Header Ads Widget

AI & Machine Learning for Materials Sciences

Last Posts

10/recent/ticker-posts

Post 3: Key Mathematical Tools — Vectors, Matrices & Probability

MODULE 1 · FOUNDATIONS

Every ML algorithm — from linear regression to deep neural networks — is built on three mathematical pillars: linear algebra, calculus, and probability. You already use all three in DFT without thinking about it. This post makes the connection explicit and gives you the minimum toolkit needed to understand the algorithms in the rest of this course.
💡 Philosophy of this post

We focus on intuition and materials science connections, not formal proofs. The goal is recognition: when you see a matrix in an ML paper, you should immediately know what it represents physically.

1. Vectors — representing a material

In ML, every material is represented as a feature vector — a list of numbers that describes it. You already think this way when you prepare a DFT input file.

x = [x₁, x₂, x₃, ..., xₙ] a material described by n numerical features

For example, a simple feature vector for a binary compound might be:

x = [a, Z_A, Z_B, ΔEN, r_A, r_B, Eg_A, Eg_B] lattice param · atomic numbers · electronegativity diff · atomic radii · elemental band gaps
📏

Vector length (norm)

x‖ = √(x₁² + x₂² + … + xₙ²)
Measures the "size" of a feature vector. Used for normalisation.

📐

Dot product

x·w = Σ xᵢwᵢ
The core operation of every neuron: weighted sum of inputs.

📊

Cosine similarity

cos θ = (x·y) / (‖x‖‖y‖)
Measures how similar two materials are in feature space.

🔬 DFT connection

In DFT, the electron density n(r) is a vector in a very high-dimensional space (one value per grid point). The Kohn-Sham wavefunctions ψᵢ(r) are also vectors — you expand them in a basis set (plane waves, APW+lo…). ML feature vectors are exactly the same concept, just much lower dimensional.

2. Matrices — datasets and transformations

A dataset of N materials, each described by d features, is naturally represented as a matrix:

X = | x₁₁ x₁₂ ··· x₁d | ← material 1
    | x₂₁ x₂₂ ··· x₂d | ← material 2
    | · |
    | xN₁ xN₂ ··· xNd | ← material N shape: N × d  (N materials, d features each)
Matrix operationML meaningMaterials example
X · w Apply weights to all features Predict band gap for all N materials at once
Xᵀ · X Feature covariance matrix How correlated are lattice constant and band gap?
X⁻¹ Solve linear system Exact solution of linear regression
SVD(X) Dimensionality reduction (PCA) Find the 2 main axes of variation in a materials database
🔬 DFT connection

In WIEN2k or VASP, the Hamiltonian H and overlap S matrices are the central objects — you solve the generalised eigenvalue problem Hψ = εSψ. In ML, you solve XXw = Xy for the weights. Same mathematical structure — different physical meaning.

3. Derivatives and gradients — how ML learns

Machine learning training is fundamentally an optimisation problem: find the model parameters that minimise the prediction error. This is done using gradients.

L(w) = (1/N) Σᵢ (ŷᵢ − yᵢ)² Loss function: mean squared error between predictions ŷ and true labels y
w ← w − η · ∂L/∂w Gradient descent update rule: η is the learning rate
⚠️ The learning rate η

η controls how large each step is during optimisation. Too large → the model oscillates and diverges. Too small → training takes forever. Choosing η is one of the most important practical decisions in ML — analogous to choosing the SCF mixing parameter in DFT.

🔬 DFT connection

In DFT geometry optimisation, you compute the forces F = −∂E/∂R and move atoms downhill. In ML, you compute the gradient of the loss ∂L/∂w and move the weights downhill. Both are gradient descent on an energy-like surface.

4. Probability and statistics — understanding data

ML models deal with uncertainty — data has noise, and predictions are never exact. Probability gives us the language to quantify this.

📈

Mean (μ)

μ = (1/N) Σ xᵢ
Average value. Used to centre your feature data before training.

📉

Variance (σ²)

σ² = (1/N) Σ (xᵢ−μ)²
Spread of the data. Large σ² → widely scattered band gaps in your dataset.

🔗

Correlation

ρ = Cov(x,y) / (σₓ σᵧ)
Are two features linearly related? e.g. does lattice constant correlate with band gap?

🎯

Normal distribution

p(x) = (1/σ√2π) exp(−(x−μ)²/2σ²)
Most ML noise assumptions are Gaussian. DFT errors are approximately Gaussian too.

5. The three most important ML metrics

When an ML model predicts materials properties, you need to measure how good the predictions are. Three metrics are universally used:

MetricFormulaWhat it meansGood value
MAE (1/N) Σ |ŷᵢ − yᵢ| Mean Absolute Error — average prediction error in same units as target e.g. <0.2 eV for band gap
RMSE √((1/N) Σ (ŷᵢ−yᵢ)²) Root Mean Square Error — penalises large errors more than MAE e.g. <0.3 eV for band gap
1 − SS_res/SS_tot Coefficient of determination — 1.0 is perfect, 0 means useless >0.95 is excellent
⚗️ Materials science context

For band gap prediction, state-of-the-art ML models achieve MAE ≈ 0.1–0.2 eV on large datasets. This is comparable to the error between GGA-DFT and experiment (~0.5–1.0 eV). For formation energy, MAE < 50 meV/atom is considered excellent.

Summary — the minimum math toolkit

ConceptWhy you need itWhere it appears
VectorsRepresent materials as numbersFeature engineering, embeddings
Dot productCore operation of every neuronLinear layers, attention
Matrix multiplyApply transformations to dataNeural network layers
GradientDirection of steepest increase of lossBackpropagation, optimisers
Mean & varianceNormalise features, understand dataData preprocessing
MAE / RMSE / R²Measure prediction qualityModel evaluation everywhere
⚡ Interactive Companion App
Math Tools Explorer
Explore vectors, the gradient descent optimiser, and test your metrics knowledge live.

Quick check

1. A material is described by the vector x = [3.5, 26, 8, 1.83]. What does this representation assume?
2. In gradient descent, what does the learning rate η control?
3. An ML model predicts band gaps with MAE = 0.15 eV and R² = 0.97. How would you describe this model?