We focus on intuition and materials science connections, not formal proofs. The goal is recognition: when you see a matrix in an ML paper, you should immediately know what it represents physically.
1. Vectors — representing a material
In ML, every material is represented as a feature vector — a list of numbers that describes it. You already think this way when you prepare a DFT input file.
For example, a simple feature vector for a binary compound might be:
Vector length (norm)
‖x‖ = √(x₁² + x₂² + … + xₙ²)
Measures the "size" of a feature vector. Used for normalisation.
Dot product
x·w = Σ xᵢwᵢ
The core operation of every neuron: weighted sum of inputs.
Cosine similarity
cos θ = (x·y) / (‖x‖‖y‖)
Measures how similar two materials are in feature space.
In DFT, the electron density n(r) is a vector in a very high-dimensional space (one value per grid point). The Kohn-Sham wavefunctions ψᵢ(r) are also vectors — you expand them in a basis set (plane waves, APW+lo…). ML feature vectors are exactly the same concept, just much lower dimensional.
2. Matrices — datasets and transformations
A dataset of N materials, each described by d features, is naturally represented as a matrix:
| x₂₁ x₂₂ ··· x₂d | ← material 2
| · |
| xN₁ xN₂ ··· xNd | ← material N shape: N × d (N materials, d features each)
| Matrix operation | ML meaning | Materials example |
|---|---|---|
| X · w | Apply weights to all features | Predict band gap for all N materials at once |
| Xᵀ · X | Feature covariance matrix | How correlated are lattice constant and band gap? |
| X⁻¹ | Solve linear system | Exact solution of linear regression |
| SVD(X) | Dimensionality reduction (PCA) | Find the 2 main axes of variation in a materials database |
In WIEN2k or VASP, the Hamiltonian H and overlap S matrices are the central objects — you solve the generalised eigenvalue problem Hψ = εSψ. In ML, you solve XᵀXw = Xᵀy for the weights. Same mathematical structure — different physical meaning.
3. Derivatives and gradients — how ML learns
Machine learning training is fundamentally an optimisation problem: find the model parameters that minimise the prediction error. This is done using gradients.
η controls how large each step is during optimisation. Too large → the model oscillates and diverges. Too small → training takes forever. Choosing η is one of the most important practical decisions in ML — analogous to choosing the SCF mixing parameter in DFT.
In DFT geometry optimisation, you compute the forces F = −∂E/∂R and move atoms downhill. In ML, you compute the gradient of the loss ∂L/∂w and move the weights downhill. Both are gradient descent on an energy-like surface.
4. Probability and statistics — understanding data
ML models deal with uncertainty — data has noise, and predictions are never exact. Probability gives us the language to quantify this.
Mean (μ)
μ = (1/N) Σ xᵢ
Average value. Used to centre your feature data before training.
Variance (σ²)
σ² = (1/N) Σ (xᵢ−μ)²
Spread of the data. Large σ² → widely scattered band gaps in your dataset.
Correlation
ρ = Cov(x,y) / (σₓ σᵧ)
Are two features linearly related? e.g. does lattice constant correlate with band gap?
Normal distribution
p(x) = (1/σ√2π) exp(−(x−μ)²/2σ²)
Most ML noise assumptions are Gaussian. DFT errors are approximately Gaussian too.
5. The three most important ML metrics
When an ML model predicts materials properties, you need to measure how good the predictions are. Three metrics are universally used:
| Metric | Formula | What it means | Good value |
|---|---|---|---|
| MAE | (1/N) Σ |ŷᵢ − yᵢ| | Mean Absolute Error — average prediction error in same units as target | e.g. <0.2 eV for band gap |
| RMSE | √((1/N) Σ (ŷᵢ−yᵢ)²) | Root Mean Square Error — penalises large errors more than MAE | e.g. <0.3 eV for band gap |
| R² | 1 − SS_res/SS_tot | Coefficient of determination — 1.0 is perfect, 0 means useless | >0.95 is excellent |
For band gap prediction, state-of-the-art ML models achieve MAE ≈ 0.1–0.2 eV on large datasets. This is comparable to the error between GGA-DFT and experiment (~0.5–1.0 eV). For formation energy, MAE < 50 meV/atom is considered excellent.
Summary — the minimum math toolkit
| Concept | Why you need it | Where it appears |
|---|---|---|
| Vectors | Represent materials as numbers | Feature engineering, embeddings |
| Dot product | Core operation of every neuron | Linear layers, attention |
| Matrix multiply | Apply transformations to data | Neural network layers |
| Gradient | Direction of steepest increase of loss | Backpropagation, optimisers |
| Mean & variance | Normalise features, understand data | Data preprocessing |
| MAE / RMSE / R² | Measure prediction quality | Model evaluation everywhere |