Post 3: Key Mathematical Tools — Vectors, Matrices & Probability

MODULE 1 · FOUNDATIONS

Every ML algorithm — from linear regression to deep neural networks — is built on three mathematical pillars: linear algebra, calculus, and probability. You already use all three in DFT without thinking about it. This post makes the connection explicit and gives you the minimum toolkit needed to understand the algorithms in the rest of this course.

💡 Philosophy of this post

We focus on intuition and materials science connections, not formal proofs. The goal is recognition: when you see a matrix in an ML paper, you should immediately know what it represents physically.

1. Vectors — representing a material

In ML, every material is represented as a feature vector — a list of numbers that describes it. You already think this way when you prepare a DFT input file.

x = [x₁, x₂, x₃, ..., xₙ] a material described by n numerical features

For example, a simple feature vector for a binary compound might be:

x = [a, Z_A, Z_B, ΔEN, r_A, r_B, Eg_A, Eg_B] lattice param · atomic numbers · electronegativity diff · atomic radii · elemental band gaps

📏

Vector length (norm)

‖x‖ = √(x₁² + x₂² + … + xₙ²)
Measures the "size" of a feature vector. Used for normalisation.

📐

Dot product

x·w = Σ xᵢwᵢ
The core operation of every neuron: weighted sum of inputs.

📊

Cosine similarity

cos θ = (x·y) / (‖x‖‖y‖)
Measures how similar two materials are in feature space.

🔬 DFT connection

In DFT, the electron density n(r) is a vector in a very high-dimensional space (one value per grid point). The Kohn-Sham wavefunctions ψᵢ(r) are also vectors — you expand them in a basis set (plane waves, APW+lo…). ML feature vectors are exactly the same concept, just much lower dimensional.

2. Matrices — datasets and transformations

A dataset of N materials, each described by d features, is naturally represented as a matrix:

Matrix operation	ML meaning	Materials example
X · w	Apply weights to all features	Predict band gap for all N materials at once
Xᵀ · X	Feature covariance matrix	How correlated are lattice constant and band gap?
X⁻¹	Solve linear system	Exact solution of linear regression
SVD(X)	Dimensionality reduction (PCA)	Find the 2 main axes of variation in a materials database

🔬 DFT connection

In WIEN2k or VASP, the Hamiltonian H and overlap S matrices are the central objects — you solve the generalised eigenvalue problem Hψ = εSψ. In ML, you solve XᵀXw = Xᵀy for the weights. Same mathematical structure — different physical meaning.

3. Derivatives and gradients — how ML learns

Machine learning training is fundamentally an optimisation problem: find the model parameters that minimise the prediction error. This is done using gradients.

L(w) = (1/N) Σᵢ (ŷᵢ − yᵢ)² Loss function: mean squared error between predictions ŷ and true labels y

w ← w − η · ∂L/∂w Gradient descent update rule: η is the learning rate

⚠️ The learning rate η

η controls how large each step is during optimisation. Too large → the model oscillates and diverges. Too small → training takes forever. Choosing η is one of the most important practical decisions in ML — analogous to choosing the SCF mixing parameter in DFT.

🔬 DFT connection

In DFT geometry optimisation, you compute the forces F = −∂E/∂R and move atoms downhill. In ML, you compute the gradient of the loss ∂L/∂w and move the weights downhill. Both are gradient descent on an energy-like surface.

4. Probability and statistics — understanding data

ML models deal with uncertainty — data has noise, and predictions are never exact. Probability gives us the language to quantify this.

📈

Mean (μ)

μ = (1/N) Σ xᵢ
Average value. Used to centre your feature data before training.

📉

Variance (σ²)

σ² = (1/N) Σ (xᵢ−μ)²
Spread of the data. Large σ² → widely scattered band gaps in your dataset.

🔗

Correlation

ρ = Cov(x,y) / (σₓ σᵧ)
Are two features linearly related? e.g. does lattice constant correlate with band gap?

🎯

Normal distribution

p(x) = (1/σ√2π) exp(−(x−μ)²/2σ²)
Most ML noise assumptions are Gaussian. DFT errors are approximately Gaussian too.

5. The three most important ML metrics

When an ML model predicts materials properties, you need to measure how good the predictions are. Three metrics are universally used:

Metric	Formula	What it means	Good value
MAE	(1/N) Σ \|ŷᵢ − yᵢ\|	Mean Absolute Error — average prediction error in same units as target	e.g. <0.2 eV for band gap
RMSE	√((1/N) Σ (ŷᵢ−yᵢ)²)	Root Mean Square Error — penalises large errors more than MAE	e.g. <0.3 eV for band gap
R²	1 − SS_res/SS_tot	Coefficient of determination — 1.0 is perfect, 0 means useless	>0.95 is excellent

⚗️ Materials science context

For band gap prediction, state-of-the-art ML models achieve MAE ≈ 0.1–0.2 eV on large datasets. This is comparable to the error between GGA-DFT and experiment (~0.5–1.0 eV). For formation energy, MAE < 50 meV/atom is considered excellent.

Summary — the minimum math toolkit

Concept	Why you need it	Where it appears
Vectors	Represent materials as numbers	Feature engineering, embeddings
Dot product	Core operation of every neuron	Linear layers, attention
Matrix multiply	Apply transformations to data	Neural network layers
Gradient	Direction of steepest increase of loss	Backpropagation, optimisers
Mean & variance	Normalise features, understand data	Data preprocessing
MAE / RMSE / R²	Measure prediction quality	Model evaluation everywhere

    ⚡ Interactive Companion App
  

Math Tools Explorer

Explore vectors, the gradient descent optimiser, and test your metrics knowledge live.

Quick check

1. A material is described by the vector x = [3.5, 26, 8, 1.83]. What does this representation assume?

2. In gradient descent, what does the learning rate η control?

3. An ML model predicts band gaps with MAE = 0.15 eV and R² = 0.97. How would you describe this model?

Coming up in this module

Post 4

Types of ML — Supervised, Unsupervised, Reinforcement

Post 5

Linear Regression — Predicting Band Gap from Simple Features

Post 6

Classification — Metal or Insulator?

Header Ads Widget

Last Posts

Post 3: Key Mathematical Tools — Vectors, Matrices & Probability

1. Vectors — representing a material

Vector length (norm)

Dot product

Cosine similarity

2. Matrices — datasets and transformations

3. Derivatives and gradients — how ML learns

4. Probability and statistics — understanding data

Mean (μ)

Variance (σ²)

Correlation

Normal distribution

5. The three most important ML metrics

Summary — the minimum math toolkit

Quick check

About me

My page

Popular Posts

Post 1: What is Artificial Intelligence? A researcher's first look

Post 2: ML vs Traditional Simulation — Where does DFT end and ML begin?

Post 4: Types of ML — Supervised, Unsupervised & Reinforcement Learning

Post 7: Decision Trees and Random Forests — Handling Non-linear Boundaries

Post 3: Key Mathematical Tools — Vectors, Matrices & Probability

Post 6: Classification — is this material metallic or insulating?

Post 8: Support Vector Machines (SVMs) — with a crystal property example

Post 5: Linear Regression — Predicting Band Gap from Simple Features.

Categories

Pageviews past week

You may contact me here

Foundations

Core Algorithms

Magnetic Calculations

Menu Footer Widget