Post 2: ML vs Traditional Simulation — Where does DFT end and ML begin?

MODULE 1 · FOUNDATIONS

In Post 1 we established what AI and Machine Learning are. Now we face the most important practical question for a computational materials scientist: where exactly does traditional simulation end and machine learning begin? And more importantly — when should you use each?

The traditional simulation workflow

You already know this workflow deeply. It starts with a crystal structure and ends with a physical property — but every step is grounded in quantum mechanics:

🔬

Crystal Structure

atomic positions, cell

→

⚙️

DFT Setup

XC functional, basis set

→

🧮
SCF Cycle
solve KS equations

→

📊

Property

Eg, Ef, DOS, …

This is a physics-first approach. The rules (Schrödinger equation, exchange-correlation functional) are known in advance. No data is needed — only the structure. The calculation is deterministic: the same input always produces the same output.

💡 Key characteristic of DFT

DFT solves the problem from first principles — no experimental data, no training. It can predict properties of a material that has never been synthesised. This is its greatest strength.

The machine learning workflow

The ML workflow looks very different. Instead of solving equations, it learns patterns from existing data:

🗄️

Dataset

structures + DFT labels

→

🧠
Training
model learns patterns

→

⚡

Prediction

new structure → property

→

📊

Property

Eg, Ef, … (fast!)

The ML model does not solve the Schrödinger equation. It does not know quantum mechanics. It has simply seen thousands of (structure → property) pairs and learned to interpolate. Once trained, prediction takes milliseconds instead of hours.

⚠️ Critical limitation

An ML model can only predict reliably for materials similar to those in its training set. It cannot extrapolate safely beyond its training distribution. DFT has no such limitation — it works for any material.

Side-by-side comparison

Property	DFT / Ab initio	Machine Learning
Physical basis	✓ Quantum mechanics	✗ Statistical patterns
Requires training data	✓ No — only structure	✗ Yes — thousands of examples
Prediction speed	Slow — hours to days	Fast — milliseconds
Accuracy	High (within XC error)	Variable — depends on training
Extrapolation	✓ Any material	✗ Similar to training data only
Interpretability	High — physics-based	Low — often a black box
Screening 10,000 compounds	✗ Computationally impossible	✓ Done in seconds
New compound (never seen)	✓ No problem	⚠️ Risky — uncertain reliability

The real workflow: DFT feeds ML

In modern computational materials science, DFT and ML are not competitors — they are partners in the same pipeline. DFT generates the reliable, physics-based data that ML needs to train on.

🔬 The real combined workflow

Step 1 — Generate data: Run DFT on hundreds or thousands of known structures. Calculate band gaps, formation energies, elastic constants.

Step 2 — Train ML model: Use the DFT results as labels. Train a model (e.g. ALIGNN, CGCNN) to predict these properties from structure.

Step 3 — Screen: Use the ML model to rapidly screen 100,000 candidate compounds. Cost: seconds.

Step 4 — Validate: Take the top 10–20 candidates identified by ML. Run full DFT calculations to verify. Cost: manageable.

This four-step loop has already led to the discovery of new battery materials, catalysts, and topological insulators — at a speed impossible with DFT alone.

A concrete example: band gap prediction

Suppose you want to find a semiconductor with a band gap between 1.0 and 1.5 eV (optimal for solar cells) from a database of 100,000 candidate structures.

⚗️ Pure DFT approach

Each DFT calculation takes ~4 hours on 16 cores. For 100,000 structures: 400,000 core-hours. Equivalent to 45 years on a single computer. Clearly impossible.

🤖 DFT + ML approach

Run DFT on 10,000 structures (40,000 core-hours — feasible). Train an ML model. Screen all 100,000 structures with ML in minutes. Run DFT only on the top 50 candidates. Total cost: ~40,200 core-hours. Speedup: ~10×.

When to use each approach

Situation	Best approach
Studying one new material in depth	DFT — maximum accuracy, full physics
Screening thousands of candidates	ML — after training on DFT data
Unknown compound class (no training data)	DFT — ML cannot extrapolate safely
High-throughput property prediction	ML — orders of magnitude faster
Validating ML predictions	DFT — always use DFT for final validation
Understanding electronic structure	DFT — DOS, band structure, Fermi surface

💡 The key insight

Think of ML as a fast surrogate model trained on DFT. It approximates DFT at a fraction of the cost — but DFT remains the ground truth. Never trust an ML prediction without DFT validation for critical decisions.

What makes a good ML model for materials?

Not all ML models are equal for materials science. A good model must respect the fundamental symmetries of physics:

Physical symmetry	What it means	Example
Rotational invariance	Rotating a crystal changes nothing physical	Band gap of Fe₃O₄ is the same in any orientation
Translational invariance	Shifting the origin changes nothing physical	Properties don't depend on where you place the unit cell
Permutation invariance	Order of atoms in the list doesn't matter	Swapping atom 1 and atom 2 in the input gives the same property

Graph Neural Networks like CGCNN and ALIGNN — which you may have encountered — are designed specifically to respect these symmetries. This is why they outperform simple neural networks for materials property prediction. We will study them in detail in Module 3.

    ⚡ Interactive Companion App
  

DFT vs ML Cost Calculator

Adjust the parameters and see how the combined DFT+ML approach saves computational time.

Quick check

1. You want to predict the formation energy of 50,000 new perovskite structures. Which approach is most practical?

2. An ML model trained on oxides is used to predict properties of nitrides (a completely different class). What is the main risk?

3. Which physical symmetry means that the predicted band gap must be the same regardless of how the crystal is oriented in space?

Coming up in this module

Post 3

Key mathematical tools — vectors, matrices, probability refresher

Post 4

Types of ML — supervised, unsupervised, reinforcement learning

Post 5

Linear regression — predicting band gap from simple features

Header Ads Widget

Last Posts

Post 2: ML vs Traditional Simulation — Where does DFT end and ML begin?

The traditional simulation workflow

The machine learning workflow

Side-by-side comparison

The real workflow: DFT feeds ML

A concrete example: band gap prediction

When to use each approach

What makes a good ML model for materials?

Quick check

About me

My page

Popular Posts

Post 1: What is Artificial Intelligence? A researcher's first look

Post 2: ML vs Traditional Simulation — Where does DFT end and ML begin?

Post 4: Types of ML — Supervised, Unsupervised & Reinforcement Learning

Post 7: Decision Trees and Random Forests — Handling Non-linear Boundaries

Post 3: Key Mathematical Tools — Vectors, Matrices & Probability

Post 6: Classification — is this material metallic or insulating?

Post 8: Support Vector Machines (SVMs) — with a crystal property example

Post 5: Linear Regression — Predicting Band Gap from Simple Features.

Categories

Pageviews past week

You may contact me here

Foundations

Core Algorithms

Magnetic Calculations

Menu Footer Widget