Header Ads Widget

AI & Machine Learning for Materials Sciences

Last Posts

10/recent/ticker-posts

Post 1: What is Artificial Intelligence? A researcher's first look

You already use powerful computational tools every day — WIEN2k, VASP, Quantum ESPRESSO — to solve the Schrödinger equation and predict properties from first principles. These tools are built on deep physics. They are exact in theory but expensive in practice: a single DFT calculation for a complex oxide can take days on a supercomputer.

Artificial Intelligence — and Machine Learning in particular — offers a complementary approach: instead of solving the physics equations from scratch each time, a trained model learns patterns from existing calculations and predicts new properties in milliseconds.

💡 Key Insight
AI does not replace DFT. It learns from DFT data. Think of it as a fast surrogate model trained on your expensive simulations — then able to screen thousands of new compounds instantly.

Real examples already transforming our field: predicting band gaps, formation energies, elastic constants, magnetic moments — all from composition and structure alone, without running a single SCF cycle.

The three layers: AI, ML, and Deep Learning

These three terms are often used interchangeably — incorrectly. Here is the correct relationship:

🤖 Artificial Intelligence
Any technique that makes machines behave "intelligently"
📊 Machine Learning
AI that learns from data — no explicit programming
🧠 Deep Learning
ML using multi-layer neural networks

Artificial Intelligence is the broad field — any system that mimics human reasoning. It includes rule-based expert systems, search algorithms, and much more.

Machine Learning is a subset of AI where the system learns from data rather than following hand-written rules. Given enough examples (band gap measurements + crystal structures), it finds patterns on its own.

Deep Learning is a subset of ML using artificial neural networks with many layers. It is behind image recognition, language models like ChatGPT — and modern materials property predictors like ALIGNN.

How does a machine "learn"?

The core idea is remarkably simple. A machine learning model has internal parameters (numbers). During training, it:

📥

1. Takes input data

e.g., crystal structure features: lattice constant, atomic number, coordination number…

⚙️

2. Makes a prediction

e.g., "band gap = 1.8 eV" — based on current parameter values

📏

3. Measures its error

Compares prediction to the known DFT value. Calculates the difference (the "loss")

🔧

4. Adjusts parameters

Slightly changes internal numbers to reduce the error — this is called backpropagation

🔁

5. Repeats thousands of times

After enough iterations, the model predicts well on data it has never seen before

⚠️ Common Misconception
"Learning" here is purely mathematical — optimization of numbers to minimize an error. There is no understanding, no intuition, no physics inside the model (unless you explicitly encode it). This is why combining ML with physical knowledge (physics-informed ML) is an active and important research direction.

The three types of Machine Learning

Type How it learns Materials Science example
Supervised From labeled examples
(input → known output)
Predict band gap from structure (DFT labels)
Unsupervised From unlabeled data,
finds hidden patterns
Cluster similar crystal structures automatically
Reinforcement From rewards and penalties
in an environment
Optimize synthesis conditions through trial and error

For materials property prediction — which is where most beginners start — supervised learning is by far the most common approach. You have DFT-calculated properties as labels; you train a model to predict them from structural features.

A concrete analogy: the DFT mindset vs the ML mindset

As a computational physicist, you think like this:

🔬 DFT Mindset
"Given this crystal structure, I will solve the Kohn-Sham equations self-consistently, compute the electronic density, and derive the total energy from first principles."

Deterministic. Physics-based. Expensive. Always interpretable.

An ML model thinks like this:

🤖 ML Mindset
"I have seen 50,000 crystal structures and their DFT-calculated band gaps. When I see a new structure, I find the closest patterns from my training and interpolate an answer."

Statistical. Data-driven. Fast (milliseconds). Often a black box.

Neither approach is universally better. The power comes from combining them: use DFT to generate reliable training data, use ML to screen vast chemical spaces quickly, use DFT again to validate the most promising candidates.

What you will learn in this course

This blog follows a progressive path. No previous ML knowledge is assumed — only your existing background in physics and materials science:

🧮

Module 1 — Foundations

What is AI/ML/DL, mathematical refresher, types of learning

📐

Module 2 — Core Algorithms

Linear regression, classification, SVMs, decision trees

🧠

Module 3 — Neural Networks

Perceptrons, backpropagation, CNNs, Graph Neural Networks

🔬

Module 4 — Applications

ALIGNN, CGCNN, Materials Project data, property prediction pipelines

📚 Reference
This course is built around the rigorous framework of Mohri, Rostamizadeh & Talwalkar — "Foundations of Machine Learning" (MIT Press, 2nd ed. 2018), adapted for the materials science context. The book is freely available as PDF from MIT Press.

Key terms to remember

TermSimple definition
ModelA mathematical function with adjustable parameters
TrainingThe process of adjusting parameters using data
FeaturesThe input variables (e.g. atomic numbers, lattice constants)
Label / TargetThe output to predict (e.g. band gap, formation energy)
Loss functionMeasures how wrong the model's prediction is
OverfittingModel memorizes training data but fails on new data
GeneralizationModel performs well on data it has never seen

Test your understanding

Try the interactive quiz for Post 1 — 5 questions to check what you've learned so far.

→ Start Quiz
Next: ML vs Traditional Simulation →