Module 2 · Core Algorithms

Post 6 — Classification: Is This Material a Metal or Insulator?

Your first binary classifier: logistic regression, the sigmoid function, and the decision boundary — applied to transition-metal compounds.

Abderrahmane REGGAD · June 14, 2026

Module 2

🔀

Algorithm

Logistic Regression

🎯

Task

Supervised Classification

⚡

Materials Target

Metal vs. Insulator (E_g = 0 or > 0)

📊

Features

ΔEN, n_d (d-electron count)

In Post 5 we trained a linear regression model to predict the band gap E_g as a continuous value. But materials scientists often need a simpler answer first: is this compound conducting or not? This yes/no question is a classification problem — the most common task in supervised machine learning.

🎯

What we will classify

Twelve transition-metal compounds as Metal (E_g = 0 eV) or Insulator (E_g > 0 eV) using two crystal-chemistry features: the electronegativity difference ΔEN and the formal d-electron count n_d.

1. From Regression to Classification — A Shift in Output

Linear regression outputs a real number ŷ ∈ ℝ. Classification instead outputs a class label: 0 or 1. The simplest way to go from one to the other is to pass the linear combination through a function that squashes any real number into the [0, 1] range and interpret the result as a probability.

Model	Output	Loss function	Decision rule
Linear regression	ŷ ∈ ℝ (any real number)	MSE: (ŷ − y)²	—
Logistic regression	P(metal) ∈ [0, 1]	Binary cross-entropy	If P ≥ 0.5 → Metal; else → Insulator

🔬

Why not just threshold the regression output?

You could set a threshold on the predicted E_g — but this ignores the structural difference between the two tasks. Classification requires a probabilistic output calibrated to class boundaries, not a raw energy value. Logistic regression is purpose-built for this.

2. The Sigmoid Function — Turning Scores into Probabilities

The key ingredient is the sigmoid (logistic) function σ(z), which maps any real-valued score z to a probability between 0 and 1.

🔀 The sigmoid function

σ(z) = 1 / (1 + e^−z)

z → −∞ : σ(z) → 0 (strongly predicts Insulator)
z = 0 : σ(z) = 0.5 (maximum uncertainty, decision boundary)
z → +∞ : σ(z) → 1 (strongly predicts Metal)

The full logistic regression model for our two-feature classifier is:

⚡ Logistic regression model

z = θ₀ + θ₁ · ΔEN + θ₂ · n_d
P(Metal | x) = σ(z) = 1 / (1 + e^−z)

θ₀ = bias (intercept)
θ₁ = weight on electronegativity difference ΔEN
θ₂ = weight on d-electron count n_d
n_d = number of formal d-electrons (0–10)

💡

Physical intuition for the features

A large ΔEN means a more ionic bond — ionic compounds tend to be insulators (θ₁ < 0 expected). A partially-filled d-shell (intermediate n_d) often correlates with metallic behaviour, but Mott insulators challenge this simple picture — exactly the kind of complexity a richer model must learn.

3. The Decision Boundary — Where the Model Is Maximally Uncertain

The model predicts Metal when σ(z) ≥ 0.5, which happens exactly when z ≥ 0. Setting z = 0 gives the equation of the decision boundary:

📐 Decision boundary in feature space

θ₀ + θ₁ · ΔEN + θ₂ · n_d = 0

Rearranging for n_d:
n_d = −(θ₀ + θ₁ · ΔEN) / θ₂

This is a straight line in (ΔEN, n_d) space separating Metal from Insulator predictions.

Logistic regression is therefore a linear classifier: its decision boundary is always a hyperplane (a line in 2D). This is a strength (interpretable, fast) and a limitation (cannot learn curved boundaries without feature engineering).

4. How the Model Learns — Binary Cross-Entropy

We cannot use MSE for classification because the sigmoid's output is a probability, and MSE would create a non-convex surface with many local minima. Instead we use binary cross-entropy, which is convex and measures how surprised the model is by the true label.

📉 Binary cross-entropy loss

J(θ) = −(1/N) · Σᵢ [ yᵢ log(ŷᵢ) + (1−yᵢ) log(1−ŷᵢ) ]

yᵢ = true label (1 = Metal, 0 = Insulator)
ŷᵢ = P(Metal | xᵢ) from the sigmoid

When yᵢ = 1: loss = −log(ŷᵢ) → penalises low confidence in Metal
When yᵢ = 0: loss = −log(1−ŷᵢ) → penalises high confidence in Metal

⚠️

No closed-form solution

Unlike linear regression, logistic regression has no Normal Equation. The cross-entropy cost has no analytical minimum — we must use gradient descent (or Newton's method). In practice, scikit-learn uses the L-BFGS solver by default, which converges much faster than vanilla gradient descent.

5. Gradient Descent for Logistic Regression

The gradient of the cross-entropy loss with respect to each weight has a surprisingly clean form — identical in structure to the MSE gradient for linear regression:

🔄 Gradient update rule

∂J/∂θⱼ = (1/N) · Σᵢ (ŷᵢ − yᵢ) · xᵢⱼ

θⱼ ← θⱼ − η · ∂J/∂θⱼ

η = learning rate (typical: 0.01–0.5 for standardised features)
Repeat until convergence (J stops decreasing).

Standardise features: subtract mean, divide by standard deviation
Initialise θ = [0, 0, 0]
Compute z = Xθ and ŷ = σ(z) for all training examples
Compute cross-entropy loss J(θ)
Compute gradient ∇J and update all weights θ ← θ − η · ∇J
Repeat steps 3–5 until J converges (typically 1,000–3,000 iterations)

6. Our Training Dataset — Transition-Metal Compounds

We use 12 transition-metal chalcogenides and oxides whose metallic or insulating character is well established experimentally. The two features (ΔEN, n_d) are computed from standard atomic tables — no DFT required.

Compound	ΔEN (Pauling)	n_d	Class	Physical note
FeS	0.43	6	Metal	Pyrrhotite-type conductor
FeS₂	0.43	6	Insulator	Pyrite — band gap ~0.95 eV
NiO	1.40	8	Insulator	Classic Mott insulator
NiS	0.43	8	Metal	Millerite-type metal
MnTe	0.61	5	Insulator	Antiferromagnetic semiconductor
MnS	0.93	5	Insulator	Alabandite — ~3 eV gap
CrO₂	1.54	2	Metal	Half-metal, fully spin-polarised
Cr₂O₃	1.54	3	Insulator	Chromia — ~3.4 eV gap
CoO	1.40	7	Insulator	Mott-Hubbard insulator
CoS₂	0.43	7	Metal	Itinerant ferromagnet
TiO	1.54	2	Metal	NaCl-type metallic oxide
TiO₂	1.54	0	Insulator	Rutile — ~3.0 eV gap

🧪

Why FeS and FeS₂ share the same ΔEN and n_d?

This is the deep challenge of this dataset: FeS (metal) and FeS₂ (insulator) have identical formal features. FeS₂ forms S₂²⁻ dimers that open a gap through molecular-orbital effects — physics our two scalar features cannot encode. These two points will always be misclassified by a linear model, illustrating a hard limit of feature-engineered classifiers.

7. A Worked Example — Reading the Auto-Fit Result

After gradient descent converges on our 12-compound dataset (η = 0.5, standardised features, 3000 iterations), the fitted weights in raw (un-standardised) feature space are approximately:

✅ Fitted coefficients (gradient descent)

θ₀ = +1.92 (bias)
θ₁ = −2.81 (ΔEN weight)
θ₂ = +0.47 (n_d weight)

P(Metal) = σ(1.92 − 2.81·ΔEN + 0.47·n_d)

✅

Physical interpretation check

θ₁ = −2.81 confirms: more ionic bonds (larger ΔEN) strongly reduce the probability of metallic character — physically correct (ionic compounds are typically insulators). θ₂ = +0.47 means a higher d-electron count mildly increases the metal probability, consistent with broader d-bands in late transition-metal sulfides.

8. Evaluating a Classifier — Confusion Matrix and Metrics

Classification models are never evaluated with R² or MAE. Instead we use the confusion matrix — a 2×2 count of correct and incorrect predictions — and four derived metrics.

Metric	Formula	Ideal	What it tells you
Accuracy	(TP + TN) / N	1.0	Fraction of all compounds classified correctly. Misleading on imbalanced datasets.
Precision	TP / (TP + FP)	1.0	Of all compounds predicted Metal, what fraction truly are? Penalises false alarms.
Recall	TP / (TP + FN)	1.0	Of all true Metals, what fraction did we detect? Penalises missed metals.
F1 score	2·Prec·Rec / (Prec + Rec)	1.0	Harmonic mean of precision and recall. Best single metric for imbalanced classes.

⚠️

Accuracy can lie

If 90% of your dataset is Insulator, a model that always predicts Insulator achieves 90% accuracy — but has zero ability to find metals. Always inspect both precision and recall, or use the F1 score.

9. Interactive Companion — App 6 Classification Explorer

The companion app lets you adjust the three parameters θ₀, θ₁, θ₂ manually and watch the decision boundary, sigmoid output, and confusion matrix update in real time. Click Auto-fit with gradient descent to run the full training loop. Notice that even after Auto-fit, accuracy stays at 10/12 at best — FeS and FeS₂ share identical (ΔEN, n_d) coordinates and no linear boundary can separate them.

10. Python Implementation — scikit-learn Workflow

From scratch — gradient descent

import numpy as np

# Data: [ΔEN, n_d], labels (1=Metal, 0=Insulator)

X_raw = np.array([

    [0.43,6],[0.43,6],[1.40,8],[0.43,8],[0.61,5],[0.93,5],

    [1.54,2],[1.54,3],[1.40,7],[0.43,7],[1.54,2],[1.54,0]

])

y = np.array([1,0,0,1,0,0,1,0,0,1,1,0])  # 1=Metal

# Standardise

mx, sx = X_raw.mean(0), X_raw.std(0)

X = np.hstack([np.ones((12,1)), (X_raw - mx) / sx])

# Sigmoid

sigmoid = lambda z: 1 / (1 + np.exp(-z))

# Gradient descent

theta = np.zeros(3); eta = 0.5; iters = 3000

for _ in range(iters):

    yhat = sigmoid(X @ theta)

    grad = X.T @ (yhat - y) / len(y)

    theta -= eta * grad

print(f"θ = {theta.round(3)}")

preds = (sigmoid(X @ theta) >= 0.5).astype(int)

print(f"Accuracy: {(preds == y).mean():.2f}")

Using scikit-learn (recommended)

from sklearn.linear_model import LogisticRegression

from sklearn.preprocessing import StandardScaler

from sklearn.metrics import classification_report, confusion_matrix

import numpy as np

X = np.array([

    [0.43,6],[0.43,6],[1.40,8],[0.43,8],[0.61,5],[0.93,5],

    [1.54,2],[1.54,3],[1.40,7],[0.43,7],[1.54,2],[1.54,0]

])

y = np.array([1,0,0,1,0,0,1,0,0,1,1,0])

# Standardise features

scaler = StandardScaler()

X_sc = scaler.fit_transform(X)

# Fit logistic regression

clf = LogisticRegression(max_iter=1000, random_state=42)

clf.fit(X_sc, y)

y_pred = clf.predict(X_sc)

print(confusion_matrix(y, y_pred))

print(classification_report(y, y_pred,

      target_names=['Insulator','Metal']))

print(f"Boundary: n_d = {(-clf.intercept_[0]/clf.coef_[0,1]):.2f}"

      f" − {(clf.coef_[0,0]/clf.coef_[0,1]):.2f}·ΔEN")

11. Limitations — When Logistic Regression Is Not Enough

Limitation	Why it matters in materials science	Solution
Linear boundary only	Metal–insulator boundary in real materials is highly non-linear (Mott physics, topology)	Kernel SVM, decision trees, neural networks
Feature overlap	FeS and FeS₂ have identical ΔEN and n_d but different classes	Add structure-sensitive features: crystal field splitting, bond angle, coordination
Small dataset	12 examples is far too few for reliable generalisation	Augment with ICSD/Materials Project data; use cross-validation
No uncertainty	Cannot report confidence interval on the probability output	Platt scaling, Bayesian logistic regression, conformal prediction

Quick Check

1. In logistic regression, what does σ(z) = 0.5 tell you about the model's prediction?

A. The compound is 50% metal by weight
B. The model is maximally uncertain — this is exactly the decision boundary
C. The loss function is minimised at this point
D. Half the features are positive and half are negative

2. FeS and FeS₂ are always misclassified by our model. The most likely reason is:

A. The learning rate η is too large
B. Gradient descent did not converge
C. Both compounds have identical features in our two-dimensional feature space
D. The sigmoid function saturates for these inputs

3. You have 5 metals and 45 insulators in your dataset. Your model predicts all compounds as Insulator. What is its accuracy and F1 score for the Metal class?

A. Accuracy = 0.50, F1 = 0.50
B. Accuracy = 0.05, F1 = 0.00
C. Accuracy = 0.90, F1 = 0.00
D. Accuracy = 0.90, F1 = 0.50

Core Algorithms Classification Logistic Regression Sigmoid Confusion Matrix Metal vs Insulator scikit-learn

Header Ads Widget

Last Posts

Post 6: Classification — is this material metallic or insulating?

Post 6 — Classification: Is This Material a Metal or Insulator?

1. From Regression to Classification — A Shift in Output

2. The Sigmoid Function — Turning Scores into Probabilities

3. The Decision Boundary — Where the Model Is Maximally Uncertain

4. How the Model Learns — Binary Cross-Entropy

5. Gradient Descent for Logistic Regression

6. Our Training Dataset — Transition-Metal Compounds

7. A Worked Example — Reading the Auto-Fit Result

8. Evaluating a Classifier — Confusion Matrix and Metrics

9. Interactive Companion — App 6 Classification Explorer

10. Python Implementation — scikit-learn Workflow

From scratch — gradient descent

Using scikit-learn (recommended)

11. Limitations — When Logistic Regression Is Not Enough

Quick Check

About me

My page

Popular Posts

Post 1: What is Artificial Intelligence? A researcher's first look

Post 2: ML vs Traditional Simulation — Where does DFT end and ML begin?

Post 4: Types of ML — Supervised, Unsupervised & Reinforcement Learning

Post 7: Decision Trees and Random Forests — Handling Non-linear Boundaries

Post 3: Key Mathematical Tools — Vectors, Matrices & Probability

Post 6: Classification — is this material metallic or insulating?

Post 8: Support Vector Machines (SVMs) — with a crystal property example

Post 5: Linear Regression — Predicting Band Gap from Simple Features.

Categories

Pageviews past week

You may contact me here

Foundations

Core Algorithms

Magnetic Calculations

Menu Footer Widget