Post 6 — Classification: Is This Material a Metal or Insulator?
Your first binary classifier: logistic regression, the sigmoid function, and the decision boundary — applied to transition-metal compounds.
Logistic Regression
Supervised Classification
Metal vs. Insulator (Eg = 0 or > 0)
ΔEN, nd (d-electron count)
In Post 5 we trained a linear regression model to predict the band gap Eg as a continuous value. But materials scientists often need a simpler answer first: is this compound conducting or not? This yes/no question is a classification problem — the most common task in supervised machine learning.
Twelve transition-metal compounds as Metal (Eg = 0 eV) or Insulator (Eg > 0 eV) using two crystal-chemistry features: the electronegativity difference ΔEN and the formal d-electron count nd.
1. From Regression to Classification — A Shift in Output
Linear regression outputs a real number ŷ ∈ ℝ. Classification instead outputs a class label: 0 or 1. The simplest way to go from one to the other is to pass the linear combination through a function that squashes any real number into the [0, 1] range and interpret the result as a probability.
| Model | Output | Loss function | Decision rule |
|---|---|---|---|
| Linear regression | ŷ ∈ ℝ (any real number) | MSE: (ŷ − y)² | — |
| Logistic regression | P(metal) ∈ [0, 1] | Binary cross-entropy | If P ≥ 0.5 → Metal; else → Insulator |
You could set a threshold on the predicted Eg — but this ignores the structural difference between the two tasks. Classification requires a probabilistic output calibrated to class boundaries, not a raw energy value. Logistic regression is purpose-built for this.
2. The Sigmoid Function — Turning Scores into Probabilities
The key ingredient is the sigmoid (logistic) function σ(z), which maps any real-valued score z to a probability between 0 and 1.
z → −∞ : σ(z) → 0 (strongly predicts Insulator)
z = 0 : σ(z) = 0.5 (maximum uncertainty, decision boundary)
z → +∞ : σ(z) → 1 (strongly predicts Metal)
The full logistic regression model for our two-feature classifier is:
P(Metal | x) = σ(z) = 1 / (1 + e−z)
θ₀ = bias (intercept)
θ₁ = weight on electronegativity difference ΔEN
θ₂ = weight on d-electron count nd
nd = number of formal d-electrons (0–10)
A large ΔEN means a more ionic bond — ionic compounds tend to be insulators (θ₁ < 0 expected). A partially-filled d-shell (intermediate nd) often correlates with metallic behaviour, but Mott insulators challenge this simple picture — exactly the kind of complexity a richer model must learn.
3. The Decision Boundary — Where the Model Is Maximally Uncertain
The model predicts Metal when σ(z) ≥ 0.5, which happens exactly when z ≥ 0. Setting z = 0 gives the equation of the decision boundary:
Rearranging for nd:
nd = −(θ₀ + θ₁ · ΔEN) / θ₂
This is a straight line in (ΔEN, nd) space separating Metal from Insulator predictions.
Logistic regression is therefore a linear classifier: its decision boundary is always a hyperplane (a line in 2D). This is a strength (interpretable, fast) and a limitation (cannot learn curved boundaries without feature engineering).
4. How the Model Learns — Binary Cross-Entropy
We cannot use MSE for classification because the sigmoid's output is a probability, and MSE would create a non-convex surface with many local minima. Instead we use binary cross-entropy, which is convex and measures how surprised the model is by the true label.
yᵢ = true label (1 = Metal, 0 = Insulator)
ŷᵢ = P(Metal | xᵢ) from the sigmoid
When yᵢ = 1: loss = −log(ŷᵢ) → penalises low confidence in Metal
When yᵢ = 0: loss = −log(1−ŷᵢ) → penalises high confidence in Metal
Unlike linear regression, logistic regression has no Normal Equation. The cross-entropy cost has no analytical minimum — we must use gradient descent (or Newton's method). In practice, scikit-learn uses the L-BFGS solver by default, which converges much faster than vanilla gradient descent.
5. Gradient Descent for Logistic Regression
The gradient of the cross-entropy loss with respect to each weight has a surprisingly clean form — identical in structure to the MSE gradient for linear regression:
θⱼ ← θⱼ − η · ∂J/∂θⱼ
η = learning rate (typical: 0.01–0.5 for standardised features)
Repeat until convergence (J stops decreasing).
- Standardise features: subtract mean, divide by standard deviation
- Initialise θ = [0, 0, 0]
- Compute z = Xθ and ŷ = σ(z) for all training examples
- Compute cross-entropy loss J(θ)
- Compute gradient ∇J and update all weights θ ← θ − η · ∇J
- Repeat steps 3–5 until J converges (typically 1,000–3,000 iterations)
6. Our Training Dataset — Transition-Metal Compounds
We use 12 transition-metal chalcogenides and oxides whose metallic or insulating character is well established experimentally. The two features (ΔEN, nd) are computed from standard atomic tables — no DFT required.
| Compound | ΔEN (Pauling) | nd | Class | Physical note |
|---|---|---|---|---|
| FeS | 0.43 | 6 | Metal | Pyrrhotite-type conductor |
| FeS₂ | 0.43 | 6 | Insulator | Pyrite — band gap ~0.95 eV |
| NiO | 1.40 | 8 | Insulator | Classic Mott insulator |
| NiS | 0.43 | 8 | Metal | Millerite-type metal |
| MnTe | 0.61 | 5 | Insulator | Antiferromagnetic semiconductor |
| MnS | 0.93 | 5 | Insulator | Alabandite — ~3 eV gap |
| CrO₂ | 1.54 | 2 | Metal | Half-metal, fully spin-polarised |
| Cr₂O₃ | 1.54 | 3 | Insulator | Chromia — ~3.4 eV gap |
| CoO | 1.40 | 7 | Insulator | Mott-Hubbard insulator |
| CoS₂ | 0.43 | 7 | Metal | Itinerant ferromagnet |
| TiO | 1.54 | 2 | Metal | NaCl-type metallic oxide |
| TiO₂ | 1.54 | 0 | Insulator | Rutile — ~3.0 eV gap |
This is the deep challenge of this dataset: FeS (metal) and FeS₂ (insulator) have identical formal features. FeS₂ forms S₂²⁻ dimers that open a gap through molecular-orbital effects — physics our two scalar features cannot encode. These two points will always be misclassified by a linear model, illustrating a hard limit of feature-engineered classifiers.
7. A Worked Example — Reading the Auto-Fit Result
After gradient descent converges on our 12-compound dataset (η = 0.5, standardised features, 3000 iterations), the fitted weights in raw (un-standardised) feature space are approximately:
θ₁ = −2.81 (ΔEN weight)
θ₂ = +0.47 (nd weight)
P(Metal) = σ(1.92 − 2.81·ΔEN + 0.47·nd)
θ₁ = −2.81 confirms: more ionic bonds (larger ΔEN) strongly reduce the probability of metallic character — physically correct (ionic compounds are typically insulators). θ₂ = +0.47 means a higher d-electron count mildly increases the metal probability, consistent with broader d-bands in late transition-metal sulfides.
8. Evaluating a Classifier — Confusion Matrix and Metrics
Classification models are never evaluated with R² or MAE. Instead we use the confusion matrix — a 2×2 count of correct and incorrect predictions — and four derived metrics.
| Metric | Formula | Ideal | What it tells you |
|---|---|---|---|
| Accuracy | (TP + TN) / N | 1.0 | Fraction of all compounds classified correctly. Misleading on imbalanced datasets. |
| Precision | TP / (TP + FP) | 1.0 | Of all compounds predicted Metal, what fraction truly are? Penalises false alarms. |
| Recall | TP / (TP + FN) | 1.0 | Of all true Metals, what fraction did we detect? Penalises missed metals. |
| F1 score | 2·Prec·Rec / (Prec + Rec) | 1.0 | Harmonic mean of precision and recall. Best single metric for imbalanced classes. |
If 90% of your dataset is Insulator, a model that always predicts Insulator achieves 90% accuracy — but has zero ability to find metals. Always inspect both precision and recall, or use the F1 score.
9. Interactive Companion — App 6 Classification Explorer
The companion app lets you adjust the three parameters θ₀, θ₁, θ₂ manually and watch the decision boundary, sigmoid output, and confusion matrix update in real time. Click Auto-fit with gradient descent to run the full training loop. Notice that even after Auto-fit, accuracy stays at 10/12 at best — FeS and FeS₂ share identical (ΔEN, nd) coordinates and no linear boundary can separate them.
10. Python Implementation — scikit-learn Workflow
From scratch — gradient descent
# Data: [ΔEN, n_d], labels (1=Metal, 0=Insulator)
X_raw = np.array([
[0.43,6],[0.43,6],[1.40,8],[0.43,8],[0.61,5],[0.93,5],
[1.54,2],[1.54,3],[1.40,7],[0.43,7],[1.54,2],[1.54,0]
])
y = np.array([1,0,0,1,0,0,1,0,0,1,1,0]) # 1=Metal
# Standardise
mx, sx = X_raw.mean(0), X_raw.std(0)
X = np.hstack([np.ones((12,1)), (X_raw - mx) / sx])
# Sigmoid
sigmoid = lambda z: 1 / (1 + np.exp(-z))
# Gradient descent
theta = np.zeros(3); eta = 0.5; iters = 3000
for _ in range(iters):
yhat = sigmoid(X @ theta)
grad = X.T @ (yhat - y) / len(y)
theta -= eta * grad
print(f"θ = {theta.round(3)}")
preds = (sigmoid(X @ theta) >= 0.5).astype(int)
print(f"Accuracy: {(preds == y).mean():.2f}")
Using scikit-learn (recommended)
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix
import numpy as np
X = np.array([
[0.43,6],[0.43,6],[1.40,8],[0.43,8],[0.61,5],[0.93,5],
[1.54,2],[1.54,3],[1.40,7],[0.43,7],[1.54,2],[1.54,0]
])
y = np.array([1,0,0,1,0,0,1,0,0,1,1,0])
# Standardise features
scaler = StandardScaler()
X_sc = scaler.fit_transform(X)
# Fit logistic regression
clf = LogisticRegression(max_iter=1000, random_state=42)
clf.fit(X_sc, y)
y_pred = clf.predict(X_sc)
print(confusion_matrix(y, y_pred))
print(classification_report(y, y_pred,
target_names=['Insulator','Metal']))
print(f"Boundary: n_d = {(-clf.intercept_[0]/clf.coef_[0,1]):.2f}"
f" − {(clf.coef_[0,0]/clf.coef_[0,1]):.2f}·ΔEN")
11. Limitations — When Logistic Regression Is Not Enough
| Limitation | Why it matters in materials science | Solution |
|---|---|---|
| Linear boundary only | Metal–insulator boundary in real materials is highly non-linear (Mott physics, topology) | Kernel SVM, decision trees, neural networks |
| Feature overlap | FeS and FeS₂ have identical ΔEN and nd but different classes | Add structure-sensitive features: crystal field splitting, bond angle, coordination |
| Small dataset | 12 examples is far too few for reliable generalisation | Augment with ICSD/Materials Project data; use cross-validation |
| No uncertainty | Cannot report confidence interval on the probability output | Platt scaling, Bayesian logistic regression, conformal prediction |
Quick Check
1. In logistic regression, what does σ(z) = 0.5 tell you about the model's prediction?
2. FeS and FeS₂ are always misclassified by our model. The most likely reason is:
3. You have 5 metals and 45 insulators in your dataset. Your model predicts all compounds as Insulator. What is its accuracy and F1 score for the Metal class?