Classification Metrics: The Health Inspector

⏱TIME: 16 min

🍽️YIELD: 1 confusion matrix you can actually read

📓CHAPTER: S4E4

The Idea

CONCEPT

Two overlapping bell curves on a 'suspicion score' axis — teal fresh dishes left, red spoiled right — with a movable inspector's line between them. The four regions are labelled TP/FP/FN/TN and feed a 2×2 grid below. Margin: 'one threshold, four fates.'

Accuracy collapses four numbers into one and lies whenever classes are imbalanced — a 99%-fresh kitchen gets 99% accuracy by inspecting nothing.

Precision asks "of the dishes I condemned, how many deserved it?"; recall asks "of the truly spoiled, how many did I catch?" They pull the threshold in opposite directions.

F1 is their harmonic truce; AUC-ROC scores the model across every threshold at once, separating model quality from threshold policy.

Pick the metric from the cost of each mistake: missed spoilage (FN) poisons customers, false alarms (FP) waste food. The matrix is a menu of consequences.

In the Test Kitchen: shrink the model separation and watch every threshold become a bad compromise — metrics cannot rescue a weak model.

⚗️ The Test Kitchen

INTERACTIVE LAB

Don't just read the recipe — taste it. Drag, click and break things below.

EXP 01

The Health Inspector

"precision vs recall is one slider"

Teal dishes are fresh, red ones are spoiled; the model gives each a suspicion score. The inspector's threshold turns scores into verdicts. Slide it left and you catch every spoiled dish but condemn good food (recall ↑, precision ↓); slide it right and the reverse. Shrink the separation to feel why a weak model makes the trade-off brutal.

THRESHOLD0.50MODEL SEPARATION0.14

TP — spoiled, flagged90

FN — spoiled, missed10

FP — fresh, flagged10

TN — fresh, cleared90

precision 90% · recall 90% · F1 90% · accuracy 90%

FIG L.6: CONFUSION MATRIX — ONE THRESHOLD, FOUR FATES. AMBER = FALSE ALARMS, DEEP RED = MISSED SPOILAGE

The Recipe

CODE

REQUIRED SPICESconfusion matrixprecisionrecallF1threshold

Same model, two verdicts

from sklearn.metrics import confusion_matrix, precision_score, recall_score

y_true = [1,1,1,1,0,0,0,0,0,0]        # 1 = spoiled
scores = [.9,.8,.6,.4,.7,.5,.3,.2,.2,.1]

for t in (0.35, 0.65):                 # two inspectors, two thresholds
    y_pred = [int(s >= t) for s in scores]
    print(t, confusion_matrix(y_true, y_pred).ravel(),   # tn fp fn tp
          precision_score(y_true, y_pred),
          recall_score(y_true, y_pred))

NEXT EXPERIMENT →

CODE & CURRY

APPROVED

ML KITCHEN