Probability & Bayes: The Base Rate Trap

⏱TIME: 15 min

🍽️YIELD: 1 working posterior + immunity to the base rate fallacy

📓CHAPTER: S2E4

The Idea

CONCEPT

A pantry of 200 spice jars, 10 of them spoiled. A 90%-accurate freshness test flags 39 jars — but only 9 of the flags are real. Arrow from 'P(flag|spoiled)' to 'P(spoiled|flag)' with a big 'NOT THE SAME!' Margin: 'rare things stay rare even after good evidence.'

P(flag|spoiled) is the test's quality; P(spoiled|flag) is what you actually want. Bayes' theorem is the bridge, and the prior is the toll.

When the prior is tiny, false positives from the huge healthy population swamp true positives from the small sick one — the base rate fallacy.

A second independent test on the flagged jars uses the 24% as its new prior: evidence compounds by updating, not by replacing.

Doctors, spam filters and fraud models all live or die by this arithmetic; calibration is just Bayes done honestly.

In the Test Kitchen: drag the prior slider down to 1% and watch a 90% test produce mostly amber false alarms.

⚗️ The Test Kitchen

INTERACTIVE LAB

Don't just read the recipe — taste it. Drag, click and break things below.

EXP 01

The Taste Test

"randomness averages out — eventually"

Set the true chance a customer loves the dish, then serve plates. Each tasting is random — but watch the running average: noisy over the first few plates, glued to the dashed line after a few hundred. That is the law of large numbers, and it is why more data beats louder opinions.

0/0 loved it

TRUE P(LOVES IT)0.70

FIG L.1: BERNOULLI TRIALS — THE RUNNING AVERAGE CONVERGES TO THE TRUE PROBABILITY

EXP 02

The Spice Jar Inspector

"never ignore the base rate"

Your freshness test is decent — yet when spoiled jars are rare, most jars it flags are actually fine. Slide the priordown and watch the amber false alarms swamp the red true catches. Bayes' theorem just counts: of all flagged jars, what fraction is genuinely spoiled?

flagged + spoiled (9)flagged but fine (29)missed spoiled (1)cleared fine (161)

PRIOR P(SPOILED)5%SENSITIVITY90%SPECIFICITY85%

P(SPOILED | FLAGGED)24.0%a flagged jar is probably fine — trust the base rate!

FIG L.2: BAYES' THEOREM — POSTERIOR = TRUE FLAGS ÷ ALL FLAGS. RARE EVENTS MAKE GOOD TESTS LOOK BAD

The Recipe

CODE

REQUIRED SPICESprobabilitybayes theorempriorposteriorbase rate

Bayes' theorem in 6 lines

# P(spoiled | flagged) via Bayes' theorem
prior = 0.05          # 5% of jars are spoiled
sens  = 0.90          # test catches 90% of spoiled jars
spec  = 0.85          # test clears 85% of fine jars

p_flag = prior * sens + (1 - prior) * (1 - spec)
posterior = prior * sens / p_flag
print(f"{posterior:.1%}")   # 24.0% — most flags are false alarms!

NEXT EXPERIMENT →

CODE & CURRY

APPROVED

ML KITCHEN