K-Nearest Neighbours: Ask the Neighbours

⏱TIME: 13 min

🍽️YIELD: 1 lazy learner + a rule of thumb for k

📓CHAPTER: S4E5

The Idea

CONCEPT

A flavour map of samosas (purple) and dosas (teal) with a mystery '?' dish in the overlap zone. A dashed circle encloses its k=5 nearest neighbours, each casting a vote. Two thought bubbles: k=1 'trust the nearest gossip', k=25 'poll the whole street'.

KNN has no training step — it memorises the pantry and answers queries by measuring. All the cost moves to prediction time.

k is a bias-variance dial: k=1 memorises noise (jagged boundary), huge k blurs everything toward the majority class.

Because it is pure distance, unscaled features silently rig every vote — the gram-valued column wins. Standardise first, always.

At production scale exact neighbour search dies; approximate indexes (FAISS, HNSW, Annoy) buy speed with a tiny accuracy tax.

In the Test Kitchen: drop the mystery dish in the overlap zone, then swing k from 1 to 15 and watch the verdict flip.

⚗️ The Test Kitchen

INTERACTIVE LAB

Don't just read the recipe — taste it. Drag, click and break things below.

EXP 01

Ask the Neighbours

"no training, just democracy"

Purple dots are samosas, teal dots are dosas, mapped by flavour. Click anywhere to drop a mystery dish — its k nearest neighbours vote on its identity. Try k = 1 near the messy middle (one odd neighbour fools it), then k = 15 (the vote smooths out, but local detail dies).

NEIGHBOURS k3vote 3🟣 : 0🟢 → it's a samosa

FIG L.7: K-NEAREST NEIGHBOURS — NO TRAINING, JUST DISTANCE + DEMOCRACY. THE DASHED CIRCLE IS THE ELECTORATE

The Recipe

CODE

REQUIRED SPICESKNNk selectiondistancelazy learningscaling

KNN done properly (scaled!)

from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

# scaling is NOT optional: KNN is pure distance
knn = make_pipeline(StandardScaler(),
                    KNeighborsClassifier(n_neighbors=5))
knn.fit(X_train, y_train)
print(knn.score(X_test, y_test))
# odd k avoids ties; sweep k with cross-validation, not vibes

NEXT EXPERIMENT →

CODE & CURRY

APPROVED

ML KITCHEN