Gradient Descent: Rolling Down Curry Hill

⏱TIME: 16 min

🍽️YIELD: 1 optimiser intuition (and a healthy fear of learning rates)

📓CHAPTER: S2E7

The Idea

CONCEPT

A loss landscape drawn as a hilly curry surface. A chickpea rolls downhill in steps; three flame settings show LOW (crawls, gets stuck), MEDIUM (settles in the deepest dip), HIGH (ricochets out of the pot). Margin: 'the gradient only knows the slope under its feet.'

Gradient descent is greedy and local: it never sees the map, only the slope where it stands. That is both its power (cheap) and its trap (local minima).

Learning rate is the whole game — too low crawls and stalls in dips, too high overshoots and diverges. Schedulers and Adam exist to tune the flame mid-cook.

SGD trades exact gradients for noisy cheap ones; the noise itself helps hop out of sharp local minima.

Every neural network you will ever train is this chickpea, in a million dimensions.

In the Test Kitchen: try all three flame settings and find the start position where LOW gets trapped but MEDIUM escapes.

⚗️ The Test Kitchen

INTERACTIVE LAB

Don't just read the recipe — taste it. Drag, click and break things below.

EXP 01

Curry Hill

"roll downhill, mind the local dips"

The pot wants the bottom of the valley (minimum loss). Each step follows the slope downhill: x ← x − α·f′(x). The flame is α, the learning rate — SIMMER takes steps so small it gets stuck in the pothole, MEDIUM powers through to the true bottom, and FULL BLAST overshoots harder every step until the pot flies off. Click anywhere on the curve to drop the pot there.

α=0.8 · x=-3.00 · steps 0

FIG V.3: GRADIENT DESCENT — RED ARROW POINTS DOWNHILL (MINUS THE GRADIENT)

The Recipe

CODE

REQUIRED SPICESgradient descentlearning ratelocal minimumSGDconvergence

Vanilla gradient descent

def f(x):  return 0.15*(x-1.8)**2 + 0.3   # the hill (loss)
def df(x): return 0.30*(x-1.8)              # its slope (gradient)

x, lr = -3.0, 0.5                            # start + flame setting
for step in range(40):
    x -= lr * df(x)                          # walk against the slope
print(round(x, 3), round(f(x), 4))           # → 1.8, the minimum

NEXT EXPERIMENT →

CODE & CURRY

APPROVED

ML KITCHEN