The model is ŷ = wx + b; training means choosing w, b to minimise mean squared residual. Nothing more mystical than that.
Squaring residuals makes big misses very expensive and gives a smooth bowl-shaped loss with a single bottom — which is why OLS has a closed-form solution.
The slope is the story: "+0.63 happiness per spice level". Interpretability is linear regression's superpower; protect it by scaling features.
Always plot residuals: curves or fans in that plot mean the straight-line assumption is wrong, no matter how nice R² looks.
In the Test Kitchen: drop a few outlier orders and watch the fitted line — and the red residuals — chase them.
Don't just read the recipe — taste it. Drag, click and break things below.
Every dot is one order: x = spice level, y = customer happiness. Click the paper to add your own orders, then hit FIT LINE — gradient descent nudges the line until the squared error stops shrinking.
FIG V.1: LINEAR REGRESSION — RED DASHES = RESIDUALS (THE ERRORS BEING SQUARED)
from sklearn.linear_model import LinearRegression import numpy as np X = np.array([[1],[2],[3],[4],[5],[6],[7]]) # spice level y = np.array([2.1, 2.4, 3.4, 3.7, 4.6, 5.1, 5.9]) model = LinearRegression().fit(X, y) print(model.coef_[0], model.intercept_) # slope ŵ, bias b̂ resid = y - model.predict(X) print((resid**2).mean()) # MSE — what OLS minimised