Home Concept Explainers Neural Networks & Deep Learning Backpropagation: How a Network Actually Learns

Neural Networks & Deep Learning MCP handshake 3 sliders

Backpropagation: How a Network Actually Learns

Backprop is just credit assignment — blame each parameter for the error, in proportion. Tune learning rate and batch size to see training stabilise or diverge.

Apr 29, 2026 · 3 min de leitura

Ir para o laboratório Sem cadastro · Grátis para sempre

▸ Experimente você mesmo

Arraste um slider — o diagrama reage em tempo real.

Espaço para play · ←/→ para scrubar

MCP handshake

FR /100 SN-312

SPACE · ◄ ►

¶ A analogia

The bad-meal analogy

A restaurant kitchen serves a dish. The customer says "too salty." The head chef has to figure out whose fault it was: the saucier who poured the salt? The cook who reduced the stock? The dishwasher who… probably not.

Backpropagation is exactly that blame assignment, automated. The model produces an answer, the loss says "you were off by X," and backprop walks the network backwards assigning a slice of blame (a gradient) to every single parameter.

The two-pass dance

Training one example is two passes through the network:

Forward pass — input flows in, predictions come out, a loss is computed (how wrong was that?).
Backward pass — using the chain rule of calculus, the loss is propagated backwards through every operation, producing a gradient for every parameter.

Then the optimiser nudges each parameter a step opposite its gradient. Repeat for billions of examples.

Three things the gradient tells you

For each parameter w, the gradient ∂L/∂w says:

Sign — "increase me" or "decrease me" to lower the loss.
Magnitude — how strongly this parameter influences the current error.
Trust — large gradients on a single example are noisy; the average across a batch is what gets used.

The optimiser does not know what a parameter "means." It just walks downhill on the loss landscape, one small step at a time.

The knobs that make or break training

Learning rate — too high and the network diverges (loss goes to infinity); too low and it never finishes. Modern training uses schedules (warmup → cosine decay) instead of one number.
Batch size — bigger batches give cleaner gradients but cost more memory; smaller batches inject useful noise. Effective batch size is often grown via gradient accumulation.
Gradient clipping — cap the gradient norm to stop occasional huge gradients (from rare tokens, etc.) from exploding the weights.

What goes wrong

Vanishing gradients — gradients shrink to ~0 in deep networks, layers stop learning. Residual connections and normalisation fix most of this.
Exploding gradients — gradients balloon, weights become NaN. Clipping handles it.
Bad init — start with weights too big or too small and forward activations blow up before backprop runs once. Modern frameworks initialise sensibly by default.

Backprop in one sentence

Backpropagation is the chain rule of calculus, applied automatically, so the optimiser can blame each parameter in exact proportion to how much it contributed to the error.

That sentence is the whole field of deep learning training. Everything else — Adam, momentum, schedules, layer norm — is engineering on top of it.

From the field

I've never written a backward pass by hand — autograd does it — but these concepts earn their keep the day a fine-tune misbehaves. When a training run's loss goes to NaN, it's almost always the learning rate too high or missing gradient clipping, exactly the failure modes above. When it flatlines and learns nothing, the rate's too low or the data's wrong. Holding the picture that the optimiser is just blindly walking downhill on a loss surface keeps expectations honest: it won't find the "right" answer, only a good-enough basin — and the quality of that basin is set by your data far more than any clever optimiser trick.

→ Quer isso na sua stack?

AI Integration for Your App — ChatGPT, Claude & RAG

Your product already works. The goal here is to make it smarter, deflect repetitive support, turn your own content and data into answers, and automate the manual steps, without rebuilding from scratch...

Ver como posso ajudar