Home Concept Explainers Neural Networks & Deep Learning Neurons, Layers, and Why Depth Matters

Neural Networks & Deep Learning Agent loop 3 sliders

Neurons, Layers, and Why Depth Matters

A neuron is a weighted sum followed by a kink. Stack a million in layers and you get a function that approximates almost anything.

Apr 29, 2026 · 3 min lezen

Naar het lab Geen registratie · Voor altijd gratis

▸ Probeer het zelf

Sleep een slider — het diagram reageert in real time.

Spatie voor play · ←/→ om te scrubben

Agent loop

FR /100 SN-74A

SPACE · ◄ ►

¶ De analogie

The voting-committee analogy

A single neuron is a tiny voter. It listens to a few inputs, weighs them ("trust this one a lot, that one a little"), adds the votes up, and either fires "yes" or stays quiet depending on whether the total clears a bar.

Stack a thousand voters in a layer and you get a committee. Stack a hundred committees in layers that pass their decisions to the next, and you get a deep network that can recognise faces, translate Mandarin, or predict the next token of code. None of the individual voters is smart — the structure is.

What one neuron does

output = activation(w · x + b)

x — vector of inputs.
w — vector of learned weights (one per input).
b — a learned bias.
activation — a non-linear bend (ReLU, GELU, sigmoid, tanh).

Without the activation, stacking neurons is mathematically equivalent to one big linear layer — useless for non-trivial tasks. The non-linear bend is what makes deep networks expressive.

Why depth beats width

The universal approximation theorem says a single (very wide) hidden layer can approximate any function. Reality: it would need exponentially many neurons. Deep networks compose simpler functions, which is exponentially more efficient for problems with hierarchical structure (like images, language, or code).

Common pattern: early layers learn primitives (edges, syllables, tokens), middle layers learn parts (corners, words, phrases), late layers learn whole concepts (faces, sentences, intents).

Activation functions you'll meet

ReLU (max(0, x)) — fast, simple, default for most networks.
GELU — smooth ReLU variant; standard inside Transformer FFNs.
Sigmoid / tanh — historically common, mostly retired from deep nets due to vanishing gradients; still used at outputs (e.g. binary classification).
Softmax — used at the very end to turn raw scores into a probability distribution over classes or vocab tokens.

Why "deep" learning is hard

Stacking layers naively breaks training: gradients vanish, activations blow up, weights become unbalanced. The infrastructure that makes deep networks trainable today is roughly:

Skip connections (ResNet) — let gradients bypass layers.
Layer / batch normalisation — keep activations in a stable range.
Better optimisers (Adam, AdamW) — handle uneven gradients.
Initialisation schemes (Xavier, He) — start weights at the right scale.

Without these, "deep" stops working past 5–10 layers. With them, 100+ layers train reliably.

Capacity vs data

A deeper, wider network has more capacity — it can fit harder functions, but it can also memorise noise. The brake is data: enough varied examples to force it to generalise instead of memorising. Modern LLMs train on trillions of tokens precisely because their parameter counts demand it.

From the field

The depth-versus-data lesson here is the one I actually use when sizing a build. More capacity — a bigger model — only helps if you have the varied data to fill it; otherwise it memorises noise, dazzles in the demo, and falls apart on real inputs. For most client problems the honest move is the smallest model that clears the eval, not the biggest one the budget allows, because the small one is cheaper to serve, faster, and easier to reason about when it breaks. Capacity is seductive. Data and a clear eval are what actually decide whether the thing works in production.

→ Wilt u dit in uw stack?

AI Integration for Your App — ChatGPT, Claude & RAG

Your product already works. The goal here is to make it smarter, deflect repetitive support, turn your own content and data into answers, and automate the manual steps, without rebuilding from scratch...

Zie hoe ik kan helpen