The Quantum Leap from Q-Tables to Neural Networks
Q-tables fail for complex environments because the state space explodes. A game screen of 210x160 RGB pixels has more possible states than atoms in the universe. Deep Q-Learning (DQN) replaces the table with a neural network that generalizes across states.
Why Q-Tables Fail
Environment | States | Q-Table Feasibility
─────────────────────┼──────────────────┼────────────────────
Frozen Lake (4×4) | 16 | ✓ Easy
CartPole | Continuous (∞) | ✗ Impossible
Lunar Lander | 8 continuous vars| ✗ Impossible
Pac-Man (pixels) | ~10^100000 | ✗ Absurd
The DQN Insight
Instead of a lookup table, use a neural network to approximate Q(s,a):
Q-Table: Q[state][action] → value (exact lookup)
DQN: Q_θ(state) → [Q_a1, Q_a2, ...] (function approximation)
┌──────────────┐
State ───→ │ Neural │ ───→ Q(s, left) = 2.3
(vector) │ Network (θ) │ ───→ Q(s, right) = 5.1 ← Best!
└──────────────┘ ───→ Q(s, up) = 1.7
DQN Architecture
import torch.nn as nn
class DQN(nn.Module):
def __init__(self, state_size, action_size):
super().__init__()
self.network = nn.Sequential(
nn.Linear(state_size, 128), nn.ReLU(),
nn.Linear(128, 128), nn.ReLU(),
nn.Linear(128, action_size),
)
def forward(self, x):
return self.network(x)
Two Critical Innovations
DeepMind's 2013 breakthrough required two stabilization techniques:
1. Experience Replay — Store transitions, sample random mini-batches (breaks correlation) 2. Target Network — Separate slowly-updated network for computing targets (prevents moving target instability)
Without these, DQN training diverges. With them, DQN achieved superhuman performance on 29 out of 49 Atari games.
| Feature | Q-Learning | DQN |
|---|---|---|
| State space | Discrete, small | Continuous, large |
| Representation | Table | Neural network |
| Generalization | None | Yes — similar states get similar Q-values |
| Training stability | Guaranteed | Requires replay + target net |