Skip to main content
Chapter 3 Deep Q-Learning — Neural Networks Meet Reinforcement Learning

Deep Q-Learning Fundamentals — Why Tables Are Not Enough

22 min read Lesson 11 / 50 Preview

The Quantum Leap from Q-Tables to Neural Networks

Q-tables fail for complex environments because the state space explodes. A game screen of 210x160 RGB pixels has more possible states than atoms in the universe. Deep Q-Learning (DQN) replaces the table with a neural network that generalizes across states.

Why Q-Tables Fail

Environment          | States           | Q-Table Feasibility
─────────────────────┼──────────────────┼────────────────────
Frozen Lake (4×4)    | 16               | ✓ Easy
CartPole             | Continuous (∞)   | ✗ Impossible
Lunar Lander         | 8 continuous vars| ✗ Impossible
Pac-Man (pixels)     | ~10^100000       | ✗ Absurd

The DQN Insight

Instead of a lookup table, use a neural network to approximate Q(s,a):

Q-Table:  Q[state][action] → value     (exact lookup)
DQN:      Q_θ(state) → [Q_a1, Q_a2, ...]  (function approximation)

           ┌──────────────┐
State ───→ │ Neural       │ ───→ Q(s, left)  = 2.3
(vector)   │ Network (θ)  │ ───→ Q(s, right) = 5.1 ← Best!
           └──────────────┘ ───→ Q(s, up)    = 1.7

DQN Architecture

import torch.nn as nn

class DQN(nn.Module):
    def __init__(self, state_size, action_size):
        super().__init__()
        self.network = nn.Sequential(
            nn.Linear(state_size, 128), nn.ReLU(),
            nn.Linear(128, 128), nn.ReLU(),
            nn.Linear(128, action_size),
        )
    def forward(self, x):
        return self.network(x)

Two Critical Innovations

DeepMind's 2013 breakthrough required two stabilization techniques:

1. Experience Replay — Store transitions, sample random mini-batches (breaks correlation) 2. Target Network — Separate slowly-updated network for computing targets (prevents moving target instability)

Without these, DQN training diverges. With them, DQN achieved superhuman performance on 29 out of 49 Atari games.

Feature Q-Learning DQN
State space Discrete, small Continuous, large
Representation Table Neural network
Generalization None Yes — similar states get similar Q-values
Training stability Guaranteed Requires replay + target net