Home Concept Explainers Generative AI Generative AI: From Next-Token Prediction to Real Creation

Generative AI Agent loop 3 sliders

Generative AI: From Next-Token Prediction to Real Creation

Generative AI is autoregressive prediction with style. Adjust temperature and top-p to see why the same prompt can sound boring or wildly creative.

Apr 29, 2026 · 2 min de leitura

Ir para o laboratório Sem cadastro · Grátis para sempre

▸ Experimente você mesmo

Arraste um slider — o diagrama reage em tempo real.

Espaço para play · ←/→ para scrubar

Agent loop

FR /100 SN-74A

SPACE · ◄ ►

¶ A analogia

The improv-musician analogy

A jazz musician hears the last bar and plays the next note. Then the next, then the next. Each note is a prediction of what fits — but the chain of predictions becomes a solo nobody has ever heard before.

Generative AI works the same way. It does not "have an idea" up front. It predicts the next token, appends it, predicts again, and the chain becomes a paragraph, a function, a poem. Out of pure prediction emerges something that looks and feels like creation.

Generative vs discriminative

Discriminative models answer "which class is this?" — spam vs not-spam, cat vs dog.
Generative models answer "what comes next?" — and by repeating that question, produce open-ended output.

LLMs, diffusion models for images, and audio models for speech are all generative. The output type changes; the next-step prediction idea does not.

How autoregressive text works

Tokenize the prompt.
Run the model — get a probability distribution over every possible next token (often 50k–200k options).
Sample one token from that distribution.
Append it. Go back to step 2 until you hit a stop token or max_tokens.

The whole "intelligence" of the output rides on two things: how good the distribution is (the model) and how you sample from it (decoding strategy).

The two knobs that change the vibe

Setting	Low value	High value
Temperature	Greedy, repetitive, "safe"	Creative, surprising, sometimes nonsense
Top-p (nucleus)	Only the most likely tokens	Long tail allowed, more variety

Production tip: temperature 0–0.3 for code, classification, structured output. 0.7–1.0 for creative prose. Adjust top-p before you crank temperature past 1.

Beyond text

Diffusion — start with noise, iteratively denoise toward an image.
Speech — generate audio tokens or waveform chunks autoregressively.
Code — same as text, but the eval metric is "does it compile and pass tests".

The shape of the model differs. The "predict, sample, repeat" loop does not.

+ Continue

Domine o próximo conceito

Ver tudo

Agent loop 3

Generative AI 2 min de leitura

Prompt Engineering Patterns That Actually Work in Production

Five prompt patterns that survive contact with real users. Tune few-shot count, system strictness, and output format to feel the t...

/prompt-engineering-pat… Testar agora