Home Concept Explainers Retrieval-Augmented Generation Embeddings and Vector Search, Without the Math

Retrieval-Augmented Generation Crawler graph 3 sliders

Embeddings and Vector Search, Without the Math

Embeddings turn meaning into coordinates. Move the dimension, top-k, and metric sliders to see how a vector store finds the nearest neighbours.

Apr 29, 2026 · 3 min lezen

Naar het lab Geen registratie · Voor altijd gratis

▸ Probeer het zelf

Sleep een slider — het diagram reageert in real time.

Spatie voor play · ←/→ om te scrubben

Crawler graph

FR /100 SN-514

SPACE · ◄ ►

¶ De analogie

The map-of-meaning analogy

Imagine a vast city where every shop is placed by what it sells. Bakeries cluster on one street. Bookshops down the next. Vegan bakeries sit at the intersection of bakery and health-food. Walk anywhere and the shops nearby will feel related.

An embedding model is the mapmaker. It places every chunk of text (or image, or audio) at coordinates in a high-dimensional space, such that similar things end up close together. Vector search is just asking: "what is nearest to this point?"

What an embedding actually is

An embedding is a list of numbers — typically 384 to 3072 dimensions — produced by a model that has learned what "similar" means from billions of examples.

"how do I reset my password" → [0.014, -0.221, 0.087, … ]   (768 numbers)
"forgot my login"             → [0.019, -0.213, 0.091, … ]   (very close)
"recipe for sourdough"        → [-0.41,  0.318, 0.002, … ]   (far away)

You can't read the numbers. You don't have to. All you need is the distance between two of them.

The two distances you'll meet

Cosine similarity — angle between vectors, ignores length. Default for text. Range −1 to 1, where 1 is "identical meaning."
Dot product / inner product — fast, sensitive to magnitude. Common when vectors are normalised.

For most RAG systems, cosine is the right answer until you have a reason otherwise.

Why we don't just compare every pair

A million chunks × one query = a million distance calculations. Doable, but slow. Vector databases use Approximate Nearest Neighbour (ANN) indexes — HNSW, IVF, ScaNN — that trade a tiny bit of recall for huge speedups. A good ANN index returns top-k in milliseconds over hundreds of millions of vectors.

What you actually need to choose

Decision	Sane default
Embedding model	A modern hosted model matched to your text language
Vector dimensions	What the model gives you — do not truncate without testing
Index type	HNSW for most workloads
Distance metric	Cosine for text
Top-k	5–10, then rerank

Where embeddings break

Domain shift — a general-purpose embedder may not know your jargon. Try domain-specific or fine-tuned variants.
Multi-language — pick a multilingual embedder or embed and query in the same language.
Long documents — embedding a 10-page PDF as one vector loses everything. Chunk first, then embed.
Symmetry trap — passage embedders and query embedders can be different models. Read the docs.

From the field

Teams agonise over which embedding model to pick, then lose all that quality at the chunking step. How you split documents matters more than the model in most builds I've shipped: chunk too big and the vector is a blurry average of five topics; too small and you retrieve a sentence with no context. I start at roughly a paragraph-to-section per chunk, with a little overlap, and I store the surrounding heading alongside each chunk so retrieval carries context. Swap the embedding model last — after chunking and reranking are dialled in. That's the order that actually moves recall.

→ Wilt u dit in uw stack?

Custom AI Customer-Support Agent Development

Your team stops re-answering the same questions, and customers get accurate replies in seconds instead of waiting in a queue. I build a custom AI support agent — grounded in your help docs, FAQs, and...

Zie hoe ik kan helpen