The map-of-meaning analogy
Imagine a vast city where every shop is placed by what it sells. Bakeries cluster on one street. Bookshops down the next. Vegan bakeries sit at the intersection of bakery and health-food. Walk anywhere and the shops nearby will feel related.
An embedding model is the mapmaker. It places every chunk of text (or image, or audio) at coordinates in a high-dimensional space, such that similar things end up close together. Vector search is just asking: "what is nearest to this point?"
What an embedding actually is
An embedding is a list of numbers — typically 384 to 3072 dimensions — produced by a model that has learned what "similar" means from billions of examples.
"how do I reset my password" → [0.014, -0.221, 0.087, … ] (768 numbers)
"forgot my login" → [0.019, -0.213, 0.091, … ] (very close)
"recipe for sourdough" → [-0.41, 0.318, 0.002, … ] (far away)
You can't read the numbers. You don't have to. All you need is the distance between two of them.
The two distances you'll meet
- Cosine similarity — angle between vectors, ignores length. Default for text. Range −1 to 1, where 1 is "identical meaning."
- Dot product / inner product — fast, sensitive to magnitude. Common when vectors are normalised.
For most RAG systems, cosine is the right answer until you have a reason otherwise.
Why we don't just compare every pair
A million chunks × one query = a million distance calculations. Doable, but slow. Vector databases use Approximate Nearest Neighbour (ANN) indexes — HNSW, IVF, ScaNN — that trade a tiny bit of recall for huge speedups. A good ANN index returns top-k in milliseconds over hundreds of millions of vectors.
What you actually need to choose
| Decision | Sane default |
|---|---|
| Embedding model | A modern hosted model matched to your text language |
| Vector dimensions | What the model gives you — do not truncate without testing |
| Index type | HNSW for most workloads |
| Distance metric | Cosine for text |
| Top-k | 5–10, then rerank |
Where embeddings break
- Domain shift — a general-purpose embedder may not know your jargon. Try domain-specific or fine-tuned variants.
- Multi-language — pick a multilingual embedder or embed and query in the same language.
- Long documents — embedding a 10-page PDF as one vector loses everything. Chunk first, then embed.
- Symmetry trap — passage embedders and query embedders can be different models. Read the docs.