The Mathematical Foundation of RAG
Vector embeddings are the secret weapon behind modern search, recommendation, and RAG systems. They convert text into numbers that capture meaning — so computers can find similar content even when words are different.
What Are Embeddings?
An embedding converts text into a fixed-size array of numbers (a vector) where similar meanings are close together in vector space.
"I love programming" → [0.12, -0.45, 0.78, 0.33, ...] (1536 dimensions)
"I enjoy coding" → [0.11, -0.43, 0.76, 0.35, ...] (very similar!)
"The weather is nice" → [0.89, 0.22, -0.15, 0.67, ...] (very different)
Generating Embeddings with OpenAI
from openai import OpenAI
import numpy as np
client = OpenAI()
def get_embedding(text, model="text-embedding-3-small"):
"""Get embedding vector for a text string."""
response = client.embeddings.create(
model=model,
input=text
)
return response.data[0].embedding
# Generate embeddings
e1 = get_embedding("How to train a neural network")
e2 = get_embedding("Steps to build a deep learning model")
e3 = get_embedding("Best restaurants in Paris")
print(f"Dimension: {len(e1)}") # 1536 for text-embedding-3-small
# Calculate cosine similarity
def cosine_similarity(a, b):
a, b = np.array(a), np.array(b)
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
print(f"Similar texts: {cosine_similarity(e1, e2):.4f}") # ~0.85
print(f"Different texts: {cosine_similarity(e1, e3):.4f}") # ~0.15
Open-Source Embeddings with Sentence Transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2") # Free, runs locally
texts = [
"Machine learning is a subset of AI",
"Deep learning uses neural networks",
"I went grocery shopping yesterday",
]
embeddings = model.encode(texts)
print(f"Shape: {embeddings.shape}") # (3, 384)
# Compare all pairs
from sentence_transformers.util import cos_sim
similarities = cos_sim(embeddings, embeddings)
print(similarities)
Embedding Models Compared
| Model | Dimensions | Cost/1M tokens | Quality | Speed |
|---|---|---|---|---|
| text-embedding-3-large | 3072 | $0.13 | Best | Fast (API) |
| text-embedding-3-small | 1536 | $0.02 | Very good | Fast (API) |
| all-MiniLM-L6-v2 | 384 | Free | Good | Very fast |
| all-mpnet-base-v2 | 768 | Free | Very good | Fast |
| BGE-large-en-v1.5 | 1024 | Free | Excellent | Medium |
| Cohere embed-v3 | 1024 | $0.10 | Excellent | Fast (API) |
Batch Processing for Efficiency
def batch_embed(texts, batch_size=100):
"""Efficiently embed large collections of texts."""
all_embeddings = []
for i in range(0, len(texts), batch_size):
batch = texts[i:i+batch_size]
response = client.embeddings.create(
model="text-embedding-3-small",
input=batch
)
batch_embeddings = [d.embedding for d in response.data]
all_embeddings.extend(batch_embeddings)
print(f"Embedded {min(i+batch_size, len(texts))}/{len(texts)}")
return all_embeddings
# Embed 1000 documents efficiently
documents = ["Document " + str(i) for i in range(1000)]
embeddings = batch_embed(documents)
Key Takeaway
Embeddings are the foundation of semantic search and RAG. They convert meaning into math, enabling computers to find relevant content even when exact keywords do not match. Choose OpenAI for best quality, or sentence-transformers for free local embeddings.