Skip to main content

Vector Embeddings — How Computers Understand Meaning

17/40
Chapter 5 Mastering RAG — Build Knowledge Systems with Vector Embeddings

Vector Embeddings — How Computers Understand Meaning

22 min read Lesson 17 / 40 Preview

The Mathematical Foundation of RAG

Vector embeddings are the secret weapon behind modern search, recommendation, and RAG systems. They convert text into numbers that capture meaning — so computers can find similar content even when words are different.

What Are Embeddings?

An embedding converts text into a fixed-size array of numbers (a vector) where similar meanings are close together in vector space.

"I love programming" → [0.12, -0.45, 0.78, 0.33, ...]  (1536 dimensions)
"I enjoy coding"     → [0.11, -0.43, 0.76, 0.35, ...]  (very similar!)
"The weather is nice" → [0.89, 0.22, -0.15, 0.67, ...]  (very different)

Generating Embeddings with OpenAI

from openai import OpenAI
import numpy as np

client = OpenAI()

def get_embedding(text, model="text-embedding-3-small"):
    """Get embedding vector for a text string."""
    response = client.embeddings.create(
        model=model,
        input=text
    )
    return response.data[0].embedding

# Generate embeddings
e1 = get_embedding("How to train a neural network")
e2 = get_embedding("Steps to build a deep learning model")
e3 = get_embedding("Best restaurants in Paris")

print(f"Dimension: {len(e1)}")  # 1536 for text-embedding-3-small

# Calculate cosine similarity
def cosine_similarity(a, b):
    a, b = np.array(a), np.array(b)
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

print(f"Similar texts: {cosine_similarity(e1, e2):.4f}")    # ~0.85
print(f"Different texts: {cosine_similarity(e1, e3):.4f}")   # ~0.15

Open-Source Embeddings with Sentence Transformers

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")  # Free, runs locally

texts = [
    "Machine learning is a subset of AI",
    "Deep learning uses neural networks",
    "I went grocery shopping yesterday",
]

embeddings = model.encode(texts)
print(f"Shape: {embeddings.shape}")  # (3, 384)

# Compare all pairs
from sentence_transformers.util import cos_sim
similarities = cos_sim(embeddings, embeddings)
print(similarities)

Embedding Models Compared

Model Dimensions Cost/1M tokens Quality Speed
text-embedding-3-large 3072 $0.13 Best Fast (API)
text-embedding-3-small 1536 $0.02 Very good Fast (API)
all-MiniLM-L6-v2 384 Free Good Very fast
all-mpnet-base-v2 768 Free Very good Fast
BGE-large-en-v1.5 1024 Free Excellent Medium
Cohere embed-v3 1024 $0.10 Excellent Fast (API)

Batch Processing for Efficiency

def batch_embed(texts, batch_size=100):
    """Efficiently embed large collections of texts."""
    all_embeddings = []
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i+batch_size]
        response = client.embeddings.create(
            model="text-embedding-3-small",
            input=batch
        )
        batch_embeddings = [d.embedding for d in response.data]
        all_embeddings.extend(batch_embeddings)
        print(f"Embedded {min(i+batch_size, len(texts))}/{len(texts)}")
    return all_embeddings

# Embed 1000 documents efficiently
documents = ["Document " + str(i) for i in range(1000)]
embeddings = batch_embed(documents)

Key Takeaway

Embeddings are the foundation of semantic search and RAG. They convert meaning into math, enabling computers to find relevant content even when exact keywords do not match. Choose OpenAI for best quality, or sentence-transformers for free local embeddings.