Embeddings and Vector Databases

22 min read Lesson 17 / 28

Retrieval-Augmented Generation (RAG) allows Claude to answer questions grounded in your own documents — without fine-tuning. The core components are embeddings (numerical representations of text) and a vector database (optimized for similarity search).

What Are Embeddings?

An embedding converts a piece of text into a vector of floating-point numbers that encodes semantic meaning. Texts with similar meanings produce vectors that are close together in high-dimensional space.

import anthropic
import numpy as np

client = anthropic.Anthropic()

def embed(text: str) -> list[float]:
    """Generate an embedding using the Voyage embedding model via Anthropic."""
    # In practice, use voyage-3 or voyage-3-lite via the Voyage API
    # For demo, we'll show the structure
    response = client.beta.messages.create(
        model="claude-haiku-4-5",
        max_tokens=1,
        messages=[{"role": "user", "content": text}],
        # Embeddings use a separate endpoint in production
    )
    # Real embedding call:
    # import voyageai
    # vo = voyageai.Client()
    # return vo.embed([text], model="voyage-3").embeddings[0]
    return [0.0] * 1024  # Placeholder

def cosine_similarity(a: list[float], b: list[float]) -> float:
    a_arr = np.array(a)
    b_arr = np.array(b)
    return float(np.dot(a_arr, b_arr) / (np.linalg.norm(a_arr) * np.linalg.norm(b_arr)))

ChromaDB: Local Vector Store

ChromaDB is a developer-friendly vector database that runs locally — ideal for development and small-to-medium datasets.

import chromadb
from chromadb.utils import embedding_functions
import anthropic

# Use a real embedding function
embedding_fn = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"
)

chroma_client = chromadb.Client()
collection = chroma_client.create_collection(
    name="knowledge_base",
    embedding_function=embedding_fn,
)

# Add documents
documents = [
    "Claude is Anthropic's AI assistant, known for being helpful, harmless, and honest.",
    "The Anthropic API uses the Messages format with roles: user and assistant.",
    "Tool use allows Claude to call external functions and APIs during a conversation.",
    "RAG stands for Retrieval-Augmented Generation — it grounds LLM answers in documents.",
]

collection.add(
    documents=documents,
    ids=[f"doc_{i}" for i in range(len(documents))],
)

# Query
results = collection.query(
    query_texts=["How does Claude use external tools?"],
    n_results=2,
)

print(results["documents"][0])
# ['Tool use allows Claude to call external functions...', 'Claude is Anthropic\'s AI assistant...']

Pinecone: Production Vector Store

For production with millions of vectors:

from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_PINECONE_API_KEY")

index = pc.Index("knowledge-base")

# Upsert vectors (embeddings must be pre-generated)
index.upsert(vectors=[
    {"id": "doc_0", "values": embed("Claude is Anthropic's AI..."), "metadata": {"text": "Claude is..."}},
    {"id": "doc_1", "values": embed("Tool use allows Claude..."), "metadata": {"text": "Tool use..."}},
])

# Query
query_vector = embed("How do tools work in Claude?")
results = index.query(vector=query_vector, top_k=3, include_metadata=True)

for match in results["matches"]:
    print(f"Score: {match['score']:.3f} | {match['metadata']['text'][:80]}")

Vector databases are optimized for approximate nearest neighbor search — finding the most semantically relevant documents in milliseconds even at scale.