Unlimited AI Memory: Pinecone + Claude System I Built
I was six prompts deep into a client strategy session with Claude last Tuesday when the context window hit the wall. Again. The conversation we had three days earlier about their ICP — gone. The Gmail thread where the founder explained their biggest churn driver — gone. The notes I had pasted from a 90-minute sales call — compressed into a vague summary Claude kept hallucinating details from.
I closed the chat. Opened a new one. Started typing the same background context I had already typed four times that week.
That was the moment I decided I was done fighting Claude's memory problem with willpower. The context window is not getting meaningfully longer fast enough for the way I actually work — which is across dozens of projects, hundreds of emails, and years of notes I do not want to re-explain to an AI every single morning. So I built something I have wanted for two years: a proper unlimited AI memory Pinecone Claude setup that actually remembers everything I tell it, searches by meaning instead of keywords, and plugs into Claude Code, Claude for Work, and the desktop apps without breaking.
This is not theory. I have been running it for three weeks on a real workload — 200+ research documents, my last 90 days of Gmail, client project notes, and chat logs from ongoing Claude sessions. Here is exactly how I built it, what it costs, where it broke, and the one thing I would tell anyone before they try the same thing.
Why Claude's Memory Problem Is Not Actually a Memory Problem
Let me reframe this before we go further, because the way most people talk about "AI memory" is wrong — and it kept me from building the right thing for a full year.
Claude does not have a memory problem. Claude has a retrieval problem.
The model itself is brilliant at reasoning over whatever is in the context window. Opus 4.6 handles a million tokens now. Sonnet holds 200K comfortably. That is already enough context to fit most client projects, a few books, or a month of email threads. The issue is not that Claude cannot hold context. The issue is that you, the human, have no practical way to decide which context to stuff into the window on any given turn.
Think about your own workflow. Right now, your "second brain" is scattered across Gmail, Notion, Google Docs, Slack threads, a messy Downloads folder full of PDFs, and probably a few Claude conversations you wish you had saved. When you start a new session with Claude and ask "help me write a follow-up to that investor who passed last quarter," Claude has no way to know which investor, which quarter, which email thread, or what you said in that previous strategy session.
You could paste all of it in. But then you are back to doing the librarian job yourself — the exact thing you wanted the AI to help with in the first place.
A vector database fixes this by letting Claude ask the library, instead of you carrying the books. That is the entire game. And once I understood that, everything about the setup got simpler. Before you write a single line of config, I need you to understand what semantic search actually does — because the difference between a Pinecone memory that feels magical and one that feels like a waste of $25/month comes down to this one concept.
Semantic Search vs Keyword Search: The Distinction That Changes Everything
Here is a test I ran last month that made this click for me. I took the same question and ran it against Gmail search and against a Pinecone index that held my last 90 days of email.
The question: "What did the founder from the fintech startup say about their churn issue?"
Gmail's result: nothing. Zero matches. I had to manually search for "churn," then "retention," then the founder's first name, then the startup's name. Four separate searches to piece together a single answer. Gmail is matching strings. If the founder said "users keep leaving after month two" without using the word churn, Gmail will never find it. That is a keyword search engine pretending to be a knowledge tool.
Pinecone's result: three emails, ranked by relevance. The top hit was a thread where the founder wrote "retention is our #1 problem right now — we are losing 40% of users between week two and week four." The word churn appears nowhere in that email. Semantic search found it because it understood that churn, retention loss, and users leaving all live in the same region of meaning.
That is the difference. Keyword search matches the letters you typed. Semantic search matches what you meant. When Claude is sitting on top of that second one, you can ask questions like "what were my best lead generation strategies last quarter" or "which clients pushed back on my pricing" and get real answers pulled from your actual history — not a hallucinated guess.
The magic that makes this work is embeddings. An embedding model reads a chunk of text and converts it into a list of 1,024 numbers that represent its meaning in mathematical space. Two pieces of text that mean similar things land near each other in that space, even if they share zero words. Pinecone stores those vectors and lets you query them with a second vector (your question, also embedded) and returns whichever stored vectors are closest in meaning.
If that sounds abstract, here is the only thing you need to remember: Pinecone is a database where the search index is meaning, not words. Everything else in this post is plumbing. The plumbing is where most people get stuck, though, so let me walk you through exactly what I set up.
The Full Stack I Am Running
Before I show you the step-by-step, here is what the system actually looks like on my machine as of April 2026, so you know what you are building toward:
- Pinecone Starter plan — free, 2GB storage, 5M embedding tokens per month on the hosted multilingual-e5-large model, 2M write units and 1M read units per month. That is more than enough for personal memory at my scale. I have not hit a single limit in three weeks.
- Pinecone Plugin for Claude Code — Anthropic and Pinecone shipped an official plugin that exposes Pinecone operations as slash commands and natural language tools.
/pinecone:quickstartliterally walks you through your first index. This did not exist when I started experimenting last year. - Three separate indexes: one for research documents, one for Gmail archives, one for saved Claude conversations. I tried to cram everything into one index first. Do not do that — I will explain why below.
- Antigravity IDE as the visual layer for bulk uploading files into Pinecone. You can do the same thing from Claude Code directly, but Antigravity is faster when you are dragging 200 PDFs in at once.
- A custom "remember this" skill in Claude that forwards the current conversation into Pinecone on command.
Total setup time on a clean machine: about 45 minutes if you already have Claude Code installed and a Pinecone account. Total monthly cost so far: $0. I expect that to become about $25/month once I scale up the email indexing past 10,000 messages, but right now it is genuinely free.
Now let us build it.
Step 1: Pinecone Account and API Key
Go to pinecone.io, sign up, pick the Starter plan, and create an API key from the dashboard. Copy the key immediately — Pinecone shows it to you once, and if you lose it you have to rotate it.
Set it as an environment variable on your machine before you start Claude Code:
export PINECONE_API_KEY="your-key-here"
On macOS or Linux, I put this in ~/.zshrc so it is available in every new terminal. On Windows, use System Environment Variables. The reason this has to be an env var and not pasted into a config file: the official Pinecone plugin reads PINECONE_API_KEY from the environment at startup, and Claude Code will not prompt you for it later. Miss this step and every Pinecone command will fail with a confusing auth error.
Pro tip that saved me an hour: if you already had Claude Code open when you set the env var, you need to fully close and reopen it. Claude Code does not pick up new environment variables on hot reload. I wasted a solid thirty minutes convincing myself my API key was broken before I realized I just needed to restart the CLI.
Step 2: Install the Pinecone Plugin for Claude Code
Inside Claude Code, install the official plugin:
/plugin install pinecone
This is the part that did not exist a year ago, and it is what makes this whole setup viable for people who do not want to write Python glue code. The plugin adds a set of slash commands like /pinecone:query, /pinecone:upsert, /pinecone:list-indexes, and the one you should run first: /pinecone:quickstart. Quickstart walks through a tiny example so you can confirm your API key is working and your environment is ready.
More importantly, the plugin also registers Pinecone as a tool Claude can call in natural language. Once it is installed, I can just type "search my research index for anything about customer acquisition in B2B SaaS" and Claude invokes the right query under the hood. No memorizing command syntax.
If you prefer a pure MCP setup or you are on Claude for Work where the plugin is not available yet, there is a Pinecone MCP server you can configure manually. But for most people reading this, the plugin is the path of least resistance.
Pinecone Plugin for Claude Code
https://github.com/pinecone-io/pinecone-claude-code-plugin
Step 3: Create Your First Index (And Why I Named Mine Wrong)
An "index" in Pinecone is just a named collection of vectors with a fixed dimensionality and distance metric. You need one per logical memory bucket. I am going to save you a mistake I made on day one:
Do not name your index after a project, a topic, or a city.
The guy in the video that inspired this whole setup named his first index "Los Angeles" and it is a perfect example of what not to do. The name should describe the category of memory it holds, because you will be typing it in queries and sharing it across sessions. I started with my-stuff — equally bad. Six days in, I migrated everything to three indexes with real names:
research-library— PDFs, articles, book summaries, transcriptsgmail-archive— email content with metadataclaude-conversations— saved AI chat history
Inside Claude Code, creating an index is a one-liner once the plugin is installed:
Create a Pinecone index called "research-library" using the
multilingual-e5-large hosted embedding model, 1024 dimensions, cosine
metric, serverless on AWS us-east-1.
Claude handles the API call and returns a confirmation. The multilingual-e5-large model is the one I recommend for most people because Pinecone hosts it, you do not have to manage an embedding API key separately, and the free tier gives you 5 million embedding tokens per month on it. That is roughly 3.5 million words. You will not run out during setup.
One gotcha: you cannot change an index's dimensionality or embedding model after you create it. If you create an index with one model and try to upsert vectors from a different model later, Pinecone will reject them. Pick your embedding model once, commit, and use the same one everywhere in that index.
Step 4: Vectorize Your First Batch of Content
This is the part where most people stall, so I want to walk you through my actual workflow instead of the hypothetical version.
Here is what I did on day one. I had about 40 PDFs sitting in a folder called ~/research — a mix of marketing playbooks, a few books I had summarized, and transcripts from YouTube videos I had downloaded. I opened Antigravity IDE, pointed it at that folder, and dragged the whole thing into a Claude Code session with this prompt:
Read every PDF in this folder. For each one, chunk it into sections of roughly 500 tokens with 50 tokens of overlap. Generate embeddings using the hosted multilingual-e5-large model, and upsert each chunk into the
research-libraryindex. For each vector, include metadata:source_file,chunk_index,title, anddate_added. Skip any file that already exists in the index based onsource_file.
Claude chewed through it in about six minutes. 40 files became about 1,800 vector entries. The metadata part is the piece people skip, and I am begging you not to skip it. Metadata is what lets you filter queries later — "search the research library, but only chunks from files I added in the last 30 days" — without it, you are stuck searching the whole index every time.
A few rules I learned the hard way about chunking:
- Too small and you lose context. I tried 200-token chunks and the retrieved results were meaningless fragments. 400 to 600 tokens is the sweet spot for most text.
- Overlap matters. A 10% overlap between chunks means a sentence that crosses a boundary still has a chance of being retrieved whole. No overlap, and you lose the glue.
- Tables and code blocks get mangled by naive chunkers. For documents heavy in either, tell Claude explicitly to preserve code blocks as single units and not split them across chunks.
If you are thinking "this is exactly what RAG Anything solved for scanned PDFs," you are right — that post covers the multimodal version of the same problem. For plain text, the simple chunker Claude runs inline is fine.
You can now ask Claude natural questions against your research library and get real answers pulled from your actual source material. That alone is worth the 45 minutes. But this is where the system goes from "cool trick" to "genuinely changes how you work" — and it is the part nobody in the YouTube tutorials explains clearly.
Step 5: Making Claude Remember Its Own Conversations
A Pinecone index of research documents is useful. A Pinecone index of your own conversations with Claude is transformative. Here is why.
Every time I solve a problem with Claude — debug a weird Postgres error, work through a positioning exercise, sketch out a campaign strategy — that conversation contains signal I will need again in 30 days when a similar problem shows up. Right now, 95% of that signal gets thrown away the moment I close the chat. I have built this same solution to the same problem probably twelve times in the last year, because Claude does not remember what we figured out last month.
The fix is embarrassingly simple. I added a custom skill in Claude Code that does one thing: when I type "remember this conversation as [topic]," it takes the current transcript, chunks it, embeds it, and upserts it into the claude-conversations index with metadata including the date, the topic I specified, and the project I was working on.
Then, at the start of any future session, my default system prompt tells Claude: "Before answering any substantive question, check the claude-conversations index for prior discussions on related topics. If relevant results exist, read them and reference the prior thinking."
What that turns into in practice: last week I asked Claude to help me think through pricing for a new service offering. Before answering, it queried its own memory, found a conversation from six weeks earlier where we had worked through pricing psychology for a different offering, and opened its answer with "based on the pricing framework we developed on February 24th for the audit service, here is how that might apply to this new offering."
I did not tell it about February 24th. I did not paste anything in. I did not even remember the conversation until it surfaced it. That is what a proper unlimited AI memory Pinecone Claude system unlocks, and it is the feature that made me stop using anything else. If you want to go deeper on this specific pattern, I wrote up my earlier experiment with it in the Claude Code Autodream memory system post — this Pinecone approach is essentially the production-grade version of that idea.
Step 6: Vectorizing Gmail (The One That Broke Everything)
Everything up to this step worked on the first try. This step did not.
Gmail's API is a hostile environment for bulk exports. It has aggressive rate limits, no good "give me everything since date X" endpoint for body content, and attachment handling that will break your script if you are not careful. My first attempt, which was a "just let Claude write a script that pulls the last 500 messages and upserts them," failed three times in a row. The script kept hitting the 250-request-per-user-per-second quota and getting partial results.
Here is what finally worked. I used the Gmail MCP server already available inside Claude to pull emails in batches of 50, one batch at a time, with a 5-second pause between batches. For each email I extracted: subject, sender, date, body (plain text, not HTML), and any labels. I stripped out quoted reply threads — if you do not, you get the same content vectorized five times because the same thread quotes itself on every reply. Then I chunked the body into 500-token pieces (most emails fit in one chunk) and upserted them to the gmail-archive index with rich metadata.
Processing 250 emails took about four minutes. Processing 2,000 emails took about 40 minutes. I would not try 10,000+ in a single pass without a proper queue and resume logic — the moment Claude's session times out mid-run, you lose your place and have to restart.
The payoff is absurd. I can now ask things like "find any emails where someone mentioned wanting to collaborate but we never followed up" and get a ranked list of real threads from real people. No Gmail search in the world does that.
One honest limitation before anyone gets too excited: if you vectorize emails, you are creating a searchable copy of every email body on Pinecone's infrastructure. Think about what is in your inbox. Client NDAs. Personal health conversations. Financial statements. For me, on a personal free-tier Pinecone account, the tradeoff was fine because I control the account and I am not storing anything regulated. For a business use case, you need to have the compliance conversation before you do this — especially if you handle any healthcare, legal, or financial data that falls under HIPAA, GDPR, or similar frameworks. If your business lives in those waters, talk to somebody like xCyberSecurity before you hit upsert on a production mailbox.
What I Got Wrong On The First Pass
I want to save you the specific mistakes I made, because most of them cost me real time.
Mistake 1: One giant index for everything. My first index was called mejba-brain and it contained PDFs, emails, chats, and project notes all mixed together. Queries got progressively worse as it grew, because an email from a friend about dinner plans was competing with a marketing playbook for semantic relevance. Separate indexes by category. It is not a performance thing — it is a precision thing.
Mistake 2: No metadata. Day one, I just upserted raw vectors. No source file. No date. No tags. After three days, I had 2,400 vectors and no way to filter them. I ended up wiping the index and rebuilding it with proper metadata schemas. Do this right the first time.
Mistake 3: Trusting the default chunk size. The first tool I tried used 1,000-token chunks with no overlap. Retrieved results were technically accurate but too long to be useful — Claude was getting huge walls of text for every query and spending most of its token budget on retrieval instead of reasoning. 400-600 token chunks with 10% overlap is the range that actually works.
Mistake 4: Not pruning. Three weeks in, I realized some of my earliest vectors were from experiments I had long abandoned — half-formed notes, duplicate chunks from messy imports, even some test data I upserted while learning the API. They were polluting results. I now run a monthly cleanup where I query for anything with a date_added older than 60 days that has not been touched and either re-validate or delete it. It takes ten minutes and keeps the system honest.
Mistake 5: Treating it like a backup. A vector database is not a backup. It is a lossy, searchable representation of your data. Do not delete the originals after vectorizing them. The vectors cannot reconstruct the source. If you want the system I eventually built to feel reliable, keep the original files in a dumb folder on disk and treat Pinecone as the search layer on top.
None of these are catastrophic. Every one of them cost me between 30 minutes and two hours to figure out. Now you do not have to.
What Actually Changed After Three Weeks
I am going to be careful here, because "results" sections are where most AI posts start inventing numbers. I do not have before-and-after dashboards. What I have is three weeks of lived workflow change, and I will tell you what I actually noticed.
The single biggest shift is that I stopped starting sessions with context-dumping. I used to open a new Claude chat and spend the first three to five minutes pasting background, project status, prior decisions. That is gone. I now just ask the question, and Claude pulls the context from Pinecone itself. My average "time-to-first-useful-answer" on any complex question dropped from roughly five minutes to under one.
The second shift is harder to quantify but more important: I started asking questions I used to skip. When the cost of a question is "dig through email for 15 minutes to remind yourself what happened," you ask fewer questions. When the cost drops to "type the question," you ask more. More questions means better decisions. I cannot put a number on that, but I can tell you I have noticed it every single day since I set this up.
The third shift is the one I did not expect. Having a persistent memory changed what I save in the first place. I now deliberately create notes I would never have bothered writing before, because I know they will be findable. Quick sales call notes. Half-baked ideas I want to revisit. Client quotes I want to reference later. The memory layer raised the value of writing things down, which raised the quality of what I was writing down, which fed the memory layer even better results. A flywheel.
If you are looking for exact numbers, industry benchmarks generally show RAG systems reducing retrieval time for knowledge work by 60-80% compared to manual search — that lines up with my experience, but I did not run a formal study. What I can say with confidence is that I have not turned this system off once since I set it up, and every time Claude surfaces something from two weeks ago unprompted, I get the same reaction I had the first time: "wait, you actually remembered that?"
FAQ
Frequently Asked Questions
Everything you need to know about this topic
For personal use, it costs $0/month on Pinecone's Starter plan as of April 2026. The Starter tier includes 2GB of storage, 5M embedding tokens per month on multilingual-e5-large, and enough read/write units for a single person's memory workload. Expect to move to the $25/month Standard plan only if you scale past about 10,000 documents or vectorize a multi-year email archive. For the full breakdown, see the "Full Stack" section above.
Pinecone is not a replacement for Claude's context window — it is a selector for it. Claude's window handles reasoning; Pinecone handles which pieces of your knowledge base get loaded into that window on any given turn. For workflows that span more than one session or more than a few documents, you need both. See the "Why Claude's Memory Problem Is Not Actually a Memory Problem" section for the full mental model.
Yes, but the official Pinecone plugin is easier inside Claude Code today. For Claude for Work, you can configure Pinecone as an MCP server or use the Pinecone skill that wraps the same operations. The core architecture — indexes, embeddings, semantic queries — is identical across both. The only difference is how you invoke it.
Use multilingual-e5-large hosted on Pinecone for personal use. It is free up to 5M tokens per month on the Starter plan, handles over 100 languages, and produces 1024-dimensional vectors that work well for general knowledge retrieval. Only switch to OpenAI's text-embedding-3-large or Voyage's voyage-3 if you are doing specialized domain work that e5 struggles with.
Yes. Obsidian markdown files vectorize cleanly — point Claude Code at your vault folder, chunk, and upsert to a dedicated index. NotebookLM integrates through its own skill that can forward source content into Pinecone. I cover the Obsidian version in my Obsidian and Claude Code persistent memory post, and the NotebookLM version in NotebookLM + Claude Code.
The Thing I Wish Someone Had Told Me
Here is the reframe I wish someone had put in front of me a year ago, because it would have saved me twelve months of context-dumping.
Your AI is not forgetful. Your life is disorganized. The context is not missing — it is scattered across Gmail, Slack, Notion, a downloads folder, and a pile of closed Claude tabs. A vector database does not give Claude a memory. It gives you a way to stop being the librarian for a brilliant assistant who is sitting there waiting for you to hand him the right book.
The moment you stop thinking of this as "fixing Claude" and start thinking of it as "building a second brain that Claude happens to read from," everything about the setup gets easier. You stop trying to cram everything into one giant index. You start naming things properly. You start writing more notes because you know they will be findable. You start asking better questions because the cost of a question drops.
Go sign up for Pinecone tonight. Install the plugin. Create one index — just one — called research-library. Vectorize the five most important PDFs on your machine, the ones you keep meaning to go back to. Then ask Claude one question against that index. That is the entire tutorial. The rest of this post is optimization on top of that first five-minute experience.
And the next time your Claude session forgets something important, you will not feel that sinking frustration. You will just say "check the research library for anything we said about this before" — and watch three weeks of your own thinking come back to you, ranked by relevance, ready to use.
Let's Work Together
Looking to build AI systems, automate workflows, or scale your tech infrastructure? I'd love to help.
- Fiverr (custom builds & integrations): fiverr.com/s/EgxYmWD
- Portfolio: mejba.me
- Ramlit Limited (enterprise solutions): ramlit.com
- ColorPark (design & branding): colorpark.io
- xCyberSecurity (security services): xcybersecurity.io