Why AI Subscriptions Are About to Be Commoditized
I was sitting at my desk on a Tuesday, looking at four different AI billing dashboards open in four different tabs, and I realized I had become exactly the customer the AI industry was built to monetize.
Claude Max, $100. Codex Plus, $20. A DeepSeek API key with about $20 of credit on it. An OpenRouter account with another $15. A Kimi K2.6 trial running through OpenCode that I'd forgotten was still active. I added it up and the number made me uncomfortable in a way I didn't fully understand yet — not because it was a lot of money, but because for the first time in two years I genuinely couldn't tell you which of those subscriptions I needed.
That's the thing about AI subscription commoditization. It doesn't announce itself. It doesn't show up as a price war or a dramatic launch event. It shows up as a slow, creeping suspicion that the thing you've been paying for at the top of the stack is not actually the thing creating the value at the bottom of the stack. And if that's true — if the model itself is becoming a commodity while the application layer eats all the margin — then the subscription stack I've spent two years building might be solving a problem that's about to stop existing.
I'm going to walk you through what I actually pay for, what I'm actually getting, why I think the open-weight models have closed the gap fast enough to break the pricing model that funds the frontier labs, and where I think the real moats are moving. Some of this is going to read like heresy if you're deep in the Claude or OpenAI ecosystem. I am too. That's why I'm writing it.
The Stack I'm Actually Paying For
Let me lay it out so we're working from the same numbers.
I pay $100 a month for Claude Max 5x, which gives me roughly 225 messages every five hours on Sonnet 4.7 and a much smaller envelope on Opus 4.7. That's the plan I use for design-forward work, long-form writing, and the projects where I want Opus's specific taste profile in the output. There's a Max 20x tier at $200 that I've toggled on twice and toggled back off both times because I couldn't justify the spend on a steady-state basis.
I pay $20 a month for ChatGPT Plus, which gets me into Codex with the GPT-5.5 model and — through May 31, 2026 — a temporary 25x boost on the 5-hour Codex limits that drops back to 5x once the promotion ends. That's the plan I use for backend code, data pipelines, ML scaffolding, and the kind of grind work where GPT-5.5's efficiency is genuinely measurable. I covered the head-to-head in detail when I wrote about Codex versus Claude Code, and the gap I called out there has only widened since.
Then there's the rotating cast of API keys. DeepSeek, currently sitting on V4 Pro at heavily discounted promotional pricing through May 31. Kimi K2.6 through OpenRouter at $0.60 per million input tokens and $2.50 per million output. A handful of free models on OpenRouter that get rate-limited at twenty requests per minute but work fine for batch jobs that aren't time-sensitive. I keep these around partly as fallbacks for when Anthropic or OpenAI hits a status-page incident, and partly because I'm running enough experiments per month that I actually use them.
The total damage, before token spend, is somewhere between $135 and $160 a month depending on what I've been testing. Add in API usage and a typical month lands in the $200-$280 range. That's the number on the spreadsheet I was looking at on Tuesday.
Here's what I noticed when I drilled into the breakdown: the proprietary subscriptions accounted for roughly 78% of the cost and roughly 60% of the actual reasoning tokens I used that month. Not 60% of the value — 60% of the tokens. The cheap stuff was carrying more of the load than I'd assumed.
That's the moment the question stopped being "how do I optimize my AI stack?" and started being "what am I actually paying these top-tier subscriptions for?"
The Catch-Up Problem
To understand why the question matters, you have to look at where the open-weight models are now versus where they were a year ago.
In May 2025, the conversation was simple. Claude and GPT were the frontier. Open-weight models like Mistral and the early Qwen and DeepSeek releases were catching up on specific benchmarks but losing badly on the work that mattered — long-context reasoning, agentic tool use, the kind of multi-step engineering tasks that actual developers care about. Paying $100 a month for Claude was a no-brainer because the next-best alternative wasn't really an alternative.
That gap has not just narrowed in 2026. It has, on several specific benchmarks, closed entirely.
Artificial Analysis ranks DeepSeek V4 Pro at 52 on the Intelligence Index — the #2 open-weight reasoning model, behind Kimi K2.6 — and the model costs roughly $1,071 to run the full benchmark suite versus $4,811 for Claude Opus 4.7. That's a 4.5x cost gap on a head-to-head intelligence comparison. On SWE-Bench Verified, the leaderboard's most-cited coding evaluation, DeepSeek V4 Pro Max scores 80.6%, Kimi K2.6 hits 80.2%, and MiniMax M2.5 lands at 80.2% — all within a percentage point of Claude Opus 4.6's 80.8%. HumanEval is effectively saturated at this point; Kimi K2.5 led at 99.0% before the benchmark stopped meaningfully discriminating between top models.
Read those numbers carefully. The open-weight models aren't beating the proprietary models. They're matching them, on the benchmarks that the proprietary models were specifically designed to win, at a fraction of the cost. And the cost story is the part that's actually destabilizing.
DeepSeek V3.2 cut its API pricing in half in late 2025 to $0.028 per million cache-hit input tokens and $0.42 per million output. V4 Pro is currently running at a 75% promotional discount through May 31, 2026. Kimi K2.6 is at $0.60 input and $2.50 output. For comparison, Claude Opus 4.7 is roughly 8-10x more expensive on output tokens than Kimi, and roughly 30x more expensive than DeepSeek V4 on input. A SaaS workload that processes 100 million tokens a month — not unusual for an agentic application — pays around $310 with Kimi versus $4,000+ with GPT-5.4 or Opus 4.7.
This is the chicken-and-egg cycle nobody at the top of the stack wants to talk about. The frontier labs train an expensive new model. They charge a premium for it because they have to recoup the training cost and fund the next generation. The open-weight labs reverse-engineer the techniques, ship a model that's 90-95% as capable at 1/10th the price, and the market routes accordingly. By the time the proprietary lab announces version N+1, the open-weight model is already pricing in a way that makes most of the previous generation's revenue opportunity disappear.
That's not a five-year trend. That's the cycle we're already inside.
The Android iOS Analogy and Why It Breaks
The cleanest analogy I've heard for what's happening is the Android-versus-iOS dynamic from the 2010s. Proprietary AI is iOS — controlled, polished, vertically integrated, expensive. Open-weight AI is Android — flexible, modifiable, fragmented, cheap. iOS held a premium for a decade because Apple's hardware-software integration created lock-in that Android's openness couldn't replicate at the same quality bar.
The analogy works, right up until you notice the part where it breaks completely.
Apple's iOS moat was hardware. You could not run iOS on a Samsung phone. The vertical integration that made the iPhone premium was protected by the literal physical chips in the device. Apple controlled the Photonic Engine, the Neural Engine, the Secure Enclave — and that hardware lock-in is what kept the platform's pricing power intact for fifteen years.
There is no equivalent moat in AI inference.
A Kimi K2.6 model running on an Nvidia H200 in a Singapore data center produces tokens that are functionally identical to a Kimi K2.6 model running on a Huawei Ascend 950PR in Shenzhen, which are functionally identical to a Kimi K2.6 model running on whatever cluster OpenRouter happens to route the request to that day. The "hardware" is fungible. The "operating system" — the model weights — is downloadable. The "app store" — the API gateway — is being commoditized by services like OpenRouter that aggregate dozens of providers behind a single key.
If Apple had been forced to ship iOS as a downloadable ISO that ran on any handset with the right specs, iOS would have looked very different by 2015. That's the position the proprietary AI labs are in today. The thing they're trying to charge a premium for can be replicated by a competitor with $5.6 million in compute, and the resulting model can be served by anyone with a GPU and an API endpoint.
This is why the analogy I actually use now isn't iOS-versus-Android. It's Apple-versus-everyone in the laptop market of the late 2000s. Apple still made beautiful machines. Apple still commanded a premium. But the moment the underlying components — the chips, the displays, the operating systems — became broadly available to other manufacturers, Apple's market share dropped to single digits and stayed there for a decade. Apple survived not because of the hardware, but because of the application ecosystem, the developer tooling, the design language, and the brand story. The hardware became table stakes.
That's where the AI labs are heading. The model is becoming table stakes. The question is what's left after that.
Where I Think the Real Moats Are
Here's the part I've been thinking about for weeks, because it determines what survives the transition.
I see four real moats forming, and only one of them is the thing I'm currently paying for.
The first moat is the application layer. This is Claude Code. This is Codex. This is the integration of the model into a specific workflow with specific tools, specific UX decisions, specific design choices about when to ask for confirmation versus when to act autonomously. When I pay $100 a month for Claude Max, the part I actually can't replicate with a DeepSeek API key isn't the model — it's the eight months of Claude Code workflow refinements Anthropic has iterated on, the agent skills ecosystem, the slash commands, the way the agent harness handles long-running tasks. Anthropic isn't selling tokens. They're selling a coding environment that happens to use tokens. That distinction is going to matter more every month for the next two years.
The second moat is compliance infrastructure. Healthcare, finance, legal, and government workloads care about things that DeepSeek and Kimi can't easily provide — data residency guarantees, audit trails, SOC 2 attestation, constitutional AI safety policies, the kind of paperwork that lets a Fortune 500 procurement team check a box. Anthropic has reportedly won 70% of head-to-head enterprise matchups against OpenAI for first-time AI buyers, and a meaningful slice of that is governance maturity, not raw model quality. This is the moat that scales with regulatory complexity, and it's the one open-weight labs have the hardest time replicating because the regulatory work is fundamentally orthogonal to the model work.
The third moat is the ecosystem. This is the Model Context Protocol. This is the integrations with Slack, Notion, Figma, Canva, GitHub, every database that matters. This is the developer documentation, the SDK quality, the conference presence, the way third-party tools light up around a platform. Apple won the laptop war on ecosystem, not hardware. The AI labs that win the next decade will win on ecosystem, not model intelligence. And ecosystems take years to build, which means the proprietary labs have a real but time-limited head start.
The fourth moat is brand and trust. When I'm building something for a paying client, I default to Claude or GPT not because they're measurably better at the specific task, but because if something goes wrong I can defend the choice. "I used Claude" is a defensible answer in a client meeting. "I used DeepSeek" requires a fifteen-minute explanation about why a Chinese open-weight model is appropriate for their HIPAA workflow. That defensibility is worth real money, and it's a moat that the proprietary labs underinvest in talking about because they take it for granted.
The thing I'm paying $100 a month for, when I'm being honest, is moats one and three. The model is no longer the product. The harness is the product, the integrations are the product, the ecosystem is the product. Everything else can be replicated by an open-weight model at 1/10th the price.
That's a fundamentally different business than the one Anthropic and OpenAI were building in 2024.
What This Means for Anthropic and OpenAI
The frontier labs know this. You can see it in their product strategy if you're paying attention.
Anthropic hit $30 billion in annualized revenue in March 2026, up roughly 1,400% year-over-year. OpenAI is at about $25 billion ARR. Those are extraordinary numbers, but the composition is what matters. A growing share of both companies' revenue is coming from enterprise contracts and platform integrations — the application layer and the compliance layer — not from individual API token sales. Anthropic and OpenAI both launched joint ventures for enterprise AI services in early May. Neither of those ventures is about selling tokens. They're about selling implementations.
The strategic shift is clear: stop competing on raw model intelligence, where the open-weight labs can match you for 1/10th the price, and start competing on the layer above the model where you can charge for outcomes instead of inference. Claude Code isn't priced like a model API. It's priced like a developer tool. Codex isn't priced like a model API. It's priced like a coding subscription. The thing that's getting commoditized is the part that's increasingly bundled rather than sold as a line item.
This is also why the bundling matters. When my $100 a month buys me Sonnet 4.7 and Opus 4.7 access plus Claude Code plus the agent skills marketplace plus MCP integrations plus the desktop app plus voice mode plus a dozen other things, Anthropic isn't charging me for the model. Anthropic is charging me for the bundle, and the model is the part of the bundle that's becoming least defensible. Strip the bundle apart and the model alone is worth maybe $20 a month at current open-weight benchmarks. Strip the bundle apart and the application layer alone is worth $80-$120 a month easily. The bundling isn't accidental. It's the survival strategy.
The risk is what happens when a third party builds a sufficiently good application layer on top of an open-weight model. That's not hypothetical anymore. OpenCode is a credible competitor to Claude Code that runs on multiple model backends. The OpenCode Go subscription gets you four parallel agents and access to V4 Pro, V4 Flash, and several other open-weight models for $5 the first month and $10 a month after that. That's a 90% discount on a stack that does most of what Claude Code does. The application layer moat is real, but it's not infinite. The open-source ecosystem is going to chip away at it the same way it chipped away at the model layer.
This is where it gets interesting for the existential question. If Adobe — to use the example I keep coming back to — wraps a fine-tuned DeepSeek V4 Pro inside Photoshop and ships it as "Adobe Intelligence" with full design system integration and a polished UX, what exactly is Anthropic selling that I can't get from Adobe? What is OpenAI selling that I can't get from a similarly motivated competitor with deep distribution? The model becomes invisible. The application layer is what the customer pays for. And every application company on the planet now has a path to building their own.
What I'm Doing With My Subscription Stack
Let me get specific about what's changing in my own setup, because the strategic picture only matters if it actually changes behavior.
I'm keeping Claude Max for now. The application layer value is real, the design taste in Opus 4.7's output is still genuinely better than anything I can get from open-weight models, and Claude Code's agent skills system does things I can't reproduce elsewhere. But I'm watching the pricing carefully. If Anthropic raises the Max tier or weakens the value, I'll downgrade to Pro and route the heavy work through OpenRouter.
I'm keeping Codex Plus for the same reason. The 25x promotional limit through May 31 makes the $20 plan absurdly good value right now, and GPT-5.5's efficiency in the agentic coding loop is the best in class for the kind of backend work I do. After May 31 the limits drop back to 5x and I'll re-evaluate.
I'm increasing my OpenRouter and DeepSeek spend, deliberately. I want enough operational fluency with the open-weight stack that if the proprietary subscriptions stop making sense, I can switch the bulk of my workload over with a weekend of effort instead of a quarter of migration pain. This is a strategic hedge, not an immediate cost optimization. The cost optimization is a side effect. I covered the free Claude Code proxy approach in detail if you want to set up the same fallback infrastructure.
I'm running OpenCode in parallel for at least one project a month. Not because I'm switching off Claude Code — I'm not — but because the gap between the open-source coding agents and the proprietary ones is closing faster than most people realize, and the day a third-party agent gets within 95% of Claude Code's UX is the day a meaningful chunk of Anthropic's revenue is at risk. I want to know when that day is, and I'd rather know early than late.
I'm not adding any new proprietary subscriptions until I see a moat that justifies it. Gemini Advanced, Cursor Pro, the various enterprise AI tools — none of them have shown me an application layer that's distinct enough from what I already have. Until that changes, the open-weight stack is going to absorb any new workload that doesn't have a specific reason to live on a proprietary platform.
That's the discipline I'm building into my own usage. Subscribe where the application layer creates value I can't get elsewhere. Pay tokens where the model is the only thing that matters. Run open-weight models everywhere I can without sacrificing output quality. And reassess the whole stack every quarter, because the price-per-quality curve is moving fast enough that last quarter's optimal allocation is this quarter's overspend.
What This Means for Solo Devs and Small Teams
If you're a solo developer or running a small team, here's the practical version.
Start with one proprietary subscription, not three. Pick the application layer you're going to live in. For most builders right now that's either Claude Code on the $20 Pro plan or Codex on the $20 Plus plan. You don't need both. Pick the one whose UX matches how you work, commit to it for at least a month, and stop running comparison shops every week.
Add a single open-weight access point as a fallback. OpenRouter is the cleanest entry — one account, one API key, dozens of models, free models for low-stakes work. Spend $20 to load credits and route any workload that's not latency-critical or quality-critical through Kimi K2.6 or DeepSeek V4. You'll be surprised how much of your daily work fits that profile.
Use the savings to pay for tools that compound. The application layer is where the moat is, and that includes tools that aren't AI subscriptions. A good observability platform. A real testing setup. A vector database with proper hybrid search. The leverage you get from those compounds with whatever model you're running, and they don't lose value when the model layer moves underneath you.
Watch for the consolidation. The current pricing is unstable. Within twelve months I expect at least one major proprietary lab to bundle aggressively, at least one major application company to ship a credible vertical AI product on open-weight infrastructure, and at least one open-weight lab to release a model that closes the remaining gap on long-context agentic work. When any of those things happens, the optimal subscription stack will shift, and the only way to know is to be paying close enough attention that you can re-evaluate when the signals come in.
If you're running a team of three to ten people, you have a different calculus. Centralize your model access through a single gateway — OpenRouter or your own routing layer — so you can swap providers without touching application code. Negotiate enterprise pricing with whichever lab gives you the best application-layer value, because the volume discounts on the proprietary side are still meaningful. Keep at least one open-weight model warm in production, even if it's only handling 10% of traffic. The day you need to fall over to it, you don't want to be doing the integration work for the first time.
For larger teams, the answer is increasingly that the model is a procurement decision, not an engineering decision. The engineering work is in the application layer. That's the part that creates differentiation. Whoever's making your AI subscription decisions in 2026 should be the same person making your developer tooling decisions, because the line between the two has effectively disappeared.
The Bigger Picture
I don't think Anthropic or OpenAI are going to disappear. The companies are too well-positioned, the application-layer moats are too real, and the brand premium is too valuable to evaporate quickly. But I do think the business they're running in 2027 is going to look different from the business they're running today.
The traditional AI subscription model — pay us a fixed monthly fee for access to our model, and the model is the product — is under serious pressure. It works right now because the application layer is bundled into the subscription and most users can't easily separate the two. As open-weight models continue to close the capability gap, the bundle is going to come under pressure from both sides: third parties building competing application layers on top of cheap open-weight models, and savvy users routing workloads to whichever provider delivers the best price-to-quality on a given task.
The future I think we're heading toward is hundreds of winners, not two or three. Different application layers for different verticals. Different open-weight models for different cost-sensitivity profiles. Different orchestration tools that route between them based on the task. The frontier labs will still matter — they'll still be training the models the open-weight labs are reverse-engineering, they'll still be selling the most polished application layers, they'll still be commanding premiums in regulated industries. But they'll be one segment of a much wider market, not the whole market.
That's a healthier industry, in my opinion. It's a more competitive industry. It's an industry where the moat that matters is what you build on top of the model, not whether you happen to own the model. And it's an industry where the subscription stack I'm running today — three proprietary plans, three API keys, half a dozen tools — is going to look like an artifact of an earlier era within eighteen months.
I'm paying for proprietary subscriptions today because the application layer is still where the value is, and the proprietary labs still build the best application layers. I'll keep paying as long as that's true. But I'm building the muscle to switch the moment it isn't, because the alternative — riding a subscription stack into obsolescence because changing it felt like too much work — is the most expensive mistake I could make in a market that's moving this fast.
So look at your own stack. Add up what you're paying. Ask yourself which subscriptions are buying you a model and which are buying you an application layer. Cancel the ones that are only buying you a model. Use the savings to pay for the ones that are buying you a workflow you genuinely couldn't build yourself. And run an open-weight model in parallel, even if it's only for one workload, even if it's only for an afternoon a week — because the day the math flips, you want to already know how to live in that world.
That's the bet I'm making. The model is becoming a commodity. The application layer is the product. And the subscription stack you're running on May 6, 2026 is almost certainly not the subscription stack you should be running on May 6, 2027.
FAQ
Frequently Asked Questions
Everything you need to know about this topic
Yes, but for a narrower reason than they were two years ago. The model itself is now a commodity — open-weight options like DeepSeek V4 Pro and Kimi K2.6 match the proprietary frontier on most coding benchmarks at 1/10th the cost. What you're actually paying for at $100/month is the application layer: Claude Code, Codex, the agent skills, the integrations, the polished UX. If the application layer creates value you can't replicate, the subscription is worth it. If it doesn't, route through OpenRouter instead.
The application layer is everything wrapped around a base AI model that turns it into a useful product — the coding agent harness, the workflow integrations, the UX decisions, the safety policies, the developer tooling, the ecosystem of third-party plugins. Claude Code and Codex are application layers built on top of Claude and GPT models. As model intelligence becomes commoditized, the application layer is where the durable moat lives.
On raw token costs, roughly 4-30x depending on the model and workload. Claude Opus 4.7 costs about $4,811 to run the Artificial Analysis Intelligence Index suite versus $1,071 for DeepSeek V4 Pro. A 100M-token-per-month workload runs around $310 on Kimi K2.6 versus $4,000+ on GPT-5.4. The catch is that you're paying for raw tokens — you don't get the application layer (Claude Code, Codex) without building it yourself or using something like OpenCode.
Don't cancel both, but you probably don't need both. Pick whichever application layer matches how you work — Claude Max if you do design-forward and long-form work, Codex Plus if you do backend, ML, and data pipeline work — and route everything else through an open-weight model on OpenRouter. The current 25x Codex promotion through May 31, 2026 makes the $20 Plus plan exceptional value if Codex matches your workflow.
OpenRouter is a single API endpoint that gives you access to 300+ AI models — proprietary and open-weight — with no monthly fee. You add credits and pay per token at close to raw provider pricing. It's the cleanest way to fall back to open-weight models without managing multiple API keys, and the free tier (rate-limited at 20 requests per minute, 200 per day) is enough for low-stakes batch work. I use it as the routing layer behind any workload that doesn't need to live on a proprietary platform.
Let's Work Together
Looking to build AI systems, automate workflows, or scale your tech infrastructure? I'd love to help.
- Fiverr (custom builds & integrations): fiverr.com/s/EgxYmWD
- Portfolio: mejba.me
- Ramlit Limited (enterprise solutions): ramlit.com
- ColorPark (design & branding): colorpark.io
- xCyberSecurity (security services): xcybersecurity.io