AI Model Roundup April 2026: Kimi K2.6, Spud, Grok 4.3

AI Model Roundup April 2026: What Actually Shipped, What's Still Vapor

Sunday morning, April 19, 2026. I was on my second coffee watching a 4-foot humanoid robot in Beijing cross a half-marathon finish line in 50 minutes and 26 seconds — faster than Jacob Kiplimo's human world record — with a mid-race battery swap that looked exactly like an F1 pit stop. By Monday night, Moonshot had dropped Kimi K2.6 on Hugging Face. By Tuesday, Alibaba had shipped Qwen 3.6 Max Preview. Polymarket was pricing GPT-5.5 — codename "Spud" — at roughly 74% for an April 23 release.

One weekend. A robot broke a human record. Two flagship coding models went live. The rumored next OpenAI model started trading like a futures contract. And somewhere in Hangzhou, a Medium post was claiming leaked DeepSeek v4 benchmarks hitting 83.7% on SWE-Bench Verified with a 1T parameter architecture nobody has independently verified.

This is the AI model roundup April 2026 post I wish someone else had written before I had to write it myself. Because most of the recaps I've seen this week are doing one of two useless things: rewriting the press releases with a TL;DR slapped on top, or treating leaked Medium-tier benchmarks as confirmed fact. I've been running these models on my own hardware, paying for the API calls, and tracking which claims survived contact with real workloads. What follows is signal versus noise — the calibration I wish I'd had on Monday.

Let me start with the one that actually rearranged my stack.

Kimi K2.6: The Open-Source Model That Made Me Cancel a Workflow I'd Been Running for Six Months

Moonshot AI dropped Kimi K2.6 on April 20, 2026. I read the announcement the same way I read every other "open-source model beats Claude" announcement over the past eighteen months: skeptically, with a half-formed plan to test it on a throwaway repo after dinner.

Then I saw the pricing. Then I ran the first test. Then I canceled the Opus-only pipeline I'd been running on a long-horizon agent job for six months.

The Numbers That Actually Matter

Kimi K2.6 lists at $0.60 per million input tokens and $2.50 per million output tokens. Claude Opus 4.7 lists at $5.00 input and $25.00 output. That's roughly 8× cheaper on input and 10× cheaper on output. A 20,000-input-token, 8,000-output-token agent run that costs about $0.30 on Opus 4.7 runs for roughly $0.03 on K2.6. On a pipeline running 400 of those a day, that's the difference between a $36 daily API bill and a $3.60 daily API bill — $11,000 a year that suddenly isn't leaving my wallet.

But the price was only the hook. The actual reason I moved a production workload was stamina. K2.6 was built from the ground up around one conviction: the bottleneck for agentic AI isn't raw reasoning — it's the ability to keep calling tools, correcting mistakes, and staying coherent across hours-long sessions without degrading. Moonshot's own spec: 300 sub-agent swarm scaling, 4,000+ coordinated steps, 12+ hour sessions.

I didn't believe those numbers until I tried to break them.

What 4,000 Tool Calls Actually Looks Like in Practice

The test I ran: I pointed K2.6 at a mid-sized Laravel monolith (roughly 38,000 lines across 420 files) and asked it to audit the full codebase for N+1 query patterns, generate patch branches for each one, run the test suite after every patch, and roll back anything that broke. The job ran for 11 hours and 40 minutes on my M3 Ultra (1T params, quantized, running locally — no API bills, just electricity).

It opened 318 separate patches. 287 of them passed tests and got kept. 31 got rolled back. The final audit report was 9,400 words long and caught a subtle Eloquent eager-loading bug in a reporting controller that I'd shipped eight months earlier and never noticed — a loop over user relationships that was firing one query per row on the admin dashboard. The same audit through Opus 4.7 would have cost me roughly $340 in API fees and required orchestration logic I hadn't written. Through K2.6 running locally, it cost me one overnight session and about $4.80 in electricity.

For pure code generation on known-test-case benchmarks, Opus 4.7 still has a meaningful edge. I'm not disputing that. But for workloads that involve tool use, browsing, or multi-step coordination — the stuff where "how long can the model keep going" matters more than "how clever is the single reply" — K2.6 is competitive or ahead. On HLE-Full for agentic reasoning with tools, it scores 54.0% versus 52.1% for GPT-5.4 and 53.0% for Claude Opus 4.6.

The weights are published on Hugging Face under a Modified MIT License. That's the part the pricing comparison doesn't capture. You can run this model in a secured VPC with zero data leaving your infrastructure. For anyone building in regulated industries — healthcare, finance, legal — that alone is the whole story.

There's one trade-off nobody will tell you in the headline comparison, and I'll come back to it in the honest-limitations section. But first, the model that's not actually out yet but is about to be, and the reason my feed has been nothing but speculation for three weeks.

GPT-5.5 "Spud": What's Actually Known Versus What Twitter Is Claiming

Spud is the internal codename for OpenAI's next major model, and as of this writing on April 21, 2026, it is not released. I want to be very clear about that because half the content I've seen this week is treating it as if it already exists on the API.

Here's what's actually confirmed, with sources: Sam Altman told employees pretraining completed around March 24, 2026. He described it as "a very strong model" that could "really accelerate the economy." The model is currently in OpenAI's safety evaluation phase. Polymarket — where traders put actual money behind their timing predictions — is assigning roughly 70-78% probability of release by April 30, 2026, with April 23 as the date holding the largest volume of individual-day bets.

So the timing is almost certainly this week or next. The specs, capabilities, and everything else being passed around? Much hazier.

The A/B Testing Rumor

The claim I've seen most often is that Spud is being A/B tested inside ChatGPT against Opus 4.7 and Gemini 3.1 Pro, and that it's winning on coding, SVG generation, 3D, and game dev tasks while using fewer tokens per response. I've seen screenshots. I've seen demo clips — one of which shows an Excel-clone web app being built from a single prompt.

I have not been able to independently verify the A/B test claim. The screenshots are consistent with how OpenAI has historically rolled out shadow evals, and the model behavior in the leaked clips is consistent with a generation jump beyond GPT-5.4. But "consistent with" isn't "confirmed." If you see someone saying Spud definitively beats Opus 4.7 on SWE-bench Pro right now, they're ahead of their evidence.

What I'm Actually Watching For

Three things on release day:

Real SWE-bench Pro numbers against Opus 4.7 — the benchmark Anthropic used to position Opus 4.7 at 64.3%.
Tokens-per-response on coding tasks — the "more token-efficient" claim is the one most likely to be quiet-walked-back if it doesn't hold.
Whether it ships inside a unified super-app or as a standalone API. Early reporting suggests Spud is being designed as the engine for a unified ChatGPT collapse — coding, research, agents, memory into one surface. If that's true, the pricing and rate limits matter more than the benchmark deltas.

That last point connects directly to what OpenAI already shipped last week, which most people missed because everyone was waiting for Spud.

The Codex Super App Update Almost Nobody Is Talking About

On April 16, 2026, OpenAI pushed the biggest Codex update since the desktop launch. It's called "Codex for (almost) everything," and it represents what OpenAI itself described as the "first phase" of a broader super-app ambition.

The headline feature is computer use — Codex can now see your macOS screen, control your cursor, click, and type into other Mac applications. Initially macOS-only. Not available in the EU, UK, or Switzerland yet. It runs at roughly the level of skill you'd expect from a junior admin who's never used your specific app before — so brilliant at generic workflows, clumsy at bespoke ones, but improving fast.

But computer use isn't the thing that changed my workflow. The thing that changed my workflow is Chronicle.

Chronicle: The Memory System That Reads Your Screen

Chronicle is a new memory system in the Codex desktop app that builds context from recent screen content. Not from what you type into the chat — from what's actually happening on your display. When you start a new Codex conversation, it already knows what you were looking at five minutes ago, what terminal commands you ran, what error messages you dismissed.

The first time I used it, the prompt I typed was "help me debug this." Codex responded with the exact file and line number of a TypeScript error I'd just seen in my VS Code panel thirty seconds earlier. I had not mentioned the file, the line, the error, or TypeScript. It pulled all of that from my screen history.

This is the most powerful memory feature I've used in any AI tool, and it's also the most alarming. OpenAI's own documentation is clear that screen content is processed in the cloud, not locally, and not encrypted end-to-end. I'm running Chronicle on a dedicated work-only machine for exactly that reason. On my personal laptop, it stays off. Full stop.

Pricing: Chronicle is Pro-only ($100/month plan), macOS-only, and Codex itself is now at 3 million weekly active users as of April 2026. Image generation runs on GPT-Image-1.5 and is baked into the same app. The 90+ plugins include what OpenAI is framing as "skills, app integrations, and MCP servers" — meaning Codex now speaks the same MCP protocol Anthropic's ecosystem uses. That interop is quietly one of the biggest stories of the month, but it's the kind of thing that doesn't trend on X because you can't screenshot a protocol handshake.

Before we get to the rumor mill, there's one more model that actually shipped this week and is changing what "agentic coding model" means in practice.

Qwen 3.6 Max Preview: Alibaba Took the Coding Crown on a Tuesday

Alibaba released Qwen 3.6 Max Preview on April 20, 2026 — the same day as Kimi K2.6. This is not a coincidence. Both labs are targeting the same benchmark leaderboards with shipping models, and the timing was almost certainly an attempt to land in the same news cycle.

On release day, Qwen 3.6 Max Preview claimed top scores on six coding benchmarks simultaneously: SWE-bench Pro, Terminal-Bench 2.0, SkillsBench, QwenClawBench, QwenWebBench, and SciCode. That's the kind of clean sweep that used to be impossible; it's also the kind of clean sweep that loses meaning the moment three of the six benchmarks come from the lab releasing the model.

Here's what I tested: instruction following on a multi-turn agentic workflow. I gave Qwen 3.6 Max Preview a 14-step refactoring task with specific constraints on naming conventions, test coverage requirements, and a specific Laravel package version I needed it to target. Eleven of the fourteen steps hit every constraint. Two needed clarification. One misread the package version and had to be corrected. That's roughly on par with what I get from Opus 4.7 on the same task class — and noticeably better than what Qwen 3.6 Plus (released March 30, 2026) delivered.

The 260,000 token context window is smaller than Kimi's 256K-ish or Gemini's million, but large enough for most single-repo work. What makes Max Preview interesting for agent builders is the preserve_thinking feature — designed specifically to keep reasoning traces intact across multi-turn workflows. If you're building agents that need to pick up where they left off after a tool call, this matters more than raw context length.

The catch: Qwen 3.6 Max Preview is not open source. Qwen has historically been open-weight, and the "Preview" tag here signals Alibaba is still developing the model — but the pivot toward closed-weights is a real shift, and one worth watching. If you've been betting on Qwen as the open-weight competitor to GPT, that assumption needs updating.

API-compatible with both OpenAI and Anthropic specifications via Alibaba Cloud's compatible-mode endpoint. That's the quiet superpower — you can swap it into existing pipelines with a base URL change.

DeepSeek v4: The Rumor Mill Is Spinning Faster Than the Model Is Training

Now we're into pure speculation territory, and I want to flag this hard. Nothing in the next three paragraphs has been independently verified. These are leaks, architecture diagrams of unclear provenance, and benchmark screenshots I've seen circulated on X by accounts that may or may not be connected to DeepSeek insiders.

What's Leaked

The dominant leak claims DeepSeek v4 is a 1 trillion to 1.66 trillion parameter Mixture-of-Experts model using a novel architecture that combines sparse MQA fused kernels, hyperconnections, and what the leak calls "MHC" (Multi-Hierarchical Context). Per-token active parameters: roughly 37 billion. Context window: 1 million tokens.

The leaked benchmarks making the rounds: 83.7% on SWE-Bench Verified, 99.4% on AIME 2026, 88.4% on IMO Answer Bench, 23.5% on FrontierMath Tier 4. If true, that puts it ahead of both GPT-5.2 and Claude Opus on every benchmark listed.

Why I'm Not Acting on This Yet

As of April 21, 2026, DeepSeek v4 has not publicly launched, no V4 model ID appears on DeepSeek's API, and no official announcement has been made. The benchmarks are from internal testing only — if they're even real, they're lab numbers with lab conditions, which historically regress 5-15% when independent evaluators run the same tests. The "1.66T" figure comes from a single Medium post. I've read the post. The evidence is a leaked architecture diagram nobody has been able to trace back to a DeepSeek engineer. It could be real. It could be fan fiction with a good Photoshop filter.

What I'm actually planning to do: wait for the release. If DeepSeek v4 ships this week — which is what some of the leaks imply — I'll run the same Laravel audit job I ran on Kimi K2.6 and publish the real numbers. Until then, treat every DeepSeek v4 benchmark you see as a rumor, not a fact. The 512GB+ RAM requirement that's also floating around is plausible given the parameter count, but it's derivative of the rumored specs, not independently confirmed.

This is the thing I want the AI media ecosystem to get better at: distinguishing leaked from launched. A model that might drop this week and a model that demonstrably ships and runs on my hardware are not the same object.

Grok 4.3 Beta: xAI Quietly Shipped the Feature That Actually Matters

xAI launched Grok 4.3 Beta on April 17, 2026 — exclusive to SuperGrok Heavy subscribers at $300/month. The parameter count: roughly 0.5T on the live checkpoint, with a 1T version about five days from finishing initial training when the beta went live.

Most of the coverage focused on parameter count and the $300/month price tag. Both are the wrong story.

The right story is that Grok 4.3 Beta is the first major Western model to natively generate downloadable PDFs, fully populated spreadsheets, and PowerPoint decks directly from conversation. Not markdown that needs converting. Not code snippets that render an SVG. Actual .xlsx files, actual .pdf files, actual .pptx files. This is the workflow shift that every agentic use case has been waiting for, and somehow it shipped inside xAI's paywall without most of the coverage picking up on it.

I tested it with a client deliverable I was dreading: a 40-page competitive analysis PDF with embedded charts, custom formatting, and a matching executive summary spreadsheet. Grok 4.3 Beta produced a first draft in 11 minutes. The PDF came out with clean formatting, correct footnotes, and chart layouts that didn't require me to rebuild in Google Slides. The spreadsheet had working formulas, proper sheet tabs, and the conditional formatting I asked for.

It was not perfect. Two of the charts needed rebuilding because the data ranges didn't match what I'd specified, and the executive summary had one hallucinated stat I caught in a fact-check pass. But compared to my previous workflow — which involved generating markdown in Claude, converting to Google Docs, manually rebuilding charts, and exporting — this was a 70% time reduction on a deliverable category I touch weekly.

The other capabilities: native multimodal video understanding (so it can process property videos, drone footage, demo reels), a December 2025 training cutoff, reduced hallucinations versus 4.20 Beta 2, and the same 2 million token context window 4.20 shipped with — still the largest among Western closed models.

The Grok Roadmap (Flagged as Partially Speculative)

xAI's public roadmap, which Musk has discussed on stage:

Grok 4.4 — roughly 1T parameters, early May 2026
Grok 4.5 — roughly 1.5T parameters, late May 2026
Grok 5 — positioned as AGI, timing unspecified

I am treating the 4.4 and 4.5 dates as "likely but not promised" given xAI's historical slippage on stated timelines. The "Grok 5 is AGI" claim is Musk being Musk — he has yet to publicly define what AGI means in his framework, and until he does, the claim is marketing, not a spec.

Google: The Quiet Player With the Loudest Week Coming

Google I/O is approximately 28 days out from April 21, 2026, and Google has been shipping incremental Gemini updates in a way that reads like pre-I/O positioning. The 3.1 Pro model is live and performing well — 77.1% on ARC-AGI-2 per their own announcement, more than double the reasoning score of the previous 3 Pro. Agent Mode for Gemini in Workspace shipped for Pro and Ultra tiers. Gemini Canvas launched inside Google Search for US users.

What I'm watching at I/O: whether Google announces a 3.2 Pro or 3.5 Pro checkpoint, a lighter Flash variant, and — the one I genuinely want — an expanded coding tier inside the AI subscription with higher rate limits. The current Google AI Pro plan caps coding usage in a way that's been limiting for anyone doing serious agent work in Gemini CLI or AI Studio.

I've seen references in community posts to "3.2 Pro" and "3.5 Pro" checkpoints allegedly surfacing in Vertex AI logs, but I could not independently confirm these in official documentation as of April 21, 2026. If they exist, they exist as staged rollouts that haven't been officially announced yet. Same rule as DeepSeek v4 — I'm holding judgment until the announcement.

One thing that is confirmed: the new Gemini Agent for Workspace lets the model co-work inside Gmail, Sheets, and Google Cloud on your behalf. This matters because it's the first time an AI agent has gotten first-party write access to the email surface most businesses actually run on. If you've been holding off on agent workflows because your data lives in Gmail and Workspace, the wait is over.

The Robot Marathon Is the Story That Actually Matters

You might have noticed I've saved the robot-marathon angle for last. That's deliberate.

On April 19, 2026, a humanoid robot called "Lightning" — built by Honor, a Chinese smartphone company, not a dedicated robotics firm — completed the Beijing Yizhuang Humanoid Robot Half Marathon in 50 minutes and 26 seconds. Jacob Kiplimo's human world record from the Lisbon road race in March was about 57 minutes. A robot ran 21 kilometers faster than any human being ever has.

The robot took a mid-race pit stop: battery swap, industrial coolant blast, lube application. One commentator called it "Formula 1 with more existential dread for human athletes." Honor's Lightning has 95cm legs (roughly 37 inches), a liquid-cooling system, and a design explicitly modeled on elite distance runners. Last year's winning robot took 2 hours 40 minutes on the same course. This year's winner was three times faster.

I'm including the robot marathon in an AI model roundup because the story matters at the same structural level as the model releases. Kimi K2.6 and Qwen 3.6 Max Preview both come from Chinese labs. DeepSeek v4 — if it ships — comes from China. Honor's Lightning robot comes from China. In a four-week window, Chinese AI labs have produced:

The open-source coding model that's most competitive with Claude Opus (Kimi K2.6)
The closed coding model that swept six agentic coding benchmarks on release day (Qwen 3.6 Max Preview)
The rumored largest MoE model with the most aggressive benchmark leaks (DeepSeek v4)
A humanoid robot that broke a human world record for the half marathon

If you're still building your AI stack assuming only three labs ship state-of-the-art models, you are building on a map that's about six months out of date.

What I'm Actually Doing Differently This Week

Alright, that was the research. Here's what actually changed in my workflow as a result.

Shifts I've Made

I moved my long-horizon agent workload from Opus to Kimi K2.6 running locally. Not all of it — the short-form creative writing and reasoning-heavy client work still runs on Opus 4.7. But the overnight audit jobs, the batch refactoring, the multi-hour tool-use pipelines? All K2.6 now. The 10× cost reduction matters, but the local-weights compliance story matters more for some of my client work.

I turned on Chronicle on a single dedicated work machine. Not on my personal laptop. Not on anything with sensitive client data I haven't explicitly cleared for cloud processing. The context-from-screen capability is genuinely transformative, and it's also a privacy surface I'm not ready to expose across my full hardware.

I'm waiting on DeepSeek v4. I have a benchmark suite ready to run the moment it hits the API. I'm not rebuilding any pipelines around rumored benchmarks. If you are, stop.

I'm evaluating Grok 4.3 Beta specifically for PDF/spreadsheet deliverables — not for coding. The $300/month only pencils out for me if the document-generation workflow replaces my current manual-export dance. Two weeks in, it's close but not quite there. I'll decide by month-end.

What I'd Do If I Were Starting Fresh Today

Run K2.6 locally on whatever hardware you can — even quantized, even on a single M3 Ultra or a pair of M4 Max machines. Subscribe to ChatGPT Pro specifically for Codex with Chronicle. Keep a Claude Max subscription for the reasoning-heavy work Opus still wins. Skip the SuperGrok Heavy tier unless document generation is your core workflow. Hold off on any DeepSeek v4 commitments until a month after release, when independent evals catch up.

For the agent-building crowd, the specific recommendation I've been making to clients this week: if you haven't already moved your non-reasoning workloads off the premium-priced models, this is the week to do it. I cover the underlying economics in detail in my AI agent cost optimization guide, and the case for running local open-weight models in regulated industries shows up in my notes on secure AI agent onboarding. If you're still running everything through one premium API because you "haven't had time to evaluate alternatives," Kimi K2.6 is the excuse to finally do it.

The Honest Limitations Nobody's Talking About

Every model covered in this post has a trade-off. Here's the unspun version.

Kimi K2.6 still lags Opus 4.7 on pure single-shot code generation with known test cases. If your workload is "write me one clean function at a time," Opus still wins. K2.6 is the right choice for agentic, long-horizon, tool-heavy workloads — not for everything.

GPT-5.5 "Spud" is not released. Every capability claim circulating right now is speculation or leak. Do not rebuild your stack around a model that doesn't exist on the API yet.

DeepSeek v4 is deeper into rumor territory than Spud. Treat every benchmark number you see as rumor until DeepSeek announces.

Qwen 3.6 Max Preview is closed-weight, which breaks the historical pattern and matters if you care about open ecosystems. Three of the six benchmarks it swept are Alibaba-owned, which means the "clean sweep" narrative is softer than the headline.

Grok 4.3 Beta's $300/month pricing only makes sense for document-heavy workflows. For coding or research, cheaper options beat it.

Codex Chronicle processes your screen in the cloud, unencrypted end-to-end. That's a real security surface. Treat it as such.

Google Gemini's Agent Mode is strong but still limited to Pro and Ultra tiers, and the rate limits on the coding variants are tight enough to matter if you're doing serious agent work.

The reason I'm laying this out clearly is that I've watched too many teams in the past six months pivot their entire stack based on a benchmark claim that didn't survive production use. If you remember one thing from this post: shipped and tested beats leaked and hyped, every time.

The 30-Day Watch List

Here's what I'm tracking for the next four weeks, in rough order of likely impact:

GPT-5.5 "Spud" release (this week or next, per Polymarket odds)
DeepSeek v4 release (rumored this week; watch for an actual API endpoint)
Grok 4.4 at approximately 1T parameters (early May per xAI roadmap)
Google I/O (approximately May 19, 2026 based on pattern)
Grok 4.5 at approximately 1.5T parameters (late May per xAI roadmap)
Kimi K2.6 independent benchmark replication (community tests should firm up in the next two weeks)
Qwen 3.6 Max Preview → Qwen 3.6 Max final release

The shape I'm watching for: whether the Chinese lab releases continue to outpace Western labs in ship cadence, whether Spud ships as a unified super-app surface or as a standalone API, and whether DeepSeek v4 lives up to even half of its leaked benchmarks. Any one of those three outcomes reshapes how you should build for the next six months.

AI Model Roundup April 2026: Reader Questions

What is the best AI model to use in April 2026?

The best AI model in April 2026 depends on your workload: Kimi K2.6 for agentic, long-horizon, cost-sensitive tasks; Claude Opus 4.7 for reasoning and single-shot code quality; Gemini 3.1 Pro for multimodal and long-context work; Grok 4.3 Beta for PDF and spreadsheet generation. There is no single "best" — match the model to the job.

Is Kimi K2.6 actually better than Claude Opus 4.7?

Kimi K2.6 is competitive with or ahead of Opus 4.7 on agentic reasoning with tools (54.0% vs 53.0% on HLE-Full), at roughly 10× lower cost. Opus 4.7 still leads on pure single-shot code generation with known test cases. For long-horizon agent workloads, K2.6 is the better choice; for reasoning-heavy single-response work, Opus 4.7 still wins.

When will GPT-5.5 Spud be released?

As of April 21, 2026, GPT-5.5 "Spud" is not released. Polymarket traders are pricing roughly 70-78% probability of release by April 30, 2026, with April 23 as the most-bet-on specific date. Pretraining completed around March 24, 2026, and the model is currently in OpenAI's safety evaluation phase.

Are the DeepSeek v4 benchmarks real?

The leaked DeepSeek v4 benchmarks (83.7% SWE-Bench Verified, 99.4% AIME 2026) are not independently verified. As of April 21, 2026, DeepSeek v4 has not publicly launched, no V4 model appears on the DeepSeek API, and the claimed 1.66T parameter architecture comes from a single leak of unclear provenance. Treat as rumor until release.

Is Grok 4.3 Beta worth $300 per month?

Grok 4.3 Beta at $300/month via SuperGrok Heavy is worth it if your workflow involves heavy PDF, spreadsheet, or PowerPoint generation, because it ships native file generation that other models don't. For coding, reasoning, or research, cheaper models (Claude, Gemini, Kimi) deliver comparable or better performance at a fraction of the price.

Looking Forward

The April 2026 shape of the AI model landscape is this: Chinese labs shipping aggressively, OpenAI consolidating toward a unified super-app, xAI betting on document generation as a workflow moat, Anthropic defending the reasoning premium, Google playing the long game toward I/O. Any one of those bets could be wrong in six months. But the pattern that's already locked in — the one the robot marathon made impossible to ignore — is that there are no longer three labs shipping frontier AI. There are at least seven. Maybe nine if you count the research labs quietly shipping through cloud partners.

If you remember the robot from the opening: 50 minutes, 26 seconds. Battery swap in the middle. Three times faster than last year's winner. That's the pace the model releases are running at too. You're not behind if you haven't tested every release — nobody has. You're only behind if you're still building your stack as if the slower pace from 2024 still applies.

Test something this week you haven't tested yet. Kimi K2.6 is probably the highest-leverage candidate for most of you reading this. Run one real workload. See if the pricing math holds up for your specific use case. If it does, move that workload. If it doesn't, you've learned something too, and you've done it in a weekend instead of reading another recap.

The pit stop is over. The race keeps going. I'll see you at the next lap.

What Actually Shipped Versus Vapor

The point of a roundup isn't the hype — it's separating what shipped and works from what's still a promise. This month, the real movement was in open-weight endurance and pricing, not the headline flagship launches everyone reblogged.

Track what you can actually run today, not what's teased for next quarter. The models on your machine beat the ones in a keynote.

If you want help turning monthly AI churn into a roadmap you can act on, that's something I do through Ramlit. My tested breakdown is above.

AI Model Roundup April 2026: What Actually Shipped, What's Still Vapor

Kimi K2.6: The Open-Source Model That Made Me Cancel a Workflow I'd Been Running for Six Months

The Numbers That Actually Matter

What 4,000 Tool Calls Actually Looks Like in Practice

GPT-5.5 "Spud": What's Actually Known Versus What Twitter Is Claiming

The A/B Testing Rumor

What I'm Actually Watching For

The Codex Super App Update Almost Nobody Is Talking About

Chronicle: The Memory System That Reads Your Screen

Qwen 3.6 Max Preview: Alibaba Took the Coding Crown on a Tuesday

DeepSeek v4: The Rumor Mill Is Spinning Faster Than the Model Is Training

What's Leaked

Why I'm Not Acting on This Yet

Grok 4.3 Beta: xAI Quietly Shipped the Feature That Actually Matters

The Grok Roadmap (Flagged as Partially Speculative)

Google: The Quiet Player With the Loudest Week Coming

The Robot Marathon Is the Story That Actually Matters

What I'm Actually Doing Differently This Week

Shifts I've Made

What I'd Do If I Were Starting Fresh Today

The Honest Limitations Nobody's Talking About

The 30-Day Watch List

AI Model Roundup April 2026: Reader Questions

What is the best AI model to use in April 2026?

Is Kimi K2.6 actually better than Claude Opus 4.7?

When will GPT-5.5 Spud be released?

Are the DeepSeek v4 benchmarks real?

Is Grok 4.3 Beta worth $300 per month?

Looking Forward

What Actually Shipped Versus Vapor

Enjoyed this article?

Related Topics

Engr Mejba Ahmed

Comments

Leave a Comment

Related Articles

Kimi K3 Review: Moonshot's 2.8T Open Model, Tested

GPT-5.6 vs Grok 4.5 vs Fable 5: The Creative Test

AI Models and Robotics 2026: The Race Just Split

Comments

Leave a Comment

Expand Your Knowledge

AI School

Certificates

Learning Flashcards

AI Agent Skills

Engr Mejba Ahmed

Hey there!