The model that wasn't there
Late March. Somebody on ChatGPT Plus opens their model picker and spots a name that isn't supposed to be there. The screenshot that started circulating on X and r/singularity labeled it "Crest Pro Alpha" — or so the poster said. No documentation. No help-center article. No changelog entry. Within a week the leak-hunting subreddits had collapsed on a single three-letter codename being passed around like a password at a speakeasy: Spud.
Spud, if you believe the leaks, is GPT-5.5. Possibly GPT-6 — even that is unsettled. It is the frontier model OpenAI finished pretraining on March 24, the one Sam Altman described internally as "very strong" and "unfolding faster than anticipated." The one Polymarket currently gives around a 78% chance of releasing before April is out.
As of today, it still hasn't shipped.
What "Spud" actually is
Here's what is load-bearing in all this and what isn't.
What's reported: OpenAI finished pretraining a new frontier model around March 24, 2026. Altman told staff "a few weeks." The Information attached the codename Spud. Prediction markets currently price a release by April 30 at roughly 78%, and by June 30 at over 95%.
What's not confirmed: whether Spud ships as GPT-5.5, GPT-6, or something else entirely. OpenAI has reportedly said the version number will depend on how significant the capability jump over GPT-5.4 turns out to be — which is either a tell that the company itself hasn't decided, or more honestly, that it is still watching internal evals to figure out what number justifies the marketing lift.
What's outright speculation: almost everything about the "Crest Pro Alpha" label floating around on screenshots. No OpenAI help article mentions it. It could be a routed-to-Spud alpha. It could be a tuned variant of GPT-5.4 Pro that gets rolled back next Thursday. It could be three different people seeing three different anonymous checkpoints and assuming they were the same thing. Anonymous checkpoints in ChatGPT are routine at this point. Most of them never become named releases.
That's the unsexy truth buried under every "I FOUND GPT-6" YouTube thumbnail: the signal is real, but it's noisier than anyone wants to admit.
The Friends apartment problem
The demos are the fun part. They're also where skepticism earns its keep.
The clip that moved the hype needle most wasn't the Minecraft clone, or the Three.js solar system with the asteroid belt, or the flight sim buzzing over Manhattan. It was a Three.js render of Monica's apartment from Friends. Purple walls. The big window. The yellow frame around the peephole. The weird spatial logic of a sitcom set that's bigger on the inside than any Manhattan apartment has any right to be. Generated in one shot from a text prompt.
That demo is genuinely interesting, and not for the reason it went viral. It's interesting because the model wasn't just emitting geometry — it was reasoning about a spatial layout it has only ever seen flattened across 25 years of 2D episode rewatches. Translating that memory into Three.js means mapping a pop-culture fragment to a 3D coordinate system. That's a real capability jump if it's reliable.
The operative word is if. Here's the thing about one-shot demos: they tell you the ceiling, not the reliability. The Friends apartment is the version the poster liked. It is not the nineteen versions before it where the couch clipped through the wall, or the eight versions after where the lighting was unusable. A frontier model's ceiling matters less than the tenth percentile of its attempts — the one you'd have to ship to users. And nobody posts the tenth percentile.
The Minecraft clone with infinite terrain and breakable blocks. The solar system with properly-scaled orbits. The Windows 11 desktop with functional SVG icons in Edge, Notepad, and Paint. Same caveat every time. Impressive ceilings. Unknown floors. AI Twitter has been doing the "I built [complex thing] in one prompt" genre since GPT-4 shipped. The delta is real. The viral format is structurally incapable of measuring it.
The anonymous checkpoint economy
OpenAI ships this way for a reason, and it's worth sitting with.
Anonymous checkpoints in ChatGPT — "Crest Pro Alpha" today, gpt2-chatbot in the LMSYS arena last year, quasar-alpha on OpenRouter a few months ago — are structured A/B tests. A small slice of users gets silently routed to a new model. Their conversations provide real-world eval data that no synthetic benchmark can match. The checkpoint either graduates to a named release, gets quietly rolled back, or gets folded into something else.
For OpenAI, this is infrastructure. For the leak-hunting subculture that has built up around it, it's a sport. There are now subreddits, Discords, and X accounts whose entire output is screenshot-grepping for unfamiliar model names in the ChatGPT dropdown. The first person to spot each new checkpoint gets the engagement. The post-hoc "this is secretly GPT-6" speculation gets the clicks.
This is the part of the story that actually matters, more than any single demo: OpenAI has trained an ecosystem to generate hype on its behalf, for free, between official releases. The SPUD policy paper and the $122B funding round got their own news cycles. The Sora shutdown got its own. But the underlying model story gets a rolling cycle, distributed across every dropdown screenshot and every user silently routed to an anonymous checkpoint. It's the most efficient marketing pipeline any AI company has built, and it works because most of it isn't coordinated at all.
It also means that by the time real benchmarks arrive, the narrative is already set. Spud is "better than Opus 4.7" in the collective imagination before anyone has run SWE-bench against it.
About that Opus 4.7 claim
The viral framing around the Spud demos is that it beats Claude Opus 4.7 at agentic and front-end tasks. The claim is repeated so confidently in YouTube titles and Reddit threads that it's starting to calcify into conventional wisdom.
It's also, right now, a vibe.
Opus 4.7 shipped on April 16 with numbers attached: SWE-bench Verified, WebDev Arena, the full battery. You can look them up. Spud hasn't shipped. There is no pricing, no published context window size, no model card, no independent eval. The claim that it's "noticeably better than Opus 4.7 at times" is sourced entirely to cherry-picked one-shot generations posted by enthusiasts who did not run the same prompt on both models and post both outputs.
That doesn't mean the claim is wrong. Spud could, in fact, clear Opus 4.7 on some meaningful axis. The GPT-5.4 → GPT-5.5 delta is reportedly the biggest capability jump OpenAI has shipped in over a year. But could and does aren't the same word, and the honest answer right now is: nobody running SWE-bench, OSWorld, or WebDev Arena on Spud has published their numbers, because those people don't have the model yet.
When they do, and when OpenAI publishes a model card, that's the comparison worth having. Until then, the Opus-vs-Spud debate on AI Twitter is two sets of cherry-picked screenshots arguing past each other.
What to actually watch for when Spud drops
The demos are not what you want to be paying attention to when this thing ships. Here's what is.
Pricing. GPT-5.4 is already at a price point where running evals, long-context agents, and anything with heavy context-window usage requires budgeting. If Spud moves that meaningfully — in either direction — it changes which products are economical to build on top of it. The Sora shutdown happened because the unit economics didn't work; the next model's pricing will tell you whether OpenAI has internalized that lesson.
Context window and output limits. GPT-5.4 already has a 1M token context. Does Spud extend it, or is the progress somewhere else? Larger context changes what's possible for long-running agents and large-codebase refactors. See the GPT-5.4 Computer Use breakdown for the shape of what's already possible at current limits.
Tool-use and agentic reliability. The interesting question isn't "can Spud generate a Minecraft clone in one shot." It's "can it run a 40-step agentic workflow without drifting." That's measured by SWE-bench Verified, Terminal-Bench, and OSWorld — not by Three.js demos.
Latency and tokens per second. Early framings describe Spud as "token-efficient, higher tokens/sec, faster inference." If that holds, it matters a lot for real-time applications. Wait for independent measurement before believing the framing.
The model card. Whatever OpenAI publishes on the day of release is the honest version of what the model is. Read it before reading any takes — including this one.
The dropdown will keep surprising you
Spud will ship. Possibly on a Tuesday. Possibly on a Thursday. Possibly, based on the prediction markets, before April is out. When it does, the first wave of coverage will be wrong about something important. It always is.
The leak economy around OpenAI has become good enough that by the time the press release drops, the narrative is already baked. "It beats Opus." "It's a step toward GPT-6." "It'll reshape agentic AI." Some of those might turn out to be true. None of them are known right now. The gap between a frontier model's ceiling and its floor is months of real-world eval, not a viral weekend.
For developers, the useful posture is the one that works every time a new model ships: wait for the real numbers, run it against your own workload, and trust your own tenth-percentile outputs over somebody else's curated ceiling.
The dropdown picker will keep surfacing names that aren't supposed to be there. That's the feature, not the bug.
Building a front-end while the next model cooks? DevPik's [free developer tools](https://devpik.com) — JSON, text, encoders, converters — are 100% client-side. Your code never leaves your browser.
Sources & references
- The Information — original reporting on the "Spud" codename and pretraining completion (late March 2026).
- Polymarket — "GPT-5.5 released by..." prediction market for release-window odds.
- Sam Altman — internal remarks on pace and capability, reported across multiple outlets.
- OpenAI Help Center — model release notes for confirmed model versions currently in ChatGPT.
- DevPik coverage — Claude Opus 4.7 benchmarks, GPT-5.4 Computer Use, OpenAI's Spud policy paper, the Sora shutdown.





