OpenAIGPT-5.5SpudChatGPTClaude Opus 4.7AI ModelsLLMAI Leaks

Inside Spud: The GPT-5.5 Leak Tearing Through ChatGPT

Spud. Crest Pro Alpha. Viral Three.js demos. The GPT-5.5 leak cycle is running hot — and most of what people are saying about it isn't real yet.

ByMuhammad TayyabPublished:April 20, 20269 min read

Inside Spud: The GPT-5.5 Leak Tearing Through ChatGPT

Back to Blog

The model that wasn't there

Late March. Somebody on ChatGPT Plus opens their model picker and spots a name that isn't supposed to be there. The screenshot that started circulating on X and r/singularity labeled it "Crest Pro Alpha" — or so the poster said. No documentation. No help-center article. No changelog entry. Within a week the leak-hunting subreddits had collapsed on a single three-letter codename being passed around like a password at a speakeasy: Spud.

Spud, if you believe the leaks, is GPT-5.5. Possibly GPT-6 — even that is unsettled. It is the frontier model OpenAI finished pretraining on March 24, the one Sam Altman described internally as "very strong" and "unfolding faster than anticipated." The one Polymarket currently gives around a 78% chance of releasing before April is out.

As of today, it still hasn't shipped.

What "Spud" actually is

Here's what is load-bearing in all this and what isn't.

What's reported: OpenAI finished pretraining a new frontier model around March 24, 2026. Altman told staff "a few weeks." The Information attached the codename Spud. Prediction markets currently price a release by April 30 at roughly 78%, and by June 30 at over 95%.

What's not confirmed: whether Spud ships as GPT-5.5, GPT-6, or something else entirely. OpenAI has reportedly said the version number will depend on how significant the capability jump over GPT-5.4 turns out to be — which is either a tell that the company itself hasn't decided, or more honestly, that it is still watching internal evals to figure out what number justifies the marketing lift.

What's outright speculation: almost everything about the "Crest Pro Alpha" label floating around on screenshots. No OpenAI help article mentions it. It could be a routed-to-Spud alpha. It could be a tuned variant of GPT-5.4 Pro that gets rolled back next Thursday. It could be three different people seeing three different anonymous checkpoints and assuming they were the same thing. Anonymous checkpoints in ChatGPT are routine at this point. Most of them never become named releases.

That's the unsexy truth buried under every "I FOUND GPT-6" YouTube thumbnail: the signal is real, but it's noisier than anyone wants to admit.

The Friends apartment problem

The demos are the fun part. They're also where skepticism earns its keep.

The clip that moved the hype needle most wasn't the Minecraft clone, or the Three.js solar system with the asteroid belt, or the flight sim buzzing over Manhattan. It was a Three.js render of Monica's apartment from Friends. Purple walls. The big window. The yellow frame around the peephole. The weird spatial logic of a sitcom set that's bigger on the inside than any Manhattan apartment has any right to be. Generated in one shot from a text prompt.

That demo is genuinely interesting, and not for the reason it went viral. It's interesting because the model wasn't just emitting geometry — it was reasoning about a spatial layout it has only ever seen flattened across 25 years of 2D episode rewatches. Translating that memory into Three.js means mapping a pop-culture fragment to a 3D coordinate system. That's a real capability jump if it's reliable.

The operative word is if. Here's the thing about one-shot demos: they tell you the ceiling, not the reliability. The Friends apartment is the version the poster liked. It is not the nineteen versions before it where the couch clipped through the wall, or the eight versions after where the lighting was unusable. A frontier model's ceiling matters less than the tenth percentile of its attempts — the one you'd have to ship to users. And nobody posts the tenth percentile.

The Minecraft clone with infinite terrain and breakable blocks. The solar system with properly-scaled orbits. The Windows 11 desktop with functional SVG icons in Edge, Notepad, and Paint. Same caveat every time. Impressive ceilings. Unknown floors. AI Twitter has been doing the "I built [complex thing] in one prompt" genre since GPT-4 shipped. The delta is real. The viral format is structurally incapable of measuring it.

The anonymous checkpoint economy

OpenAI ships this way for a reason, and it's worth sitting with.

Anonymous checkpoints in ChatGPT — "Crest Pro Alpha" today, gpt2-chatbot in the LMSYS arena last year, quasar-alpha on OpenRouter a few months ago — are structured A/B tests. A small slice of users gets silently routed to a new model. Their conversations provide real-world eval data that no synthetic benchmark can match. The checkpoint either graduates to a named release, gets quietly rolled back, or gets folded into something else.

For OpenAI, this is infrastructure. For the leak-hunting subculture that has built up around it, it's a sport. There are now subreddits, Discords, and X accounts whose entire output is screenshot-grepping for unfamiliar model names in the ChatGPT dropdown. The first person to spot each new checkpoint gets the engagement. The post-hoc "this is secretly GPT-6" speculation gets the clicks.

This is the part of the story that actually matters, more than any single demo: OpenAI has trained an ecosystem to generate hype on its behalf, for free, between official releases. The SPUD policy paper and the $122B funding round got their own news cycles. The Sora shutdown got its own. But the underlying model story gets a rolling cycle, distributed across every dropdown screenshot and every user silently routed to an anonymous checkpoint. It's the most efficient marketing pipeline any AI company has built, and it works because most of it isn't coordinated at all.

It also means that by the time real benchmarks arrive, the narrative is already set. Spud is "better than Opus 4.7" in the collective imagination before anyone has run SWE-bench against it.

About that Opus 4.7 claim

The viral framing around the Spud demos is that it beats Claude Opus 4.7 at agentic and front-end tasks. The claim is repeated so confidently in YouTube titles and Reddit threads that it's starting to calcify into conventional wisdom.

It's also, right now, a vibe.

Opus 4.7 shipped on April 16 with numbers attached: SWE-bench Verified, WebDev Arena, the full battery. You can look them up. Spud hasn't shipped. There is no pricing, no published context window size, no model card, no independent eval. The claim that it's "noticeably better than Opus 4.7 at times" is sourced entirely to cherry-picked one-shot generations posted by enthusiasts who did not run the same prompt on both models and post both outputs.

That doesn't mean the claim is wrong. Spud could, in fact, clear Opus 4.7 on some meaningful axis. The GPT-5.4 → GPT-5.5 delta is reportedly the biggest capability jump OpenAI has shipped in over a year. But could and does aren't the same word, and the honest answer right now is: nobody running SWE-bench, OSWorld, or WebDev Arena on Spud has published their numbers, because those people don't have the model yet.

When they do, and when OpenAI publishes a model card, that's the comparison worth having. Until then, the Opus-vs-Spud debate on AI Twitter is two sets of cherry-picked screenshots arguing past each other.

What to actually watch for when Spud drops

The demos are not what you want to be paying attention to when this thing ships. Here's what is.

Pricing. GPT-5.4 is already at a price point where running evals, long-context agents, and anything with heavy context-window usage requires budgeting. If Spud moves that meaningfully — in either direction — it changes which products are economical to build on top of it. The Sora shutdown happened because the unit economics didn't work; the next model's pricing will tell you whether OpenAI has internalized that lesson.

Context window and output limits. GPT-5.4 already has a 1M token context. Does Spud extend it, or is the progress somewhere else? Larger context changes what's possible for long-running agents and large-codebase refactors. See the GPT-5.4 Computer Use breakdown for the shape of what's already possible at current limits.

Tool-use and agentic reliability. The interesting question isn't "can Spud generate a Minecraft clone in one shot." It's "can it run a 40-step agentic workflow without drifting." That's measured by SWE-bench Verified, Terminal-Bench, and OSWorld — not by Three.js demos.

Latency and tokens per second. Early framings describe Spud as "token-efficient, higher tokens/sec, faster inference." If that holds, it matters a lot for real-time applications. Wait for independent measurement before believing the framing.

The model card. Whatever OpenAI publishes on the day of release is the honest version of what the model is. Read it before reading any takes — including this one.

The dropdown will keep surprising you

Spud will ship. Possibly on a Tuesday. Possibly on a Thursday. Possibly, based on the prediction markets, before April is out. When it does, the first wave of coverage will be wrong about something important. It always is.

The leak economy around OpenAI has become good enough that by the time the press release drops, the narrative is already baked. "It beats Opus." "It's a step toward GPT-6." "It'll reshape agentic AI." Some of those might turn out to be true. None of them are known right now. The gap between a frontier model's ceiling and its floor is months of real-world eval, not a viral weekend.

For developers, the useful posture is the one that works every time a new model ships: wait for the real numbers, run it against your own workload, and trust your own tenth-percentile outputs over somebody else's curated ceiling.

The dropdown picker will keep surfacing names that aren't supposed to be there. That's the feature, not the bug.

Building a front-end while the next model cooks? DevPik's [free developer tools](https://devpik.com) — JSON, text, encoders, converters — are 100% client-side. Your code never leaves your browser.

Sources & references

The Information — original reporting on the "Spud" codename and pretraining completion (late March 2026).
Polymarket — "GPT-5.5 released by..." prediction market for release-window odds.
Sam Altman — internal remarks on pace and capability, reported across multiple outlets.
OpenAI Help Center — model release notes for confirmed model versions currently in ChatGPT.
DevPik coverage — Claude Opus 4.7 benchmarks, GPT-5.4 Computer Use, OpenAI's Spud policy paper, the Sora shutdown.

Frequently Asked Questions

What is "Spud" by OpenAI?▾

Spud is the reported codename for OpenAI's next frontier model, first attached to the project by The Information in late March 2026. OpenAI finished pretraining the model around March 24, 2026, and Sam Altman told staff it would ship in "a few weeks." Whether Spud ultimately ships as GPT-5.5, GPT-6, or under a different name has not been publicly confirmed.

Is Spud the same as GPT-5.5?▾

Most leak coverage treats Spud and GPT-5.5 as the same model, but OpenAI has not officially confirmed the version number. The company has reportedly said the final version designation will depend on how large the capability jump over GPT-5.4 turns out to be. In other words: if the delta is big enough, it ships as GPT-6; if it's incremental, it ships as GPT-5.5. Until OpenAI publishes a model card, treat the GPT-5.5 label as speculation.

What is "Crest Pro Alpha" in ChatGPT?▾

"Crest Pro Alpha" is a model name that some users have claimed to spot in their ChatGPT model picker. It has not been documented by OpenAI in any help article, release note, or changelog. It is most likely an anonymous A/B checkpoint — OpenAI routinely silently routes a slice of users to unnamed model variants for real-world evaluation. Whether it is specifically the Spud checkpoint, a variant of GPT-5.4 Pro, or something else entirely is not verifiable from the public screenshots alone.

When will GPT-5.5 (Spud) be released?▾

As of April 20, 2026, GPT-5.5 has not been officially released. Sam Altman told OpenAI staff on March 24 to expect a release in "a few weeks." Polymarket prediction markets currently assign roughly a 78% probability to a release by April 30, 2026, and over 95% by June 30. OpenAI typically ships new models on Tuesdays or Thursdays.

Is Spud / GPT-5.5 better than Claude Opus 4.7?▾

There is no verified benchmark comparison yet. Claims that Spud beats Claude Opus 4.7 come from cherry-picked one-shot demos posted to X, YouTube, and Reddit, not from SWE-bench Verified, WebDev Arena, or OSWorld scores. Opus 4.7 shipped with published benchmarks on April 16, 2026; Spud has not shipped at all. The honest answer is: wait for OpenAI's model card and independent evals before trusting either side of the comparison.

What is the difference between GPT-5.4 and GPT-5.5?▾

GPT-5.4 is OpenAI's current top frontier model in ChatGPT and the API, shipped March 5, 2026, with a 1M token context window and strong computer-use / OSWorld results. GPT-5.5 (codenamed Spud) is the next-generation model that finished pretraining on March 24, 2026, but has not yet been released. Leaked framings describe it as "token-efficient with higher tokens-per-second" and stronger on agentic and front-end tasks, but those claims are not yet independently verified.

Why does OpenAI use anonymous model checkpoints?▾

Anonymous checkpoints let OpenAI run real-world A/B tests without committing to a named release. A slice of users gets silently routed to a new model; their conversations provide evaluation data that no synthetic benchmark can match. The checkpoint either graduates to a named public release, gets quietly rolled back, or gets merged into a future variant. OpenAI has used this pattern with gpt2-chatbot, quasar-alpha, o3-alpha, and others — most anonymous checkpoints never become named models.

What should developers actually watch for when Spud launches?▾

Five things worth more than viral demos: (1) pricing — does it shift the unit economics of AI products; (2) context window — does it extend beyond GPT-5.4's 1M tokens; (3) agentic reliability measured by SWE-bench Verified, Terminal-Bench, and OSWorld rather than one-shot clones; (4) latency and tokens per second for real-time applications; (5) the model card OpenAI publishes on release day, which is the only source more reliable than enthusiast screenshots.

Written by

Muhammad Tayyab

CEO & Founder at Mergemain

Muhammad Tayyab builds free, privacy-first developer tools at DevPik. He writes about AI trends, developer tools, and web technologies.

LinkedIn View all articles

base64encoding

What Is Base64 Encoding? A Complete Guide for Developers

Base64 encoding converts binary data into ASCII text. Learn how it works, common use cases in web development, and how to encode and decode Base64 strings instantly.

jsonformatting

JSON Formatting Best Practices: How to Read & Debug JSON Data

JSON is the backbone of modern APIs. Learn best practices for formatting, validating, and debugging JSON data to write cleaner code and fix errors faster.

word countseo

The Ultimate Guide to Word Count: Why It Matters for SEO & Writing

Word count impacts SEO rankings, reader engagement, and content quality. Learn the ideal word counts for different content types and how to count words accurately.

uuidunique identifiers

Understanding UUIDs: What They Are and When to Use Them

UUIDs provide unique identifiers without a central authority. Learn about UUID versions, use cases in databases and APIs, and generate them instantly with our free tool.