Alcazar · Technical Blog

Technical notes, architecture writeups, and release stories.

RSS feed

Published Apr 9, 2026

Claude mythos is the first model Anthropic didn't really release

Claude Mythos matters for one reason above all: Anthropic seems to think it trained a model good enough at finding and exploiting bugs that the normal release playbook no longer made sense.

That is more interesting than another few points on GPQA.

Instead of putting Mythos into broad general availability, Anthropic put it behind Project Glasswing, limited access to a small set of defenders and infrastructure companies, published a long cybersecurity writeup on its Frontier Red Team blog, and paired it with a public alignment risk report. That does not prove every claim is right. It does tell us Anthropic thinks Mythos is different in a way that matters operationally, not just in marketing. The story became public in stages; Fortune reported a leak and Anthropic’s confirmation before the full April launch, and SecurityWeek summarized the dual-use framing others picked up.

My read is simple:

  • Claude Mythos looks like a real step up over Claude Opus 4.6
  • the biggest jump appears in agentic coding and cybersecurity, not in casual chat
  • on AGI timelines, it looks more like acceleration inside the current curve than a clean break from it
  • the most important missing number is still the one many people care about most: we do not have a public METR time-horizon score for Mythos

What Anthropic has actually said

The basic facts are now public.

Anthropic says Claude Mythos Preview is a general-purpose frontier model, stronger than its prior public models, but unusually strong at cybersecurity tasks. It says the model found thousands of high-severity vulnerabilities, including bugs in every major operating system and every major browser, and that it can autonomously turn some of those bugs into working exploits. Anthropic also says it does not plan to make Mythos Preview generally available for now. Instead, it is giving access to a narrow group of defenders through Project Glasswing, including organizations like AWS, Google, Microsoft, Cisco, CrowdStrike, Palo Alto Networks, JPMorganChase, and the Linux Foundation.

That is already unusual.

Labs often say a new model is powerful. They do not often say, in effect, “we are going to keep this one mostly off the public shelf because the cyber capabilities look too spicy.”

Anthropic’s own risk report also says Mythos is the most capable model it has trained, that it is used internally for coding and agentic work, and that overall alignment risk remains “very low, but higher than for previous models.” In other words, Anthropic is not claiming Mythos is rogue. It is claiming the capability jump is large enough that the old comfort level no longer applies automatically.

The company has been telegraphing a cyber inflection for a while. In building AI for cyber defenders it argued models had crossed from theory to practice on security tasks and that defenders needed to adopt AI or cede advantage. Glasswing is the operational version of that argument.

The benchmark picture

On public numbers, Mythos looks strongest in the places that matter for real software work.

Compared with Opus 4.6, Anthropic reports (figures from the Glasswing announcement):

Some of those gains are moderate. Some are not.

The biggest story here is not GPQA. Going from 91.3% to 94.6% on a hard science benchmark is real, but it is not the part that should reset your mental model.

The bigger story is the cluster:

  • big jump on harder software engineering benchmarks
  • big jump on terminal work
  • big jump on cybersecurity reproduction
  • decent jump on computer use

That cluster fits the release decision. Anthropic is not behaving like it trained a nicer chatbot. It is behaving like it trained a stronger software operator.

There are two caveats worth keeping in your head the whole time:

First, these are mostly vendor-reported numbers.

Second, Anthropic itself notes possible memorization concerns on Humanity’s Last Exam, and says it ran memorization screens on the SWE-bench family. So these results are informative, but they are not sacred tablets.

The cyber claims are the real headline

The most striking claims are not the benchmark tables. They are in Anthropic’s cyber writeup.

Anthropic says Mythos can identify and exploit zero-day vulnerabilities across major operating systems and browsers. More concretely, it gives several examples:

  • a 27-year-old OpenBSD bug (Anthropic links a now-patched advisory path)
  • a 16-year-old FFmpeg vulnerability (fixes landed in FFmpeg 8.1 among others)
  • a 17-year-old FreeBSD remote code execution bug triaged as CVE-2026-4747
  • multiple exploit chains in the Linux kernel (one commit Anthropic cites is e2f78c7ec165)

The numbers in the writeup are even more eye-catching than the examples.

Anthropic says that when it re-ran a Firefox 147 exploit-development benchmark, Opus 4.6 produced working exploits only 2 times out of several hundred tries, while Mythos produced working exploits 181 times and got register control 29 more times.

It also says that on roughly 7000 entry points from the OSS-Fuzz corpus, earlier models mostly topped out at low-severity crashes, while Mythos got 595 crashes at tiers 1 and 2, a handful at tiers 3 and 4, and 10 full control-flow hijacks at tier 5.

If those numbers are broadly right, then the right frame is not “AI is helping with security now.” The right frame is “the cost of searching for serious bugs may be dropping fast enough to change who finds them first.”

That is the scary part and the exciting part at the same time.

Anthropic’s argument is that defenders should still win in the long run, because the same models that find bugs can also patch them, review code, and harden systems at scale. I think that is probably right over a long enough time horizon. But the transition could still be ugly. If model capability improves faster than patching pipelines, testing infrastructure, and boring institutional response, attackers get a window. Anthropic follows coordinated vulnerability disclosure for what it finds; the question is whether the rest of the ecosystem can absorb the pace.

That seems to be exactly what Anthropic is worried about.

So where does Mythos fit on AGI timelines?

This is where people start overclaiming.

Dario Amodei has been publicly arguing that we are near “the end of the exponential” and that something like a “country of geniuses in a data center” could show up on a 1-3 year horizon, with very high confidence on a 10-year view. You can hear it in the Dwarkesh Patel interview and the YouTube recording. If you read those comments in 2024, they sounded to many people like frontier-lab chest beating. In 2026, Mythos makes them sound more grounded, though still not settled.

The strongest evidence for near-term AGI has never been “the model got a little better at trivia.” It is the idea that models keep getting better at economically real, verifiable, tool-using work. Software is the cleanest example because code lives in text, terminals, test suites, logs, diffs, and reproducible environments. It is exactly the kind of world language models can inhabit.

On that front, Mythos is a bullish datapoint.

But it is not a complete one.

The cleanest outside metric we have for long-horizon autonomy is still METR’s task-completion time horizon. They explain the methodology in Measuring AI ability to complete long tasks and the underlying paper. METR currently has public numbers for Opus 4.6, which lands around 14.5 hours at the 50% success level on its updated benchmark. That is already a big deal. It is well above where frontier models were not long ago.

The problem is that we do not have a public METR score for Mythos.

So if someone tells you Mythos proves week-long autonomous software work is here, they are getting ahead of the evidence. The honest answer is narrower:

  • Mythos strongly suggests the current curve is still moving
  • Mythos strengthens the case that software and cyber will keep going first
  • Mythos does not yet close the loop on broad autonomous work the way a fresh METR score would

For a different slice of “autonomy,” RE-Bench looks at ML research engineering tasks; it is a reminder that “long horizon” means different things in different domains. METR’s review of Anthropic’s Opus 4.6 sabotage risk report is also worth reading if you care how third parties stress-test lab risk narratives.

Are we accelerating, or just staying on pace?

My answer is: both, depending on the slice.

On broad reasoning, Mythos looks like strong continued progress. GPQA, HLE, BrowseComp, and OSWorld all improve, but they do not scream “new paradigm.”

On agentic coding and cyber, Mythos looks more like acceleration.

That distinction matters. A lot of the public AGI argument gets confused because people average together very different capability domains. If you average everything into one vibe, you miss what is actually changing first.

The sharpest thing to say is this:

Mythos looks ahead of the previous pace where the real world is easiest to score and easiest to monetize: coding, terminals, exploit generation, bug finding, and computer-use loops.

That does not mean the whole AGI debate is over. It does mean the “maybe models are kind of plateauing” story is getting harder to tell with a straight face, at least for software-adjacent work.

Still, I would not call Mythos a clean discontinuity yet. It looks more like a steepening continuation of the same story we have been watching since agentic coding started to feel real.

Anthropic’s Economic Index shows coding and API automation still dominating real usage; the January primitives report ties task success to estimated human time in ways that rhyme with METR’s story. None of that is Mythos-specific, but it is context for why software-shaped benchmarks keep biting first.

A few things you might care about

The first is benchmark saturation.

Anthropic says Mythos now saturates enough older cyber evaluations that it has shifted toward real-world zero-day discovery. Cybench (paper) is the kind of CTF-style suite labs have used to track this; when models stop separating on familiar suites, the action moves to messier targets. That is a bigger deal than another leaderboard shuffle.

The second is friction-based defense.

Anthropic explicitly argues that some defenses work mainly by making exploitation tedious, not impossible. That matters because tedious work is exactly what you can throw cheap, parallel model runs at. If a mitigation’s main virtue is “this would take a human exploit developer a long weekend,” that mitigation may age badly.

The third is memory-safe triumphalism.

Mythos reportedly found bugs not only in classic C and C++ targets but also in “memory-safe” systems where unsafe boundaries still existed. Anthropic points to a critical Botan TLS issue as one cryptography-library example. That is not an argument against memory-safe languages. It is an argument against thinking language choice makes security easy. The interesting security question is moving from “does this code use Rust?” to “where are the unsafe seams, protocol assumptions, and logic gaps?”

The fourth is the bottleneck shift.

If models get much better at bug discovery than organizations get at patch validation, rollout, and incident response, the limiting factor stops being finding problems. It becomes operations. Software teams may soon drown less in unknown bugs than in known bugs they cannot fix fast enough.

My take

Claude Mythos does not prove AGI.

It does not prove ASI.

It does not even prove Anthropic’s strongest cyber claims in the sense a fully independent public evaluation would.

But it does look like the first recent model where the vendor’s own behavior changed in a way that is more informative than the benchmark gains. Anthropic is telling us, through its release choices, that it thinks the cyber jump is real enough to warrant a different deployment regime.

That is the part I would take seriously.

If you only look at Mythos as “Opus plus some benchmark uplift,” you miss the main story. The main story is that software security may be entering the phase where scaled model search beats a lot of human intuition, human patience, and older defensive habits.

And if that is true, then Mythos is less a product launch than a warning shot.

← Back to Tech Log

Leave the right message behind

Set up encrypted messages, files, and instructions for the people who would need them most if something happened to you.

See the dead man's switch