Claude Opus 4.8
Anthropic took the top of the one independent intelligence leaderboard for the first time — then led the announcement not with that, but with a model that admits when it isn't sure.
Six weeks after Opus 4.7, Anthropic shipped Opus 4.8 and, at launch, edged past OpenAI to the top of the Artificial Analysis Intelligence Index — the one composite score run by a disinterested third party — at 61.4, about a point clear of GPT-5.5. It is the first time the independent crown has sat with Anthropic rather than OpenAI.
Anthropic's headline number isn't the leaderboard win — it's a model ~4x less likely than 4.7 to let a flaw it wrote pass unflagged, turning that flag into a signal for human review.
The crown is the thinner half of the story. The margin is roughly a point, the index moves constantly, and a newer Claude variant had already passed 4.8 within days. The larger coding gaps Anthropic quotes — a wide lead on the hard SWE-Bench Pro test, being the only model to finish every task in its own agent benchmark — are the company's own numbers, not anyone else's, and several analysts note such cross-vendor gaps shrink once you account for how each model was run.
What Anthropic actually led with is quieter: calibration. The headline feature is that the model is roughly four times less likely than its predecessor to let a flaw in code it wrote pass without flagging it. That is not better code — it is a model that tells you where it is unsure. One analyst's read: a flagged output becomes a first-class signal for human review rather than something to catch later. The day it could have led with its first independent #1, the lab instead opened with a model that volunteers its own doubt.
Switch the model id to claude-opus-4-8 in claude.ai or the API and watch where it volunteers that it isn't sure.
Try it at anthropic.com →The lenses
The facts
Concepts
How this connects
Tap a node to open it