Artificial Intelligence high · first-party

Claude Opus 4.8

Anthropic took the top of the one independent intelligence leaderboard for the first time — then led the announcement not with that, but with a model that admits when it isn't sure.

app Anthropic · 2 min read

Six weeks after Opus 4.7, Anthropic shipped Opus 4.8 and, at launch, edged past OpenAI to the top of the Artificial Analysis Intelligence Index — the one composite score run by a disinterested third party — at 61.4, about a point clear of GPT-5.5. It is the first time the independent crown has sat with Anthropic rather than OpenAI.

Anthropic's headline number isn't the leaderboard win — it's a model ~4x less likely than 4.7 to let a flaw it wrote pass unflagged, turning that flag into a signal for human review.

The crown is the thinner half of the story. The margin is roughly a point, the index moves constantly, and a newer Claude variant had already passed 4.8 within days. The larger coding gaps Anthropic quotes — a wide lead on the hard SWE-Bench Pro test, being the only model to finish every task in its own agent benchmark — are the company's own numbers, not anyone else's, and several analysts note such cross-vendor gaps shrink once you account for how each model was run.

What Anthropic actually led with is quieter: calibration. The headline feature is that the model is roughly four times less likely than its predecessor to let a flaw in code it wrote pass without flagging it. That is not better code — it is a model that tells you where it is unsure. One analyst's read: a flagged output becomes a first-class signal for human review rather than something to catch later. The day it could have led with its first independent #1, the lab instead opened with a model that volunteers its own doubt.

Want to try it?

Switch the model id to claude-opus-4-8 in claude.ai or the API and watch where it volunteers that it isn't sure.

Try it at anthropic.com →

The lenses

Novelty 2

Impact · breadth 4

Impact · depth 3

Actionable 5

Substance 4

Hype 3

The facts

AvailableNow, in claude.ai and the API

PriceUnchanged from 4.7

Independent rank#1 on the AA Intelligence Index at launch — by ~1 point, already passed since

The lead claimWider coding-benchmark gaps are Anthropic's own, not externally confirmed

Concepts

Frontier models AI alignment

Open anthropic.com →

How this connects

Tap a node to open it

Claude Opus 4.8

The lenses

The facts

Concepts

More in Artificial Intelligence

Safety's rounding error

The Jevons bill comes due

Money stopped being the bottleneck

How this connects