Mentatcurated
Artificial Intelligence high · first-party

The lab that wants a brake

In a new paper, Anthropic reports that Claude now writes more than 80% of the code merged into its own production codebase — then spends pages explaining why you shouldn't fully trust its own numbers.

A frontier lab put hard internal figures on AI building AI. In 'When AI Builds Itself,' Anthropic's Marina Favaro and Jack Clark report that as of May 2026 more than 80% of the code merged into Anthropic's production codebase is written by Claude, up from low single digits before Claude Code launched in early 2025, and that a typical engineer now merges roughly eight times as much code per day as in 2024.

Anthropic concedes its own eightfold productivity figure 'is almost certainly an overstatement of the true productivity gain.'

What makes the paper unusual is that it argues against itself. Anthropic concedes the eightfold figure 'is almost certainly an overstatement of the true productivity gain'; its own poll of 130 researchers put the median self-estimated uplift at about four times — and the paper adds that even that is probably inflated by the known habit of overrating AI's help. The 80% number is self-reported on a private codebase no outsider can audit. The leg that does hold up independently is METR's public benchmark of how long a task an AI can finish unaided: that horizon is now doubling every four months, down from seven, a curve the paper leans on rather than invents.

The genuinely new move is institutional, not technical. The lab building the fastest is the one asking the world to build a brake: the paper formally proposes the option of a verifiable, treaty-style mechanism by which competing labs and governments could coordinate a slowdown, which Anthropic says it would join if rivals verifiably did. Clark puts the odds of fully autonomous self-improvement before the end of 2028 at around 60%. A company pre-discounting its own evidence while requesting a pause button is a strange artifact — and a more honest one than the press cycle that repeated its headline stats at face value.

The lenses

Novelty 3
Impact · breadth 4
Impact · depth 3
Actionable 1
Substance 4
Hype 4

The facts

SourceAnthropic internal paper, self-reported figures (codebase numbers unaudited)
Independent checkMETR's public task-horizon benchmark corroborates the acceleration
The askA verifiable, multi-lab coordinated pause option Anthropic says it would join
Open anthropic.com →

How this connects

Tap a node to open it