Mentatcurated
Artificial Intelligence high · independent

The Einstein test

Demis Hassabis wants AGI judged by whether a model fed only physics from before 1905 can re-derive Einstein's special relativity on its own — and a hobbyist quietly ran the test.

At Google I/O in May, the Google DeepMind chief narrowed his guess for human-level AI to "around 2030, plus or minus a year" and called today's agents "a practice run." But the durable idea wasn't the date. It was the bar he wants to replace it with: don't measure AGI by task scores, measure it by discovery. Train a model on the world's knowledge frozen before 1905, hand it the era's puzzling experimental data, and see if it can invent special relativity the way Einstein did. Current systems, Hassabis said, "clearly can't."

It produced flashes of relativity, then explained particles "in terms of steam engines and rivers."

The framing isn't his alone — a 2025 paper laid out the same frozen-knowledge relativity test, and Hassabis is popularizing it rather than coining it. What makes it concrete is that someone already built a working version. Independent researcher Michael Hla spent about a month training a small model on roughly 22 billion words of pre-1900 text and period physics, then fed it the data of the day. It threw off flashes — at one point concluding from light experiments that light "must be composed of disconnected parts," brushing up against quantum theory — then explained particles in terms of steam engines and rivers, its Victorian training leaking out as metaphor.

Hla's own verdict was that the near-misses were "sophisticated plausibility matching rather than genuine physical intuition" — pattern, not insight. That gap is the whole point: a benchmark that rewards rederiving known physics may only catch a model good at sounding like a 1905 journal. And the field can't even agree on the question. While Hassabis says AGI is years away, others argue it already arrived with the first large language models around 2020 and nobody noticed. The value of the Einstein test is that it forces a falsifiable answer onto a word that has had none.

The lenses

Novelty 3
Impact · breadth 3
Impact · depth 3
Actionable 2
Substance 3
Hype 4

The facts

What it isA proposed yardstick for AGI: invent known physics from frozen historical knowledge, rather than ace a benchmark
Has it been done?An independent researcher built a small model and ran it — flashes of physics, but no genuine derivation
The verdict so far"Plausibility matching, not physical intuition" — nothing passes the test yet
Open axios.com →

How this connects

Tap a node to open it