Mentatcurated
Artificial Intelligence high · independent

The forecasting gap, in Brier points

On ForecastBench, the best AI now trails the world's top human forecasters by about 0.017 of a probability-scoring point — close enough that the institute behind it projects the machines will draw level later this year.

As of this spring the humans are still ahead, but barely. On ForecastBench — a live tournament of thousands of unresolved questions about the world, elections, economic numbers, sports, weather, answered by both leading language models and by superforecasters — the gap between the best model and the humans is about 0.017 on the scoring scale, roughly a 20 percent edge, and roughly one year of model improvement at the current rate. The superforecasters are the top sliver of human predictors who, in earlier work, beat intelligence analysts with classified access by roughly 30 percent; forecasts are scored once reality settles them, on a measure that rewards being both confident and right.

The human baseline was frozen in 2024 while the models keep getting fresh weekly attempts — so some score well by chance.

The institute's own extrapolation puts parity on the easier 'dataset' questions around the middle of 2026 and overall parity around November, with a confidence range that stretches into 2027. No single model owns the milestone; the strongest entries are a shifting set of frontier systems and ensembles.

The sharper story is the argument over whether the finish line is honestly drawn. Superforecasters who looked at the benchmark say it is tilted toward machines: the human baseline was frozen in 2024 while the models keep getting fresh attempts, many questions lean on the data-heavy domains where computers already excel, and a model that takes several swings a week will land some good scores by luck. By that reading, the headline 'AI matches superforecasters' — whenever it arrives — measures how well a model answers ForecastBench, not whether it can actually out-think a human about an uncertain future.

Which is the useful caution for a year that will almost certainly produce that headline. The number closing is real and worth watching; the claim it will be sold as is a narrower thing than it sounds.

The lenses

Novelty 2
Impact · breadth 2
Impact · depth 3
Actionable 2
Substance 4
Hype 2

The facts

Current gapBest AI trails top humans by ~0.017 Brier points (~20%)
Projected parityDataset questions ~mid-2026, overall ~Nov 2026 — projected, not reached
Who's aheadSuperforecasters still rank #1 overall
Open forecastingresearch.substack.com →

How this connects

Tap a node to open it