AI benchmarks
Standardised tests used to measure what AI models can do — and the main proxy for tracking whether the field is making progress.
Leads to
A benchmark is a fixed dataset of tasks with known correct answers: a model runs it, a number comes out, and labs use that number to compare models and claim progress. The hard part is the gap between the number and real capability. A model can improve on a benchmark by seeing similar questions during training (contamination), by being tuned to the test format, or because the tasks were too narrow to begin with. When every frontier model clusters near the ceiling, the benchmark stops discriminating — which is what "saturated" means. New benchmarks get harder, the cycle repeats, and the field debates whether any of it tracks what we actually care about.
Where it came from
In megatrends
Related players
How this connects
Tap a node to open it









