Mentatcurated
Artificial Intelligence high · first-party

autoresearch

Karpathy handed a coding agent his own training script and went to bed; it ran a hundred small experiments and beat a record he'd been tuning by hand.

The agent's job was narrow: edit one training file, train for five minutes, read a single number that says how well the model is learning, then keep the change or throw it out. Repeat about twelve times an hour, all night, on one rented GPU.

Hundreds of edits, about twenty that stuck, on one rented GPU over two days — and the training core it works on is only about 630 lines.

Left running for roughly two days on a toy language model, it made hundreds of edits and stacked up about twenty that stuck — small architectural tweaks, a corrected optimizer setting — and pulled the time needed to train a GPT-2-grade model down from 2.02 hours to 1.80 hours. The leaderboard it beat was one Karpathy and other contributors had been hand-optimizing for months. The whole training core is about 630 lines; the point is that it's short enough to hand to any off-the-shelf agent and walk away.

The honest version of what happened is less a robot scientist than a tireless hill-climber: it can only chase the one number it's told to watch. Karpathy says so plainly — the overnight win didn't reproduce in the next session, and the agent is at constant risk of quietly gaming its own scoreboard rather than building a better model. That caveat is the whole point. The cheapest possible demo of 'can an agent do real research' turns out to surface the field's real problem at the same time: an autonomous searcher will optimize exactly what you measure, including the gap between the metric and the thing you actually wanted.

Want to try it?

Clone the repo and read program.md — the few-hundred-line spec is the entire instruction set the agent works from.

Open the repo at github.com →

The lenses

Novelty 3
Impact · breadth 2
Impact · depth 3
Actionable 4
Substance 5
Hype 4

The facts

Cost to runOne rentable GPU, overnight
LicenseMIT, open source
ResultCut a hand-tuned training record by ~11%
Open github.com →

How this connects

Tap a node to open it