Artificial Intelligence high · first-party

nanochat

Andrej Karpathy packed the whole ChatGPT recipe — from raw text to a chatbot you talk to in a browser — into one readable codebase that trains for about the cost of a nice dinner.

repo Andrej Karpathy · 1 min read

Rent a single eight-GPU machine, run one script, and a little under four hours and $92.40 later you have trained a ChatGPT clone you can open in a browser and talk to. That is the speedrun Karpathy clocked for nanochat: a self-contained repo of roughly 8,300 lines that walks through the entire modern language-model pipeline — building the tokenizer, pretraining on web text, fine-tuning it to chat, an optional round of reinforcement learning, and the inference server and web UI to use it.

"Our model isn't even sure about the color of the sky so we're probably safe on the biohazard side of things for now." — Andrej Karpathy

His earlier teaching repo, nanoGPT, only covered the first step. nanochat is the first to carry a reader all the way to a working assistant, and the point is less the model than the readability: the whole thing is meant to be forked and understood, not configured. The price is the other half of the story. Training a comparable model — GPT-2 — cost around $43,000 in 2019; the same shape of thing now fits on a single afternoon's GPU rental.

The catch, stated plainly by its author, is that $100 buys a deliberately feeble model — it scores under five percent on grade-school math and gets basic facts wrong. The achievement is not the chatbot but the map: the complete path from text to talkable model, cheap enough that any student can run the whole thing once and watch every stage happen.

Want to try it?

Clone karpathy/nanochat and read speedrun.sh — it is the single script that runs the whole pipeline end to end, and the cheapest way to watch every stage of a ChatGPT happen once.

Open the repo at github.com →

The lenses

Novelty 3

Impact · breadth 3

Impact · depth 4

Actionable 4

Substance 5

Hype 4

The facts

Cost / time~$100, about 4 hours on one rented 8-GPU node

Open?MIT-licensed, ~8,300 lines across 44 files

Scalinga depth knob trades up — ~$1,000 / ~40h buys a markedly stronger model

Contextcapstone project for Karpathy's LLM101n course

Concepts

Supervised fine-tuning Scaling laws Local inference

Open github.com →

How this connects

Tap a node to open it

nanochat

The lenses

The facts

Concepts

More in Artificial Intelligence

Safety's rounding error

The Jevons bill comes due

Money stopped being the bottleneck

How this connects