Google forks the TPU
For the first time in a decade of TPUs, Google built two chips instead of one — a training chip and a serving chip — because the economics of the two jobs have pulled apart.
For ten years Google built one general-purpose AI chip per generation, tuned to handle both halves of the job: the enormous batch math of training a model and the latency-sensitive trickle of serving it to users. The eighth generation, shown at Cloud Next in April, ends that. There are now two chips — a training part and an inference part — on diverging roadmaps, designed by different partners and assembled into the same data centers.
Google still trails Nvidia by roughly three to one per chip; its entire case is that the contest is decided at pod scale, not per socket.
The split is an admission, not a flex. Training and serving have always wanted different hardware; for years it was cheaper to compromise on one design than to maintain two. That it now pays to fork the silicon says the two workloads' cost curves have separated far enough to justify the expense — the same bet Amazon made years ago with its Trainium and Inferentia chips, now made at hyperscaler scale.
The numbers Google led with deserve a squint. Its headline compute figure is measured in 4-bit precision, which inflates it against the higher-precision math most comparisons use, and per individual chip Google still trails Nvidia by roughly three to one. Google's actual argument is about scale: tens of thousands of chips wired into one logical cluster over a single fabric, where its lead lives — not in any one socket. Whether that wins depends on whether the workloads you run can be spread across the whole machine.
For the companies renting this — Anthropic, Google's own labs, and outside cloud customers — the pitch is roughly twice the serving volume at the same cost. If it holds, the consequence is competitive: it gives the one credible alternative to Nvidia a cheaper way to keep pace, by paying Broadcom and MediaTek to fabricate two specialized chips rather than one chip that does everything adequately.
The lenses
The facts
Concepts
How this connects
Tap a node to open it