Artificial Intelligence high · first-party

Google forks the TPU

For the first time in a decade of TPUs, Google built two chips instead of one — a training chip and a serving chip — because the economics of the two jobs have pulled apart.

demo Google · 2 min read

For ten years Google built one general-purpose AI chip per generation, tuned to handle both halves of the job: the enormous batch math of training a model and the latency-sensitive trickle of serving it to users. The eighth generation, shown at Cloud Next in April, ends that. There are now two chips — a training part and an inference part — on diverging roadmaps, designed by different partners and assembled into the same data centers.

Google still trails Nvidia by roughly three to one per chip; its entire case is that the contest is decided at pod scale, not per socket.

The split is an admission, not a flex. Training and serving have always wanted different hardware; for years it was cheaper to compromise on one design than to maintain two. That it now pays to fork the silicon says the two workloads' cost curves have separated far enough to justify the expense — the same bet Amazon made years ago with its Trainium and Inferentia chips, now made at hyperscaler scale.

The numbers Google led with deserve a squint. Its headline compute figure is measured in 4-bit precision, which inflates it against the higher-precision math most comparisons use, and per individual chip Google still trails Nvidia by roughly three to one. Google's actual argument is about scale: tens of thousands of chips wired into one logical cluster over a single fabric, where its lead lives — not in any one socket. Whether that wins depends on whether the workloads you run can be spread across the whole machine.

For the companies renting this — Anthropic, Google's own labs, and outside cloud customers — the pitch is roughly twice the serving volume at the same cost. If it holds, the consequence is competitive: it gives the one credible alternative to Nvidia a cheaper way to keep pace, by paying Broadcom and MediaTek to fabricate two specialized chips rather than one chip that does everything adequately.

The lenses

Novelty 3

Impact · breadth 4

Impact · depth 4

Actionable 1

Substance 4

Hype 4

The facts

WhatTwo purpose-built TPUs — one for training, one for inference — replacing the single design

AvailabilityOn Google Cloud later in 2026; broader silicon timeline reported around 2027

Claim to watchRoughly 2x serving volume at the same cost — a vendor figure, not independently verified

Concepts

AI infrastructure AI chips

Open blog.google →

How this connects

Tap a node to open it

Google forks the TPU

The lenses

The facts

Concepts

More in Artificial Intelligence

Safety's rounding error

The Jevons bill comes due

Money stopped being the bottleneck

How this connects