The model rewrote a third of the protein
OpenAI's protein language model redesigned two of the factors that turn an adult cell back into a stem cell — and the winning variants differ from the natural proteins by more than a hundred amino acids.
Protein engineers usually edit a few residues at a time, because a transcription factor is a folded machine and most changes break it. The model OpenAI built with Retro Biosciences ignored that caution. Trained only on protein sequences, it rewrote roughly a third of SOX2 and KLF4 — two of the four Yamanaka factors that revert an adult cell to a stem-cell state — and the redesigned versions drove pluripotency markers more than fifty times higher than the originals in human cells.
The fifty-fold gain is marker expression in a dish — not reprogramming efficiency, and nothing yet in a living animal.
The surprise isn't the multiple, it's the method. AlphaFold and its successors predict structure; this was a language model, a cousin of the one behind ChatGPT, proposing radical sequence changes a human wouldn't risk. Decades of hand-tuning the reprogramming cocktail had barely moved the needle. The model jumped past it by being willing to rewrite the part everyone treats as untouchable, and the rewritten proteins worked better, not worse.
Two cautions belong next to the number. The fifty-fold is on marker expression in a dish — the easy endpoint, not whole-cell reprogramming efficiency or anything in a living animal. And it arrived as a company blog post, not a peer-reviewed paper or even a public preprint; tumorigenicity, scale-up, and in-vivo behaviour are all unproven, and Sam Altman has personally put around $180 million into Retro. What it does establish, if it holds, is narrower and still large: a general-purpose language model can do functional protein design, not just read structure.
The lenses
The facts
How this connects
Tap a node to open it