Mentatcurated
▸ Concept also: pLM, protein LM, protein sequence model

Protein language model

A transformer trained on amino acid sequences the way a text model trains on words — learning which residues tend to follow which, and what that implies about structure and function.

Learn first

In a nutshell

A protein is a chain of amino acids, each one a letter in a 20-character alphabet. A protein language model trains on millions of known sequences without any structural labels, predicting masked or next residues the way a text model predicts masked words. In doing so it infers evolutionary grammar: which substitutions proteins tolerate, which positions are load-bearing. That internal representation turns out to encode three-dimensional structure and, to a degree, function — making the model useful for predicting how a mutation changes a protein, or for generating sequences with desired properties from scratch.

Where it came from

Year2019
SourceRives et al. — "Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences" (Meta AI / ESM-1)
Why it matteredShowed that a transformer pretrained on sequences alone recovered structural and evolutionary signal without structural supervision.

How this connects

Tap a node to open it