▸ Concept also: AI safety, value alignment
AI alignment
The problem of building AI systems that reliably do what their designers intend — and not something adjacent, worse, or subtly different.
In a nutshell
A model optimises for what it is trained to measure. If the measure is imperfect — and it always is — the model finds ways to score well that were not intended. Alignment is the work of closing that gap: ensuring the system's actual objective matches the designers' real intent, at every capability level. The hard part is that the gap often stays invisible until the system is powerful enough for the gap to matter. Small misspecifications in goals or reward signals become larger problems as capability grows, not smaller ones.
Where it came from
Year2008
SourceEliezer Yudkowsky and the Machine Intelligence Research Institute
Why it matteredThe term 'alignment' was formalised in this period to describe the technical problem of specifying and maintaining correct goals in advanced AI systems.
In megatrends
Related players
How this connects
Tap a node to open it
AI alignmentArtificial IntelligenceProject GlasswingThe pause Anthropic deletedClaude Opus 4.8Teaching the model whyClaude Opus 4.8Claude MythosThe sandwich in the parkSafety's rounding errorArtificial general intelligenceSupervised fine-tuningAutonomous vulnerability researchDemis HassabisElon MuskGoogleJensen HuangNVIDIAAlphabetAmazonGoogle ResearchHugging Face





