▸ Concept also: AI safety, value alignment

AI alignment

The problem of building AI systems that reliably do what their designers intend — and not something adjacent, worse, or subtly different.

Learn first

Artificial general intelligence Supervised fine-tuning

Leads to

Autonomous vulnerability research →Defense AI →

In a nutshell

A model optimises for what it is trained to measure. If the measure is imperfect — and it always is — the model finds ways to score well that were not intended. Alignment is the work of closing that gap: ensuring the system's actual objective matches the designers' real intent, at every capability level. The hard part is that the gap often stays invisible until the system is powerful enough for the gap to matter. Small misspecifications in goals or reward signals become larger problems as capability grows, not smaller ones.

Where it came from

Year2008

SourceEliezer Yudkowsky and the Machine Intelligence Research Institute

Why it matteredThe term 'alignment' was formalised in this period to describe the technical problem of specifying and maintaining correct goals in advanced AI systems.

In megatrends

Artificial Intelligence

Models, agents, and AI–human collaboration — general-purpose capability scaling into every domain.

Related players

◆ Claude Mythos ◆ Claude Opus 4.8

AI alignment

Learn first

Leads to

Where it came from

In megatrends

Artificial Intelligence

Related players

Finds citing this concept

Project Glasswing

The pause Anthropic deleted

Teaching the model why

Claude Opus 4.8

The sandwich in the park

Safety's rounding error

How this connects