▸ Concept also: text-to-video, video synthesis, video diffusion
Video generation
A model that produces video clips from a text prompt or image by learning the statistical structure of large video datasets — extending image generation across time.
Learn first
In a nutshell
A video generation model takes a text description, a still image, or both, and produces a sequence of frames that match it. The core problem is temporal coherence: each frame must follow plausibly from the last, objects must persist, and motion must be physically believable — constraints that don't exist for single images. Current systems extend diffusion or autoregressive approaches to the time axis. The hard part is that errors compound: a small geometry mistake in frame two drifts into an incoherent scene by frame twenty.
Where it came from
Year2022
SourceHo et al. — Video Diffusion Models (NeurIPS 2022)
Why it matteredExtended the image diffusion framework to jointly model frames, establishing the template most subsequent video generators follow.
In megatrends
Related players
How this connects
Tap a node to open it
