▸ Concept also: text-to-video, video synthesis, video diffusion

Video generation

A model that produces video clips from a text prompt or image by learning the statistical structure of large video datasets — extending image generation across time.

Learn first

Image generation Scaling laws World model

In a nutshell

A video generation model takes a text description, a still image, or both, and produces a sequence of frames that match it. The core problem is temporal coherence: each frame must follow plausibly from the last, objects must persist, and motion must be physically believable — constraints that don't exist for single images. Current systems extend diffusion or autoregressive approaches to the time axis. The hard part is that errors compound: a small geometry mistake in frame two drifts into an incoherent scene by frame twenty.

Where it came from

Year2022

SourceHo et al. — Video Diffusion Models (NeurIPS 2022)

Why it matteredExtended the image diffusion framework to jointly model frames, establishing the template most subsequent video generators follow.

In megatrends

Artificial Intelligence

Models, agents, and AI–human collaboration — general-purpose capability scaling into every domain.

How this connects

Tap a node to open it

Video generation

Learn first

Where it came from

In megatrends

Artificial Intelligence

Related players

Finds citing this concept

China takes the video-model lead

How this connects