Mentatcurated
▸ Concept

Supervised fine-tuning

Training a pre-trained model further on labelled examples to shift its behaviour toward a specific task.

Learn first

Leads to

In a nutshell

A pre-trained model has learned broad patterns from vast data. Supervised fine-tuning (SFT) continues training that model on a curated set of input–output pairs — question/answer, instruction/response — so the weights tilt toward the desired behaviour. The hard part is the data: a small set of high-quality demonstrations outperforms a large noisy one, but "quality" is expensive to define and gather. SFT can also overfit — the model memorises the examples instead of generalising the intent. It is the first adaptation step between raw pre-training and a model that follows instructions.

How this connects

Tap a node to open it