concept also: local LLMs, on-device inference, running models locally

Local inference

Running a model on your own hardware rather than calling a hosted API — private, free at the margin, and offline-capable, at the cost of some quality and speed.

Knowledge entry · updated June 2026

Local inference became practical through two advances: small models that punch above their size, and quantisation that shrinks them to fit consumer hardware. Tools now hide the setup behind a single command.

It's the democratisation track — anyone can run capable AI privately — and the proliferation worry, since the same accessibility cuts both ways. Both are true at once.

In themes

Open Models

Open weights, local inference, and running real models on your own machine.

Related concepts

Vision-language model

Local inference

In themes

Open Models

Related concepts

Mentioned in 1 find

Ollama