concept also: local LLMs, on-device inference, running models locally
Local inference
Running a model on your own hardware rather than calling a hosted API — private, free at the margin, and offline-capable, at the cost of some quality and speed.
Local inference became practical through two advances: small models that punch above their size, and quantisation that shrinks them to fit consumer hardware. Tools now hide the setup behind a single command.
It's the democratisation track — anyone can run capable AI privately — and the proliferation worry, since the same accessibility cuts both ways. Both are true at once.