Ollama: your local model rack
Ollama is a model runtime that makes pulling and running open-weight LLMs as easy as docker run. It is the fastest path to "I am running serious models on my own hardware."
Install
macOS / Linux:
curl -fsSL https://ollama.com/install.sh | sh
ollama serve # auto-starts on macOS
Windows: install the desktop app, it ships a server on 127.0.0.1:11434.
Pull a daily-driver model
ollama pull llama3.1:8b-instruct-q4_K_M # general
ollama pull qwen2.5-coder:14b # coding
ollama pull nomic-embed-text # embeddings
A 4-bit 8B chat model and a 4-bit 14B coder will run comfortably on a 16 GB consumer GPU or even a beefy CPU-only laptop.
Wire it into OpenClaw
In your model registry:
models:
llama-3-1-8b:
provider: ollama
base_url: http://127.0.0.1:11434
name: llama3.1:8b-instruct-q4_K_M
cost_per_1k_input: 0
cost_per_1k_output: 0
Now Ollama models are first-class citizens in your routing rules.
When local wins
- Heartbeat ticks (cost compounds, quality bar low)
- Internal classification ("is this email spam, urgent, or routine?")
- Privacy-sensitive content (local-only journal entries, draft contracts)
When local loses
- Open-ended reasoning over long context
- Tool-heavy multi-step tasks (frontier tool-calling is still better)
- Anything you would not trust a junior model for
Try it
Switch your Heartbeat's morning-briefing tick from a frontier model to llama 3.1 8B. Compare quality for a week. Most users keep the local version.