Ollama

Run open-weights LLMs locally with a one-line install.

What is Ollama?

Run frontier and open-weights LLMs locally on your laptop with a one-line install. Ollama wraps llama.cpp, vLLM-style serving, and a clean CLI/REST interface so you can prototype with Llama, Mistral, Qwen, DeepSeek, and dozens of other models without a cloud bill. We use it in the bootcamp during prompt-engineering exercises so students can iterate freely without burning API credits.

Key Features

One-line install on Mac/Linux/Windows
Model library spans Llama, Mistral, Qwen, DeepSeek, Phi, etc.
OpenAI-compatible REST endpoint
First-class GPU + Apple Silicon (Metal) acceleration
Modelfile syntax for custom system prompts/templates
Python + JS SDKs

FAQ

When does local Ollama beat a hosted API? +

When data must not leave your machine, when you're prototyping at high volume on your own GPU, or when you're experimenting with prompt structure and don't want to burn API credits. For production traffic, hosted APIs usually win on quality and ops cost.