[ ABORT TO HUD ]
SEQ. 1

Ollama Quickstart

🦙 Ollama: Local AI10 min100 BASE XP

One-Command LLM Deployment

Ollama is the easiest way to run open-source models locally. It handles downloading, quantization, GPU detection, and API serving automatically.

Getting Started

# Install (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

# Run a model (auto-downloads on first use)
ollama run llama4-scout
ollama run mistral-large
ollama run qwen3.5:32b
ollama run gemma4:4b

OpenAI-Compatible API

Ollama exposes an API on localhost:11434 that's compatible with the OpenAI SDK — just change the base URL:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
response = client.chat.completions.create(
    model="mistral-large",
    messages=[{"role": "user", "content": "Explain Docker networking"}]
)
print(response.choices[0].message.content)

Custom Modelfiles

# Modelfile
FROM mistral-small:latest
SYSTEM "You are a senior DevOps engineer. Always provide Docker and Kubernetes examples."
PARAMETER temperature 0.3
PARAMETER num_ctx 32768

Build: ollama create devops-assistant -f Modelfile

🐳 Container Deployment:
docker run -d --gpus all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker exec ollama ollama pull mistral-large
Now any app on your network can call http://host:11434/v1/chat/completions
KNOWLEDGE CHECK
QUERY 1 // 2
What port does Ollama's local API run on by default?
8080
3000
11434
5000
Watch: 139x Rust Speedup
Ollama Quickstart | Ollama: Local AI — Open Source AI Academy