Question 1

Is the Open Source AI Academy free?

Accepted Answer

Yes, 100% free. No sign-up, no credit card. All 15 modules, 18 lessons, and 45 quiz questions are accessible immediately. Your progress saves locally in your browser.

Question 2

What topics does the Open Source AI Academy cover?

Accepted Answer

The course covers Transformer architecture, Meta Llama 4, Mistral, DeepSeek, Qwen, Gemma, Hugging Face, quantization (GGUF, AWQ, GPTQ, EXL2), Ollama, llama.cpp, vLLM, SGLang, LoRA/QLoRA fine-tuning, pre-training from scratch, RLHF/DPO/GRPO alignment, and production MLOps with Docker and Kubernetes.

Question 3

What is vLLM?

Accepted Answer

vLLM is the industry-standard engine for high-throughput production LLM serving. It uses PagedAttention for efficient GPU memory management and continuous batching for >90% GPU utilization.

Question 4

Do I need expensive hardware to run open-source models?

Accepted Answer

No. Many models can run on consumer GPUs (e.g., RTX 4090) or even CPUs via llama.cpp. The course covers deployment strategies for every budget, from Raspberry Pi edge devices to multi-GPU production clusters.

Feature	vLLM (PagedAttention)	SGLang (RadixAttention)
Cache Strategy	Block-based virtual memory	Radix tree prefix sharing
Best For	High-throughput, diverse requests	Prefix-heavy workloads (RAG, multi-turn, agents)
Speedup	Baseline	10-20%+ on prefix-heavy workloads
Config	Manual prefix caching setup	Automatic prefix detection

Engine	Best Use Case	Hardware
Ollama	Local dev, single user, prototyping	Any (CPU/GPU)
llama.cpp	CPU inference, edge, hybrid GPU/CPU, max flexibility	Universal
vLLM	Production multi-user GPU serving	NVIDIA GPUs
SGLang	RAG, multi-turn chat, agentic workloads	NVIDIA GPUs
TensorRT-LLM	Maximum throughput on NVIDIA hardware	NVIDIA (Hopper+)
ExLlamaV2	Fastest single-user local inference	High-end NVIDIA

SGLang & The Engine Landscape

SGLang: RadixAttention

RadixAttention Explained

When To Use What