Question 1

Is the Open Source AI Academy free?

Accepted Answer

Yes, 100% free. No sign-up, no credit card. All 15 modules, 18 lessons, and 45 quiz questions are accessible immediately. Your progress saves locally in your browser.

Question 2

What topics does the Open Source AI Academy cover?

Accepted Answer

The course covers Transformer architecture, Meta Llama 4, Mistral, DeepSeek, Qwen, Gemma, Hugging Face, quantization (GGUF, AWQ, GPTQ, EXL2), Ollama, llama.cpp, vLLM, SGLang, LoRA/QLoRA fine-tuning, pre-training from scratch, RLHF/DPO/GRPO alignment, and production MLOps with Docker and Kubernetes.

Question 3

What is vLLM?

Accepted Answer

vLLM is the industry-standard engine for high-throughput production LLM serving. It uses PagedAttention for efficient GPU memory management and continuous batching for >90% GPU utilization.

Question 4

Do I need expensive hardware to run open-source models?

Accepted Answer

No. Many models can run on consumer GPUs (e.g., RTX 4090) or even CPUs via llama.cpp. The course covers deployment strategies for every budget, from Raspberry Pi edge devices to multi-GPU production clusters.

Format	Best For	Key Advantage	Hardware
GGUF	Local / CPU / hybrid	Runs on anything (CPU, Mac, consumer GPU)	Universal
AWQ	Production GPU serving	Best quality at 4-bit, vLLM optimized	NVIDIA GPUs
GPTQ	Broad GPU inference	Wide ecosystem support, mature	NVIDIA GPUs
EXL2	Maximum speed (single GPU)	Lowest latency for local high-end setups	High-end NVIDIA

Quant	Bits/Weight	Quality	70B VRAM
Q8_0	8-bit	Near-lossless	~70GB
Q6_K	6-bit	Excellent	~54GB
Q4_K_M	4-bit	Great (recommended)	~40GB
Q3_K_S	3-bit	Acceptable	~30GB
Q2_K	2-bit	Quality cliff ⚠️	~20GB

Quantization Formats Compared

Why Quantize?

The Decision Matrix

GGUF Quality Tiers

Calibration Best Practice