[ ABORT TO HUD ]
SEQ. 1
SEQ. 2

Model Distillation

🎯 Fine-Tuning & Distillation 12 min 250 BASE XP

Shrink the Cost, Keep the Quality

Model Distillation is the process of using a large, expensive model (teacher) to generate training data, then fine-tuning a smaller, cheaper model (student) to replicate the teacher's behavior.

Distillation Pipeline

  1. Generate: Run GPT-5.4 Thinking on 1,000 real-world queries. Save the outputs.
  2. Curate: Filter for high-quality responses. Remove errors.
  3. Fine-Tune: Train GPT-5.4 Mini on these curated examples.
  4. Evaluate: Compare Mini's outputs to Thinking's on a held-out test set.

Cost Impact

MetricGPT-5.4 ThinkingDistilled MiniSavings
Cost per 1M tokens~$15~$0.4097%
Latency~3-8s~0.3s90%
Quality (on your task)98%92-95%Minimal loss
💡 OpenAI Stored Completions: If you use store: true in the Responses API, OpenAI stores your completions. You can then use these stored outputs directly as fine-tuning data for distillation — no manual data collection needed.
SYNAPSE VERIFICATION
QUERY 1 // 3
What is model distillation?
Compressing model weights
Using a large model's outputs to train a smaller model to replicate its behavior
Removing unused parameters
Converting to a different format
Watch: 139x Rust Speedup
Model Distillation | Fine-Tuning & Distillation — OpenAI Academy