For models not available as serverless APIs, or when you need full control, use Managed Compute deployments.
| Aspect | Serverless API | Managed Compute |
|---|---|---|
| Infrastructure | Fully managed by Microsoft | You manage VM quota |
| Billing | Per-token / PTU | Per-hour (VM hosting) |
| Setup | Minutes | 15-30 minutes |
| Control | Limited | Full (GPU type, scaling) |
| Best For | OpenAI models, quick starts | Open-source models, custom configs |
Managed compute uses Azure ML Online Endpoints under the hood, deploying models to VMs with specific GPU SKUs (like A100, H100).