Serverless API deployments are the simplest way to use models. Microsoft hosts the infrastructure — you just call the endpoint.
| Tier | Billing | Best For | Data Processing |
|---|---|---|---|
| Standard | Pay-per-token | Development, variable workloads | Global (any region) |
| Provisioned (PTU) | Reserved capacity | Production, predictable throughput | Specific region |
| Data Zone | Pay-per-token | EU/US data residency compliance | Within zone (EU or US) |
| Batch | 50% discount | Async bulk processing | Non-real-time |
// Via Azure CLI: az cognitiveservices account deployment create \ --name my-foundry \ --resource-group my-rg \ --deployment-name gpt4o-deploy \ --model-name gpt-4o \ --model-version "2024-11-20" \ --sku-name "Standard" \ --sku-capacity 10