[ ABORT TO HUD ]
SEQ. 1
SEQ. 2
SEQ. 3
SEQ. 4

Serverless API Deployments

🗂️ The Model Catalog 8 min 70 BASE XP

Pay-Per-Token Model Access

Serverless API deployments are the simplest way to use models. Microsoft hosts the infrastructure — you just call the endpoint.

Deployment Tiers

TierBillingBest ForData Processing
StandardPay-per-tokenDevelopment, variable workloadsGlobal (any region)
Provisioned (PTU)Reserved capacityProduction, predictable throughputSpecific region
Data ZonePay-per-tokenEU/US data residency complianceWithin zone (EU or US)
Batch50% discountAsync bulk processingNon-real-time

Creating a Serverless Deployment

// Via Azure CLI:
az cognitiveservices account deployment create \
  --name my-foundry \
  --resource-group my-rg \
  --deployment-name gpt4o-deploy \
  --model-name gpt-4o \
  --model-version "2024-11-20" \
  --sku-name "Standard" \
  --sku-capacity 10
🎯 Pro Tip: Start with Standard tier for development (you only pay for what you use). When you know your production load, switch to Provisioned (PTU) for guaranteed throughput and predictable costs.
FOUNDRY VERIFICATION
QUERY 1 // 2
Which deployment tier offers a 50% cost discount for bulk processing?
Standard
Provisioned
Data Zone
Batch
Watch: 139x Rust Speedup
Serverless API Deployments | The Model Catalog — Azure Foundry Academy