[ ABORT TO HUD ]
SEQ. 1
SEQ. 2

Production Monitoring & Alerts

📊 Observability & Monitoring 9 min 80 BASE XP

Keeping AI Systems Healthy

Key Metrics to Monitor

MetricWhat It Tells YouAlert Threshold
Latency (P95)Response time for 95th percentile> 5 seconds
Token UsageInput/output tokens per request> budget threshold
Error RatePercentage of failed requests> 2%
Content Filter TriggersHow often safety filters activateUnusual spike
Groundedness ScoreAverage quality of RAG responses< 3.5/5

KQL Query Examples

// Find slow agent runs (> 10 seconds)
traces
| where timestamp > ago(24h)
| where customDimensions.duration_ms > 10000
| project timestamp, operation_Name, 
  duration = customDimensions.duration_ms,
  tokens = customDimensions.total_tokens
| order by duration desc
💡 Key Insight: Set up continuous evaluation alongside performance monitoring. A fast response that's wrong is worse than a slow response that's correct. Monitor quality metrics (groundedness, relevance) in production, not just latency and errors.
FOUNDRY VERIFICATION
QUERY 1 // 1
Why should you monitor 'Groundedness Score' in production alongside latency?
It's required by Azure
A fast response that's incorrect is worse than a slow correct one — quality metrics catch degradation
It reduces costs
It improves model speed
Watch: 139x Rust Speedup
Production Monitoring & Alerts | Observability & Monitoring — Azure Foundry Academy