Keeping AI Systems Healthy
Key Metrics to Monitor
| Metric | What It Tells You | Alert Threshold |
| Latency (P95) | Response time for 95th percentile | > 5 seconds |
| Token Usage | Input/output tokens per request | > budget threshold |
| Error Rate | Percentage of failed requests | > 2% |
| Content Filter Triggers | How often safety filters activate | Unusual spike |
| Groundedness Score | Average quality of RAG responses | < 3.5/5 |
KQL Query Examples
// Find slow agent runs (> 10 seconds)
traces
| where timestamp > ago(24h)
| where customDimensions.duration_ms > 10000
| project timestamp, operation_Name,
duration = customDimensions.duration_ms,
tokens = customDimensions.total_tokens
| order by duration desc
💡 Key Insight: Set up continuous evaluation alongside performance monitoring. A fast response that's wrong is worse than a slow response that's correct. Monitor quality metrics (groundedness, relevance) in production, not just latency and errors.