Automated Agent Testing
The ASSERT Framework (Automated Safety & Security Evaluation for Responsible Testing) is Microsoft's approach to systematically testing AI agents against natural language policies. Unlike traditional unit tests that check specific inputs/outputs, ASSERT evaluates whether agents comply with business rules expressed in plain English.
How ASSERT Works
- Define Policies: Write safety and business rules in natural language (e.g., "The agent must never reveal internal pricing formulas" or "The agent must escalate to a human if the customer mentions legal action")
- Generate Test Cases: ASSERT automatically generates adversarial test scenarios designed to probe each policy boundary
- Execute & Evaluate: The framework runs agents through generated scenarios and evaluates compliance using an LLM-as-judge approach
- Report: Produces detailed compliance reports showing which policies passed, failed, or were borderline
Agent Control Specification
The Agent Control Specification complements ASSERT by providing a standardised format for defining agent identity, permitted actions, and policy enforcement at deployment time. It acts as a "constitution" for your agent:
| Component | Purpose | Example |
| Identity | Who the agent is | Name, role, domain boundaries |
| Permissions | What the agent can do | Allowed tools, data sources, actions |
| Policies | Rules the agent must follow | Escalation triggers, prohibited outputs, compliance rules |
| Boundaries | Hard limits on behaviour | Maximum spend per action, rate limits, geographic restrictions |
💡 Key Insight: ASSERT + Agent Control Specification together create a governance-first deployment pattern: define policies → auto-generate tests → validate compliance → deploy with enforceable controls. This is essential for regulated industries where agent behaviour must be auditable and provably compliant.