[ ABORT TO HUD ]
SEQ. 1
SEQ. 2
SEQ. 3

ASSERT Framework & Agent Control

🛡️ Evaluation & Safety 10 min 90 BASE XP

Automated Agent Testing

The ASSERT Framework (Automated Safety & Security Evaluation for Responsible Testing) is Microsoft's approach to systematically testing AI agents against natural language policies. Unlike traditional unit tests that check specific inputs/outputs, ASSERT evaluates whether agents comply with business rules expressed in plain English.

How ASSERT Works

  1. Define Policies: Write safety and business rules in natural language (e.g., "The agent must never reveal internal pricing formulas" or "The agent must escalate to a human if the customer mentions legal action")
  2. Generate Test Cases: ASSERT automatically generates adversarial test scenarios designed to probe each policy boundary
  3. Execute & Evaluate: The framework runs agents through generated scenarios and evaluates compliance using an LLM-as-judge approach
  4. Report: Produces detailed compliance reports showing which policies passed, failed, or were borderline

Agent Control Specification

The Agent Control Specification complements ASSERT by providing a standardised format for defining agent identity, permitted actions, and policy enforcement at deployment time. It acts as a "constitution" for your agent:

ComponentPurposeExample
IdentityWho the agent isName, role, domain boundaries
PermissionsWhat the agent can doAllowed tools, data sources, actions
PoliciesRules the agent must followEscalation triggers, prohibited outputs, compliance rules
BoundariesHard limits on behaviourMaximum spend per action, rate limits, geographic restrictions
💡 Key Insight: ASSERT + Agent Control Specification together create a governance-first deployment pattern: define policies → auto-generate tests → validate compliance → deploy with enforceable controls. This is essential for regulated industries where agent behaviour must be auditable and provably compliant.
FOUNDRY VERIFICATION
QUERY 1 // 3
What is unique about ASSERT compared to traditional testing?
It tests hardware performance
It evaluates agent compliance against natural language policies rather than specific input/output pairs
It only tests response speed
It requires manual test creation
Watch: 139x Rust Speedup
ASSERT Framework & Agent Control | Evaluation & Safety — Azure Foundry Academy