[ ABORT TO HUD ]
SEQ. 1
SEQ. 2
SEQ. 3
SEQ. 4

Defense in Depth

🛡️ Safety & Guardrails12 min100 BASE XP

Securing the Loop

You cannot rely on the LLM's built-in safety alone. You must build defenses into the orchestrator:

  • Sandboxing: Run all agent code in isolated environments without network access to internal systems.
  • Least Privilege: Only give the agent the exact tools it needs. Don't give a read-only agent a delete_row tool.
  • Human-in-the-Loop (HITL): Require a human to click "Approve" before any irreversible action (e.g., sending an email, dropping a table).
  • Input/Output Filters: Pass the agent's planned action through a smaller, fast model trained specifically to detect malicious intent before executing it.
SYNAPSE VERIFICATION
QUERY 1 // 1
What is the most effective defense against an agent making a catastrophic, irreversible mistake?
A very strong system prompt
Human-in-the-Loop (HITL) approval gates for sensitive tools
Using temperature 0
Prompt caching
Watch: 139x Rust Speedup
Defense in Depth | Safety & Guardrails — AI Agents Academy