Part 3: Production
Chapter 17: Security and Guardrails
Agentic systems introduce new security considerations. An agent that can take actions in the world can be manipulated to take the wrong actions.
Threat: Prompt Injection
Malicious users (or malicious content the agent retrieves) attempt to override the agent's instructions.
Mitigations:
- Clearly separate system instructions from user input
- Validate and sanitise user inputs
- Treat retrieved content as untrusted
- Test with adversarial inputs
Threat: Data Leakage
The agent reveals information it shouldn't — from its knowledge base, other users' conversations, or internal system details.
Mitigations:
- Principle of least privilege
- Output filtering
- Clear data boundaries
- Regular audits
Threat: Excessive Agency
The agent takes actions beyond what's appropriate — either through manipulation or misconfiguration.
Mitigations:
- Limit tool permissions
- Implement approval gates
- Rate limiting
- Audit logs for all actions
Defence in Depth
No single mitigation is sufficient. Layer your defences across:
- Model-level: Built-in safety features of the LLM
- DNA-level: Instructions that define boundaries and behaviours
- Platform-level: Tool permissions, rate limits, approval gates
- Infrastructure-level: Network controls, access management, audit logging
Security Mindset
Assume that users will try to misuse your agent. Design your guardrails for the adversarial case, not just the happy path.
