Part 3: Production
Chapter 19: Operating Agentic Systems
Production agents need ongoing care and feeding. They're not "set and forget."
Continuous Improvement
Your agent should get better over time:
- Review feedback: What are users saying? What's working and what isn't?
- Analyse failures: Why did conversations go wrong?
- Update DNA: Refine instructions based on learnings
- Expand knowledge: Add new information as it becomes relevant
- Tune thresholds: Adjust escalation triggers based on experience
Change Management
Changes to agent DNA can have unexpected effects.
- Test changes against your eval suite before deploying
- Deploy gradually where possible
- Monitor closely after changes
- Have a rollback plan
Incident Response
When things go wrong, have a plan:
- Detect: Know something is wrong (via monitoring)
- Triage: Assess severity and impact
- Mitigate: Stop the bleeding (disable agent if necessary)
- Investigate: Understand root cause
- Fix: Address the underlying issue
- Learn: Update processes to prevent recurrence
Governance and Accountability
Agentic systems make decisions. Someone needs to be accountable.
- Ownership: Who is responsible for this agent?
- Audit trails: Can you explain what the agent did and why?
- Review cadence: How often do you review agent performance?
- Escalation paths: Who gets called when things go wrong?
Operational Reality
Plan for ongoing operations from day one. The work doesn't end at deployment — that's when the real work begins.
