Most agent incidents are not caused by exotic model failure.
They are caused by missing basics.
Use this checklist before scaling any agentic workflow beyond a pilot.
Architecture Checklist
- Clear agent role boundaries
- No over-privileged execution role
- Deterministic handoff between stages
- Explicit failure states and retries
Governance Checklist
- Repo and branch allowlists
- Command allowlists
- Capability flags by environment
- Global and scoped kill switches
Quality and Evaluation Checklist
- Required test gates
- Structured evaluator scoring
- Block/warn/pass thresholds
- Fallback behavior for failed checks
Observability Checklist
- Correlation IDs across all stages
- Tool-call logging with timestamps
- Decision logs for evaluator outcomes
- Fast path for incident reconstruction
Human Review Checklist
- Explicit merge ownership
- Policy override workflow
- Escalation path for high-risk changes
- Audit trail for approvals
Rollout Checklist
- Gradual rollout by team or repo
- Baseline metrics captured pre-launch
- Weekly reliability review in first month
- Exit criteria for rollback mode
Final Take
If your team can answer “yes” to these items, you likely have a production-ready AI agent foundation.
If not, fix the controls first. Scale only multiplies architecture decisions you already made.