Stop Letting Agents <em>Freestyle Production Work.</em>

The phrase “autonomous agent” makes people brave in exactly the wrong way.

Suddenly the agent is allowed to inspect a repo, interpret a ticket, edit files, run commands, write tests, create branches, open PRs, and summarize the result. This can be useful. It can also become a jazz performance in the middle of your production workflow.

I like jazz. I do not want it in my deployment path.

The difference between useful autonomy and expensive improvisation is a playbook.

Freestyle Looks Productive

Freestyle agents look impressive in demos because demos reward motion. The agent reads the issue. It searches the codebase. It edits files. It runs tests. It writes a PR summary with a level of confidence normally reserved for courtroom dramas.

The problem is not that any single step is wrong. The problem is that nobody defined the allowed sequence.

Can it refactor adjacent code? Can it update snapshots? Can it change a shared helper? Can it run a migration? Can it add a dependency? Can it skip a failing test if the failure looks unrelated?

If the answer is “the agent will decide,” you have delegated policy to a text generator.

Playbooks Are Not Anti-Autonomy

A playbook is not a leash. It is the thing that lets an agent move without asking a human for every tiny action.

The playbook says: for this class of ticket, inspect these sources, modify only these paths, run these checks, create this branch format, produce this evidence, and stop when these risk conditions appear.

That is not bureaucracy. That is how you convert “please be smart” into an executable workflow.

Humans also work this way. A senior engineer does not freestyle payroll code on a Friday afternoon because they are feeling inspired. They follow the release process. The process exists because inspiration has poor rollback semantics.

The Minimum Production Playbook

For coding agents, I want at least five sections.

First: input rules. What does the agent need before it starts? Ticket ID, acceptance criteria, target repo, branch base, affected component, and known constraints.

Second: action rules. Which tools, commands, paths, and operations are allowed? This is where command allowlists and repo boundaries stop being theoretical governance and start saving your afternoon.

Third: quality gates. Which tests must run? What counts as enough evidence? What happens when tests fail?

Fourth: escalation triggers. Secrets, infra config, migrations, auth logic, payment flows, destructive commands, ambiguous requirements, and missing acceptance criteria should usually stop the run.

Fifth: handoff format. The final output should explain what changed, what was verified, what failed, what remains risky, and what a human reviewer should inspect first.

// Operating Rule

If an agent cannot explain which rule allowed an action, the action probably should not have happened.

Start Smaller Than Your Ambition

The first production agent should not be allowed to roam across the whole engineering organization like a caffeinated platform team.

Start with a narrow lane. One repo. One task type. One branch pattern. One test command. One review process. Make it boring. Make it reliable. Make the evidence easy to inspect.

Then expand based on observed behavior, not optimism.

This is the part teams skip because the demo already worked. The demo always works. The demo is a garden with no weather.

Production has weather.

The Real Goal

The goal is not to make agents passive. It is to make their autonomy legible.

When an agent changes code, you should know why it touched that file, why it ran that command, why it stopped, and what evidence supports the result. If you cannot reconstruct the run, you do not have an autonomous workflow. You have a mystery with a PR link.

Stop letting agents freestyle production work.

Give them a playbook. Then let them move fast inside it.

Stop Letting Agents
Freestyle Production Work.

Freestyle Looks Productive

Playbooks Are Not Anti-Autonomy

The Minimum Production Playbook

Start Smaller Than Your Ambition

The Real Goal

Read Next

AI Code Review with Agents: Faster Reviews Without Lowering Standards

AI Agent Orchestration: The Practical Playbook for Engineering Teams

Muzammil Bashir