Your Agent Followed Instructions. <em>That Was the Problem.</em>

? vague brief in? confident output out? no schema no mercy? agent did exactly what you asked? team still surprised somehow? workflow contract matters more than prompt polish? vague brief in? confident output out? no schema no mercy? agent did exactly what you asked? team still surprised somehow? workflow contract matters more than prompt polish

A few months ago I gave an agent what I thought was a clean task:

“Pick up this ticket, implement the obvious fix, open a PR.”

It did exactly that. It opened a PR. It changed code. It even wrote tests.

The PR was beautifully formatted and strategically wrong.

Not broken. Not random. Wrong in a way that looked reasonable for fifteen seconds, then triggered a twenty-minute code review intervention and one deeply disappointed architect.

The Model Was Not the Problem

When people see this, they blame the model first.

Wrong suspect.

The model only had the instructions I gave it, the tools I exposed, and the guardrails I did not define. That is not intelligence failure. That is interface failure.

If a human engineer got a ticket that vague, they would ask clarifying questions. Agents do not ask enough clarifying questions by default. They infer. And inference under ambiguity creates confident nonsense with clean formatting.

Prompt Quality Is Overrated Without Task Contracts

A lot of teams spend too much time tuning prompts and too little time defining task contracts.

A useful contract for agentic work includes:

clear success criteria
allowed files and folders
explicit output schema
stop conditions
fallback behavior
reviewer expectation

That looks boring. It is also the reason one workflow feels reliable and another feels haunted.

// Field Observation

Most agent failures I see in production are specification failures wearing a model-shaped costume.

What We Changed in Practice

We replaced vague ticket execution with a small structured contract template.

Each ticket now includes:

objective in one sentence
non-goals in one sentence
acceptance test bullets
risk note for impacted modules
reviewer lens (what to inspect hardest)

That template did more for output quality than all the prompt adjective upgrades combined.

And yes, I still use good prompts. I just stopped pretending prompts are architecture.

The Review Experience Changed Immediately

Once contract clarity improved, the PRs changed in a very visible way:

fewer broad rewrites
fewer speculative abstractions
tighter test intent
faster reviewer decisions

The surprising part was not speed. It was reviewer energy. People were less irritated because they were reviewing work, not untangling assumptions.

-41%

review rework after first pass

+28%

PR decision speed

task contract template that mattered

If You Want Better Agent Output, Start Here

Before changing models, do this for one workflow this week:

define output schema
define file boundaries
define stop conditions
define what “done” means

Then let the agent run.

You will still get mistakes. But they will be smaller, easier to review, and less theatrical.

The model can be smart and still fail under vague instructions.

That is not a paradox. That is engineering.

Your Agent Followed Instructions.
That Was the Problem.

The Model Was Not the Problem

Prompt Quality Is Overrated Without Task Contracts

What We Changed in Practice

The Review Experience Changed Immediately

If You Want Better Agent Output, Start Here

Read Next

Jira to PR Automation: How AI Agents Ship Faster Without Chaos

Evals Are Not Unit Tests for Vibes.

Muzammil Bashir