? vague brief in? confident output out? no schema no mercy? agent did exactly what you asked? team still surprised somehow? workflow contract matters more than prompt polish? vague brief in? confident output out? no schema no mercy? agent did exactly what you asked? team still surprised somehow? workflow contract matters more than prompt polish
VAGUE TASKTASK CONTRACT”Handle ticket""Use best judgment”confident chaosreview pain: highinputs, outputs, ownerschema, limits, fallbackpredictable deliveryreview pain: manageablesame model � different contract � completely different outcome

A few months ago I gave an agent what I thought was a clean task:

“Pick up this ticket, implement the obvious fix, open a PR.”

It did exactly that. It opened a PR. It changed code. It even wrote tests.

The PR was beautifully formatted and strategically wrong.

Not broken. Not random. Wrong in a way that looked reasonable for fifteen seconds, then triggered a twenty-minute code review intervention and one deeply disappointed architect.

The Model Was Not the Problem

When people see this, they blame the model first.

Wrong suspect.

The model only had the instructions I gave it, the tools I exposed, and the guardrails I did not define. That is not intelligence failure. That is interface failure.

If a human engineer got a ticket that vague, they would ask clarifying questions. Agents do not ask enough clarifying questions by default. They infer. And inference under ambiguity creates confident nonsense with clean formatting.

Prompt Quality Is Overrated Without Task Contracts

A lot of teams spend too much time tuning prompts and too little time defining task contracts.

A useful contract for agentic work includes:

That looks boring. It is also the reason one workflow feels reliable and another feels haunted.

// Field Observation

Most agent failures I see in production are specification failures wearing a model-shaped costume.

What We Changed in Practice

We replaced vague ticket execution with a small structured contract template.

Each ticket now includes:

That template did more for output quality than all the prompt adjective upgrades combined.

And yes, I still use good prompts. I just stopped pretending prompts are architecture.

The Review Experience Changed Immediately

Once contract clarity improved, the PRs changed in a very visible way:

The surprising part was not speed. It was reviewer energy. People were less irritated because they were reviewing work, not untangling assumptions.

-41%
review rework after first pass
+28%
PR decision speed
1
task contract template that mattered

If You Want Better Agent Output, Start Here

Before changing models, do this for one workflow this week:

Then let the agent run.

You will still get mistakes. But they will be smaller, easier to review, and less theatrical.

The model can be smart and still fail under vague instructions.

That is not a paradox. That is engineering.