Putting the AI in Agile, Part 1: Talk is No Longer Cheap

Probably just write it down

The biggest change AI brings to agile is not speed. It is audience.

For years, most delivery artefacts were written for humans who would not read very much, could ask clarifying questions, and could carry a lot of context in conversation. User stories, lightweight acceptance criteria, planning meetings, and Slack threads all made sense in that world.

AI coding agents change the economics of that arrangement. They can consume far more written context than a human teammate would tolerate in one sitting, but they are much less reliable at resolving ambiguity through conversation. They execute quickly, inconsistently fill gaps, and happily act on a partial understanding if you let them.

That creates a new problem for product and engineering teams. Much of our delivery process is still optimised for the wrong reader.

You can see the same pressure in the current tooling conversation. A new Kaggle-hosted paper by Google employees, The New SDLC With Vibe Coding, describes the shift from vibe coding to agentic engineering: less casual prompting, more structure, more verification, more deliberate context.

The paper itself is not only about solo developers; it talks about individuals, teams, organisations, and production systems. But the phrase "vibe coding" still evokes a useful reference point: one person, one agent, one very tight feedback loop.

That is not the same as working inside a real team. Teams still have handoffs, shared ownership, review, dependencies, product decisions, operational constraints, and other people who need to understand what changed. But the solo workflow is still a useful signal, because it shows how dramatically the implementation loop can shrink. The team question is how that faster loop interacts with handoffs, feedback timing, review load, and shared ownership. This article focuses on one part of that problem: how intent becomes a spec the delivery system can actually use.

When parts of implementation collapse from weeks to hours, the slow parts of delivery move. They are no longer mainly about typing code. They are about deciding what should be true, capturing that clearly, decomposing it into sensible units, routing the right context to the right agent or human, and checking the result.

That is where specification starts to matter again.

Not specification in the old waterfall sense. Not a giant document handed over the wall and left to rot. Specification as a living, iterative, executable part of the delivery loop.

Or, put more bluntly: a spec without a harness is just advice.

How we got here

There is an obvious historical discomfort here. Software teams used to write big specs. Then agile told us, for good reasons, that big specs were often the problem. Now AI arrives and suddenly everyone is talking about specs again.

That can look like a circle, but I do not think it is.

In the 1990s software teams leaned heavily on large requirements documents. The theory was straightforward: define enough upfront and delivery becomes predictable. It did not. Heavy documentation went stale fast, changes were expensive, and teams often shipped the wrong thing slowly.

Agile was a rational reaction to that failure. Extreme Programming shifted the emphasis from speculative documentation to engineering discipline. The Agile Manifesto reinforced the broader principle that working software matters more than comprehensive documentation. Ron Jeffries, one of the Manifesto's authors, described user stories as "conversation starters," not complete specifications. The point was not to eliminate thinking. It was to move detail into collaborative discussion at the moment of execution.

That model shaped a lot of my own career. I learned to compress my thinking into a few sentences and let close communication fill in the rest. If you wrote too much, people would not read it. That was not cynicism. It was usually true.

But there was an assumption hiding inside that practice: the builder was a human who could participate in the missing conversation.

That assumption is no longer stable.

So we are not going back to one big specification upfront. We are moving toward many smaller specifications, created and revised inside the delivery loop, each close to the work it governs.

Timeline showing the shift from large upfront specification documents, to agile story cards and conversation, to many smaller AI-ready specifications attached to work items.

The delivery loop has collapsed unevenly

The paper's useful observation is that AI compresses the delivery loop unevenly. Implementation can become dramatically faster, while requirements, architecture, review, and verification remain human-paced. That means the old loop does not simply become faster. It changes shape around the parts that are still slow.

When implementation was expensive, a vague story could survive for a while. There was time for discovery during delivery. Engineers would pull at the edges, product people would clarify, designers would adjust, testers would find gaps, and the story would accrete meaning as it moved across the board. The elapsed time of delivery created room for the missing conversation.

When implementation is cheap, ambiguity becomes more dangerous. An agent can turn a thin story into a large diff before the team has noticed the story was thin. The cost has not disappeared. It has moved from "waiting for code to be written" to "working out whether the code represents the right intent."

That is why "AI makes developers faster" is only the shallow version of the story. AI changes where the constraint sits. A tighter loop starts to emerge:

Idea -> specification -> implementation -> verification -> revised specification

The organisations that get good at this will not be the ones that write the longest documents. They will be the ones that can keep this loop tight without letting intent dissolve into chat history.

One more thing has changed: writing is cheaper. You can now talk through an idea with an AI and turn it into structured material quickly. That does not make documentation good by default, but it removes one of the historical reasons teams avoided writing more of their thinking down.

That undermines one of agile's working compromises. "Write less, talk more" worked when conversation was the execution medium. In an AI-assisted workflow, undocumented nuance is not lightweight. It is missing execution context.

The spec-tool wave is the symptom

The obvious answer is write it down.

That is directionally right, but incomplete. If AI can consume more context, surely the answer is a much better PRD. That helps explain the recent growth of spec-driven development approaches: GitHub's Spec Kit, Amazon's Kiro, grassroots methods like BMAD, and a broader pattern of teams treating the PRD as the key input to implementation.

That wave of tools is useful evidence. A lot of people have reached the same conclusion at the same time: if agents are going to do more of the implementation, the intent they receive has to get better.

But "more written down" is not the same as "better execution context."

Birgitta Böckeler at Thoughtworks tested Kiro, Spec Kit, and Tessl and found that agents still skipped important instructions, while the generated artefacts became repetitive and heavy. Addy Osmani has described the same tension: vague prompts produce vague code, but overlong context is not a substitute for context quality. Anthropic's engineering team makes a similar point when they talk about context engineering and the limits of an LLM's attention budget.

So the failure mode is not just "we need a better document." The failure mode is that the document is treated as advice. It may be verbose, repetitive, stale, too broad for the task, or full of instructions the agent can simply fail to follow.

The old challenge was, "How do I compress my thinking into a few sentences and some clear acceptance criteria, so I can have healthy human conversations?"

The new challenge is closer to, "How do I turn intent into something the delivery system is forced to respect?"

The missing layer is a harness

This is where I think the paper's harness framing is helpful. The model is not the whole system. The harness around it matters: instructions, tools, sandboxes, orchestration, hooks, feedback loops, observability, and the rules that decide what happens next.

The same is true for specs.

A useful spec is not just a file an agent may or may not read carefully. It should be part of a harness that can enforce behaviour around it.

That means things like:

Work cannot be marked ready without concrete, testable acceptance.
A spec must include scope boundaries and constraints, especially what should not be built.
Assumptions must be marked as assumptions and verified before coding starts.
Shared behaviour should be written once and referenced, not duplicated across items.
The build agent must receive the relevant context before it starts.
Completion requires evidence, not just a status change.
Verification failures feed back into the spec or work graph, rather than disappearing into a chat transcript.

Those are not just better paragraphs. They are workflow rules.

That distinction matters because it is also where I think some spec-driven tools feel slightly the wrong shape. They help you produce more specification material, but the harder problem is enforcing the right behaviour around that material: when to ask a human, when to refuse to build, when to split the work, when to load shared context, when to verify assumptions, and when to update the source of truth.

The backlog is part of the harness

This is also why I have been building Haven, but Haven is not really the point of this article.

The point is the layer it sits in.

I have been building Haven because this is the layer I kept missing: not another place to track tickets after the work is already understood, but a place to turn unclear work into ready work.

Haven is a local-first, AI-native backlog manager. It is for the mess before delivery looks neat: half-formed ideas, evidence, assumptions, decomposition, readiness, and the handoff from "we might do this" to "this is ready for a human or agent to build."

That distinction matters. The problem I keep seeing is not just that specs are weak. It is that backlog quality is weak. A story exists, but nobody has forced the assumptions into the open. A ticket is prioritised, but not ready. A decision happened in chat, but never became durable context. An agent can pick something up, but there is no clear contract for what counts as done.

In Haven, an item cannot really be ready unless it has done_looks_like: a concrete description of what success looks like. If the item needs more than that, the spec carries the scope boundary, constraints, assumptions, and design detail. If several related items share architecture, behaviour, or sequencing, a Context Pack can hold the shared context once rather than letting it fragment across tickets.

I do not want to overclaim this. Haven does not magically solve "the AI ignored the instruction." It reduces the surface area for that failure by moving some instructions out of prose and into gates, routing, and checks.

If an item is not ready, it should not be dispatched. If a shared pack exists, the agent should load it before building. If a claim is unverified, it should be marked and checked. If the work is complete, there should be evidence.

Those are backlog behaviours as much as delivery behaviours. The spec matters, but the workflow around the spec is what gives it teeth.

What changes for agile teams

I do not think user stories disappear. They can still exist in the backlog as planning and prioritisation units. But once work is taken forward for AI-assisted development, the story itself is not enough.

For a human builder, a story could be a conversation starter. For an AI-assisted delivery loop, it has to become a stronger routing contract. Why does this work exist? What is in scope? What is out of scope? What assumptions need checking? What context matters? What would count as done? How will we verify it?

That is even more important in teams. The next builder may not be the person who heard the original discussion. The reviewer may be looking at a large AI-generated diff without the missing product context. The agent may be working from whatever the harness gave it, not from the nuance everyone assumed was obvious.

So the definition of ready has to become more real. Not heavier for its own sake. More operational.

Ready means the work is bounded enough to build, explicit enough to hand over, and testable enough to verify.

Talk is no longer cheap

Agile did not fail by preferring conversation over documentation. It was reacting to a real problem: documents that were too large, too early, too detached from delivery, and too expensive to maintain. But AI changes the cost model. Conversation that never becomes durable context now creates drag.

So the principle needs updating.

Do not write giant documents for their own sake. Do not bring back the PRD as a sacred object. Do not confuse ceremony with clarity.

But do make intent durable where execution depends on it. Put it somewhere the delivery system can route, check, revise, and enforce.

The next discipline is not bigger documents. It is better backlog thinking: clear intent, close to the work, surrounded by enough harness that the delivery system has to respect it.

Talk is still where thinking starts. It is no longer cheap if it never becomes part of the system.