Spec-Driven Development: Hype or Worth It?

Open a pull request from a team running "fully AI-powered" development. Not a team using AI with discipline. A team that handed the wheel to the model and waited for code to arrive.

You know the code immediately. A function that calls an API endpoint that does not exist. A field named user_id in one file and userId in the next. An authentication flow that contradicts the one written two files ago, because the model forgot what it decided earlier. It compiles. The CI passes. Nobody owns it.

The problem is not the model. The model is doing exactly what you asked. The problem is that you never gave it a harness.

Andrej Karpathy coined the term vibe coding for this mode: you describe a feeling and accept what comes back. Fine for a weekend demo. Dangerous for software that runs in production and pages someone at 3 AM.

What SDD tries to solve

Chat history is not a spec. Prompts are not requirements. Agents drift when intent lives only in someone's head.

At the AI Engineer World's Fair, Sean Grove of OpenAI made the point precisely: the new scarce skill in AI-augmented development is not coding — it is writing specifications that fully capture your intent and values. The model can generate code at extraordinary speed. What it cannot do is invent your constraints, your invariants, or your acceptance criteria.

Thoughtworks flagged spec-driven development as one of the defining engineering practices of 2025. The reason is structural: multi-agent pipelines, autonomous coding agents, and long-running AI tasks all share a common failure mode. They optimize for the stated goal and drift away from everything unstated. Without a document the agent can read — and re-read — every new context window starts from scratch.

What SDD actually is in the AI era

The practical definition is simpler than the acronym suggests. A spec, in the AI-augmented sense, is a living markdown document that captures goals, constraints, examples, and acceptance criteria — written so that both humans and agents can read it, use it, and update it.

It is not a 200-page requirements document frozen at kickoff. It is not a Confluence page nobody updates. It is a file that lives next to the code, versioned in git, changed via pull request, and treated as a first-class artifact.

There is a heavier industrial flavor: Kiro, spec-kit, and Tessl — platforms that formalize SDD with their own systems of record, tooling, and process layers. Martin Fowler's writeup is worth reading. For a large engineering org with dedicated tooling budgets, these tools are interesting. For a team of three to ten engineers, they are overkill.

The honest critique — why skeptics are not wrong

The strongest objections to SDD deserve a direct answer, not a dismissal.

"This is just waterfall." The HN reaction to SDD was swift and recognizable: Spec-Driven Development: The Waterfall Strikes Back. The concern is real. Frozen specs reproduce frozen-spec failure modes: the world changes, the spec does not, and you build the wrong thing with great precision. Marc Brooker, Senior Principal Engineer at AWS, addressed this directly in his post Spec Driven Development isn't Waterfall: the difference is whether the spec is treated as a living artifact reviewed continuously or a contract signed once and filed. The problem is not specs. It is frozen specs.

Spec drift is invisible work. Drew Breunig put it well in his piece on the spec-driven development triangle: keeping the spec, the code, and the agent's behavior in sync is work that loses every planning meeting to features and fires. Drift is not a theoretical concern. It is the default outcome.

Bigger context windows do not fix this. Augment Code's engineering blog makes the case against naive SDD cleanly: even a model that can read a hundred thousand tokens does not automatically synthesize intent from a long document. Retrieval is not understanding.

Executable specs are oversold. Robert Encarnacao's Emperor's New Code lands the harder punch: the vendors selling spec-driven platforms often promise that the spec is the software. That is marketing. Specs guide software. They do not replace the judgment required to build it.

The honest position: SDD is not waterfall if the spec is a living document and the team is small enough to keep it honest.

OpenSpec — the lightweight harness for small teams

OpenSpec, maintained by Fission-AI, takes a deliberately minimal approach: proposals, change records, and spec deltas — all in markdown, all in git, all reviewed via pull request.

No new system of record. No platform license. No onboarding ceremony. The GitHub Blog's introduction to spec-driven development with AI explains the toolkit well: agents and humans read the same files, the diff is the audit trail, and the spec evolves at the pace of the codebase because it lives there.

"The sweet spot is a spec light enough that a developer actually updates it and precise enough that an agent can use it."

For a three-to-ten-person team, that is roughly where openspec.dev lives. Kiro and spec-kit assume a larger org with a dedicated tooling budget and platform engineers to maintain the scaffolding. OpenSpec assumes a team with a git repo and the discipline to write one page before they start.

The IBERANT playbook — five rules that work

These are the practices we have found worth keeping.

Start every non-trivial feature with a one-page spec, not a prompt. Goals, constraints, acceptance criteria. One page. If you cannot write it in one page, the feature is not understood well enough to build.

Keep specs in /specs next to the code. Version them. Review them in PRs. A spec that lives in Confluence is a spec nobody reads.

Write acceptance criteria as testable bullets. They double as agent guardrails during development and as your test plan at review. If you cannot write a test for it, you cannot verify it.

Treat stale specs as bugs. If a spec drifts from the code, fix one of them in the next PR. Do not let the gap widen. Drift compounds.

Do not spec the trivial. A CRUD endpoint does not need a four-page document. A payments flow does. A background job that touches customer data does. A "forgot password" email does not.

The harness is the point

SDD is not magic and it is not waterfall. It is the harness that lets you point AI at non-trivial work without producing slop.

The failure mode of vibe coding is not that the model is bad. The failure mode is that the model is good at producing confident-sounding code for the problem it understood, which is never exactly the problem you had. The spec is the translation layer. It is how you transfer intent from a human head into a form that an agent can act on without hallucinating the rest.

The sweet spot for most small teams is somewhere between frozen 200-page requirements and vibes in a chat window. For most of us, that spot is roughly where OpenSpec lives: lightweight enough to maintain, precise enough to be useful, and boring enough to actually get done.

If you want to build software that ships with discipline and speed, let us talk.

Back to Blog