I have been writing software professionally for fifteen years. In the last eighteen months, more code has been written in my team’s repositories by AI agents than by human hands. That sentence still surprises me when I read it back.
What surprised me even more was how wrong we got it the first time — and how dramatically things improved once we adopted Spec-Driven Development.
This is a field report. It covers what SDD actually is, the frameworks and tooling that matter in 2026, real project experiences from my teams, and the hard lessons we had to learn twice before they stuck.
The Problem: AI Raises the Stakes on Ambiguity
Before we get into solutions, let’s be honest about the failure mode.
In early 2025, my team’s workflow was what the industry now calls vibe coding: describe what you want, hit enter, review the output, repeat. It felt like a superpower. For prototypes, it was. For production systems, it eventually wasn’t.
The core problem: AI coding agents are brilliant interns who never say “I don’t understand.”
Give a vague requirement to a human engineer and they’ll ask clarifying questions. Give the same vague requirement to an AI agent and it will confidently generate clean, professional-looking code that solves a different problem than the one you had. It won’t warn you. It will look correct. You will review it, approve it, and discover the mismatch three sprints later.
As Thoughtworks put it in their 2025 engineering practices report: “AI raises the stakes because it will happily translate fuzzy intent into crisp code without asking clarifying questions. In a world of infinite code, clarity is the scarce resource.”
That insight is the foundation of Spec-Driven Development.
What Is Spec-Driven Development?
Spec-Driven Development (SDD) is a development paradigm where well-crafted software specifications become the primary artifact — and code becomes a generated output derived from those specifications.
It is the disciplined answer to vibe coding.
graph LR
A["Vibe Coding\n(Ad-hoc prompting)"] -->|Problems| B["Context Loss\nHallucinations\nTechnical Debt"]
C["Spec-Driven Dev\n(SDD)"] -->|Results in| D["Living Specs\nReviewable Artifacts\nProduction Quality"]
style A fill:#e74c3c,stroke:#c0392b,color:#fff
style B fill:#c0392b,stroke:#922b21,color:#fff
style C fill:#2ecc71,stroke:#27ae60,color:#fff
style D fill:#27ae60,stroke:#1e8449,color:#fffThe workflow has three phases, and the order is non-negotiable:
- Requirements — What are we building and why? Acceptance criteria, edge cases, constraints.
- Design — How will we build it? Architecture, system interactions, data models, security considerations.
- Tasks — Discrete, ordered, dependency-aware implementation steps.
Only after all three are reviewed and approved by a human does implementation begin.
The key word is reviewed. Not skimmed. Not auto-approved. Actually read, challenged, and iterated on. This is where the technical lead’s judgment is irreplaceable — and where, paradoxically, it has the highest leverage.
The SDD Workflow in Practice
Here’s how a feature development cycle looks with SDD:
sequenceDiagram
participant TL as Tech Lead
participant AI as AI Agent
participant Code as Codebase
TL->>AI: "Build user notification preferences feature"
AI->>TL: requirements.md (user stories, acceptance criteria, edge cases)
TL->>AI: Review + feedback (reject 2 assumptions, add GDPR constraint)
AI->>TL: design.md (architecture, DB schema, API contracts)
TL->>AI: Approve design with 1 modification
AI->>TL: tasks.md (12 ordered implementation tasks)
TL->>AI: Approve tasks
AI->>Code: Implement task 1 (subagent with isolated context)
AI->>Code: Implement task 2
Note over AI,Code: Human reviews final output, not every step
AI->>TL: Implementation completeThe transformation is dramatic. Instead of being interrupted dozens of times during implementation, you review three documents upfront, then the agent works. Your approval count during implementation drops from constant to rare — because the important decisions were already made at the spec level.
The Spec Files
Specs typically live as Markdown files in your repository. Common structure:
.
├── specs/
│ ├── requirements.md # What to build
│ ├── design.md # How to build it
│ └── tasks.md # Step-by-step plan
├── CLAUDE.md # Project constitution (persistent context)
├── AGENTS.md # Agent behavior rules
└── src/
CLAUDE.md (or AGENTS.md in other tools) is the project constitution. It captures conventions, architecture decisions, tech stack, coding standards, and anything an agent needs to know before touching your codebase. This file is the single most valuable investment you can make when adopting SDD. Write it once, refine it continuously, and it pays dividends on every subsequent task.
The Ecosystem in 2026: Five Frameworks Worth Knowing
The SDD ecosystem has matured significantly. Here is what I consider the canonical tooling landscape as of Q1 2026:
graph TB
subgraph Structured["Structured SDD Frameworks"]
K["AWS Kiro\n(Purpose-built IDE)"]
B["BMAD Method\n(Multi-agent, 21 roles)"]
GS["GitHub Spec Kit\n(Open-source, any agent)"]
end
subgraph Workflow["Workflow Plugins"]
CS["cc-sdd\n(Claude Code + Cursor)"]
PZ["claude-code-spec-workflow\n(Pimzino, 60-80% token reduction)"]
end
subgraph Agents["AI Agents"]
CC["Claude Code"]
CU["Cursor"]
CP["GitHub Copilot"]
GE["Gemini CLI"]
end
Structured --> Agents
Workflow --> Agents
style K fill:#ff9900,stroke:#e68a00,color:#000
style B fill:#3498db,stroke:#2980b9,color:#fff
style GS fill:#24292e,stroke:#000,color:#fff
style CC fill:#7c3aed,stroke:#6d28d9,color:#fff1. AWS Kiro — The Purpose-Built SDD IDE
Kiro is the most complete SDD implementation. AWS built it to solve a specific problem: AI-generated code that works brilliantly until someone needs to maintain it.
Kiro’s standout features:
- Structured spec generation: Takes your natural language prompt and generates requirements in EARS notation (a formal requirements engineering standard), a full technical design, and a sequenced task list — in 30-45 seconds for moderate complexity features.
- Agent Hooks: Event-driven automations that execute when specific things happen (save a React component → update tests; modify API endpoint → refresh README; pre-commit → scan for credentials). Once configured, they fire automatically for the entire team.
- Steering Files: Persistent markdown files that give the agent lasting context about your project conventions, eliminating the need to re-explain standards in every chat.
I trialled Kiro on an AWS-native project. The notification feature that would have taken two weeks was done in two days. That’s not marketing copy — I tracked the time. The biggest gain wasn’t code generation speed; it was eliminated rework. The spec review caught three architectural assumptions that would have required substantial changes if discovered post-implementation.
Best for: AWS-native projects, enterprise teams, teams with high documentation requirements.
2. BMAD Method — Multi-Agent at Scale
BMAD (Breakthrough Method for Agile AI-Driven Development) is the most ambitious framework in the ecosystem. It defines 21 specialized agents, each with explicit ownership constraints: PM (requirements, feasibility), Architect (system design, pattern selection), Developer (implementation), QA (test design), and more.
The philosophy: treat Claude Code not as a solo coder, but as a team.
graph LR
PM["PM Agent\n(Requirements)"] --> ARCH["Architect Agent\n(Design)"]
ARCH --> DEV["Developer Agent\n(Implementation)"]
DEV --> QA["QA Agent\n(Testing)"]
QA --> DOC["Doc Agent\n(Documentation)"]
style PM fill:#e74c3c,stroke:#c0392b,color:#fff
style ARCH fill:#3498db,stroke:#2980b9,color:#fff
style DEV fill:#2ecc71,stroke:#27ae60,color:#fff
style QA fill:#f39c12,stroke:#d68910,color:#fff
style DOC fill:#9b59b6,stroke:#8e44ad,color:#fffOne practitioner combined BMAD with Langfuse (LLM observability) and properly scoped agent teams. The result: “Skip BMAD and you get agents that produce code that doesn’t match requirements. Skip Langfuse and you’re flying blind. Skip proper agent teams and you get merge conflicts.”
BMAD also provides an enterprise governance layer: every requirement, architectural decision, and code change is versioned and auditable — a continuous compliance ledger that simplifies SOC 2 and HIPAA certifications.
Best for: Greenfield enterprise projects, distributed teams, regulated industries.
3. GitHub Spec Kit — The Open Standard
GitHub’s Spec Kit is the most accessible entry point. It’s open-source, works with any major coding agent (Copilot, Claude Code, Gemini CLI, Cursor), and focuses on one thing: placing a specification at the centre of the engineering process.
Specs in Spec Kit drive implementation, checklist generation, and task breakdowns. The key insight is making specs a first-class citizen in your version control — they evolve alongside your code rather than living in a separate wiki that nobody reads.
Best for: Teams starting with SDD, multi-tool environments, open-source projects.
4. cc-sdd — Kiro-Style Workflow for Claude Code
cc-sdd brings the structured AI-Driven Development Lifecycle to Claude Code, Cursor, Codex CLI, Gemini CLI, GitHub Copilot, and Windsurf. It enforces the requirements → design → tasks workflow with approval gates at each phase.
Best for: Teams already using Claude Code or Cursor who want structured SDD without switching IDEs.
5. claude-code-spec-workflow — Token Efficiency
This workflow plugin achieves 60-80% token reduction by eliminating redundant document fetching across spec phases. For teams running agentic sessions at scale, this translates to meaningful cost savings.
Best for: Cost-sensitive teams with high agentic usage volume.
Real Project Experiences
Here are three concrete projects where SDD changed outcomes:
Project 1: Multi-Tenant SaaS Platform Migration
Context: Migrating a monolith to microservices. 8-person team. Six-month timeline.
The vibe-coding attempt (first two months): We let agents generate service boundaries based on loose descriptions. The resulting services had overlapping responsibilities, inconsistent API contracts, and data ownership ambiguities that only emerged during integration testing.
The SDD reset: We stopped implementation entirely, ran each service through the full requirements → design → tasks cycle, and encoded the architecture decisions into a shared CLAUDE.md. Total planning time: three weeks.
Result: The next four months of implementation had a fraction of the integration issues. We shipped on time. More importantly, the spec documents became onboarding material — new engineers understood the system’s design intent before touching any code.
Lesson: For projects with complex service boundaries, the “lost” time upfront is not a cost. It’s an investment with a documented return.
Project 2: AI-Powered Notification Engine
Context: Customer-facing notification preferences, 12 microservices affected, GDPR compliance required.
Workflow: Used Claude Code in plan mode → requirements.md → design.md → tasks.md. Technical lead reviewed each document before agent proceeded to next phase.
Key catch during spec review: The initial design assumed a centralized preference store. During design review, I identified that this violated our existing data residency architecture. Caught in the spec, not in code.
Result: Feature delivered in 4 days. Estimated manual development time: 2-3 weeks.
Lesson: The spec review is not bureaucracy. It is the mechanism by which the technical lead’s architectural knowledge is transferred to the agent. Without it, the agent will make plausible-but-wrong decisions at every ambiguous junction.
Project 3: Legacy System Documentation
Context: 200K LOC legacy Java application. No documentation. Three of the original engineers had left.
SDD application: Used an Architect agent (BMAD) to analyze the codebase and generate specifications for each major subsystem — not to generate new code, but to generate understanding. The output was a living spec repository.
Result: Onboarding time for new engineers dropped from 6 weeks to 2 weeks. The specs also became the foundation for a planned rewrite.
Lesson: SDD is not only for greenfield. Generating specs from existing code is underexplored but high-value.
Hard Lessons: What Goes Wrong
Mistake 1: Treating Plan Mode as Optional
Claude Code has a plan mode (activated with Shift+Tab). It forces the agent to explore your codebase and draft a plan before writing any code. Early on, my team would skip this because it felt slow. We paid for it in rework.
Now it’s non-negotiable. Start every non-trivial task in plan mode. Review the plan. Check it for over-engineering (I’ve caught the agent adding Redis caching to endpoints with 10 requests/day). Only then execute.
Mistake 2: Underspecifying CLAUDE.md
A thin CLAUDE.md is worse than no CLAUDE.md. It gives the agent false confidence that it understands the project while leaving critical gaps. We now spend one day per project writing the CLAUDE.md before any implementation begins.
Our template sections: Tech stack and versions, Architecture patterns in use, What NOT to do (anti-patterns), Testing conventions, Branch/commit conventions, Critical business rules the code must respect, Third-party integrations and their quirks.
Mistake 3: Reviewing Code Instead of Specs
The old habit: review the PR diff. The new discipline: review the spec before implementation starts.
Code review still matters — but if you are reviewing code to catch wrong architectural decisions, you are too late. Architectural errors caught in the spec take 5 minutes to fix. Architectural errors caught in code review take days.
Shift your review energy upstream.
Mistake 4: Using Weak Models
SDD places high demands on reasoning capability. We tested several models for the planning phases. Smaller models — anything below the quality tier of Claude 3.5 Sonnet or GPT-4o — exhibited “logic collapse”: they would forget constraints from earlier in the spec, contradict their own designs, or silently drop edge cases.
For production SDD, use the best available model for planning phases. You can sometimes use a faster model for implementation tasks, but never for spec generation.
Mistake 5: SDD on Legacy Codebases Without Boundaries
We tried to “SDD-ify” an entire legacy application at once. It failed. The scope was too large; the specs became unmanageable.
The fix: apply SDD to new features and isolated modules only. Draw a clear boundary around the change, write specs for that bounded scope, and implement within the boundary. This scales. Attempting to specify everything at once does not.
Measuring the Impact
After eighteen months, here is what we track:
graph TD
subgraph Before["Before SDD (Vibe Coding)"]
B1["Avg feature cycle: 2-3 weeks"]
B2["Rework rate: ~40% of features"]
B3["Architectural drift: frequent"]
B4["AI review catches: ~55% issues"]
end
subgraph After["After SDD"]
A1["Avg feature cycle: 3-7 days"]
A2["Rework rate: ~8% of features"]
A3["Architectural drift: rare"]
A4["AI review catches: ~81% issues"]
end
Before -->|18 months| After
style Before fill:#e74c3c,stroke:#c0392b,color:#fff
style After fill:#2ecc71,stroke:#27ae60,color:#fffThe 81% AI code review catch rate comes from Qodo’s 2025 AI Code Quality report. Our rework reduction is internal data. Your numbers will differ, but the direction is consistent across teams I have talked to.
The harder-to-measure gain: engineer cognitive load. When the spec is clear, implementation is execution, not exploration. Engineers are less drained. Reviews are faster. Context handoffs between engineers (including to AI agents) are more reliable.
The Technical Lead’s Role in an SDD World
SDD does not eliminate the need for technical leadership. It relocates it.
Before SDD: Technical lead as code reviewer, catching problems in PRs.
After SDD: Technical lead as spec challenger, catching problems before implementation begins.
The skills that matter:
- Spec review: Can you read a requirements document and identify the three assumptions that will cause problems in month two?
- Design critique: Can you evaluate a proposed architecture against non-functional requirements the agent doesn’t know about (operational constraints, team capabilities, organizational politics)?
- Task decomposition: Can you validate that a task list is properly ordered, appropriately granular, and won’t create blockers?
- CLAUDE.md authorship: Can you encode your architectural philosophy into a document an AI agent will actually follow?
The technical lead who masters these skills in an agentic world has significantly more leverage than one who reviews code line by line.
Getting Started: A 30-Day Plan
Week 1: Foundation
- Write a CLAUDE.md for your primary project. Cover tech stack, patterns, anti-patterns, and critical business rules.
- Install Claude Code (or your preferred agentic tool).
- Pick one small, well-scoped feature as your SDD pilot.
Week 2: First Spec Cycle
- Use plan mode for your pilot feature. Generate requirements.md, review it, iterate.
- Generate design.md, review against your architecture. Push back on at least one assumption.
- Generate tasks.md, verify ordering and granularity.
Week 3: Implementation and Reflection
- Let the agent implement with minimal interruption.
- Track time spent on rework vs. comparable previous feature.
- Capture what the CLAUDE.md missed; update it.
Week 4: Team Rollout
- Run a 2-hour session with your team on the SDD workflow.
- Establish team conventions for spec file naming and storage.
- Define which classes of work require full SDD vs. lightweight spec.
For teams adopting Kiro or BMAD: budget 2-3 weeks for setup and team training before expecting meaningful productivity gains. The investment is real but front-loaded.
The Bigger Picture: Specs as the New Source of Truth
There is a pattern emerging across all the tools and teams I’ve observed: specifications are becoming the primary interface between human intent and software systems.
Code is becoming a build artifact — something generated from specs the way compiled binaries are generated from source code. Version control for your thinking, not just your implementation.
This is a profound shift for the profession. The engineers who thrive in this world will be those who can write clear, unambiguous, well-structured specifications — who can think rigorously about what they are building before a line of code exists.
That sounds a lot like good engineering has always looked. The difference is that the leverage has changed. A well-written spec now drives an agent that can implement it in hours. The cost of specification quality has always been high; now the return is dramatically higher too.
Spec-Driven Development may not have the visibility of “vibe coding” as a term. But it is, without question, the practice that will define how serious engineering teams use AI in 2026 and beyond.
Write the spec first. Review it ruthlessly. Then let the agents work.
References
- Thoughtworks — Spec-Driven Development: Unpacking One of 2025’s Key New Engineering Practices
- AWS Kiro — Spec-Driven Development Documentation
- AWS — Introducing Kiro
- InfoQ — Beyond Vibe Coding: Amazon Introduces Kiro
- GitHub — BMAD-METHOD Repository
- Medium — BMAD Method + Langfuse + Claude Code Agent Teams in Production
- GitHub — Spec Kit by Microsoft
- GitHub — claude-code-spec-workflow by Pimzino
- GitHub — cc-sdd (Kiro-style for Claude Code & Cursor)
- Medium — Using Spec-Driven Development with Claude Code (Heeki Park)
- Addy Osmani — How to Write a Good Spec for AI Agents
- Built In — Why Spec-Driven Development Is the Future of AI-Assisted Software Engineering
- Qodo — 2025 AI Code Quality Report
- Agent Factory / Panaversity — Chapter 16: Spec-Driven Development with Claude Code
- Medium — Spec-Driven Development Is Eating Software Engineering: A Map of 30+ Agentic Coding Frameworks