A client once sent me a Slack message that said: “Can you build me a platform for connecting freelancers with clients? Like Upwork, but better. Should take a few weeks.”

I have been writing software for twelve years. I know what “like Upwork but better, a few weeks” actually means. It means the client has a vision of the final product in their head — the polished, feature-complete, successful version — and has no visibility into the gap between that vision and the first line of code. Upwork has around 600 engineers and a $1.5 billion market cap. “A few weeks” does not get you there.

But here is the thing: this is not a client problem. It is a communication problem. The client is doing exactly what clients should do — they are describing their desired outcome in the language of their domain, not yours. The job of a good Product Owner is to translate that dream into something a development team can actually build.

In Part 4, we built the BaseAgent class — the foundation every agent in our AI software team inherits from. We got LLM integration, conversation history, domain event emission, and retry logic all working in a clean, reusable base. Now we build the first two domain-specific agents: the Product Owner (PO) and the Business Analyst (BA).

These two agents have the hardest job in the system. Every agent that comes after them — the Tech Architect, the Senior Software Engineer, the QC Engineer — depends entirely on the quality of what the PO and BA produce. If they produce vague, ambiguous output, the downstream agents will amplify that vagueness into broken code and failed tests. If they produce precise, well-structured output, everything else in the pipeline becomes dramatically easier.

Let’s build them right.


1. The Client Brief Problem

Here is the brief I get most often in one form or another:

“Build me an app like Uber but for dogs. Users should be able to book dog walkers, track their dog in real time, pay through the app, and leave reviews. I also want a web dashboard for walkers. And maybe an admin panel. Should be pretty simple.”

This brief contains real information. There is genuine intent here. But it is completely useless as a specification for a development team. Let me count the problems:

Problem 1: Missing scope definition. What does “book a dog walker” mean exactly? Does the user browse a list of available walkers and request one? Does the system automatically match based on location and availability? Is there a calendar? A cancellation policy? A waitlist?

Problem 2: Technology assumptions buried in business language. “Real-time tracking” is not a feature. It is a system architecture decision. Real-time GPS tracking requires a mobile app (not a web app), a WebSocket backend, push notifications, and battery optimization concerns. “Pretty simple” does not survive contact with “real-time GPS.”

Problem 3: Undefined user personas. Who is the dog owner? A busy professional who books once a week? A retiree who wants someone to walk their dog every morning? What is their technical comfort level? What device do they use?

Problem 4: No success criteria. What does “success” mean for this app? Revenue? Number of bookings? Walker retention? Customer satisfaction scores? Without success criteria, we cannot prioritize features, and we cannot know when we are done.

Problem 5: Scope creep in plain sight. “Maybe an admin panel” is doing a lot of work in that sentence. An admin panel for what? User management? Financial reconciliation? Fraud detection? Dispute resolution? Each of those is a significant feature vertical.

The person who wrote this brief is not careless. They are describing what they want with the vocabulary they have. The carelessness would be on our side if we accepted this brief as-is and started building.

This is the problem the PO agent solves.

What “Good Requirements” Actually Look Like

A good requirement document answers five questions:

  1. Who is the user? (Persona — name, motivation, pain points)
  2. What do they need to do? (Feature — specific, bounded, implementable)
  3. Why does it matter? (Success metric — measurable, time-bound)
  4. What is out of scope? (Explicit exclusions — prevents scope creep)
  5. What assumptions are we making? (Surfaced, not hidden — prevents surprises)

The PO agent’s job is to extract these answers from a raw brief, ask targeted questions to fill gaps, and produce a RequirementDoc that a development team can work from.

The BA agent’s job is to take that RequirementDoc and decompose it into individual UserStory objects with acceptance criteria — the atomic unit of work that an engineer can implement and a QC engineer can validate.


2. PO Agent Design

Before writing a single line of code, let’s think about the PO agent’s role precisely.

Receives: A raw brief from the human stakeholder. May also receive answers to clarifying questions on a second pass.

Produces: A structured RequirementDoc with personas, features, success metrics, out-of-scope items, assumptions, and technical constraints.

Personality: The PO agent is named Alex. Alex is senior — patient, professionally skeptical, focused on value delivery. Alex does not get excited about features for their own sake. Alex asks uncomfortable questions: “What happens if we cut that feature? Would users still get value?” Alex is the person in the room who pushes back on “wouldn’t it be cool if.”

Two-pass design: The PO agent runs twice. First pass: given a raw brief, it asks five clarifying questions. Second pass: given the answers to those questions, it produces the structured RequirementDoc. This two-pass design is important. It forces a human checkpoint — the stakeholder has to respond to the questions before the system moves forward. This is not a limitation; it is intentional design. Requirements that have never been challenged are not requirements. They are wishes.

Tools Available to the PO Agent

The PO agent has access to two tools:

search_web(query: str) -> str — For market research. If the brief mentions a competitor (“like Airbnb for boats”), the PO agent should look up what that competitor actually does before asking questions about scope. This prevents embarrassing clarifying questions that the stakeholder expects the agent to already know the answer to.

create_document(title: str, content: str) -> str — For writing the RequirementDoc to a persistent store. This is how the document becomes available to downstream agents.

Permission Boundaries

The PO agent CAN:

  • Ask clarifying questions about business goals and user needs
  • Do market research to understand the competitive landscape
  • Define scope (what is in vs out of scope)
  • Approve user stories produced by the BA agent
  • Request revisions to user stories if they do not match the RequirementDoc

The PO agent CANNOT:

  • Write code or provide technical implementation details
  • Make deployment decisions
  • Approve technical architecture choices (that is the Tech Architect’s domain)
  • Commit to delivery timelines (that is the PM’s domain)

These permission boundaries are not just documentation. They are enforced by the system prompt and by the tool set we give the agent. The PO agent has no code execution tool, no GitHub tool, no deployment tool. What is not possible cannot be done by accident.


3. POAgent Full Implementation

Let me show the full implementation. This builds directly on the BaseAgent from Part 4.

# agents/po_agent.py

import json
import re
from typing import Optional

from agents.base import BaseAgent
from domain.models import RequirementDoc, UserPersona, Feature
from domain.state import TeamState
from domain.events import RequirementsCleared


class POAgent(BaseAgent):
    """
    Product Owner Agent — Alex.

    Transforms raw client briefs into structured RequirementDoc objects.
    Runs in two passes:
      Pass 1: raw_brief is present, clarified_requirements is None
              → ask 5 clarifying questions
      Pass 2: both raw_brief and clarification_answers are present
              → produce RequirementDoc
    """

    AGENT_ID = "po_agent"
    AGENT_NAME = "Alex"
    AGENT_ROLE = "Senior Product Owner"
    DEFAULT_MODEL = "claude-opus-4-6"

    # System prompt stays constant across both passes
    SYSTEM_PROMPT = """You are Alex, a Senior Product Owner with 12 years of experience
delivering software products. You have worked at startups and scale-ups. You are
methodical, user-focused, and professionally skeptical of features that don't
directly serve user needs.

Your job is to turn raw client briefs into structured requirement documents that
development teams can work from. You ask the uncomfortable questions early so
that nobody wastes time building the wrong thing.

You always think in terms of:
- User personas: who are the real humans using this?
- Core value: what problem does this solve, specifically?
- MVP scope: what is the minimum set of features that delivers that value?
- Success metrics: how will we know if this worked?
- Out-of-scope items: what are we explicitly NOT building?

You write clearly, concisely, and in plain language. You avoid jargon.
When you produce JSON, it is valid JSON with no extra text before or after.
"""

    def _prepare_prompt(self, state: TeamState) -> str:
        """Build the prompt based on current state."""
        has_brief = bool(state.get("raw_brief"))
        has_answers = bool(state.get("clarification_answers"))

        if not has_brief:
            raise ValueError("POAgent requires raw_brief in state")

        if not has_answers:
            # First pass — ask clarifying questions
            return self._clarification_prompt(state["raw_brief"])
        else:
            # Second pass — produce structured RequirementDoc
            return self._structuring_prompt(state)

    def _clarification_prompt(self, brief: str) -> str:
        return f"""A client has submitted the following project brief:

---
{brief}
---

Your task is to ask exactly 5 clarifying questions that will give you what you
need to write a complete requirement document.

Each question should target one of these areas:
1. Core user personas (who are the primary users, and what is their context?)
2. Key features — MVP vs nice-to-have (if you could only ship 3 things, what are they?)
3. Success metrics (how will the client measure whether this project succeeded?)
4. Technical constraints (existing systems, platforms, compliance requirements, etc.)
5. Timeline and budget signals (what is the urgency? is there a hard deadline?)

Before asking your questions, do a quick internal assessment of what the brief
already tells you. Do not ask questions the brief already answers.

Format your response as a numbered list of 5 questions. Be direct. Each question
should be answerable in 1-3 sentences.
"""

    def _structuring_prompt(self, state: TeamState) -> str:
        brief = state["raw_brief"]
        answers = state.get("clarification_answers", "")
        search_context = state.get("po_research_notes", "")

        research_section = ""
        if search_context:
            research_section = f"""
Market Research Notes:
{search_context}

"""

        return f"""You now have everything you need to produce a structured requirement document.

Original Brief:
---
{brief}
---

Answers to Clarifying Questions:
---
{answers}
---
{research_section}

Produce a RequirementDoc as a JSON object with this exact schema. Return ONLY
the JSON object — no preamble, no explanation, no code fences.

{{
  "project_name": "short name for the project",
  "elevator_pitch": "one sentence: who it's for, what it does, why it matters",
  "personas": [
    {{
      "name": "persona name (e.g. 'Dog Owner Dana')",
      "description": "2-3 sentences describing this person",
      "pain_points": ["pain point 1", "pain point 2"],
      "goals": ["goal 1", "goal 2"]
    }}
  ],
  "features": [
    {{
      "title": "feature name",
      "description": "what this feature does",
      "priority": "must-have" | "should-have" | "nice-to-have",
      "estimated_complexity": "small" | "medium" | "large"
    }}
  ],
  "success_metrics": ["metric 1", "metric 2"],
  "out_of_scope": ["item 1", "item 2"],
  "assumptions": ["assumption 1", "assumption 2"],
  "tech_constraints": ["constraint 1", "constraint 2"]
}}

Rules:
- Minimum 2 personas, maximum 4
- Minimum 5 features, maximum 15
- Features must cover both must-haves and should-haves; at least 3 must be must-have
- Success metrics must be measurable (avoid vague metrics like "user satisfaction")
- Out-of-scope must be explicit — list at least 3 items
- Assumptions must be surfaced — list at least 2
"""

    def _parse_output(self, response: str, state: TeamState) -> TeamState:
        """
        Parse LLM output into state updates.

        Pass 1 (clarification questions): stores questions in state for
        human review and response.
        Pass 2 (structured doc): parses JSON into RequirementDoc and
        emits RequirementsCleared domain event.
        """
        has_answers = bool(state.get("clarification_answers"))

        if not has_answers:
            # Pass 1: response contains clarifying questions
            return {
                **state,
                "po_clarifying_questions": response.strip(),
                "requirement_state": "awaiting_clarification",
            }
        else:
            # Pass 2: response should be JSON RequirementDoc
            req_doc = self._parse_requirement_doc(response)
            event = RequirementsCleared(
                source_agent=self.AGENT_ID,
                project_name=req_doc.project_name,
                feature_count=len(req_doc.features),
                persona_count=len(req_doc.personas),
            )
            return {
                **state,
                "clarified_requirements": req_doc.model_dump(),
                "requirement_state": "clarified",
                "event_outbox": self._append_event(state, event),
            }

    def _parse_requirement_doc(self, response: str) -> RequirementDoc:
        """Extract and validate RequirementDoc from LLM response."""
        # Strip any accidental markdown fences
        cleaned = re.sub(r"```(?:json)?", "", response).strip()
        # Find the JSON object
        match = re.search(r"\{.*\}", cleaned, re.DOTALL)
        if not match:
            raise ValueError(
                f"Could not find JSON object in PO response: {response[:200]}"
            )
        data = json.loads(match.group())
        return RequirementDoc.model_validate(data)

    async def _run_tools(self, state: TeamState) -> TeamState:
        """
        Pre-LLM tool execution: if the brief mentions competitors,
        do market research before structuring the document.
        Only runs on the second pass.
        """
        if not state.get("clarification_answers"):
            return state  # No research needed on first pass

        if state.get("po_research_notes"):
            return state  # Already researched

        brief = state.get("raw_brief", "")
        competitor_signals = [
            "like ", "similar to", "uber for", "airbnb for", "spotify for"
        ]
        needs_research = any(sig in brief.lower() for sig in competitor_signals)

        if needs_research:
            # Extract the comparison and search for context
            query = f"market analysis {brief[:100]} key features competitors"
            research = await self.tools["search_web"](query)
            return {**state, "po_research_notes": research}

        return state

A few things worth pointing out in this implementation.

The two-pass design is enforced by state inspection, not by a separate node. The same POAgent.run() method handles both passes. It checks state.get("clarification_answers") to determine which prompt to build. This keeps the LangGraph graph simple — there is one po_agent node, and it does the right thing based on state.

The system prompt is constant. Alex’s personality and constraints do not change between passes. Only the user prompt changes. This keeps the agent’s identity stable.

JSON parsing is defensive. LLMs will occasionally wrap JSON in markdown code fences even when told not to. The _parse_requirement_doc method strips those fences before parsing. This is not a workaround; it is a production reality.

Tool execution happens before the LLM call. The _run_tools method enriches the state with research notes before _prepare_prompt is called. The prompt can then incorporate that research context. The base class calls _run_tools before building the prompt.


4. RequirementDoc Schema

The Pydantic models that define what the PO agent produces. These live in domain/models.py alongside the other domain objects from Part 3.

# domain/models.py  (PO section)

from __future__ import annotations

from typing import Literal, Optional
from pydantic import BaseModel, Field, field_validator


class UserPersona(BaseModel):
    """
    Represents a distinct type of user for the product.

    Personas are not demographics — they are behavioral archetypes.
    Two people of the same age can be completely different personas
    if they use the product for different reasons.
    """
    name: str = Field(
        description="A memorable name that evokes the persona (e.g. 'Commuter Carlos')"
    )
    description: str = Field(
        description="2-3 sentences describing who this person is and their relationship to the product"
    )
    pain_points: list[str] = Field(
        min_length=1,
        description="The problems this persona has that the product addresses"
    )
    goals: list[str] = Field(
        min_length=1,
        description="What this persona wants to achieve using the product"
    )

    @field_validator("name")
    @classmethod
    def name_not_generic(cls, v: str) -> str:
        generic = ["user", "customer", "person", "individual"]
        if v.lower().strip() in generic:
            raise ValueError(
                f"Persona name '{v}' is too generic. Use a descriptive name."
            )
        return v


class Feature(BaseModel):
    """
    A discrete capability the product will (or will not) provide.

    Features at this stage are intentionally coarse-grained.
    The BA agent will decompose them into user stories.
    """
    title: str = Field(
        description="Short, action-oriented title (e.g. 'Real-time GPS Tracking')"
    )
    description: str = Field(
        description="What this feature does and why it matters to the user"
    )
    priority: Literal["must-have", "should-have", "nice-to-have"] = Field(
        description=(
            "MoSCoW priority: "
            "must-have = product fails without it, "
            "should-have = significantly reduces value if absent, "
            "nice-to-have = delightful but not required for MVP"
        )
    )
    estimated_complexity: Literal["small", "medium", "large"] = Field(
        description=(
            "Order-of-magnitude complexity estimate: "
            "small = hours/1-2 days, "
            "medium = days/1 week, "
            "large = weeks/multiple sprints"
        )
    )

    @field_validator("title")
    @classmethod
    def title_is_not_empty(cls, v: str) -> str:
        if len(v.strip()) < 3:
            raise ValueError("Feature title too short")
        return v.strip()


class RequirementDoc(BaseModel):
    """
    The structured output of the PO agent.

    This is the aggregate root of the Requirement Context.
    Everything downstream in the pipeline derives from this document.

    Design principle: every field in this model should be actionable.
    If a field cannot influence a decision (technical, prioritization,
    or scope), it should not be in this model.
    """
    project_name: str = Field(
        description="A short, memorable name for the project"
    )
    elevator_pitch: str = Field(
        description=(
            "One sentence: 'For [persona] who [need], [product] is a [category] "
            "that [key benefit]. Unlike [alternative], [differentiator].'"
        )
    )
    personas: list[UserPersona] = Field(
        min_length=2,
        max_length=4,
        description="The 2-4 primary user archetypes"
    )
    features: list[Feature] = Field(
        min_length=5,
        max_length=15,
        description="The features being considered, with priorities"
    )
    success_metrics: list[str] = Field(
        min_length=2,
        description="Measurable indicators of project success"
    )
    out_of_scope: list[str] = Field(
        min_length=3,
        description="Explicit list of things we are NOT building"
    )
    assumptions: list[str] = Field(
        min_length=2,
        description="Stated assumptions that could invalidate requirements if wrong"
    )
    tech_constraints: list[str] = Field(
        default_factory=list,
        description="Known technical constraints (platforms, integrations, compliance)"
    )

    @property
    def must_have_features(self) -> list[Feature]:
        return [f for f in self.features if f.priority == "must-have"]

    @property
    def mvp_complexity_estimate(self) -> str:
        """Rough total complexity of must-have features."""
        weights = {"small": 1, "medium": 3, "large": 8}
        total = sum(
            weights[f.estimated_complexity]
            for f in self.must_have_features
        )
        if total <= 5:
            return "small (days to 1-2 weeks)"
        elif total <= 15:
            return "medium (2-6 weeks)"
        else:
            return "large (2+ months)"

    def summary(self) -> str:
        """Human-readable summary for logging and notifications."""
        must_have_count = len(self.must_have_features)
        return (
            f"RequirementDoc: {self.project_name} | "
            f"{len(self.personas)} personas | "
            f"{len(self.features)} features ({must_have_count} must-have) | "
            f"MVP estimate: {self.mvp_complexity_estimate}"
        )

The schema enforces several constraints that prevent common requirement document failures:

  • UserPersona.name rejects generic names like “user” or “customer.” If the PO agent produces a persona named “User,” it means it is not thinking in terms of real people.
  • RequirementDoc.out_of_scope requires at least 3 items. This is a forcing function. A requirement document without explicit out-of-scope items will accumulate scope creep until it collapses.
  • RequirementDoc.assumptions requires at least 2 items. Hidden assumptions are how projects fail six months in. Surface them early.
  • The mvp_complexity_estimate property gives downstream agents (and the PM agent) a rough sense of scale without adding a separate field that the LLM has to populate.

5. BA Agent Design

With the RequirementDoc in hand, the Business Analyst agent takes over. The BA agent’s job is decomposition: taking high-level features and turning them into specific, testable, implementable user stories.

The BA agent is named Jordan. Jordan is precise, methodical, and adversarial in the best sense — they think about what can go wrong, what edge cases are lurking, and what the acceptance criteria need to cover to prevent a future debate about whether a feature is “done.”

Receives: A RequirementDoc from the PO agent.

Produces: A list of UserStory objects, each with:

  • The story text in “As a… I want to… so that…” format
  • Acceptance criteria in Given/When/Then format
  • Edge cases to test
  • Dependencies on other stories
  • Story point estimate

The BA’s Key Insight: Challenging Assumptions

Every feature in the RequirementDoc carries hidden assumptions. The BA agent’s job is to surface them at the story level by asking “what happens when X fails?” systematically.

Consider a feature like “Users can book a dog walker.” The assumptions hiding in that sentence:

  • What if no walkers are available in the user’s area?
  • What if the user tries to book for a time that is less than 2 hours away?
  • What if the user’s payment method fails during booking?
  • What if the walker cancels after the booking is confirmed?
  • What if the user has more dogs than the walker is rated for?

None of these are edge cases. They are main cases that happen every day on any marketplace platform. The BA agent needs to produce user stories that cover the happy path and these failure modes. Otherwise, the QC agent will find them later — after the SSE agent has already shipped code that does not handle them.

Tools Available to the BA Agent

read_document(doc_id: str) -> str — To retrieve the RequirementDoc written by the PO agent. This is the BA’s primary input.

create_document(title: str, content: str) -> str — To write the user story backlog as a persistent document accessible to downstream agents.

The BA agent does not have web search access. It is working entirely from the RequirementDoc. If it finds gaps in the requirements — things that are ambiguous or missing — it flags them as assumptions rather than inventing its own answers.


6. BAAgent Full Implementation

# agents/ba_agent.py

import json
import re
from typing import Optional

from agents.base import BaseAgent
from domain.models import UserStory, AcceptanceCriteria, StoryStatus
from domain.state import TeamState
from domain.events import UserStoriesCreated


class BAAgent(BaseAgent):
    """
    Business Analyst Agent — Jordan.

    Decomposes a RequirementDoc into a prioritized backlog of UserStory
    objects with acceptance criteria, edge cases, and story point estimates.
    """

    AGENT_ID = "ba_agent"
    AGENT_NAME = "Jordan"
    AGENT_ROLE = "Senior Business Analyst"
    DEFAULT_MODEL = "claude-opus-4-6"

    SYSTEM_PROMPT = """You are Jordan, a Senior Business Analyst with 10 years of
experience writing user stories for Agile software teams. You have worked on
consumer apps, B2B platforms, and internal tools.

Your output is what developers read when they start writing code. Every ambiguity
you leave in a story is a decision a developer will make on their own — often wrong.
Every edge case you miss is a bug that gets found in QA or, worse, in production.

You write stories in the classic format: "As a [persona], I want to [action] so
that [benefit]." You write acceptance criteria in Given/When/Then format. You
think adversarially about edge cases — not paranoid, but thorough.

Story points follow the Fibonacci scale: 1, 2, 3, 5, 8.
  1 = trivial (a label change, a static page)
  2 = simple (one happy path, no edge cases)
  3 = medium (one happy path, 2-3 edge cases)
  5 = complex (multiple paths, system integration)
  8 = epic that should be split further

When you produce JSON, it is valid JSON with no extra text before or after.
"""

    def _prepare_prompt(self, state: TeamState) -> str:
        req_doc = state.get("clarified_requirements")
        if not req_doc:
            raise ValueError("BAAgent requires clarified_requirements in state")

        # Format the requirement doc as readable text for the prompt
        req_text = self._format_req_doc(req_doc)

        return f"""You have received the following requirement document from the Product Owner:

{req_text}

Your task: produce a complete set of user stories covering all must-have features,
and the most critical should-have features.

Instructions:
1. Write one or more user stories per must-have feature. Large features may need
   2-3 stories. Small features may map to one story.
2. For each story, write 2-4 acceptance criteria in Given/When/Then format.
3. For each story, list 2-3 edge cases that must be handled.
4. Identify any dependencies between stories (a story that cannot be implemented
   before another story).
5. Estimate story points using Fibonacci scale: 1, 2, 3, 5, 8.
   If a story estimates to 8 points, split it into two stories.

Return a JSON array of story objects. Return ONLY the JSON array — no preamble,
no explanation, no code fences.

[
  {{
    "story_id": "US-001",
    "title": "short title",
    "story_text": "As a [persona], I want to [action] so that [benefit]",
    "persona": "persona name from RequirementDoc",
    "feature_ref": "title of the feature this story implements",
    "priority": "must-have" | "should-have" | "nice-to-have",
    "story_points": 1 | 2 | 3 | 5,
    "acceptance_criteria": [
      {{
        "id": "AC-001-1",
        "given": "the precondition",
        "when": "the action taken",
        "then": "the expected outcome"
      }}
    ],
    "edge_cases": ["edge case description 1", "edge case description 2"],
    "dependencies": ["US-XXX", "US-YYY"],
    "notes": "optional: anything the developer should know that doesn't fit above"
  }}
]

Think carefully before writing. A good user story:
- Is testable: the acceptance criteria are binary pass/fail
- Is small: a developer can finish it in 1-3 days
- Is independent: it can be tested in isolation (or dependencies are explicit)
- Has a clear "so that" clause that explains business value, not just functionality
"""

    def _format_req_doc(self, req_doc: dict) -> str:
        """Format the RequirementDoc dict as readable text for the prompt."""
        lines = [
            f"Project: {req_doc['project_name']}",
            f"Elevator Pitch: {req_doc['elevator_pitch']}",
            "",
            "Personas:",
        ]
        for p in req_doc.get("personas", []):
            lines.append(f"  - {p['name']}: {p['description']}")
            for pain in p.get("pain_points", []):
                lines.append(f"    Pain: {pain}")
        lines.append("")
        lines.append("Features:")
        for f in req_doc.get("features", []):
            lines.append(
                f"  [{f['priority'].upper()}] {f['title']} ({f['estimated_complexity']})"
            )
            lines.append(f"    {f['description']}")
        lines.append("")
        lines.append("Success Metrics:")
        for m in req_doc.get("success_metrics", []):
            lines.append(f"  - {m}")
        lines.append("")
        lines.append("Out of Scope:")
        for o in req_doc.get("out_of_scope", []):
            lines.append(f"  - {o}")
        lines.append("")
        lines.append("Assumptions:")
        for a in req_doc.get("assumptions", []):
            lines.append(f"  - {a}")
        if req_doc.get("tech_constraints"):
            lines.append("")
            lines.append("Technical Constraints:")
            for t in req_doc["tech_constraints"]:
                lines.append(f"  - {t}")
        return "\n".join(lines)

    def _parse_output(self, response: str, state: TeamState) -> TeamState:
        """Parse the LLM's JSON array of user stories into UserStory objects."""
        stories = self._extract_stories(response)

        # Validate that we have enough stories to proceed
        must_have_stories = [s for s in stories if s.priority == "must-have"]
        if len(must_have_stories) == 0:
            raise ValueError(
                "BA agent produced no must-have stories. "
                "Check that the RequirementDoc contains must-have features."
            )

        event = UserStoriesCreated(
            source_agent=self.AGENT_ID,
            story_count=len(stories),
            must_have_count=len(must_have_stories),
            total_story_points=sum(s.story_points for s in stories),
        )

        return {
            **state,
            "user_stories": [s.model_dump() for s in stories],
            "requirement_state": "user_stories_created",
            "event_outbox": self._append_event(state, event),
        }

    def _extract_stories(self, response: str) -> list[UserStory]:
        """Extract and validate UserStory list from LLM response."""
        cleaned = re.sub(r"```(?:json)?", "", response).strip()
        # Find the JSON array
        match = re.search(r"\[.*\]", cleaned, re.DOTALL)
        if not match:
            raise ValueError(
                f"Could not find JSON array in BA response: {response[:200]}"
            )
        raw_stories = json.loads(match.group())
        stories = []
        for raw in raw_stories:
            try:
                stories.append(UserStory.model_validate(raw))
            except Exception as e:
                # Log but don't fail on a single malformed story
                self.logger.warning(f"Skipping malformed story {raw.get('story_id')}: {e}")
        return stories

The _format_req_doc method deserves a note. We could pass the RequirementDoc to the BA agent as raw JSON in the prompt. Sometimes that is fine. But a well-formatted text representation is easier for the LLM to reason about. JSON is optimized for machines. Structured plain text — indented lists, clear labels, bracketed priorities — is easier to parse in the context of a long prompt. In practice, the text format produces more consistent BA output.


7. UserStory Schema

# domain/models.py  (BA section)

from __future__ import annotations

from typing import Literal, Optional
from pydantic import BaseModel, Field, field_validator, model_validator


class AcceptanceCriteria(BaseModel):
    """
    A single acceptance criterion in Given/When/Then format.

    The Given/When/Then structure forces each criterion to be:
    - Given: a specific precondition (not "when logged in" — which login state?)
    - When: a specific action (not "user submits form" — which form, which data?)
    - Then: a specific, observable, binary outcome (not "it works" — what exactly?)
    """
    id: str = Field(description="Unique ID within the story, e.g. 'AC-001-1'")
    given: str = Field(description="The precondition / system state")
    when: str = Field(description="The action taken by actor or system")
    then: str = Field(description="The observable, verifiable outcome")

    @field_validator("then")
    @classmethod
    def then_is_specific(cls, v: str) -> str:
        vague_endings = ["works", "succeeds", "is done", "is complete"]
        if any(v.lower().strip().endswith(ending) for ending in vague_endings):
            raise ValueError(
                f"Acceptance criterion 'then' clause is too vague: '{v}'. "
                "Specify the exact observable outcome."
            )
        return v


class StoryStatus(str):
    BACKLOG = "backlog"
    READY = "ready"           # all dependencies resolved
    IN_PROGRESS = "in_progress"
    IN_REVIEW = "in_review"
    DONE = "done"
    BLOCKED = "blocked"


class UserStory(BaseModel):
    """
    A single user story — the atomic unit of work in our system.

    This is what the SSE agent reads when writing code.
    This is what the QC agent reads when writing tests.
    Every ambiguity here becomes a decision made without the team.

    Design principle: a UserStory is a conversation starter, not a contract.
    The 'notes' field exists for things that don't fit the structured fields —
    use it for developer-facing context that would be lost in a ticket title.
    """
    story_id: str = Field(description="Unique ID, e.g. 'US-001'")
    title: str = Field(description="Short, verb-first title, e.g. 'Book a Dog Walker'")
    story_text: str = Field(
        description="'As a [persona], I want to [action] so that [benefit]'"
    )
    persona: str = Field(
        description="The persona name from the RequirementDoc"
    )
    feature_ref: str = Field(
        description="The RequirementDoc feature title this story implements"
    )
    priority: Literal["must-have", "should-have", "nice-to-have"]
    story_points: Literal[1, 2, 3, 5] = Field(
        description=(
            "Fibonacci estimate. Note: 8 is intentionally excluded — "
            "stories that large must be split."
        )
    )
    acceptance_criteria: list[AcceptanceCriteria] = Field(
        min_length=2,
        max_length=6,
        description="2-6 Given/When/Then criteria"
    )
    edge_cases: list[str] = Field(
        min_length=1,
        description="Edge cases the implementation must handle"
    )
    dependencies: list[str] = Field(
        default_factory=list,
        description="story_ids that must be completed before this story"
    )
    notes: Optional[str] = Field(
        default=None,
        description="Free-text notes for developers"
    )
    status: str = Field(
        default=StoryStatus.BACKLOG,
        description="Current status in the workflow"
    )

    @field_validator("story_text")
    @classmethod
    def validate_story_format(cls, v: str) -> str:
        v_lower = v.lower()
        if "as a " not in v_lower:
            raise ValueError("Story text must contain 'As a'")
        if "i want to " not in v_lower:
            raise ValueError("Story text must contain 'I want to'")
        if "so that " not in v_lower:
            raise ValueError("Story text must contain 'so that'")
        return v

    @field_validator("story_id")
    @classmethod
    def validate_story_id_format(cls, v: str) -> str:
        if not re.match(r"^US-\d{3,}$", v):
            raise ValueError(
                f"story_id must match format US-NNN, got: {v}"
            )
        return v

    @model_validator(mode="after")
    def check_acceptance_criteria_ids_match_story(self) -> "UserStory":
        """Validate that AC ids reference this story's story_id."""
        prefix = self.story_id.replace("US-", "AC-")
        for ac in self.acceptance_criteria:
            if not ac.id.startswith(prefix):
                raise ValueError(
                    f"AcceptanceCriteria id {ac.id} does not match "
                    f"story {self.story_id}. Expected prefix: {prefix}"
                )
        return self

    def to_markdown(self) -> str:
        """Human-readable Markdown for dashboards and notifications."""
        lines = [
            f"### [{self.story_id}] {self.title}",
            f"**Priority:** {self.priority} | **Points:** {self.story_points}",
            f"**Persona:** {self.persona}",
            "",
            f"> {self.story_text}",
            "",
            "**Acceptance Criteria:**",
        ]
        for ac in self.acceptance_criteria:
            lines.extend([
                f"- **Given** {ac.given}",
                f"  **When** {ac.when}",
                f"  **Then** {ac.then}",
            ])
        if self.edge_cases:
            lines.append("")
            lines.append("**Edge Cases:**")
            for ec in self.edge_cases:
                lines.append(f"- {ec}")
        if self.dependencies:
            lines.append("")
            lines.append(f"**Depends on:** {', '.join(self.dependencies)}")
        if self.notes:
            lines.append("")
            lines.append(f"**Notes:** {self.notes}")
        return "\n".join(lines)

The story_points field uses Literal[1, 2, 3, 5] — notice that 8 is intentionally excluded. Any story that would be estimated at 8 points is too large and must be split. This constraint is baked into the schema, not just the system prompt. The BA agent cannot produce an 8-point story even if the LLM tries.

The validate_story_format validator enforces the classic user story format at the data layer. If the LLM produces “User can book a walker” instead of “As a Dog Owner, I want to book a walker so that I can…”, the validation fails and the base agent’s retry mechanism kicks in.


8. The PO → BA Handoff

PO to BA workflow diagram showing Client Brief, PO clarification pass, RequirementDoc, BA processing, and User Stories output

The handoff between the PO agent and the BA agent is the first major state transition in the pipeline. Let me show exactly what the state looks like at each point, because this is where bugs tend to hide.

State entering the PO agent (Pass 1):

{
    "raw_brief": "Build me a task manager...",
    "clarified_requirements": None,
    "clarification_answers": None,
    "po_clarifying_questions": None,
    "requirement_state": "received",
}

State after PO agent Pass 1:

{
    "raw_brief": "Build me a task manager...",
    "clarified_requirements": None,
    "clarification_answers": None,
    "po_clarifying_questions": "1. Who are the primary users...\n2. ...",
    "requirement_state": "awaiting_clarification",
    # Graph pauses here at the human_clarification_checkpoint node
}

State after human responds (checkpoint resumed):

{
    "raw_brief": "Build me a task manager...",
    "clarified_requirements": None,
    "clarification_answers": "1. Primary users are software engineers...\n2. ...",
    "po_clarifying_questions": "1. Who are the primary users...\n2. ...",
    "requirement_state": "awaiting_clarification",
}

State after PO agent Pass 2:

{
    "raw_brief": "Build me a task manager...",
    "clarified_requirements": {
        "project_name": "TaskFlow",
        "elevator_pitch": "For software engineers who lose track of...",
        "personas": [...],
        "features": [...],
        "success_metrics": [...],
        "out_of_scope": [...],
        "assumptions": [...],
        "tech_constraints": [...]
    },
    "clarification_answers": "...",
    "po_clarifying_questions": "...",
    "requirement_state": "clarified",
    "event_outbox": {"pending": [{"event": "RequirementsCleared", ...}]}
}

State entering the BA agent:

The BA agent sees the full state above. The key field it reads is clarified_requirements. It does not re-read the raw_brief or the clarification_answers — that work is done. The RequirementDoc is the BA agent’s source of truth. This is the anti-corruption layer at work: the BA’s world starts at the RequirementDoc, not at the raw brief.

State after the BA agent:

{
    # All previous fields preserved
    "user_stories": [
        {
            "story_id": "US-001",
            "title": "Create a Task",
            "story_text": "As a Software Engineer Dana, I want to create...",
            "priority": "must-have",
            "story_points": 2,
            "acceptance_criteria": [
                {
                    "id": "AC-001-1",
                    "given": "Dana is on the task list screen",
                    "when": "she taps the 'New Task' button",
                    "then": "a task creation form appears with Title, Due Date, and Priority fields"
                },
                # ...more criteria
            ],
            "edge_cases": [
                "User tries to save a task with an empty title",
                "User enters a due date in the past"
            ],
            "dependencies": [],
            "status": "backlog"
        },
        # ... more stories
    ],
    "requirement_state": "user_stories_created",
}

The state is now ready for the stories approval checkpoint — the second human gate, where a stakeholder reviews the user stories before the system proceeds to technical architecture.

The LangGraph Node Wiring

In the graph builder from Part 3, the PO and BA nodes are connected like this:

# graph/builder.py

from langgraph.graph import StateGraph, END
from langgraph.checkpoint.sqlite import SqliteSaver
from langgraph.types import interrupt

from agents.po_agent import POAgent
from agents.ba_agent import BAAgent
from domain.state import TeamState


po_agent = POAgent()
ba_agent = BAAgent()


def run_po_agent(state: TeamState) -> TeamState:
    """LangGraph node wrapper for the PO agent."""
    import asyncio
    return asyncio.run(po_agent.run(state))


def run_ba_agent(state: TeamState) -> TeamState:
    """LangGraph node wrapper for the BA agent."""
    import asyncio
    return asyncio.run(ba_agent.run(state))


def human_clarification_checkpoint(state: TeamState) -> TeamState:
    """
    Pause execution after PO asks clarifying questions.
    Human must provide answers via the approval gateway.
    """
    if state.get("requirement_state") == "awaiting_clarification":
        # Interrupt the graph — resume when answers are provided
        answers = interrupt({
            "type": "clarification_required",
            "questions": state["po_clarifying_questions"],
            "project": state.get("raw_brief", "")[:100],
        })
        return {**state, "clarification_answers": answers}
    return state  # Already has answers, pass through


def route_after_po(state: TeamState) -> str:
    """Determine where to go after the PO agent runs."""
    req_state = state.get("requirement_state")
    if req_state == "awaiting_clarification":
        return "clarification_checkpoint"
    elif req_state == "clarified":
        return "ba_agent"
    else:
        # Unexpected state — escalate to PM
        return "pm_agent"


builder = StateGraph(TeamState)
builder.add_node("po_agent", run_po_agent)
builder.add_node("clarification_checkpoint", human_clarification_checkpoint)
builder.add_node("ba_agent", run_ba_agent)

builder.set_entry_point("po_agent")

builder.add_conditional_edges(
    "po_agent",
    route_after_po,
    {
        "clarification_checkpoint": "clarification_checkpoint",
        "ba_agent": "ba_agent",
        "pm_agent": "pm_agent",
    },
)

# After clarification, always loop back to PO agent for Pass 2
builder.add_edge("clarification_checkpoint", "po_agent")

The route_after_po function is the key routing decision. If the PO agent completed Pass 1 (state is awaiting_clarification), route to the checkpoint. If it completed Pass 2 (state is clarified), route to the BA agent. This means the po_agent node is visited twice in a typical run, and the routing function handles both cases cleanly.


9. Real Example Walkthrough

Let me run through a complete example from raw brief to user stories. The input is deliberately short to be readable, but the pattern holds for more complex briefs.

Input brief:

“Build a simple task manager app for our engineering team. We need to track tasks, assign them to people, and know when things are due. We currently use spreadsheets and it’s getting messy.”

PO Agent Pass 1 — Clarifying Questions:

1. Who are the primary users of this task manager, and what is their typical
   workflow? For example: are they assigning tasks to themselves, to teammates,
   or managing work across a whole team?

2. If you could ship only three features in the first version, which three would
   make the spreadsheets feel obviously inadequate by comparison?

3. How will you know this project was successful? Are you measuring something
   specific — like time spent in status meetings, number of missed deadlines,
   or team onboarding time?

4. Are there any technical constraints we should know about — for example,
   does this need to integrate with GitHub, Slack, Jira, or another tool your
   team already uses?

5. Is there a hard deadline for the first version, and do you have a rough
   sense of the available development budget (in time or money)?

Human answers:

1. Primary users are software engineers (4-6 person team). They assign tasks
   to themselves and to teammates. A tech lead occasionally needs to see the
   whole team's workload.

2. Must-haves: create/assign tasks with due dates, a board view showing
   status (Todo/In Progress/Done), and email notifications for due dates.
   Everything else is secondary.

3. Success = the team stops using the spreadsheet within 3 months and
   nobody misses a due date that was in the system.

4. Needs to work in a browser. No integrations required for v1, but
   Slack notifications would be nice someday. Simple email auth is fine —
   no SSO needed.

5. We'd like something usable within 6 weeks. Budget is one developer
   for that period.

PO Agent Pass 2 — RequirementDoc:

{
  "project_name": "TaskFlow",
  "elevator_pitch": "For small engineering teams drowning in spreadsheet chaos, TaskFlow is a task management app that makes assignments, due dates, and status visible to everyone — so nothing falls through the cracks.",
  "personas": [
    {
      "name": "Engineer Sam",
      "description": "A software engineer on a 4-6 person team. Works on 3-5 tasks at any given time. Needs to know what to work on next and when things are due.",
      "pain_points": [
        "Loses track of tasks assigned by teammates in long Slack threads",
        "Misses due dates because the spreadsheet is always stale"
      ],
      "goals": [
        "Know exactly what I own and when it's due",
        "Update task status without a meeting or a message"
      ]
    },
    {
      "name": "Tech Lead Morgan",
      "description": "The team lead who assigns tasks and monitors workload across the whole team. Needs a high-level view without micromanaging.",
      "pain_points": [
        "No visibility into who is overloaded or blocked",
        "Status meetings take 30 minutes to recreate information that should be visible"
      ],
      "goals": [
        "See team workload at a glance",
        "Identify blockers before they become missed deadlines"
      ]
    },
    {
      "name": "New Hire Taylor",
      "description": "A developer who joined the team 2 weeks ago. Needs to understand what exists, what they own, and how work flows through the team.",
      "pain_points": [
        "No single place to understand team priorities",
        "Afraid of stepping on colleagues' toes by working on the wrong thing"
      ],
      "goals": [
        "Onboard without needing a 1-on-1 to understand team priorities",
        "Find and claim available tasks independently"
      ]
    }
  ],
  "features": [
    {
      "title": "Task Creation and Assignment",
      "description": "Create tasks with title, description, assignee, due date, and priority. Assignee receives email notification.",
      "priority": "must-have",
      "estimated_complexity": "medium"
    },
    {
      "title": "Kanban Board View",
      "description": "Visual board with Todo, In Progress, and Done columns. Drag-and-drop to change status.",
      "priority": "must-have",
      "estimated_complexity": "medium"
    },
    {
      "title": "Due Date Email Notifications",
      "description": "Email reminder 24 hours before a task is due. Email notification when a task is assigned to you.",
      "priority": "must-have",
      "estimated_complexity": "small"
    },
    {
      "title": "User Authentication",
      "description": "Email and password login. No SSO, no OAuth for v1.",
      "priority": "must-have",
      "estimated_complexity": "small"
    },
    {
      "title": "Task Detail View",
      "description": "Full task details including history of status changes, comments thread, and file attachment.",
      "priority": "should-have",
      "estimated_complexity": "medium"
    },
    {
      "title": "Team Workload View",
      "description": "A view (for tech leads) showing all team members and their open task counts, grouped by due date.",
      "priority": "should-have",
      "estimated_complexity": "small"
    },
    {
      "title": "Task Filtering and Search",
      "description": "Filter tasks by assignee, status, priority, and due date. Full-text search on task title.",
      "priority": "should-have",
      "estimated_complexity": "small"
    },
    {
      "title": "Slack Notifications",
      "description": "Optional Slack webhook integration for task assignment and due date notifications.",
      "priority": "nice-to-have",
      "estimated_complexity": "small"
    }
  ],
  "success_metrics": [
    "Spreadsheet abandoned by 100% of team members within 90 days of launch",
    "Zero missed deadlines for tasks that were logged in TaskFlow (measured monthly)",
    "New team member can find and claim their first task without asking a teammate"
  ],
  "out_of_scope": [
    "SSO or OAuth authentication (email/password only for v1)",
    "Mobile app (browser only for v1)",
    "Time tracking or billing features",
    "External integrations except optional Slack webhook",
    "Advanced reporting, burndown charts, or velocity metrics"
  ],
  "assumptions": [
    "Team size stays at 4-6 people during v1; no multi-team or org-level features needed",
    "Email delivery is handled by a transactional email service (SendGrid, Postmark); no self-hosted email server"
  ],
  "tech_constraints": [
    "Must be browser-based (no mobile app for v1)",
    "One developer, 6-week delivery timeline"
  ]
}

BA Agent — User Stories Output (8 stories):

The BA agent produces the following stories from the RequirementDoc:

Story IDTitlePersonaPointsPriority
US-001Register and Log InEngineer Sam2must-have
US-002Create a Task and Assign ItTech Lead Morgan3must-have
US-003View the Kanban BoardEngineer Sam3must-have
US-004Move a Task Across Status ColumnsEngineer Sam2must-have
US-005Receive Assignment Email NotificationEngineer Sam2must-have
US-006Receive Due Date ReminderEngineer Sam2must-have
US-007View Team WorkloadTech Lead Morgan2should-have
US-008Filter Tasks by Assignee and StatusEngineer Sam2should-have

Total: 18 story points. At a typical velocity of 8-10 points per sprint (2 weeks), this is roughly 4 sprints — about 8 weeks. The 6-week timeline with one developer is tight but possible if the should-have stories slip to a second release. The PO agent’s mvp_complexity_estimate called this “medium (2-6 weeks)” based on the must-haves alone — reasonable alignment.

Let me show one complete story to illustrate the depth of the BA output:

US-002 — Create a Task and Assign It:

{
  "story_id": "US-002",
  "title": "Create a Task and Assign It",
  "story_text": "As Tech Lead Morgan, I want to create a task with a title, description, due date, and assignee so that the assigned engineer knows exactly what they need to do and when it needs to be done by.",
  "persona": "Tech Lead Morgan",
  "feature_ref": "Task Creation and Assignment",
  "priority": "must-have",
  "story_points": 3,
  "acceptance_criteria": [
    {
      "id": "AC-002-1",
      "given": "Morgan is logged in and on the task board",
      "when": "she clicks the 'New Task' button",
      "then": "a task creation form opens with fields: Title (required), Description (optional), Assignee (dropdown of team members), Due Date (date picker), Priority (Low/Medium/High)"
    },
    {
      "id": "AC-002-2",
      "given": "Morgan has filled in all required fields and selected an assignee",
      "when": "she clicks 'Save Task'",
      "then": "the task appears in the Todo column on the board, and the assigned engineer receives an email notification within 5 minutes"
    },
    {
      "id": "AC-002-3",
      "given": "Morgan tries to save a task without a title",
      "when": "she clicks 'Save Task'",
      "then": "the form shows an inline error on the Title field: 'Task title is required' and the form does not close"
    },
    {
      "id": "AC-002-4",
      "given": "Morgan selects a due date that is in the past",
      "when": "she clicks 'Save Task'",
      "then": "the form shows a warning: 'Due date is in the past. Are you sure?' with Confirm and Cancel options — the task can still be saved if Morgan confirms"
    }
  ],
  "edge_cases": [
    "Assignee is not a registered user (e.g., Morgan types an email that isn't in the system) — show inline error",
    "Email notification delivery fails — task should still be created; log the failed notification for retry"
  ],
  "dependencies": ["US-001"],
  "notes": "The email notification for US-002 and the due date reminder in US-006 should share the same email service integration. Implement them as a single EmailNotificationService class to avoid duplicated SendGrid setup code."
}
User Story card visual showing US-002 with story text, acceptance criteria, story points badge, and dependencies

Notice what the notes field carries: a cross-story implementation insight that does not fit neatly into the structured fields. “The email notification for US-002 and the due date reminder in US-006 should share the same EmailNotificationService class.” This kind of observation — connecting the dots between two stories — is what distinguishes a good BA output from a mediocre one. It does not dictate architecture (that is the TA agent’s domain), but it flags a coupling that the TA agent should know about.


10. Testing PO and BA

Tests for agents are different from tests for regular business logic, because the LLM output is non-deterministic. We need to test at two levels: the output parsing logic (deterministic, unit-testable) and the end-to-end agent behavior (non-deterministic, integration-testable with mocking).

Unit Tests: Schema Validation

Start with the data models. These are entirely deterministic.

# tests/unit/test_requirement_doc.py

import pytest
from pydantic import ValidationError
from domain.models import RequirementDoc, UserPersona, Feature, UserStory, AcceptanceCriteria


class TestUserPersona:
    def test_rejects_generic_name(self):
        with pytest.raises(ValidationError, match="too generic"):
            UserPersona(
                name="user",
                description="A person who uses the app",
                pain_points=["pain"],
                goals=["goal"],
            )

    def test_accepts_descriptive_name(self):
        persona = UserPersona(
            name="Engineer Sam",
            description="A software engineer on a small team.",
            pain_points=["Loses track of tasks"],
            goals=["Know what to work on next"],
        )
        assert persona.name == "Engineer Sam"


class TestRequirementDoc:
    def test_rejects_too_few_features(self):
        with pytest.raises(ValidationError):
            RequirementDoc(
                project_name="Test",
                elevator_pitch="A test app",
                personas=[
                    UserPersona(
                        name="Persona One",
                        description="A person.",
                        pain_points=["pain"],
                        goals=["goal"],
                    ),
                    UserPersona(
                        name="Persona Two",
                        description="Another person.",
                        pain_points=["pain"],
                        goals=["goal"],
                    ),
                ],
                features=[
                    Feature(
                        title="One Feature",
                        description="Just one",
                        priority="must-have",
                        estimated_complexity="small",
                    )
                ],  # Too few — minimum is 5
                success_metrics=["metric one", "metric two"],
                out_of_scope=["item one", "item two", "item three"],
                assumptions=["assumption one", "assumption two"],
            )

    def test_mvp_complexity_estimate(self):
        """Medium complexity features should produce a medium estimate."""
        features = [
            Feature(
                title=f"Feature {i}",
                description="A feature",
                priority="must-have",
                estimated_complexity="medium",
            )
            for i in range(4)
        ]
        # Add padding features to meet minimum count
        features += [
            Feature(
                title="Extra Feature",
                description="Extra",
                priority="nice-to-have",
                estimated_complexity="small",
            )
        ]
        doc = RequirementDoc(
            project_name="Test",
            elevator_pitch="A test app",
            personas=[
                UserPersona(name="User A", description=".", pain_points=["p"], goals=["g"]),
                UserPersona(name="User B", description=".", pain_points=["p"], goals=["g"]),
            ],
            features=features,
            success_metrics=["metric 1", "metric 2"],
            out_of_scope=["o1", "o2", "o3"],
            assumptions=["a1", "a2"],
        )
        # 4 medium features = 4 * 3 = 12 points → "medium (2-6 weeks)"
        assert "medium" in doc.mvp_complexity_estimate


class TestUserStory:
    def test_rejects_invalid_story_format(self):
        """Stories that don't follow 'As a / I want / so that' format are rejected."""
        with pytest.raises(ValidationError, match="I want to"):
            UserStory(
                story_id="US-001",
                title="Create Task",
                story_text="User can create a task",  # Missing format
                persona="Engineer Sam",
                feature_ref="Task Creation",
                priority="must-have",
                story_points=2,
                acceptance_criteria=[
                    AcceptanceCriteria(
                        id="AC-001-1",
                        given="user is on the board",
                        when="they click New Task",
                        then="the task creation form appears with Title and Due Date fields",
                    ),
                    AcceptanceCriteria(
                        id="AC-001-2",
                        given="user fills in the title",
                        when="they click Save",
                        then="the task appears in the Todo column",
                    ),
                ],
                edge_cases=["Empty title attempt"],
            )

    def test_rejects_vague_acceptance_criteria(self):
        """Acceptance criteria with vague 'then' clauses are rejected."""
        with pytest.raises(ValidationError, match="too vague"):
            AcceptanceCriteria(
                id="AC-001-1",
                given="user is logged in",
                when="they submit the form",
                then="it works",  # Too vague
            )

    def test_rejects_8_story_points(self):
        """8-point stories must be split — Literal[1,2,3,5] excludes 8."""
        with pytest.raises(ValidationError):
            UserStory(
                story_id="US-001",
                title="Giant Story",
                story_text="As Engineer Sam, I want to do everything so that I can ship the whole app",
                persona="Engineer Sam",
                feature_ref="All Features",
                priority="must-have",
                story_points=8,  # Not in Literal[1, 2, 3, 5]
                acceptance_criteria=[
                    AcceptanceCriteria(
                        id="AC-001-1",
                        given="context",
                        when="action",
                        then="the system displays the confirmation message",
                    ),
                    AcceptanceCriteria(
                        id="AC-001-2",
                        given="context 2",
                        when="action 2",
                        then="the system saves the record to the database",
                    ),
                ],
                edge_cases=["Something fails"],
            )

Unit Tests: Output Parsing

Test the parsing logic with fixed strings, not LLM calls.

# tests/unit/test_po_agent_parsing.py

import json
import pytest
from agents.po_agent import POAgent
from domain.models import RequirementDoc


VALID_REQUIREMENT_DOC_JSON = json.dumps({
    "project_name": "TaskFlow",
    "elevator_pitch": "For engineering teams who struggle with spreadsheets.",
    "personas": [
        {
            "name": "Engineer Sam",
            "description": "A software engineer.",
            "pain_points": ["Loses track of tasks"],
            "goals": ["Know what to work on"]
        },
        {
            "name": "Tech Lead Morgan",
            "description": "The team lead.",
            "pain_points": ["No visibility"],
            "goals": ["See team workload"]
        }
    ],
    "features": [
        {"title": "Task Creation", "description": "Create tasks", "priority": "must-have", "estimated_complexity": "medium"},
        {"title": "Kanban Board", "description": "Board view", "priority": "must-have", "estimated_complexity": "medium"},
        {"title": "Notifications", "description": "Email alerts", "priority": "must-have", "estimated_complexity": "small"},
        {"title": "Auth", "description": "Login", "priority": "must-have", "estimated_complexity": "small"},
        {"title": "Task Detail", "description": "Full view", "priority": "should-have", "estimated_complexity": "medium"},
    ],
    "success_metrics": [
        "Spreadsheet abandoned within 90 days",
        "Zero missed deadlines for logged tasks"
    ],
    "out_of_scope": ["SSO", "Mobile app", "Time tracking"],
    "assumptions": ["Team stays at 4-6 people", "Transactional email service available"],
    "tech_constraints": ["Browser-only"]
})


class TestPOAgentParsing:
    def setup_method(self):
        self.agent = POAgent()

    def test_parses_valid_json(self):
        doc = self.agent._parse_requirement_doc(VALID_REQUIREMENT_DOC_JSON)
        assert isinstance(doc, RequirementDoc)
        assert doc.project_name == "TaskFlow"
        assert len(doc.personas) == 2
        assert len(doc.features) == 5

    def test_strips_markdown_fences(self):
        wrapped = f"```json\n{VALID_REQUIREMENT_DOC_JSON}\n```"
        doc = self.agent._parse_requirement_doc(wrapped)
        assert doc.project_name == "TaskFlow"

    def test_raises_on_missing_json(self):
        with pytest.raises(ValueError, match="Could not find JSON"):
            self.agent._parse_requirement_doc("Here is my analysis of your brief...")

Integration Test: PO → BA Pipeline

# tests/integration/test_po_ba_pipeline.py

import pytest
from unittest.mock import AsyncMock, patch
from agents.po_agent import POAgent
from agents.ba_agent import BAAgent
from domain.state import TeamState


# A realistic RequirementDoc to inject (bypassing the LLM for the PO pass)
MOCK_REQUIREMENT_DOC = {
    "project_name": "TaskFlow",
    "elevator_pitch": "For engineering teams who struggle with spreadsheets.",
    "personas": [
        {
            "name": "Engineer Sam",
            "description": "A software engineer on a 4-6 person team.",
            "pain_points": ["Loses track of tasks"],
            "goals": ["Know what to work on"]
        },
        {
            "name": "Tech Lead Morgan",
            "description": "The team lead who assigns tasks.",
            "pain_points": ["No visibility into workload"],
            "goals": ["See team workload at a glance"]
        }
    ],
    "features": [
        {"title": "Task Creation", "description": "Create and assign tasks", "priority": "must-have", "estimated_complexity": "medium"},
        {"title": "Kanban Board", "description": "Visual board view", "priority": "must-have", "estimated_complexity": "medium"},
        {"title": "Email Notifications", "description": "Assignment and due date emails", "priority": "must-have", "estimated_complexity": "small"},
        {"title": "User Authentication", "description": "Email/password login", "priority": "must-have", "estimated_complexity": "small"},
        {"title": "Task Detail View", "description": "Full task details with comments", "priority": "should-have", "estimated_complexity": "medium"},
    ],
    "success_metrics": ["Spreadsheet abandoned in 90 days", "Zero missed deadlines"],
    "out_of_scope": ["SSO", "Mobile app", "Time tracking"],
    "assumptions": ["Team stays at 4-6 people", "Transactional email available"],
    "tech_constraints": ["Browser-only"],
}


@pytest.mark.asyncio
async def test_ba_agent_produces_stories_from_requirement_doc():
    """
    Integration test: given a RequirementDoc, the BA agent should produce
    at least one user story per must-have feature, and all stories should
    pass schema validation.
    """
    ba_agent = BAAgent()

    initial_state: TeamState = {
        "raw_brief": "Build a task manager",
        "clarified_requirements": MOCK_REQUIREMENT_DOC,
        "clarification_answers": "Answered in previous step",
        "user_stories": [],
        "requirement_state": "clarified",
        "event_outbox": {"pending": [], "delivered": []},
        "conversation_id": "test-integration-001",
    }

    # Mock the LLM to return a realistic JSON response
    mock_llm_response = """[
      {
        "story_id": "US-001",
        "title": "Register and Log In",
        "story_text": "As Engineer Sam, I want to register with my email and log in so that I can access the team task board.",
        "persona": "Engineer Sam",
        "feature_ref": "User Authentication",
        "priority": "must-have",
        "story_points": 2,
        "acceptance_criteria": [
          {
            "id": "AC-001-1",
            "given": "Sam is on the registration page",
            "when": "she enters a valid email and password and clicks Register",
            "then": "her account is created and she is redirected to the task board"
          },
          {
            "id": "AC-001-2",
            "given": "Sam tries to register with an email already in the system",
            "when": "she clicks Register",
            "then": "the form shows: 'This email is already registered. Log in instead?'"
          }
        ],
        "edge_cases": ["Weak password attempt", "Invalid email format"],
        "dependencies": [],
        "notes": null,
        "status": "backlog"
      },
      {
        "story_id": "US-002",
        "title": "Create and Assign a Task",
        "story_text": "As Tech Lead Morgan, I want to create a task with a title, due date, and assignee so that the assigned engineer knows what to work on.",
        "persona": "Tech Lead Morgan",
        "feature_ref": "Task Creation",
        "priority": "must-have",
        "story_points": 3,
        "acceptance_criteria": [
          {
            "id": "AC-002-1",
            "given": "Morgan is on the task board",
            "when": "she clicks New Task",
            "then": "a form appears with Title, Description, Assignee, Due Date, and Priority fields"
          },
          {
            "id": "AC-002-2",
            "given": "Morgan fills all required fields and clicks Save",
            "when": "the form is submitted",
            "then": "the task appears in the Todo column and the assignee receives an email notification within 5 minutes"
          }
        ],
        "edge_cases": ["Empty title attempt", "Past due date warning"],
        "dependencies": ["US-001"],
        "notes": null,
        "status": "backlog"
      }
    ]"""

    with patch.object(ba_agent, "_call_llm", new=AsyncMock(return_value=mock_llm_response)):
        result_state = await ba_agent.run(initial_state)

    # Assertions
    stories = result_state.get("user_stories", [])
    assert len(stories) > 0, "BA agent must produce at least one story"

    must_have_stories = [s for s in stories if s["priority"] == "must-have"]
    assert len(must_have_stories) > 0, "Must have at least one must-have story"

    # Every story must have at least 2 acceptance criteria
    for story in stories:
        assert len(story["acceptance_criteria"]) >= 2, (
            f"Story {story['story_id']} has fewer than 2 acceptance criteria"
        )

    # State should be updated
    assert result_state["requirement_state"] == "user_stories_created"
    assert len(result_state["event_outbox"]["pending"]) > 0

Run these with:

# Unit tests (no LLM calls — fast)
pytest tests/unit/ -v

# Integration tests (with mocked LLM — still fast)
pytest tests/integration/test_po_ba_pipeline.py -v

# Full end-to-end (real LLM — slow, costs tokens, run sparingly)
pytest tests/e2e/ -v --run-llm

The unit tests are the most valuable for daily development — they run in milliseconds and catch schema and parsing regressions immediately. The integration tests catch wiring issues. The end-to-end tests are reserved for pre-release validation.


11. Common Failure Modes (And How We Handle Them)

Building the PO and BA agents revealed several failure modes worth documenting, because you will encounter them too.

PO agent produces requirements without asking clarifying questions. Sometimes the LLM skips Pass 1 and jumps straight to producing a RequirementDoc even when asked to ask questions. The fix: in the Pass 1 prompt, explicitly say “Your response must be a numbered list of exactly 5 questions. Do not produce a requirement document in this step.” The base class also validates the response — if it looks like JSON, it is assumed to be a Pass 2 response, and the agent is called again with the Pass 1 prompt explicitly enforced.

BA agent produces 8-point stories. This happens when the BA agent treats a feature as a single story rather than decomposing it. The Literal[1, 2, 3, 5] type constraint catches this at validation time. The base agent’s retry mechanism kicks in and re-calls the LLM with an added instruction: “The following stories had invalid story point estimates. Split any stories with more than 5 points into two smaller stories.”

BA agent references personas not in the RequirementDoc. The LLM occasionally invents new personas (a “casual user” that was not in the doc). The UserStory.persona field should be validated against RequirementDoc.personas. Add a model validator to UserStory that checks this when the RequirementDoc is available. In practice, we pass the valid persona names in the prompt as an explicit constraint: “Persona must be one of: Engineer Sam, Tech Lead Morgan, New Hire Taylor.”

Acceptance criteria are too vague. The AcceptanceCriteria.then_is_specific validator catches the worst offenders (“it works,” “it succeeds”). For subtler vagueness — “the user is redirected” without specifying where — the prompt explicitly instructs the BA agent: “The ‘then’ clause must name the exact page or UI element that appears, the exact message shown, or the exact data change that occurs.”

Circular dependencies in user stories. The BA agent occasionally produces dependency cycles (US-003 depends on US-004, which depends on US-003). This is detected by the PM agent’s dependency graph checker, which runs after the BA agent and flags cycles for human resolution before the stories go into the implementation queue.


12. What We Have Built — And What Comes Next

In this part, we built the first two working agents in the AI software team.

The PO agent (Alex) transforms a vague client brief into a structured RequirementDoc through a two-pass clarification process. It does market research when needed, enforces scope discipline through explicit out-of-scope fields, and surfaces assumptions before they become surprises. Its output is a validated Pydantic model with 40+ fields that passes strict schema validation.

The BA agent (Jordan) takes the RequirementDoc and decomposes it into a backlog of UserStory objects with Given/When/Then acceptance criteria, edge cases, story point estimates, and dependency mapping. Its output is the atomic unit of work that every downstream agent will read.

Together, they turn “build me an app like Uber but for dogs” into eight well-structured user stories with testable acceptance criteria — in minutes, not weeks.

The handoff mechanism — state inspection, conditional routing, human checkpoints — is production-grade. The schemas enforce quality at the data layer, not just in prose instructions. The test suite covers parsing, validation, and end-to-end agent behavior.

In Part 6, we build the Technical Architect agent. The TA agent reads the RequirementDoc and UserStory list, does its own research on technology choices, and produces a TechnicalSpec: component diagrams, data models, API contracts, and an ArchitectureDecisionRecord for every major choice. We will also build the anti-corruption layer that translates UserStory objects into FunctionalRequirement objects — the first real test of the bounded context model we defined in Part 3.

The interesting challenge in Part 6: the TA agent needs to make real technical decisions. Not “it depends” — actual choices, with documented reasoning. We will see how to prompt an agent to be decisive without being reckless.


The full source code for Part 5 is at github.com/thuanpham582002/ai-software-team in the part-5 branch.

Export for reading

Comments