Every codebase has a moment where the architecture either holds or collapses. For multi-agent systems, that moment is not when you build the flashy agent that writes code or the one that talks to clients. It is when the second agent needs something from the first and there is no clean way to get it.

I have seen this happen three times. Someone builds a PO agent. Works beautifully. Builds a BA agent. Also works. Wires them together and everything falls apart — the PO returns a string, the BA expects a dict, there is no shared state contract, no retry logic, no memory, and no way to trace what went wrong.

In Part 3, we designed the architecture — bounded contexts, domain events, state machines. Now in Part 4, we write the code that makes all of that real. Every agent in Parts 5 through 10 inherits from the BaseAgent we build here, reads from the TeamState we define here, and accesses tools through the ToolRegistry we create here.


1. Project Structure

ai-software-team/
├── agents/
│   ├── __init__.py
│   ├── base.py              # BaseAgent ABC — this part
│   ├── po_agent.py           # Part 5
│   ├── ba_agent.py           # Part 5
│   ├── ta_agent.py           # Part 6
│   ├── tl_agent.py           # Part 7
│   ├── sse_agent.py          # Part 8
│   ├── qc_agent.py           # Part 9
│   ├── devops_agent.py       # Part 9
│   └── pm_agent.py           # Part 10
├── domain/
│   ├── __init__.py
│   ├── state.py              # TeamState TypedDict
│   └── events.py             # Domain events (Part 3)
├── infra/
│   ├── __init__.py
│   ├── memory.py             # SQLiteMemory
│   └── tools.py              # ToolRegistry
├── config/
│   ├── loader.py             # YAML config loader
│   └── agents.yaml           # Agent configuration
├── tests/
│   ├── test_base_agent.py
│   ├── test_memory.py
│   └── test_tool_registry.py
├── main.py
├── requirements.txt
└── .env.example
Project structure and shared infrastructure
Figure 1: main.py loads agents.yaml, which feeds into BaseAgent (with SQLiteMemory), TeamState, and ToolRegistry. All eight agents inherit from BaseAgent.

2. Requirements

# requirements.txt
langgraph==0.4.1
langchain-core==0.3.40
langchain-anthropic==0.3.12
anthropic>=0.42.0
pydantic>=2.9.0
pyyaml>=6.0.2
python-dotenv>=1.0.1
rich>=13.9.0
pytest>=8.3.0
pytest-asyncio>=0.24.0
# .env.example
ANTHROPIC_API_KEY=sk-ant-...
DEFAULT_MODEL=claude-sonnet-4-20250514
MEMORY_DB_PATH=./data/memory.db

3. Agent Configuration: agents.yaml

One early mistake: hardcoding agent config inside Python classes. The fix is YAML.

# config/agents.yaml
project:
  name: "AI Software Team"
  version: "1.0.0"
  default_model: "claude-sonnet-4-20250514"
  max_retries: 3
  retry_delay_seconds: 2

agents:
  po_agent:
    name: "Alex"
    role: "Senior Product Owner"
    model: "claude-sonnet-4-20250514"
    temperature: 0.3          # creative interpretation
    max_tokens: 4096
    tools: ["search_web", "create_document"]
    permissions: { can_write_code: false, can_deploy: false, can_approve_stories: true }

  ba_agent:
    name: "Jordan"
    role: "Senior Business Analyst"
    temperature: 0.2
    max_tokens: 8192
    tools: ["create_document"]
    permissions: { can_write_code: false }

  ta_agent:
    name: "Sam"
    role: "Technical Architect"
    model: "claude-opus-4-6"   # deeper reasoning for design
    temperature: 0.2
    max_tokens: 8192
    tools: ["create_document", "search_web"]
    permissions: { can_approve_architecture: true }

  tl_agent:
    name: "Morgan"
    role: "Tech Lead"
    model: "claude-opus-4-6"
    temperature: 0.1
    max_tokens: 4096
    tools: ["read_file", "create_document"]
    permissions: { can_approve_stories: true, can_approve_architecture: true }

  sse_agent:
    name: "Riley"
    role: "Senior Software Engineer"
    temperature: 0.1           # deterministic code gen
    max_tokens: 16384          # full source files
    tools: ["write_file", "read_file", "run_tests", "run_linter"]
    permissions: { can_write_code: true, can_deploy: false }

  qc_agent:
    name: "Casey"
    role: "QC Engineer"
    temperature: 0.1
    max_tokens: 8192
    tools: ["read_file", "run_tests", "create_document"]
    permissions: { can_write_code: false }

  devops_agent:
    name: "Drew"
    role: "DevOps Engineer"
    temperature: 0.1
    max_tokens: 4096
    tools: ["write_file", "read_file", "create_document"]
    permissions: { can_write_code: true, can_deploy: true }

  pm_agent:
    name: "Taylor"
    role: "Project Manager"
    temperature: 0.3
    max_tokens: 4096
    tools: ["create_document"]
    permissions: { can_write_code: false, can_deploy: false }

Temperature varies by role: SSE/QC use 0.1 for determinism, PO uses 0.3 for creative interpretation. Tools are whitelisted and enforced at runtime by the ToolRegistry.

The config loader merges per-agent settings with project defaults:

# config/loader.py
from pathlib import Path
from typing import Any
import yaml

def load_config(config_path: str = "config/agents.yaml") -> dict[str, Any]:
    path = Path(config_path)
    if not path.exists():
        raise FileNotFoundError(f"Config not found: {config_path}")
    with open(path) as f:
        config = yaml.safe_load(f)
    # Validate all 8 agents present, each with name + role
    required = ["po_agent","ba_agent","ta_agent","tl_agent",
                "sse_agent","qc_agent","devops_agent","pm_agent"]
    missing = [a for a in required if a not in config.get("agents", {})]
    if missing:
        raise ValueError(f"Missing agent configs: {missing}")
    return config

def get_agent_config(config: dict[str, Any], agent_id: str) -> dict[str, Any]:
    agent_cfg = config["agents"][agent_id]
    project_cfg = config.get("project", {})
    return {
        "agent_id": agent_id,
        "name": agent_cfg["name"], "role": agent_cfg["role"],
        "model": agent_cfg.get("model", project_cfg.get("default_model")),
        "temperature": agent_cfg.get("temperature", 0.2),
        "max_tokens": agent_cfg.get("max_tokens", 4096),
        "tools": agent_cfg.get("tools", []),
        "permissions": agent_cfg.get("permissions", {}),
        "max_retries": project_cfg.get("max_retries", 3),
        "retry_delay": project_cfg.get("retry_delay_seconds", 2),
    }

4. TeamState: The Shared Backbone

We defined TeamState in Part 3. Here is the production version — streamlined, with a factory that ensures every key has a default.

# domain/state.py
from typing import Literal, Optional, Annotated
from typing_extensions import TypedDict
import operator

class QualityGates(TypedDict):
    min_test_coverage: float
    max_cyclomatic_complexity: int
    lint_must_pass: bool
    type_check_must_pass: bool
    security_scan_must_pass: bool

class TeamState(TypedDict):
    task_id: str
    created_at: str
    updated_at: str
    phase: Literal["requirement","design","implementation","quality","deployment","done"]
    requirement_state: str
    schema_version: str
    # Requirement
    raw_brief: str
    clarified_requirements: Optional[dict]
    po_clarifying_questions: Optional[str]
    clarification_answers: Optional[str]
    user_stories: list[dict]
    stories_human_approved: bool
    rejection_reason: Optional[str]
    # Design
    technical_spec: Optional[dict]
    architecture_decisions: list[dict]
    design_review_notes: list[str]
    design_approved_by: Optional[str]
    # Implementation
    implementation_plan: list[dict]
    code_artifacts: list[dict]
    implementation_attempts: int
    current_task_id: Optional[str]
    self_test_results: Optional[dict]
    # Quality
    test_cases: list[dict]
    quality_gates: QualityGates
    quality_test_results: Optional[dict]
    quality_score: Optional[float]
    quality_gate_passed: bool
    qc_notes: list[str]
    # Deployment
    cicd_config: Optional[dict]
    deployment_status: Optional[dict]
    deployment_human_approved: bool
    deployment_notes: list[str]
    # Coordination (append-only via LangGraph reducers)
    agent_messages: Annotated[list[dict], operator.add]
    event_outbox: list[dict]
    current_agent: str
    blockers: list[dict]
    human_feedback: Optional[str]
    human_checkpoint_pending: bool
    state_transitions: Annotated[list[dict], operator.add]
    error_log: Annotated[list[dict], operator.add]

def create_initial_state(task_id: str, raw_brief: str) -> TeamState:
    """Factory — the only place TeamState should be created from scratch."""
    from datetime import datetime, timezone
    now = datetime.now(timezone.utc).isoformat()
    return TeamState(
        task_id=task_id, created_at=now, updated_at=now,
        phase="requirement", requirement_state="received", schema_version="1.0.0",
        raw_brief=raw_brief, clarified_requirements=None,
        po_clarifying_questions=None, clarification_answers=None,
        user_stories=[], stories_human_approved=False, rejection_reason=None,
        technical_spec=None, architecture_decisions=[], design_review_notes=[],
        design_approved_by=None, implementation_plan=[], code_artifacts=[],
        implementation_attempts=0, current_task_id=None, self_test_results=None,
        test_cases=[], quality_gates=QualityGates(
            min_test_coverage=0.80, max_cyclomatic_complexity=10,
            lint_must_pass=True, type_check_must_pass=True,
            security_scan_must_pass=False),
        quality_test_results=None, quality_score=None, quality_gate_passed=False,
        qc_notes=[], cicd_config=None, deployment_status=None,
        deployment_human_approved=False, deployment_notes=[],
        agent_messages=[], event_outbox=[], current_agent="", blockers=[],
        human_feedback=None, human_checkpoint_pending=False,
        state_transitions=[], error_log=[])

Without the factory, you end up with agents checking if "user_stories" in state everywhere. With it, every key always exists.


5. The BaseAgent: Abstract Base Class

This is the core of the system. Every agent inherits from BaseAgent. It handles LLM calls, conversation history, retry logic, and state updates. Individual agents implement two methods: _prepare_prompt and _parse_output.

# agents/base.py
import json
import time
import logging
from abc import ABC, abstractmethod
from datetime import datetime, timezone
from typing import Optional

from anthropic import Anthropic
from config.loader import load_config, get_agent_config
from domain.state import TeamState
from infra.memory import SQLiteMemory
from infra.tools import ToolRegistry

logger = logging.getLogger(__name__)


class BaseAgent(ABC):
    """
    Abstract base class for all agents.

    Subclasses MUST implement:
        _prepare_prompt(state) -> str
        _parse_output(response, state) -> TeamState
    Subclasses MAY override:
        _get_system_prompt() -> str
    """

    AGENT_ID: str = ""
    AGENT_NAME: str = ""
    AGENT_ROLE: str = ""
    SYSTEM_PROMPT: str = ""

    def __init__(
        self,
        config_path: str = "config/agents.yaml",
        memory: Optional[SQLiteMemory] = None,
        tool_registry: Optional[ToolRegistry] = None,
    ):
        full_config = load_config(config_path)
        self.config = get_agent_config(full_config, self.AGENT_ID)
        self.model = self.config["model"]
        self.temperature = self.config["temperature"]
        self.max_tokens = self.config["max_tokens"]
        self.max_retries = self.config["max_retries"]
        self.retry_delay = self.config["retry_delay"]
        self.client = Anthropic()
        self.memory = memory or SQLiteMemory()
        self.tool_registry = tool_registry or ToolRegistry()

    def run(self, state: TeamState) -> TeamState:
        """
        Execute the agent. This is the method LangGraph calls.
        1. Prepare prompt  2. Call LLM with retries
        3. Parse output    4. Save to memory
        """
        state = {
            **state,
            "current_agent": self.AGENT_ID,
            "updated_at": datetime.now(timezone.utc).isoformat(),
        }
        prompt = self._prepare_prompt(state)
        system = self._get_system_prompt()
        history = self.memory.get_history(state["task_id"], self.AGENT_ID)

        response = self._call_llm_with_retry(system, prompt, history, state)

        self.memory.save_interaction(
            task_id=state["task_id"], agent_id=self.AGENT_ID,
            prompt=prompt, response=response,
        )
        updated_state = self._parse_output(response, state)
        updated_state["state_transitions"] = [{
            "agent_id": self.AGENT_ID,
            "agent_name": self.AGENT_NAME,
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "phase": state["phase"],
        }]
        return updated_state

    @abstractmethod
    def _prepare_prompt(self, state: TeamState) -> str:
        """Build the user prompt from current state."""
        ...

    @abstractmethod
    def _parse_output(self, response: str, state: TeamState) -> TeamState:
        """Parse LLM response into state updates."""
        ...

    def _get_system_prompt(self) -> str:
        return self.SYSTEM_PROMPT

    def _call_llm_with_retry(
        self, system: str, prompt: str,
        history: list[dict], state: TeamState,
    ) -> str:
        messages = self._build_messages(history, prompt)
        for attempt in range(1, self.max_retries + 1):
            try:
                response = self.client.messages.create(
                    model=self.model, max_tokens=self.max_tokens,
                    temperature=self.temperature,
                    system=system, messages=messages,
                )
                text = response.content[0].text
                logger.info(
                    "%s response: %d chars, %d+%d tokens",
                    self.AGENT_ID, len(text),
                    response.usage.input_tokens,
                    response.usage.output_tokens,
                )
                return text
            except Exception as e:
                logger.warning(
                    "%s retry %d/%d: %s",
                    self.AGENT_ID, attempt, self.max_retries, e,
                )
                if attempt == self.max_retries:
                    raise
                time.sleep(self.retry_delay * attempt)
        raise RuntimeError("Unreachable")

    def _build_messages(
        self, history: list[dict], current_prompt: str
    ) -> list[dict]:
        messages = []
        for entry in history[-10:]:
            messages.append({"role": "user", "content": entry["prompt"]})
            messages.append({"role": "assistant", "content": entry["response"]})
        messages.append({"role": "user", "content": current_prompt})
        return messages

    def _emit_event(
        self, state: TeamState, event_name: str, payload: dict,
    ) -> TeamState:
        event = {
            "event_name": event_name,
            "source_agent": self.AGENT_ID,
            "payload": payload,
            "timestamp": datetime.now(timezone.utc).isoformat(),
        }
        return {**state, "event_outbox": state.get("event_outbox", []) + [event]}

    def _parse_json_from_response(self, response: str) -> dict:
        """Extract JSON from LLM output that may include fences or preamble."""
        text = response.strip()
        try:
            return json.loads(text)
        except json.JSONDecodeError:
            pass
        if "```json" in text:
            start = text.index("```json") + 7
            end = text.index("```", start)
            return json.loads(text[start:end].strip())
        if "```" in text:
            start = text.index("```") + 3
            end = text.index("```", start)
            return json.loads(text[start:end].strip())
        first = text.find("{")
        last = text.rfind("}")
        if first != -1 and last != -1:
            return json.loads(text[first:last + 1])
        raise ValueError(f"No JSON found in: {text[:200]}...")

run() is not abstract. The flow — prepare, call, parse, save — is identical for every agent. Only the prompt and parsing logic change. This is the Template Method pattern.

_parse_json_from_response handles messy output. No matter how firmly you instruct an LLM to return only JSON, it will sometimes add preamble or code fences. This method handles all those cases so individual agents don’t have to.

History capped at 10 interactions. Enough for continuity without drowning the context window.


6. SQLiteMemory

Agents need memory across executions. When the SSE agent fails and the graph restarts from a checkpoint, it should know what it tried before.

# infra/memory.py
import sqlite3
import json
import logging
from datetime import datetime, timezone
from pathlib import Path
from typing import Optional

logger = logging.getLogger(__name__)


class SQLiteMemory:
    def __init__(self, db_path: str = "data/memory.db"):
        self.db_path = db_path
        Path(db_path).parent.mkdir(parents=True, exist_ok=True)
        self.conn = sqlite3.connect(db_path)
        self.conn.row_factory = sqlite3.Row
        self._create_tables()

    def _create_tables(self) -> None:
        self.conn.executescript("""
            CREATE TABLE IF NOT EXISTS interactions (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                task_id TEXT NOT NULL,
                agent_id TEXT NOT NULL,
                prompt TEXT NOT NULL,
                response TEXT NOT NULL,
                created_at TEXT NOT NULL
            );
            CREATE TABLE IF NOT EXISTS domain_events (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                task_id TEXT NOT NULL,
                event_name TEXT NOT NULL,
                source_agent TEXT NOT NULL,
                payload TEXT NOT NULL,
                created_at TEXT NOT NULL
            );
            CREATE INDEX IF NOT EXISTS idx_interactions_task
                ON interactions(task_id, agent_id);
            CREATE INDEX IF NOT EXISTS idx_events_task
                ON domain_events(task_id);
        """)
        self.conn.commit()

    def save_interaction(
        self, task_id: str, agent_id: str,
        prompt: str, response: str,
    ) -> None:
        self.conn.execute(
            """INSERT INTO interactions
               (task_id, agent_id, prompt, response, created_at)
               VALUES (?, ?, ?, ?, ?)""",
            (task_id, agent_id, prompt, response,
             datetime.now(timezone.utc).isoformat()),
        )
        self.conn.commit()

    def get_history(
        self, task_id: str, agent_id: str, limit: int = 20,
    ) -> list[dict]:
        rows = self.conn.execute(
            """SELECT prompt, response, created_at
               FROM interactions
               WHERE task_id = ? AND agent_id = ?
               ORDER BY id DESC LIMIT ?""",
            (task_id, agent_id, limit),
        ).fetchall()
        return [dict(r) for r in reversed(rows)]

    def save_event(
        self, task_id: str, event_name: str,
        source_agent: str, payload: dict,
    ) -> None:
        self.conn.execute(
            """INSERT INTO domain_events
               (task_id, event_name, source_agent, payload, created_at)
               VALUES (?, ?, ?, ?, ?)""",
            (task_id, event_name, source_agent, json.dumps(payload),
             datetime.now(timezone.utc).isoformat()),
        )
        self.conn.commit()

    def get_events(
        self, task_id: str, event_name: Optional[str] = None,
    ) -> list[dict]:
        if event_name:
            rows = self.conn.execute(
                "SELECT * FROM domain_events WHERE task_id = ? AND event_name = ? ORDER BY id",
                (task_id, event_name),
            ).fetchall()
        else:
            rows = self.conn.execute(
                "SELECT * FROM domain_events WHERE task_id = ? ORDER BY id",
                (task_id,),
            ).fetchall()
        return [{**dict(r), "payload": json.loads(r["payload"])} for r in rows]

    def close(self) -> None:
        self.conn.close()

7. ToolRegistry

Agents only access the tools listed in their config. The PO agent cannot write files. The SSE agent cannot deploy. This is enforced here.

# infra/tools.py
from typing import Any, Callable

ToolFn = Callable[..., str]

class ToolRegistry:
    def __init__(self):
        self._tools: dict[str, dict[str, Any]] = {}

    def register(self, name: str, fn: ToolFn, description: str, parameters: dict) -> None:
        self._tools[name] = {"name": name, "fn": fn, "description": description, "parameters": parameters}

    def get_tools_for(self, allowed: list[str]) -> list[dict[str, Any]]:
        return [self._tools[n] for n in allowed if n in self._tools]

    def execute(self, name: str, **kwargs) -> str:
        if name not in self._tools:
            raise KeyError(f"Tool '{name}' not registered")
        return self._tools[name]["fn"](**kwargs)

    def list_tools(self) -> list[str]:
        return list(self._tools.keys())

def create_default_registry() -> ToolRegistry:
    """Register all 6 standard tools. Called once at startup."""
    r = ToolRegistry()
    r.register("search_web", lambda query: f"[results for: {query}]", "Web search", {"type":"object","properties":{"query":{"type":"string"}},"required":["query"]})
    r.register("create_document", _create_document, "Save a document", {"type":"object","properties":{"title":{"type":"string"},"content":{"type":"string"}},"required":["title","content"]})
    r.register("write_file", _write_file, "Write a file", {"type":"object","properties":{"file_path":{"type":"string"},"content":{"type":"string"}},"required":["file_path","content"]})
    r.register("read_file", _read_file, "Read a file", {"type":"object","properties":{"file_path":{"type":"string"}},"required":["file_path"]})
    r.register("run_tests", lambda test_path: '{"total":0,"passed":0,"failed":0}', "Run tests", {"type":"object","properties":{"test_path":{"type":"string"}},"required":["test_path"]})
    r.register("run_linter", lambda file_path: '{"errors":0,"warnings":0,"passed":true}', "Run linter", {"type":"object","properties":{"file_path":{"type":"string"}},"required":["file_path"]})
    return r

# Stubs — real implementations replace these in Part 8
def _create_document(title: str, content: str) -> str:
    from pathlib import Path
    path = Path(f"output/docs/{title.replace(' ','_').lower()}.md")
    path.parent.mkdir(parents=True, exist_ok=True)
    path.write_text(content); return f"Saved: {path}"

def _write_file(file_path: str, content: str) -> str:
    from pathlib import Path
    path = Path(f"output/{file_path}")
    path.parent.mkdir(parents=True, exist_ok=True)
    path.write_text(content); return f"Written: {path}"

def _read_file(file_path: str) -> str:
    from pathlib import Path
    path = Path(f"output/{file_path}")
    return path.read_text() if path.exists() else f"Not found: {path}"

8. Entry Point: main.py

# main.py
import uuid, logging
from dotenv import load_dotenv
from rich.console import Console
from rich.panel import Panel
from config.loader import load_config
from domain.state import create_initial_state
from infra.memory import SQLiteMemory
from infra.tools import create_default_registry

load_dotenv()
logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(name)s] %(levelname)s: %(message)s")
console = Console()

def main():
    console.print(Panel("[bold blue]AI Software Team[/bold blue]", border_style="blue"))
    config = load_config()
    memory = SQLiteMemory()
    tools = create_default_registry()
    console.print(f"[green]Ready:[/green] {len(config['agents'])} agents, {len(tools.list_tools())} tools")

    brief = input("\nProject brief (or 'demo'): ").strip()
    if brief.lower() == "demo":
        brief = "Build a task management API with auth, CRUD, team collaboration, dashboard. Python/FastAPI, PostgreSQL, AWS."

    task_id = str(uuid.uuid4())[:8]
    state = create_initial_state(task_id=task_id, raw_brief=brief)
    console.print(f"[green]Task {task_id}:[/green] {len(state)} fields | phase={state['phase']}")
    console.print("[dim]Graph construction comes in Parts 5-10. Run tests: pytest tests/ -v[/dim]")
    memory.close()

if __name__ == "__main__":
    main()

9. Testing the Foundation

These components are called by every agent. A bug here propagates everywhere.

# tests/test_base_agent.py
import pytest
from domain.state import create_initial_state
from infra.memory import SQLiteMemory
from infra.tools import ToolRegistry, create_default_registry


class TestCreateInitialState:
    def test_all_keys_present(self):
        state = create_initial_state("t1", "Build a todo app")
        assert state["task_id"] == "t1"
        assert state["phase"] == "requirement"
        assert state["requirement_state"] == "received"
        assert state["user_stories"] == []
        assert state["implementation_attempts"] == 0
        assert state["quality_gate_passed"] is False

    def test_quality_gates_defaults(self):
        state = create_initial_state("t1", "brief")
        assert state["quality_gates"]["min_test_coverage"] == 0.80
        assert state["quality_gates"]["lint_must_pass"] is True


class TestSQLiteMemory:
    @pytest.fixture
    def memory(self, tmp_path):
        mem = SQLiteMemory(db_path=str(tmp_path / "test.db"))
        yield mem
        mem.close()

    def test_save_and_retrieve(self, memory):
        memory.save_interaction("t1", "po_agent", "prompt", "response")
        history = memory.get_history("t1", "po_agent")
        assert len(history) == 1
        assert history[0]["prompt"] == "prompt"

    def test_filtered_by_agent(self, memory):
        memory.save_interaction("t1", "po_agent", "p1", "r1")
        memory.save_interaction("t1", "ba_agent", "p2", "r2")
        assert len(memory.get_history("t1", "po_agent")) == 1
        assert len(memory.get_history("t1", "ba_agent")) == 1

    def test_events(self, memory):
        memory.save_event("t1", "StoriesCreated", "ba_agent", {"count": 5})
        events = memory.get_events("t1")
        assert len(events) == 1
        assert events[0]["payload"]["count"] == 5


class TestToolRegistry:
    def test_filter_by_allowed(self):
        reg = ToolRegistry()
        reg.register("a", lambda: "a", "A", {})
        reg.register("b", lambda: "b", "B", {})
        reg.register("c", lambda: "c", "C", {})
        tools = reg.get_tools_for(["a", "c"])
        assert [t["name"] for t in tools] == ["a", "c"]

    def test_execute(self):
        reg = ToolRegistry()
        reg.register("add", lambda x, y: str(x + y), "Add", {})
        assert reg.execute("add", x=2, y=3) == "5"

    def test_default_registry(self):
        reg = create_default_registry()
        names = reg.list_tools()
        assert "search_web" in names
        assert "write_file" in names
        assert "run_tests" in names

Run: pytest tests/ -v


10. How It All Connects

The flow when Part 5’s PO agent runs: main.py loads config, creates infrastructure. User provides a brief, create_initial_state() builds TeamState. LangGraph invokes po_agent.run(state). BaseAgent.run() calls _prepare_prompt (PO-specific), then _call_llm_with_retry, then _parse_output (PO-specific). The interaction is saved to SQLiteMemory. LangGraph receives updated state, routes to next node. Every agent follows this exact flow.


What We Built

ComponentFilePurpose
agents.yamlconfig/agents.yamlConfiguration for all 8 agents
BaseAgentagents/base.pyABC with LLM calls, retry, memory
TeamStatedomain/state.pyShared state TypedDict with factory
SQLiteMemoryinfra/memory.pyPersistent conversation and event store
ToolRegistryinfra/tools.pyPer-agent tool isolation

These five components produce no visible output. No requirement documents, no user stories, no code. They are pure infrastructure. And they are why the next six parts can move quickly.

In Part 5, we build the first two domain agents: the Product Owner (Alex) and the Business Analyst (Jordan). They take a raw client brief and transform it into structured requirements and user stories. The system starts producing real artifacts.

The foundation is laid. Time to build on it.

Export for reading

Comments