QC và TA Agents: Định Nghĩa Chất Lượng và Thiết Kế Kỹ Thuật (Phần 6/12)

Có một sai lầm mà hầu như mọi đội engineering đều mắc phải ít nhất một lần. Bạn viết code trước, gửi cho QA, rồi phát hiện ra tester có một mental model hoàn toàn khác về ý nghĩa của “xong”. Developer nghĩ “user có thể đăng nhập” nghĩa là một form hoạt động được. Tester nghĩ nó nghĩa là SSO, xác thực hai yếu tố, hết hạn session, và xử lý graceful khi identity provider bị sập. Hai tuần làm lại theo sau, thường vào thời điểm tệ nhất có thể.

Giải pháp không phải là một quy trình QA tốt hơn. Giải pháp là một trình tự tốt hơn. Chất lượng phải được định nghĩa trước khi code tồn tại, không phải sau.

Và trong khi QC đang vạch ra những ranh giới chất lượng đó, Technical Architect của bạn nên đưa ra những quyết định công nghệ khó — không phải như một suy nghĩ muộn, không phải trong pull request, mà là từ đầu, với lý luận rõ ràng, trước khi một dòng implementation nào được viết.

Trong Phần 5, chúng ta đã xây dựng Business Analyst agent. BA tạo ra user stories: có cấu trúc, được ưu tiên, và giàu acceptance criteria. Bây giờ chúng ta có hai câu hỏi cần trả lời đồng thời. QC agent hỏi: “Thành công trông như thế nào, và thất bại trông như thế nào?” TA agent hỏi: “Những lựa chọn công nghệ và ràng buộc thiết kế nào chi phối việc implementation?” Cả hai câu hỏi đều độc lập với nhau. Cả hai cần được trả lời trước khi SSE agent chạm vào bàn phím.

Sự độc lập đó chính là insight then chốt của bài viết này. QC và TA chạy song song.

1. Tư Duy Test-First, Design-First

Tôi muốn nói một điều nghe có vẻ hiển nhiên nhưng bị bỏ qua rộng rãi trong thực tế: thời điểm tốt nhất để viết test case là khi bạn không có một dòng code nào để nhìn vào.

Khi code tồn tại, người viết test vô thức test những gì code làm, chứ không phải những gì requirements nói nó phải làm. Bạn tìm thấy happy path và viết test cho nó. Bạn thấy cách xử lý lỗi mà developer chọn và viết test cho cái đó. Bạn bỏ lỡ cách xử lý lỗi mà developer không chọn — trường hợp mà họ quên tưởng tượng. Đây gọi là confirmation bias trong việc viết test, và nó là vấn đề phổ biến.

QC agent tránh được điều này hoàn toàn vì nó không có code để nhìn vào. Nó đọc user stories và hỏi: “Người dùng sẽ trải nghiệm gì nếu mọi thứ hoàn hảo? Họ sẽ trải nghiệm gì nếu mọi thứ tệ đi? Các điều kiện biên là gì? Điều gì xảy ra khi chịu tải?” Các câu trả lời trở thành test cases — chính thức, được đánh số, có cấu trúc — trước khi implementation bắt đầu.

Đây là tinh thần của Behavior-Driven Development và test-driven development, nhưng áp dụng ở cấp độ kiến trúc team, không chỉ cấp độ developer.

Tại Sao Quyết Định Kiến Trúc Được Đưa Ra Sớm Tiết Kiệm 10x Thời Gian Refactor

Công việc của TA agent cũng được front-loaded vì cùng lý do: quyết định được đưa ra muộn thì đắt đỏ.

Ví dụ kinh điển là lựa chọn database. Nếu Technical Architect của bạn quyết định vào tuần thứ ba rằng PostgreSQL là lựa chọn đúng — sau khi SSE đã dành hai tuần thiết kế xung quanh document model của MongoDB — bạn có một cuộc viết lại toàn bộ data layer phía trước. Phổ biến hơn, các quyết định đến muộn tạo ra một khoản thuế ẩn: developer đưa ra quyết định cục bộ (tôi sẽ dùng thư viện này, tôi sẽ cấu trúc API này theo cách này) không nhất quán với những gì architect sẽ chọn. Những sự không nhất quán này tích lũy. Chúng trở thành “technical debt” khiến mỗi tính năng tiếp theo khó thêm gấp đôi.

Architecture Decision Records (ADRs) là artifact mà TA agent tạo ra. Một ADR là một tài liệu ngắn ghi lại một quyết định, bối cảnh thúc đẩy nó, các phương án thay thế đã xem xét, và hệ quả của lựa chọn đó. Nó không phải là một tài liệu thiết kế giải thích mọi thứ. Nó là một bản ghi quyết định — một bức ảnh chụp lý luận tại một thời điểm. ADRs có giá trị vì chúng trả lời câu hỏi mà mọi thành viên mới cuối cùng đều hỏi: “Tại sao chúng ta xây dựng nó theo cách này?”

Fan-Out Pattern Song Song

Đây là insight cấu trúc then chốt. Hãy nhìn hai câu hỏi:

QC: “Thành công trông như thế nào cho mỗi user story?”
TA: “Technology stack, data model, và kiến trúc là gì?”

Hai câu hỏi này không phụ thuộc vào nhau. QC agent không cần biết technology stack để viết test cases. TA agent không cần biết QC sẽ test những edge cases nào để đưa ra quyết định kiến trúc. Cả hai đều cần cùng một input — user stories từ BA — và chúng tạo ra các output khác nhau mà cả hai sẽ đều cần thiết cho các agent đến sau.

Đây là một parallel fan-out pattern. Một output upstream duy nhất fan out đến nhiều process downstream chạy đồng thời, sau đó kết quả của chúng merge lại trước khi giai đoạn tiếp theo bắt đầu.

Trong LangGraph, đây là một khái niệm first-class. Graph mô hình hóa rõ ràng node nào có thể chạy song song và node nào phải chờ các predecessor cụ thể. Phương án tuần tự — chạy QC, chờ, rồi chạy TA — sẽ mất gấp đôi thời gian mà không có lợi ích gì. Đây là tối ưu hóa thời gian quan trọng đầu tiên trong pipeline của chúng ta, và nó sẽ không phải là cuối cùng.

Parallel fan-out pattern: BA output feeds QC and TA simultaneously, both merge into combined state

2. Thiết Kế QC Agent

Sam là QC Engineer của chúng ta. Tám năm kinh nghiệm nghĩa là Sam đã thấy mọi failure mode mà ships-to-production mang lại. Sam biết rằng “nó chạy trên máy tôi” không phải là kết quả test. Sam biết rằng “chúng ta sẽ thêm xử lý lỗi sau” là một lời nói dối.

Công việc của Sam trong pipeline của chúng ta là cụ thể và có giới hạn: chuyển đổi user stories thành test cases và quality gates. Sam không chạy tests. Sam không viết test code. Sam không cấu hình CI pipelines. Sam định nghĩa cái gì phải được test và tiêu chí đạt/trượt là gì. SSE agent sẽ viết test code thực tế sau. DevOps sẽ cấu hình pipeline. Sam định nghĩa hợp đồng.

Định Nghĩa Vai Trò

Vai trò chính thức của QC agent:

Input: các đối tượng UserStory từ TeamState
Output: TestSuite chứa các đối tượng TestCase
Tools: read_document, create_test_file
KHÔNG: chạy tests, viết implementation code, cấu hình CI

Sự phân biệt giữa “định nghĩa tests” và “chạy tests” rất quan trọng cho thiết kế agent. Nếu Sam cố chạy tests, Sam sẽ cần truy cập vào một môi trường đang chạy, một build system, và live code — không cái nào tồn tại ở giai đoạn này của pipeline. Bằng cách giới hạn Sam chỉ ở việc định nghĩa test, chúng ta giữ phạm vi của agent sạch sẽ và output của nó thuần túy logic thay vì phụ thuộc môi trường.

Những Gì Sam Tạo Ra Cho Mỗi Story

Với mỗi user story, Sam tạo ra:

Happy path test cases — luồng thành công kỳ vọng, từ đầu đến cuối
Edge cases — điều kiện biên: input rỗng, độ dài tối đa, giá trị zero, danh sách một phần tử, concurrent users
Error scenarios — điều gì xảy ra khi các dependency bên ngoài thất bại: network timeout, database không khả dụng, invalid authentication token, malformed request body
Performance acceptance criteria — cụ thể, đo lường được: thời gian phản hồi khi chịu tải, yêu cầu throughput, giới hạn memory

Output không phải là prose. Nó là dữ liệu có cấu trúc — các đối tượng TestCase với các bước Given/When/Then chính thức, kết quả kỳ vọng, và metadata liên kết mỗi test case trở lại story mà nó validate.

3. Implementation Đầy Đủ Của QCAgent

from typing import Optional
from agents.base import BaseAgent
from state import TeamState
from schemas.qc import TestSuite, TestCase, TestType


class QCAgent(BaseAgent):
    QC_SYSTEM_PROMPT = """You are Sam, a QC Engineer with 8 years of experience.
    Your job is to think about what can go wrong BEFORE the code is written.

    For every user story, you create:
    1. Happy path test cases — the successful flow a real user would follow
    2. Edge cases — empty input, max length strings, zero values,
       concurrent users, single-item collections, unicode characters
    3. Error scenarios — network failure, invalid data, auth failures,
       timeout conditions, third-party service outages
    4. Performance acceptance criteria — response time < 200ms at p95,
       throughput targets, memory ceiling

    You are NOT writing test code. You are defining test contracts.
    Every test case must be actionable by a developer who has not read
    the requirements — it must stand alone as a specification.

    Your test IDs follow the format TC-XXX where XXX is a zero-padded
    three-digit integer. Start from TC-001 for each new suite.

    Always output valid JSON matching the TestSuite schema.
    """

    def _prepare_prompt(self, state: TeamState) -> str:
        stories = state["user_stories"]
        formatted = self._format_stories(stories)

        return f"""Create comprehensive test cases for these user stories:

{formatted}

For each story produce test cases covering:

HAPPY PATH:
- Normal user flow from start to finish
- All required fields present and valid
- Expected system response and state change

EDGE CASES (at minimum):
- Empty / null / whitespace-only inputs
- Maximum length inputs (assume 255 chars for strings)
- Numeric boundary values (0, 1, max int, negative)
- Concurrent requests (at least 2 simultaneous users)
- Unicode and special character inputs

ERROR SCENARIOS:
- Missing required fields
- Invalid data types
- Authentication / authorization failure
- Downstream service unavailable (500 / timeout)
- Rate limit exceeded

PERFORMANCE:
- Define p95 response time ceiling
- Define acceptable throughput (requests/second)
- Define memory growth ceiling if applicable

For each test case provide:
- test_id: TC-001, TC-002, ...
- story_id: matches the user story ID
- title: short descriptive name
- test_type: unit | integration | e2e | performance
- given: list of preconditions
- when: list of actions / inputs
- then: list of expected outcomes
- expected_result: single-sentence summary
- priority: critical | high | medium | low

Return a JSON object matching this structure:
{{
  "suite_id": "TS-001",
  "story_coverage": ["US-001", "US-002", ...],
  "test_cases": [ ... ],
  "quality_gates": {{
    "min_unit_coverage": 80,
    "max_p95_response_ms": 200,
    "max_error_rate_percent": 0.1
  }}
}}
"""

    def _format_stories(self, stories: list) -> str:
        lines = []
        for story in stories:
            lines.append(f"Story {story.story_id}: {story.title}")
            lines.append(f"  As a {story.as_a}")
            lines.append(f"  I want to {story.i_want}")
            lines.append(f"  So that {story.so_that}")
            lines.append(f"  Acceptance criteria:")
            for criterion in story.acceptance_criteria:
                lines.append(f"    - {criterion}")
            lines.append(f"  Priority: {story.priority}")
            lines.append("")
        return "\n".join(lines)

    async def run(self, state: TeamState) -> TeamState:
        prompt = self._prepare_prompt(state)
        response = await self.llm.ainvoke(
            [
                {"role": "system", "content": self.QC_SYSTEM_PROMPT},
                {"role": "user", "content": prompt},
            ],
            response_format={"type": "json_object"},
        )
        raw = response.content
        suite = TestSuite.model_validate_json(raw)

        return {
            **state,
            "test_suite": suite,
            "qc_complete": True,
        }

Helper _format_stories là cùng pattern đã dùng trong BA agent ở Phần 5: chuyển đổi các đối tượng Pydantic có cấu trúc thành prose dễ đọc, cung cấp cho LLM đủ ngữ cảnh mà không làm nó choáng ngợp bởi cú pháp JSON. LLM đọc tiếng Anh; hãy cho nó tiếng Anh.

Tham số response_format={"type": "json_object"} là chế độ structured output của OpenAI. Nó chỉ thị model chỉ trả về JSON hợp lệ. Kết hợp với TestSuite.model_validate_json(raw), điều này cho chúng ta một đảm bảo cứng: nếu output của model không thể parse thành TestSuite, agent raise một validation error mà orchestrator có thể retry hoặc escalate.

4. Schemas TestCase và TestSuite

from enum import Enum
from typing import Optional
from pydantic import BaseModel, Field


class TestType(str, Enum):
    UNIT = "unit"
    INTEGRATION = "integration"
    E2E = "e2e"
    PERFORMANCE = "performance"


class TestPriority(str, Enum):
    CRITICAL = "critical"
    HIGH = "high"
    MEDIUM = "medium"
    LOW = "low"


class TestCase(BaseModel):
    test_id: str = Field(
        description="Unique identifier, format TC-001",
        pattern=r"^TC-\d{3}$",
    )
    story_id: str = Field(
        description="References the user story this test validates",
    )
    title: str = Field(
        description="Short descriptive name for the test case",
        max_length=120,
    )
    test_type: TestType
    given: list[str] = Field(
        description="Preconditions that must be true before the test runs",
        min_length=1,
    )
    when: list[str] = Field(
        description="Actions or inputs that trigger the behavior under test",
        min_length=1,
    )
    then: list[str] = Field(
        description="Observable outcomes that must be true after the action",
        min_length=1,
    )
    expected_result: str = Field(
        description="Single-sentence summary of the expected outcome",
        max_length=200,
    )
    priority: TestPriority
    tags: list[str] = Field(
        default_factory=list,
        description="Optional labels: happy-path, edge-case, error-scenario, performance",
    )


class QualityGates(BaseModel):
    min_unit_coverage: int = Field(
        default=80,
        ge=0,
        le=100,
        description="Minimum percentage of lines/branches covered by unit tests",
    )
    max_p95_response_ms: int = Field(
        default=200,
        gt=0,
        description="Maximum acceptable p95 response time in milliseconds",
    )
    max_error_rate_percent: float = Field(
        default=0.1,
        ge=0.0,
        le=100.0,
        description="Maximum acceptable error rate percentage under normal load",
    )
    min_test_pass_rate: float = Field(
        default=100.0,
        ge=0.0,
        le=100.0,
        description="Minimum percentage of tests that must pass to gate deployment",
    )


class TestSuite(BaseModel):
    suite_id: str = Field(
        description="Unique identifier for this test suite, format TS-001",
    )
    story_coverage: list[str] = Field(
        description="List of story IDs covered by this test suite",
    )
    test_cases: list[TestCase] = Field(
        description="All test cases in this suite",
        min_length=1,
    )
    quality_gates: QualityGates = Field(
        default_factory=QualityGates,
        description="Measurable thresholds that gate deployment",
    )
    total_cases: int = Field(default=0)
    critical_cases: int = Field(default=0)

    def model_post_init(self, __context) -> None:
        self.total_cases = len(self.test_cases)
        self.critical_cases = sum(
            1 for tc in self.test_cases if tc.priority == TestPriority.CRITICAL
        )

Một vài quyết định thiết kế đáng chú ý:

Validator pattern=r"^TC-\d{3}$" trên test_id bắt buộc quy ước đặt tên ở tầng dữ liệu. Nếu LLM hallucinate một ID như “TEST_1” hoặc “tc001”, Pydantic reject nó ngay lập tức. Agent framework bắt validation error và có thể retry với thông báo lỗi cụ thể hơn trong prompt.

QualityGates là một model first-class, không phải một dictionary lỏng lẻo. Những con số này sẽ được DevOps agent tham chiếu khi cấu hình CI/CD pipeline gates trong Phần 10. Có chúng dưới dạng typed model nghĩa là DevOps agent có thể đọc suite.quality_gates.min_unit_coverage với full type safety thay vì phải parse strings.

model_post_init tính toán các trường dẫn xuất sau khi validation. Điều này giữ cho LLM khỏi phải đếm output của chính nó (việc mà nó làm không đáng tin cậy) trong khi đảm bảo TeamState luôn có số liệu chính xác.

5. Thiết Kế TA Agent

Morgan là Technical Architect của chúng ta. Đặc điểm nổi bật của Morgan — và đặc điểm nổi bật mà chúng ta mã hóa trong system prompt — là sở thích dành cho sự đơn giản. “Boring technology hơn clever technology” không phải là câu đùa hay sự phủ nhận đổi mới. Đó là một triết lý engineering đúc rút từ kinh nghiệm: hệ thống bạn ship là hệ thống bạn maintain. Hệ thống xây dựng trên những lựa chọn exotic đòi hỏi chuyên gia về những lựa chọn exotic. Hệ thống xây dựng trên PostgreSQL, HTTP, và boring file I/O đòi hỏi senior developers — một talent pool lớn hơn nhiều.

Vai trò chính thức của Morgan:

Input: clarified_requirements và user_stories từ TeamState
Output: TechnicalSpec chứa các đối tượng ArchitectureDecision, data models, API contracts, và implementation phases
KHÔNG: viết implementation code, định nghĩa test cases, cấu hình CI/CD

Output của TA agent là design envelope mà trong đó tất cả các agent tiếp theo hoạt động. SSE sẽ implement theo tech stack mà Morgan chọn. DevOps agent sẽ cấu hình infrastructure cho kiến trúc mà Morgan chỉ định. TL agent sẽ enforce các coding standards mà Morgan định nghĩa trong ADRs. Làm đúng việc này từ đầu là không tùy chọn.

Những Gì Morgan Tạo Ra

Technology stack với justification — không chỉ “chúng ta sẽ dùng FastAPI” mà “chúng ta dùng FastAPI vì team có expertise Python, async model phù hợp với I/O pattern của chúng ta, và việc tự động tạo OpenAPI giảm overhead về documentation so với Flask”
Data models — các entity chính với thuộc tính và relationships, được document dưới dạng entity definitions thay vì SQL DDL (SSE sẽ viết DDL thực tế)
API contracts — các endpoint chính với HTTP methods, request/response shapes, và status codes
Architecture Decision Records — một ADR cho mỗi lựa chọn quan trọng, với context, decision, alternatives considered, và consequences
Implementation phases — cái gì được xây trước theo thứ tự nào, với lý luận về dependency
Technical risks và mitigations — điều gì có thể sai về mặt kiến trúc, và kế hoạch là gì nếu nó xảy ra

6. Implementation Đầy Đủ Của TAAgent

from agents.base import BaseAgent
from state import TeamState
from schemas.ta import TechnicalSpec, ArchitectureDecision, DataModel, ApiContract


class TAAgent(BaseAgent):
    TA_SYSTEM_PROMPT = """You are Morgan, a Technical Architect with 12 years of
    experience building production systems at scale.

    Your core philosophy:
    - Prefer boring technology over clever technology
    - Simple systems that work beat complex systems that might work
    - Document decisions with explicit reasoning — future Morgan will thank you
    - "It depends" is a valid answer, but you must always say what it depends ON

    You design systems that are:
    - Simple enough for a mid-level developer to understand
    - Maintainable by a team that does not include you
    - Appropriately scalable — not over-engineered for traffic that doesn't exist
    - Explicit about trade-offs, never pretending trade-offs don't exist

    You document every significant decision as an ADR (Architecture Decision Record).
    An ADR is short: title, status, context (why this decision was needed),
    decision (what you chose), alternatives considered, consequences (good and bad).

    Always output valid JSON matching the TechnicalSpec schema.
    """

    def _prepare_prompt(self, state: TeamState) -> str:
        requirements = state.get("clarified_requirements", "")
        stories = state.get("user_stories", [])
        formatted_stories = self._format_stories(stories)

        return f"""Based on these requirements, produce a complete technical specification.

CLARIFIED REQUIREMENTS:
{requirements}

USER STORIES:
{formatted_stories}

Produce the following sections:

1. TECHNOLOGY STACK
   For each component (backend framework, database, cache, message queue if needed,
   frontend if applicable), specify:
   - The chosen technology
   - The version or version range
   - The justification (2-3 sentences minimum)
   - Alternatives you considered and why you rejected them

2. DATA MODELS
   For each key entity:
   - Entity name and description
   - Fields with types and constraints
   - Relationships to other entities
   - Any important indexes or unique constraints

3. API CONTRACTS
   For each significant endpoint:
   - HTTP method and path
   - Request body schema (if applicable)
   - Response schema
   - Key HTTP status codes and their meaning
   - Authentication requirement

4. ARCHITECTURE DECISION RECORDS
   Create one ADR for each significant technical decision. At minimum, address:
   - Primary database choice
   - Caching strategy (if applicable)
   - Authentication approach
   - API style (REST vs GraphQL vs other)
   Each ADR must include: title, status (Proposed/Accepted/Deprecated),
   context, decision, alternatives_considered (list), consequences (list)

5. IMPLEMENTATION PHASES
   Define 3-4 phases of implementation in dependency order:
   - Phase name
   - What gets built
   - What it depends on from previous phases
   - Estimated relative complexity (low/medium/high)

6. TECHNICAL RISKS
   For each identified risk:
   - Risk description
   - Likelihood (low/medium/high)
   - Impact (low/medium/high)
   - Mitigation strategy

Return a JSON object matching the TechnicalSpec schema.
"""

    def _format_stories(self, stories: list) -> str:
        if not stories:
            return "No user stories provided."
        lines = []
        for story in stories:
            lines.append(f"[{story.story_id}] {story.title}")
            lines.append(f"  As a {story.as_a}, I want {story.i_want}")
        return "\n".join(lines)

    async def run(self, state: TeamState) -> TeamState:
        prompt = self._prepare_prompt(state)
        response = await self.llm.ainvoke(
            [
                {"role": "system", "content": self.TA_SYSTEM_PROMPT},
                {"role": "user", "content": prompt},
            ],
            response_format={"type": "json_object"},
        )
        raw = response.content
        spec = TechnicalSpec.model_validate_json(raw)

        return {
            **state,
            "technical_spec": spec,
            "ta_complete": True,
        }

Cấu trúc prompt rõ ràng về yêu cầu tối thiểu cho mỗi phần. “2-3 sentences minimum” cho justification không phải là lịch sự — nó ngăn LLM tạo ra câu trả lời một từ như “FastAPI: fast.” “Alternatives you considered” buộc ADR phải là một bản ghi quyết định thực sự, không phải một sự hợp lý hóa hậu kỳ của phương án đầu tiên xuất hiện trong đầu.

7. Schemas TechnicalSpec và ArchitectureDecision

from enum import Enum
from typing import Optional
from pydantic import BaseModel, Field


class AdrStatus(str, Enum):
    PROPOSED = "Proposed"
    ACCEPTED = "Accepted"
    DEPRECATED = "Deprecated"
    SUPERSEDED = "Superseded"


class Complexity(str, Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"


class RiskLevel(str, Enum):
    LOW = "low"
    MEDIUM = "medium"
    HIGH = "high"


class ArchitectureDecision(BaseModel):
    adr_id: str = Field(
        description="Unique identifier, format ADR-001",
        pattern=r"^ADR-\d{3}$",
    )
    title: str = Field(
        description="Short noun phrase describing what was decided",
        max_length=100,
    )
    status: AdrStatus
    context: str = Field(
        description="Why this decision was needed — what problem it addresses",
        min_length=50,
    )
    decision: str = Field(
        description="What was chosen and the primary reasoning",
        min_length=50,
    )
    alternatives_considered: list[str] = Field(
        description="Technologies or approaches that were evaluated and not chosen",
        min_length=1,
    )
    consequences: list[str] = Field(
        description="Trade-offs, both positive and negative, of this decision",
        min_length=1,
    )


class TechComponent(BaseModel):
    name: str
    technology: str
    version: Optional[str] = None
    justification: str = Field(min_length=30)
    alternatives_rejected: list[str] = Field(default_factory=list)


class EntityField(BaseModel):
    field_name: str
    field_type: str
    nullable: bool = False
    description: Optional[str] = None
    constraints: list[str] = Field(default_factory=list)


class DataModel(BaseModel):
    entity_name: str
    description: str
    fields: list[EntityField]
    relationships: list[str] = Field(
        default_factory=list,
        description="Plaintext descriptions: 'User has many Tasks', etc.",
    )
    indexes: list[str] = Field(
        default_factory=list,
        description="Key indexes and unique constraints",
    )


class ApiContract(BaseModel):
    method: str = Field(pattern=r"^(GET|POST|PUT|PATCH|DELETE|OPTIONS|HEAD)$")
    path: str = Field(description="URL path, e.g. /api/v1/tasks/{task_id}")
    summary: str
    request_body: Optional[dict] = None
    response_schema: dict = Field(default_factory=dict)
    status_codes: dict[str, str] = Field(
        description="Map of status code string to meaning, e.g. {'200': 'Success', '404': 'Task not found'}",
    )
    requires_auth: bool = True


class ImplementationPhase(BaseModel):
    phase_number: int
    name: str
    deliverables: list[str]
    depends_on: list[str] = Field(
        default_factory=list,
        description="Phase names this phase depends on",
    )
    complexity: Complexity


class TechnicalRisk(BaseModel):
    description: str
    likelihood: RiskLevel
    impact: RiskLevel
    mitigation: str


class TechnicalSpec(BaseModel):
    spec_id: str = Field(description="Format SPEC-001")
    project_name: str
    tech_stack: list[TechComponent]
    data_models: list[DataModel]
    api_contracts: list[ApiContract]
    adrs: list[ArchitectureDecision]
    implementation_phases: list[ImplementationPhase]
    technical_risks: list[TechnicalRisk] = Field(default_factory=list)
    total_adrs: int = Field(default=0)

    def model_post_init(self, __context) -> None:
        self.total_adrs = len(self.adrs)

Schema enforce mức chất lượng tối thiểu. min_length=50 trên các trường context và decision trong ArchitectureDecision ngăn LLM tạo ra các ADR chỉ nói “We chose Postgres. It is good.” Regex trên method đảm bảo không có typo trong API contracts. Phần đếm model_post_init một lần nữa loại bỏ một nhiệm vụ mà LLM xử lý không đáng tin cậy.

8. Fan-Out Pattern Song Song Trong LangGraph

Đây là cách chúng ta nối QC và TA để chạy đồng thời trong LangGraph workflow:

from langgraph.graph import StateGraph
from state import TeamState
from agents.ba import BAAgent
from agents.qc import QCAgent
from agents.ta import TAAgent
from nodes.merge import merge_qc_ta


def build_workflow() -> StateGraph:
    workflow = StateGraph(TeamState)

    # --- Node registration ---
    workflow.add_node("ba_agent", ba_node)
    workflow.add_node("qc_agent", qc_node)
    workflow.add_node("ta_agent", ta_node)
    workflow.add_node("parallel_merge", merge_qc_ta)

    # --- Entry point ---
    workflow.set_entry_point("ba_agent")

    # --- BA completes, fans out to both QC and TA ---
    workflow.add_edge("ba_agent", "qc_agent")
    workflow.add_edge("ba_agent", "ta_agent")

    # --- Both feed into the merge node ---
    workflow.add_edge("qc_agent", "parallel_merge")
    workflow.add_edge("ta_agent", "parallel_merge")

    # --- After merge, the SSE agent has everything it needs ---
    workflow.add_edge("parallel_merge", "sse_agent")

    return workflow.compile()

LangGraph nhận ra rằng khi một node có nhiều outgoing edges, những target nodes đó có thể được dispatch đồng thời. Runtime sẽ không bắt đầu parallel_merge cho đến khi cả qc_agent và ta_agent đều đã hoàn thành và ghi outputs của chúng vào TeamState.

Các node wrappers rất mỏng:

async def ba_node(state: TeamState) -> TeamState:
    agent = BAAgent(llm=get_llm())
    return await agent.run(state)


async def qc_node(state: TeamState) -> TeamState:
    agent = QCAgent(llm=get_llm())
    return await agent.run(state)


async def ta_node(state: TeamState) -> TeamState:
    agent = TAAgent(llm=get_llm())
    return await agent.run(state)

Factory get_llm() tạo một LLM client mới cho mỗi lần gọi. Đây là chủ ý: trong một lần thực thi đồng thời, chia sẻ một LLM client duy nhất giữa các agent có thể gây rò rỉ state (một số thư viện client duy trì conversation history nội bộ). Sự cô lập đáng giá so với overhead nhỏ.

9. Merge Node

Merge node có một nhiệm vụ: kết hợp outputs của QC và TA thành một TeamState duy nhất mạch lạc mà không có xung đột.

Xung đột tiềm ẩn rất đơn giản. Cả QC và TA đều bắt đầu từ cùng một snapshot TeamState. Cả hai trả về một state đã cập nhật. Nếu chúng ta spread cả hai updates một cách ngây thơ ({**state, **qc_update, **ta_update}), spread thứ hai ghi đè các trường từ spread đầu tiên. Trong thực tế, QC và TA ghi vào các key khác nhau (test_suite vs technical_spec), nên điều này không nguy hiểm — nhưng đáng để làm rõ ràng.

from state import TeamState


def merge_qc_ta(states: list[TeamState]) -> TeamState:
    """
    Merge the outputs of QC and TA agents into a single TeamState.

    LangGraph passes all completed branch states as a list when multiple
    branches converge at the same node. We extract the non-None values
    from each and combine them into the final state.

    Both agents write to disjoint keys:
    - QC writes: test_suite, qc_complete
    - TA writes: technical_spec, ta_complete

    We assert this invariant explicitly rather than hoping it stays true.
    """
    if len(states) != 2:
        raise ValueError(
            f"merge_qc_ta expected exactly 2 states, got {len(states)}"
        )

    # Identify which state came from which agent
    qc_state = next(s for s in states if s.get("qc_complete"))
    ta_state = next(s for s in states if s.get("ta_complete"))

    # Sanity check: neither agent should have written the other's key
    if "technical_spec" in qc_state and qc_state["technical_spec"] is not None:
        raise ValueError("QC agent wrote to technical_spec — unexpected state pollution")
    if "test_suite" in ta_state and ta_state["test_suite"] is not None:
        raise ValueError("TA agent wrote to test_suite — unexpected state pollution")

    # Merge: start from QC state (arbitrary), overlay TA additions
    merged: TeamState = {
        **qc_state,
        "technical_spec": ta_state["technical_spec"],
        "ta_complete": True,
        "qc_complete": True,
    }

    return merged

Các kiểm tra invariant rõ ràng không phải là paranoia. Trong một hệ thống mà LLMs tạo ra outputs và agents ghi vào shared state, các kịch bản “điều này không nên xảy ra” xảy ra thường xuyên trong quá trình phát triển. Làm cho invariant rõ ràng và ồn ào nghĩa là bạn phát hiện vi phạm ngay lập tức thay vì nhiều giờ sau khi một agent downstream tạo ra output khó hiểu.

10. Ví Dụ Thực Tế: Ứng Dụng Task Manager

Hãy cùng trace qua một lần chạy cụ thể sử dụng các stories task manager đã giới thiệu trong Phần 5. BA tạo ra 8 user stories cho ứng dụng quản lý task: tạo task, gán task, đặt due date, đánh dấu hoàn thành, lọc theo status, thêm comment, đặt priority, và archive task.

Output Của QC: 24 Test Cases

Sam review tất cả 8 stories và tạo ra 24 test cases phân bổ như sau:

Create Task (US-001) — 4 test cases:

TC-001 | Happy Path | Create task with all required fields
  Given: authenticated user, valid session token
  When:  POST /api/v1/tasks with {"title": "Deploy to prod", "priority": "high"}
  Then:  Response 201 Created
         Response body contains task_id (UUID format)
         task.status equals "pending"
         task.created_at within last 5 seconds
  Priority: critical

TC-002 | Edge Case | Create task with maximum-length title (255 chars)
  Given: authenticated user
  When:  POST /api/v1/tasks with title of exactly 255 characters
  Then:  Response 201 Created
         task.title stored without truncation
  Priority: medium

TC-003 | Error Scenario | Create task with empty title
  Given: authenticated user
  When:  POST /api/v1/tasks with {"title": ""}
  Then:  Response 422 Unprocessable Entity
         Error body contains field "title" with message about minimum length
  Priority: high

TC-004 | Error Scenario | Create task without authentication
  Given: no session token in request headers
  When:  POST /api/v1/tasks with valid body
  Then:  Response 401 Unauthorized
         Response body does not leak system information
  Priority: critical

Mark Complete (US-005) — 4 test cases bao gồm một concurrency edge case:

TC-017 | Edge Case | Concurrent completion — two users mark same task simultaneously
  Given: task in "pending" status, two authenticated users (user_a, user_b)
  When:  user_a and user_b both send PATCH /api/v1/tasks/{id}
         with {"status": "complete"} within 50ms of each other
  Then:  Exactly one request receives 200 OK
         Exactly one request receives 409 Conflict or 200 OK (idempotent)
         Task appears in completed state exactly once
         No duplicate completion events emitted
  Priority: high
  Tags: concurrency, edge-case

Block quality_gates mà Sam tạo ra:

{
  "suite_id": "TS-001",
  "quality_gates": {
    "min_unit_coverage": 85,
    "max_p95_response_ms": 150,
    "max_error_rate_percent": 0.05,
    "min_test_pass_rate": 100.0
  }
}

Lưu ý rằng Sam đặt mức trần error rate ở 0.05% — chặt hơn mặc định 0.1% của schema. Điều này phản ánh yêu cầu mà Sam đọc trong user stories: PM đã yêu cầu rõ ràng về việc theo dõi task “đáng tin cậy”.

Output Của TA: FastAPI + PostgreSQL + Redis, 4 ADRs, 3 Data Models

Morgan review cùng 8 stories đó và tạo ra một TechnicalSpec với các quyết định chính sau:

Technology Stack:

Component	Choice	Key Justification
Backend Framework	FastAPI 0.111	Async-native, auto-generated OpenAPI docs, Python type hints throughout
Primary Database	PostgreSQL 16	ACID transactions cho task state machines, hệ sinh thái trưởng thành, hỗ trợ JSON xuất sắc cho metadata
Cache Layer	Redis 7	Session storage, rate limiting, real-time task status broadcasting qua pub/sub
Task Queue	None (Phase 1)	Quy mô hiện tại không biện minh cho sự phức tạp của async worker — xem xét lại ở 10k users

Quyết định “Task Queue: None” là ví dụ về triết lý của Morgan trong thực tế. Yêu cầu không gọi cho background processing. Thêm Celery hoặc RQ ở giai đoạn này sẽ là sự phức tạp quá sớm. ADR cho quyết định này viết:

ADR-004: Defer Async Task Queue to Phase 2
Status: Accepted
Context: Several team members suggested adding an async task queue (Celery/RQ)
  upfront for operations like email notifications and status webhooks. The current
  requirements specify no background operations. Adding queue infrastructure adds
  a Redis dependency (partially mitigated by the cache decision), a worker process
  to manage, a dead-letter queue strategy, and monitoring for queue depth.
Decision: No task queue in Phase 1. Use synchronous processing. Add queue in Phase 2
  if response time SLAs cannot be met synchronously.
Alternatives Considered:
  - Celery with Redis broker — well-understood but heavy for current scale
  - RQ (Redis Queue) — lighter than Celery, still adds operational complexity
  - AWS SQS — vendor lock-in not justified for single-region deployment
Consequences:
  + Simpler deployment and monitoring in Phase 1
  + Fewer failure modes (no queue backpressure, no worker health checks)
  - Email notification latency will be synchronous (adds ~200ms to request path)
  - Refactor required if async processing becomes necessary

Ba Data Models:

Task:
  - task_id (UUID, primary key)
  - title (varchar 255, not null)
  - description (text, nullable)
  - status (enum: pending | in_progress | complete | archived)
  - priority (enum: low | medium | high | critical)
  - created_by (UUID, foreign key → User.user_id)
  - assigned_to (UUID, nullable, foreign key → User.user_id)
  - due_date (timestamp with timezone, nullable)
  - created_at (timestamp with timezone, default now())
  - updated_at (timestamp with timezone, auto-updated)
  Indexes: (status, created_by), (assigned_to, status), (due_date) where not null

User:
  - user_id (UUID, primary key)
  - email (varchar 320, unique, not null)
  - display_name (varchar 100, not null)
  - created_at (timestamp with timezone)

Comment:
  - comment_id (UUID, primary key)
  - task_id (UUID, foreign key → Task.task_id, cascade delete)
  - author_id (UUID, foreign key → User.user_id)
  - body (text, not null)
  - created_at (timestamp with timezone)
  Indexes: (task_id, created_at)

Ba data models và bốn ADRs cho SSE agent một technical envelope không mơ hồ. Không cần SSE phải quyết định giữa MySQL và PostgreSQL, không cần chọn giữa sync và async, không cần thiết kế data schema từ đầu. Những quyết định này đã được đưa ra, document, và đóng băng.

QC TestCase card and TA Architecture Decision Record side by side, showing how they connect

11. Khái Niệm Quality Gates

Đối tượng QualityGates mà Sam tạo ra không chỉ là documentation. Nó là một hợp đồng vận hành di chuyển xuyên suốt toàn bộ pipeline và cuối cùng chi phối việc deployment.

Đây là cách quality gates hoạt động xuyên suốt lifecycle:

Tại thời điểm QC (bây giờ): Sam định nghĩa gates dựa trên requirements. Đây là những thanh tối thiểu mà hệ thống khi ship phải vượt qua. Chúng được đặt trước khi bất kỳ code nào tồn tại, nghĩa là chúng phản ánh requirements, không phải implementation. Một gate unit coverage 85% được đặt trước khi code được viết là một yêu cầu thực sự. Một gate coverage 85% được đặt sau khi code được viết là một ngưỡng mà code hiện tại tình cờ vượt qua.

Tại thời điểm SSE (Phần 7): Khi SSE agent viết code, nó cũng viết các test implementations tương ứng với các định nghĩa test case của Sam. SSE đọc TestSuite và implement mỗi TestCase thành một pytest test thực tế. Các ngưỡng QualityGates được nhúng dưới dạng comments trong test configuration.

Tại thời điểm TL (Phần 8): Khi Tech Lead agent review output của SSE, một trong những tiêu chí review là liệu test implementation có thực sự cover các test cases mà Sam đã định nghĩa hay không. Một SSE implement 18 trong 24 test cases thì trượt TL review bất kể code có trông sạch sẽ đến đâu.

Tại thời điểm CI/CD (Phần 10): DevOps agent đọc đối tượng QualityGates và cấu hình pipeline tương ứng:

# Generated by DevOps agent from QualityGates object
coverage:
  minimum: 85  # From QualityGates.min_unit_coverage
  fail_under: true

performance:
  p95_threshold_ms: 150  # From QualityGates.max_p95_response_ms
  test_environment: staging

reliability:
  max_error_rate: 0.05  # From QualityGates.max_error_rate_percent
  measurement_window: 5m

Chuỗi từ định nghĩa ban đầu của Sam đến cấu hình CI/CD thực tế là điều làm cho quality gates có ý nghĩa. Chúng không phải là gợi ý. Chúng là gates — deployment không thể tiến hành nếu chúng không được đáp ứng. Và quan trọng là, chúng được định nghĩa bởi QC agent trước khi bất kỳ code nào tồn tại, nghĩa là chúng được định nghĩa bởi requirements, không phải bởi những gì code tình cờ làm.

Quality Gates Bảo Vệ Chống Lại Gì

Có ba failure modes mà quality gates ngăn chặn cụ thể:

Coverage Drift: Không có coverage gate, unit test coverage có xu hướng giảm theo thời gian khi features được thêm nhanh hơn tests. Gate biến coverage thành một yêu cầu cứng cho mọi merge, không phải một metric bạn nhìn vào hàng quý.

Performance Regression: Không có performance gate, thời gian phản hồi có xu hướng tăng dần. Một endpoint mất 80ms vào ngày đầu tiên mất 300ms sáu tháng sau vì sáu features đã mỗi cái thêm 30-50ms. Gate p95 bắt regressions ngay lập tức thay vì để chúng tích lũy.

Silent Error Rate Increase: Error rates trong production thường tăng dần qua những thay đổi nhỏ mà mỗi cái đều có vẻ vô hại. Mức trần error rate 0.05% nghĩa là team được cảnh báo về những gia tăng mà có thể thậm chí không nhìn thấy được trên dashboard với cài đặt scale mặc định.

Cả ba failure modes chia sẻ một nguyên nhân gốc: chúng vô hình trừ khi bạn đang đo lường chúng một cách rõ ràng so với các ngưỡng đã định nghĩa. Quality gates làm cho chúng hiện hữu. Định nghĩa chúng trước khi code tồn tại đảm bảo các ngưỡng là trung thực.

Kết Hợp Tất Cả: State Sau Giai Đoạn Này

Sau khi QC và TA hoàn thành và merge node kích hoạt, TeamState chứa:

state = TeamState(
    # From Part 4 (PO)
    user_brief=UserBrief(...),

    # From Part 5 (BA)
    clarified_requirements="...",
    user_stories=[UserStory(...) * 8],

    # From QC (this article)
    test_suite=TestSuite(
        suite_id="TS-001",
        test_cases=[TestCase(...) * 24],
        quality_gates=QualityGates(
            min_unit_coverage=85,
            max_p95_response_ms=150,
            max_error_rate_percent=0.05,
        ),
    ),
    qc_complete=True,

    # From TA (this article)
    technical_spec=TechnicalSpec(
        spec_id="SPEC-001",
        tech_stack=[TechComponent(...) * 4],
        data_models=[DataModel(...) * 3],
        api_contracts=[ApiContract(...) * 12],
        adrs=[ArchitectureDecision(...) * 4],
        implementation_phases=[ImplementationPhase(...) * 3],
    ),
    ta_complete=True,
)

SSE agent, đến tiếp theo, kế thừa state này. Nó có:

8 user stories cho biết cần xây dựng gì
24 test cases cho biết thành công và thất bại trông như thế nào
Một technology stack cho biết sử dụng công cụ gì
3 data models cho biết tạo entities nào
12 API contracts cho biết implement endpoints nào
4 ADRs giải thích lý do đằng sau các quyết định chính

SSE không đưa ra quyết định kiến trúc. SSE không định nghĩa tiêu chí chất lượng. SSE đang implement một hệ thống đã được đặc tả rõ ràng. Đây là, theo thiết kế, công việc hẹp nhất trong pipeline. Chúng ta đặc tả càng nhiều từ đầu, SSE đối mặt với càng ít sự mơ hồ — và SSE đối mặt với càng ít sự mơ hồ, chất lượng output càng tốt.

Những Gì Chúng Ta Xây Dựng Trong Phần 6

Hai agents, một tối ưu hóa cấu trúc:

QCAgent (Sam) — đọc user stories, tạo ra 24 test cases với cấu trúc Given/When/Then, phân loại TestPriority, và các ngưỡng QualityGates
TAAgent (Morgan) — đọc requirements và user stories, tạo ra TechnicalSpec với lựa chọn tech stack, 3 data models, 12 API contracts, 4 ADRs với lý luận đầy đủ, và 3 implementation phases
Parallel fan-out — cả hai agents chạy đồng thời sau khi BA hoàn thành, giảm một nửa wall-clock time cho giai đoạn này
Merge node — kết hợp an toàn cả hai outputs thành một TeamState duy nhất với các kiểm tra invariant rõ ràng
Quality gates — các ngưỡng chính thức được định nghĩa trước khi code tồn tại, sẽ chi phối hành vi CI/CD pipeline trong Phần 10

Nguyên tắc chính kết nối cả hai agents là định nghĩa phải đi trước implementation. Chất lượng được định nghĩa trước khi code được viết. Kiến trúc được quyết định trước khi SSE chọn một thư viện. Trình tự này không phải là quan liêu — nó là kỷ luật engineering ngăn chặn loại rework đắt đỏ nhất.

Tiếp Theo Là Gì

Trong Phần 7, chúng ta xây dựng SSE agent — Senior Software Engineer. SSE là implementation engine của pipeline. Nó đọc toàn bộ TeamState (user stories + test cases + technical spec) và tạo ra working code: Python files, test files, requirements manifest, và implementation log ngắn gọn. SSE là nơi trừu tượng trở thành cụ thể.

SSE đối mặt với một thách thức thú vị: làm sao bạn prompt một LLM để tạo ra production-quality code xuyên nhiều files mà không để nó drift off-spec? Câu trả lời liên quan đến strict output schemas, file-level granularity, và một pattern tôi gọi là “spec-first prompting” — đưa cho model các ràng buộc trước sự tự do.

Hẹn gặp bạn ở Phần 7.

Điều Hướng Series

Phần 1: Tại Sao Xây Dựng Một Đội Ngũ AI Software?
Phần 2: Mapping Các Vai Trò — 8 Agents, Một Pipeline
Phần 3: Thiết Kế Đội Ngũ AI — Kiến Trúc và DDD
Phần 4: TeamState — Bộ Não Chung
Phần 5: PO và BA Agents — Từ Brief Đến User Stories
Phần 6: QC và TA Agents — Chất Lượng và Thiết Kế Kỹ Thuật (bài viết này)
Phần 7: SSE Agent — Implementation Engine (sắp ra mắt)
Phần 8-12: Sắp ra mắt

Xuất nội dung

QC và TA Agents: Định Nghĩa Chất Lượng và Thiết Kế Kỹ Thuật (Phần 6/12)

1. Tư Duy Test-First, Design-First

Tại Sao Quyết Định Kiến Trúc Được Đưa Ra Sớm Tiết Kiệm 10x Thời Gian Refactor

Fan-Out Pattern Song Song

2. Thiết Kế QC Agent

Định Nghĩa Vai Trò

Những Gì Sam Tạo Ra Cho Mỗi Story

3. Implementation Đầy Đủ Của QCAgent

4. Schemas TestCase và TestSuite

5. Thiết Kế TA Agent

Những Gì Morgan Tạo Ra

6. Implementation Đầy Đủ Của TAAgent

7. Schemas TechnicalSpec và ArchitectureDecision

8. Fan-Out Pattern Song Song Trong LangGraph

9. Merge Node

10. Ví Dụ Thực Tế: Ứng Dụng Task Manager

Output Của QC: 24 Test Cases

Output Của TA: FastAPI + PostgreSQL + Redis, 4 ADRs, 3 Data Models

11. Khái Niệm Quality Gates

Quality Gates Bảo Vệ Chống Lại Gì

Kết Hợp Tất Cả: State Sau Giai Đoạn Này

Những Gì Chúng Ta Xây Dựng Trong Phần 6

Tiếp Theo Là Gì

Điều Hướng Series

Bình luận

Nội dung chính

QC và TA Agents: Định Nghĩa Chất Lượng và Thiết Kế Kỹ Thuật (Phần 6/12)