LangGraph Multi-Agent Platform

Single LLM calls hit a ceiling fast. Ask one model to research a topic, architect a solution, write the code, test it, and deploy it — quality collapses at every step. The answer is specialization: give each task to an agent that is purpose-built for it, with the right tools, the right context, and nothing else.

This is my most complete AI systems project — a production-ready multi-agent platform where 7 specialist agents collaborate under a Master Orchestrator to handle complex, multi-step tasks end-to-end.

Architecture: Supervisor Pattern

The core design is a Supervisor/Orchestrator pattern implemented on LangGraph 1.1+. A Master Orchestrator receives tasks, determines which specialist agent(s) should handle them, routes work accordingly, and synthesizes results.

Agents self-register via an AgentRegistry — the Orchestrator discovers capabilities dynamically rather than having agents hardcoded. This means adding a new specialist agent requires zero changes to the orchestration logic.

flowchart LR
    U(["👤 User"])

    subgraph EP ["📡 Entry Points"]
        A1["FastAPI\n/api/chat/stream"]
        A2["WebSocket\n/ws/voice"]
        A3["CLI Scripts"]
    end

    subgraph CORE ["🧠 Supervisor Core"]
        SUP["Master Supervisor\nAgentRegistry · LangGraph 1.1+"]
        LLM["☁️ AWS Bedrock\nClaude Sonnet 4"]
        SUP -. calls .-> LLM
    end

    subgraph AGENTS ["🤖 7 Specialist Agents"]
        B1["🔍 Research"]
        B2["💻 Code"]
        B3["📊 Data"]
        B4["⚙️ Ops ⚠️"]
        B5["📚 Knowledge + RAG"]
        B6["🔬 Deep Research\nsub-graph"]
        B7["🏗️ Architect\n6-phase"]
    end

    subgraph SVC ["🔧 Services"]
        C1["🌐 Tavily\n🐍 Python REPL"]
        C2["🗄️ PostgreSQL\n+ pgvector"]
        C3["🎤 Deepgram\n🔊 AWS Polly"]
    end

    U --> EP
    EP --> CORE
    CORE --> AGENTS
    B1 & B2 & B3 & B4 & B6 & B7 --> C1
    B5 --> C2
    A2 --> C3

    style SUP fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
    style LLM fill:#2d1f5e,stroke:#8b5cf6,color:#c4b5fd
    style B4 fill:#3d2000,stroke:#f59e0b,color:#fbbf24
    style B6 fill:#1a1040,stroke:#7c3aed,color:#a78bfa
    style B7 fill:#1a1040,stroke:#7c3aed,color:#a78bfa
    style C2 fill:#0a1530,stroke:#2563eb,color:#93c5fd
    style C3 fill:#0a1a15,stroke:#0d9488,color:#5eead4

The 7 Specialist Agents

Research Agent — Quick web research and URL summarization using Tavily API. Designed for fast, shallow lookups where the Orchestrator needs current information to proceed.

Code Agent — Writes, executes, and reviews code using a sandboxed Python REPL. Used for data transformation, proof-of-concept implementations, and automated code review tasks.

Data Agent — SQL query execution and data analysis against connected databases. Uses describe_tables() to self-discover schema before writing queries — no hallucinated column names.

Ops Agent — CLI commands and file system operations. The only agent with a permission gate: configurable via environment variable to off / read / execute / full. Default is off — the agent exists but cannot act until explicitly enabled.

Knowledge Agent — Semantic search over ingested documents using pgvector (1024-dim Amazon Titan embeddings or 1536-dim OpenAI embeddings). Powers the RAG layer for private knowledge bases.

Deep Research Agent — A sub-graph that spawns parallel research workers, aggregates results, and produces cited reports. Built on the deepagents package. Used when depth and citations matter more than speed.

Software Architect Agent — The most complex agent: a 6-phase pipeline sub-graph that handles the full development lifecycle:

Understand requirements
Research existing solutions
Design architecture
Implement code
Write tests
Prepare deployment plan

Voice Interface

Real-time voice is a first-class feature, not an add-on.

STT: Deepgram streaming via WebSocket — near-zero latency transcription
TTS: AWS Polly — converts agent responses to natural speech
Protocol: WebSocket endpoint at /ws/voice handles bidirectional audio streaming
Frontend: voice-panel.tsx React component handles mic input, playback, and visual feedback

The result: you can have a spoken conversation with the multi-agent system, with streaming responses read aloud as they arrive.

Persistence & State

Every conversation thread is persisted in PostgreSQL 16 with pgvector using LangGraph’s AsyncPostgresSaver checkpointing. This means:

Conversations survive server restarts
Long-running tasks (Deep Research, Software Architect) can span multiple sessions
All agent decisions, tool calls, and handoffs are captured and replayable

API Surface (FastAPI)

GET  /api/health          — Health check
POST /api/chat            — Synchronous chat
POST /api/chat/stream     — SSE streaming (preferred)
GET  /api/threads         — List conversation threads
GET  /api/stats           — Token usage and cost metrics
PUT  /api/ops/permission  — Update Ops Agent permission level
GET  /api/skills          — List available agent skills
PUT  /api/settings/prompt — Update system prompt at runtime
WS   /ws/voice            — WebSocket voice interface

Frontend

Next.js 15 with React 19 and Tailwind CSS 4. Key features:

Real-time streaming — SSE responses rendered as they arrive
Mermaid diagrams — Architecture diagrams rendered inline from agent output
Slash commands — /research, /code, /architect to invoke specific agents directly
AI skills learning — The UI surfaces which agent handled each part of a response
Custom system prompts — Editable per session via the settings panel
Export — Full conversation export to PDF or ZIP via html2pdf.js and jszip

Safety & Cost Controls

Running multiple agents with real tools requires hard limits:

AGENT_TIMEOUT_SECONDS=120    # No agent runs more than 2 minutes
AGENT_RECURSION_LIMIT=25     # Prevents infinite loops
MAX_REQUEST_TOKENS=50000     # Per-request cap
MAX_REQUEST_COST=0.50        # Hard cost ceiling per request
DAILY_USER_BUDGET=10.00      # Daily spending limit per user
OPS_PERMISSION=off           # Ops agent disabled by default
AUTH_ENABLED=false           # JWT auth toggleable

Observability

Full LangSmith tracing enabled by default. Every agent decision, tool invocation, token count, and cost is traced. Combined with structured logging via structlog and rate limiting via slowapi, the system is production-observable from day one.

Deployment

One command full stack:

docker compose up -d
# Services: PostgreSQL 16 (pgvector) + API + Next.js frontend + optional Nginx

MCP integration via .mcp.json connects Claude Code directly to the live PostgreSQL instance and project filesystem during development — the platform builds itself using the same agent infrastructure.

Tech Stack

Layer	Technology
AI Framework	LangGraph 1.1+, LangChain 0.3+
LLM (default)	AWS Bedrock — Claude Sonnet 4
LLM (alternatives)	OpenAI GPT-4o, Anthropic direct
Deep Research	`deepagents` package
Web Search	Tavily API
Voice STT	Deepgram (WebSocket)
Voice TTS	AWS Polly
API Server	FastAPI + Uvicorn
Database	PostgreSQL 16 + pgvector
RAG Embeddings	Amazon Titan / OpenAI (1024–1536 dim)
Frontend	Next.js 15, React 19, TypeScript
Styling	Tailwind CSS 4
E2E Testing	Playwright BDD (Cucumber)
Backend Testing	pytest + httpx
Frontend Testing	Vitest + Testing Library
Containers	Docker Compose
Observability	LangSmith + structlog
MCP	`server-postgres` + `server-filesystem`