LangGraph Multi-Agent Platform
Production-ready multi-agent AI system built on LangGraph 1.1+ with a Supervisor/Orchestrator pattern. 7 specialized agents (Research, Code, Data, Ops, Knowledge, Deep Research, Software Architect) orchestrated via AWS Bedrock Claude, with real-time voice (Deepgram STT + AWS Polly TTS), PostgreSQL + pgvector persistence, and a Next.js 15 frontend.
Single LLM calls hit a ceiling fast. Ask one model to research a topic, architect a solution, write the code, test it, and deploy it — quality collapses at every step. The answer is specialization: give each task to an agent that is purpose-built for it, with the right tools, the right context, and nothing else.
This is my most complete AI systems project — a production-ready multi-agent platform where 7 specialist agents collaborate under a Master Orchestrator to handle complex, multi-step tasks end-to-end.
Architecture: Supervisor Pattern
The core design is a Supervisor/Orchestrator pattern implemented on LangGraph 1.1+. A Master Orchestrator receives tasks, determines which specialist agent(s) should handle them, routes work accordingly, and synthesizes results.
Agents self-register via an AgentRegistry — the Orchestrator discovers capabilities dynamically rather than having agents hardcoded. This means adding a new specialist agent requires zero changes to the orchestration logic.
flowchart LR
U(["👤 User"])
subgraph EP ["📡 Entry Points"]
A1["FastAPI\n/api/chat/stream"]
A2["WebSocket\n/ws/voice"]
A3["CLI Scripts"]
end
subgraph CORE ["🧠 Supervisor Core"]
SUP["Master Supervisor\nAgentRegistry · LangGraph 1.1+"]
LLM["☁️ AWS Bedrock\nClaude Sonnet 4"]
SUP -. calls .-> LLM
end
subgraph AGENTS ["🤖 7 Specialist Agents"]
B1["🔍 Research"]
B2["💻 Code"]
B3["📊 Data"]
B4["⚙️ Ops ⚠️"]
B5["📚 Knowledge + RAG"]
B6["🔬 Deep Research\nsub-graph"]
B7["🏗️ Architect\n6-phase"]
end
subgraph SVC ["🔧 Services"]
C1["🌐 Tavily\n🐍 Python REPL"]
C2["🗄️ PostgreSQL\n+ pgvector"]
C3["🎤 Deepgram\n🔊 AWS Polly"]
end
U --> EP
EP --> CORE
CORE --> AGENTS
B1 & B2 & B3 & B4 & B6 & B7 --> C1
B5 --> C2
A2 --> C3
style SUP fill:#1e3a5f,stroke:#3b82f6,color:#93c5fd
style LLM fill:#2d1f5e,stroke:#8b5cf6,color:#c4b5fd
style B4 fill:#3d2000,stroke:#f59e0b,color:#fbbf24
style B6 fill:#1a1040,stroke:#7c3aed,color:#a78bfa
style B7 fill:#1a1040,stroke:#7c3aed,color:#a78bfa
style C2 fill:#0a1530,stroke:#2563eb,color:#93c5fd
style C3 fill:#0a1a15,stroke:#0d9488,color:#5eead4The 7 Specialist Agents
Research Agent — Quick web research and URL summarization using Tavily API. Designed for fast, shallow lookups where the Orchestrator needs current information to proceed.
Code Agent — Writes, executes, and reviews code using a sandboxed Python REPL. Used for data transformation, proof-of-concept implementations, and automated code review tasks.
Data Agent — SQL query execution and data analysis against connected databases. Uses describe_tables() to self-discover schema before writing queries — no hallucinated column names.
Ops Agent — CLI commands and file system operations. The only agent with a permission gate: configurable via environment variable to off / read / execute / full. Default is off — the agent exists but cannot act until explicitly enabled.
Knowledge Agent — Semantic search over ingested documents using pgvector (1024-dim Amazon Titan embeddings or 1536-dim OpenAI embeddings). Powers the RAG layer for private knowledge bases.
Deep Research Agent — A sub-graph that spawns parallel research workers, aggregates results, and produces cited reports. Built on the deepagents package. Used when depth and citations matter more than speed.
Software Architect Agent — The most complex agent: a 6-phase pipeline sub-graph that handles the full development lifecycle:
- Understand requirements
- Research existing solutions
- Design architecture
- Implement code
- Write tests
- Prepare deployment plan
Voice Interface
Real-time voice is a first-class feature, not an add-on.
- STT: Deepgram streaming via WebSocket — near-zero latency transcription
- TTS: AWS Polly — converts agent responses to natural speech
- Protocol: WebSocket endpoint at
/ws/voicehandles bidirectional audio streaming - Frontend:
voice-panel.tsxReact component handles mic input, playback, and visual feedback
The result: you can have a spoken conversation with the multi-agent system, with streaming responses read aloud as they arrive.
Persistence & State
Every conversation thread is persisted in PostgreSQL 16 with pgvector using LangGraph’s AsyncPostgresSaver checkpointing. This means:
- Conversations survive server restarts
- Long-running tasks (Deep Research, Software Architect) can span multiple sessions
- All agent decisions, tool calls, and handoffs are captured and replayable
API Surface (FastAPI)
GET /api/health — Health check
POST /api/chat — Synchronous chat
POST /api/chat/stream — SSE streaming (preferred)
GET /api/threads — List conversation threads
GET /api/stats — Token usage and cost metrics
PUT /api/ops/permission — Update Ops Agent permission level
GET /api/skills — List available agent skills
PUT /api/settings/prompt — Update system prompt at runtime
WS /ws/voice — WebSocket voice interface
Frontend
Next.js 15 with React 19 and Tailwind CSS 4. Key features:
- Real-time streaming — SSE responses rendered as they arrive
- Mermaid diagrams — Architecture diagrams rendered inline from agent output
- Slash commands —
/research,/code,/architectto invoke specific agents directly - AI skills learning — The UI surfaces which agent handled each part of a response
- Custom system prompts — Editable per session via the settings panel
- Export — Full conversation export to PDF or ZIP via
html2pdf.jsandjszip
Safety & Cost Controls
Running multiple agents with real tools requires hard limits:
AGENT_TIMEOUT_SECONDS=120 # No agent runs more than 2 minutes
AGENT_RECURSION_LIMIT=25 # Prevents infinite loops
MAX_REQUEST_TOKENS=50000 # Per-request cap
MAX_REQUEST_COST=0.50 # Hard cost ceiling per request
DAILY_USER_BUDGET=10.00 # Daily spending limit per user
OPS_PERMISSION=off # Ops agent disabled by default
AUTH_ENABLED=false # JWT auth toggleable
Observability
Full LangSmith tracing enabled by default. Every agent decision, tool invocation, token count, and cost is traced. Combined with structured logging via structlog and rate limiting via slowapi, the system is production-observable from day one.
Deployment
One command full stack:
docker compose up -d
# Services: PostgreSQL 16 (pgvector) + API + Next.js frontend + optional Nginx
MCP integration via .mcp.json connects Claude Code directly to the live PostgreSQL instance and project filesystem during development — the platform builds itself using the same agent infrastructure.
Tech Stack
| Layer | Technology |
|---|---|
| AI Framework | LangGraph 1.1+, LangChain 0.3+ |
| LLM (default) | AWS Bedrock — Claude Sonnet 4 |
| LLM (alternatives) | OpenAI GPT-4o, Anthropic direct |
| Deep Research | deepagents package |
| Web Search | Tavily API |
| Voice STT | Deepgram (WebSocket) |
| Voice TTS | AWS Polly |
| API Server | FastAPI + Uvicorn |
| Database | PostgreSQL 16 + pgvector |
| RAG Embeddings | Amazon Titan / OpenAI (1024–1536 dim) |
| Frontend | Next.js 15, React 19, TypeScript |
| Styling | Tailwind CSS 4 |
| E2E Testing | Playwright BDD (Cucumber) |
| Backend Testing | pytest + httpx |
| Frontend Testing | Vitest + Testing Library |
| Containers | Docker Compose |
| Observability | LangSmith + structlog |
| MCP | server-postgres + server-filesystem |