Introduction

Building a demo agent takes a weekend. Shipping a multi-agent system that handles thousands of concurrent tasks, costs pennies per query, and scales without rearchitecting? That takes months of hard-won production experience.

This guide documents the architecture decisions, cost strategies, and infrastructure patterns I’ve used to build scalable multi-agent systems with LangGraph on AWS. It’s written from the perspective of a Technical Lead who’s responsible for both the technical design and the budget that funds it.

What you’ll learn:

  • The Brain/Worker separation pattern that achieves 5-10x cost reduction vs. single-model architectures
  • A 5-layer component architecture with clear responsibility boundaries
  • AWS infrastructure patterns (Lambda + ECS + Step Functions + Bedrock) with real pricing numbers
  • CI/CD pipelines specifically designed for agent systems
  • Production best practices for error handling, observability, and human-in-the-loop workflows

Who is this for? Technical Leads evaluating multi-agent architectures, AI engineers moving from prototype to production, and anyone who’s been asked “how much will this cost at scale?” by their CTO.


Table of Contents

  1. Why Multi-Agent? The Case for Separation of Concerns
  2. Architecture Principles
  3. The 5-Layer Component Architecture
  4. LangGraph: Why Graph-Based Orchestration Wins
  5. The Brain Agent: Claude Sonnet as Orchestrator
  6. Worker Agents: Cheap, Fast, Focused
  7. State Management and Checkpointing
  8. AWS Infrastructure Deep Dive
  9. Cost Modeling: From USD 13,500 to USD 2,790/month
  10. CI/CD Pipeline for Agent Systems
  11. Observability and Monitoring
  12. Security Architecture
  13. Human-in-the-Loop Patterns
  14. Error Handling and Resilience
  15. Scaling Patterns
  16. Production Checklist
  17. LangGraph vs. Alternatives: When to Choose What
  18. Resources and References

1. Why Multi-Agent? The Case for Separation of Concerns

The instinct is to build one powerful agent that does everything. This fails at scale for three reasons:

Cost explosion. A single Claude Sonnet call costs USD 3/1M input tokens. If your agent processes 1M tasks/month and every task hits Sonnet, you’re looking at USD 13,500/month in model costs alone. Most of those tasks don’t need Sonnet-level reasoning.

Blast radius. When one monolithic agent fails, everything fails. When a SQL-writing worker fails, only the SQL step fails. The orchestrator can retry, route to a fallback, or escalate to a human.

Extensibility. Adding a new capability (say, a chart generator) to a monolithic agent means rewriting the entire prompt and retraining. In a multi-agent system, you add a new worker and register it with the orchestrator. Zero changes to existing agents.

The pattern that works in production is Brain/Worker separation:

  • Brain (Orchestrator): An expensive, high-reasoning model (Claude Sonnet) that plans, coordinates, and synthesizes. Invoked 1-2 times per task.
  • Workers: Cheap, fast models (Claude Haiku, DeepSeek) that execute specific subtasks. They don’t decide what to do; they do what they’re told.

This is the primary cost optimization lever. Get this right, and everything else falls into place.


2. Architecture Principles

Before diving into components, here are the non-negotiable principles that guide every decision:

PrincipleImplementationWhy It Matters
Brain/Worker separationSonnet handles reasoning; Haiku/DeepSeek handle execution5-10x cost reduction vs. all-Sonnet
Context injection over fine-tuningAgent knowledge injected at runtime via system prompt and RAGNo retraining when schemas change
Human-in-the-loop on exceptionsLow-confidence results flagged, not silently returnedAnalyst trust maintained as system scales
Pay-per-use infrastructureAthena + Lambda + S3 with no always-on compute for queriesCost scales linearly with usage
Typed contracts between agentsStructured JSON specs between brain and workersEliminates ambiguity, enables testing
Checkpoint everythingEvery state transition persisted to durable storageResume from any failure point

3. The 5-Layer Component Architecture

The system is organized into five layers, each with a defined responsibility boundary:

Layer 1 — Data Inputs Raw data sources, business rules, and schema definitions. Examples: S3 data lake, weighting rules, data dictionaries, configuration files.

Layer 2 — Orchestrator (Brain) Claude Sonnet loaded with full schema context and business rules. Receives natural-language questions, plans query strategies, issues typed instructions to workers, validates results, and synthesizes final answers.

Layer 3 — Worker Agents Specialized executors: SQL Writer, Data Processor, Formatter. Each receives structured input from the orchestrator and returns structured output. Uses Haiku or DeepSeek for cost efficiency.

Layer 4 — Data and State Layer Query execution (Athena), agent state persistence (DynamoDB), artifact storage (S3), and task queuing (SQS).

Layer 5 — Output Final deliverables: weighted insights, chart-ready JSON (Recharts/D3 compatible), markdown tables, or plain-English narrative summaries.

5-Layer Architecture Diagram


4. LangGraph: Why Graph-Based Orchestration Wins

After evaluating CrewAI, AutoGen, Semantic Kernel, and LangGraph, here’s why LangGraph is the right choice for production multi-agent systems:

The Graph Model

LangGraph models agent workflows as directed graphs where:

  • Nodes are processing steps (LLM calls, tool executions, human approvals)
  • Edges define transitions (conditional routing, parallel fan-out, loops)
  • State flows through the graph, accumulating results at each step

This gives you something no other framework provides: you can see every possible execution path before the first line of code runs. For a Technical Lead responsible for production reliability, this is non-negotiable.

LangGraph 1.0 (October 2025)

LangGraph hit its first stable release in October 2025. Key facts:

  • No breaking changes from v0.6.6
  • LangGraph is now the low-level runtime; LangChain is the high-level API built on top
  • create_react_agent moved to langchain.agents as create_agent
  • 38M+ monthly PyPI downloads
  • Nearly 400 companies deployed agents via LangGraph Platform during beta

The Supervisor Pattern

The supervisor pattern is the recommended architecture for multi-agent orchestration. A central supervisor (Brain) coordinates specialized workers:

from langchain.agents import create_agent
from langgraph_supervisor import create_supervisor
from langchain_aws import ChatBedrock

# Brain model (expensive, high-reasoning)
brain = ChatBedrock(
    model_id="anthropic.claude-sonnet-4-5-20250514-v1:0",
    region_name="us-east-1"
)

# Worker model (cheap, fast execution)
worker_model = ChatBedrock(
    model_id="anthropic.claude-3-5-haiku-20241022-v1:0",
    region_name="us-east-1"
)

# Specialized worker agents
sql_agent = create_agent(
    model=worker_model,
    tools=[execute_athena_query, validate_sql],
    name="sql_writer",
    prompt="You are a SQL specialist. Translate query "
           "specs into Athena-compatible SQL. Never decide "
           "WHAT to query; only translate intent to SQL."
)

formatter_agent = create_agent(
    model=worker_model,
    tools=[format_json, format_markdown, format_narrative],
    name="formatter",
    prompt="You convert data into the requested output format."
)

# Supervisor orchestration
workflow = create_supervisor(
    agents=[sql_agent, formatter_agent],
    model=brain,
    prompt="""You are the orchestrator of a data analytics system.
    You have access to a data dictionary and business rules.
    Plan the query strategy, delegate to specialists,
    validate results, and synthesize final answers."""
)

app = workflow.compile()

Manual Handoff for More Control

For production systems where you need explicit control over agent transitions:

Hierarchical Teams

For complex systems, compose subgraphs with multi-level supervisors:

Each team is an independent LangGraph subgraph. The top-level supervisor routes tasks to teams; team supervisors route to workers. This scales to dozens of agents without the orchestrator becoming a bottleneck.


5. The Brain Agent: Claude Sonnet as Orchestrator

Claude Sonnet serves as the central intelligence. It is not a data executor. It is a planner and synthesizer.

What the Brain Does

  1. Receives analyst or client natural-language questions
  2. Interprets the business intent behind the question
  3. Consults the injected data dictionary for schema, column semantics, and business rules
  4. Plans the query strategy: which table, which filters, which dimensions
  5. Issues tool calls to worker agents with structured, typed instructions
  6. Receives results from workers, validates against expected schema
  7. Synthesizes a final answer: narrative text, structured data, or both

Context Injection Pattern

Instead of fine-tuning (expensive, brittle), inject knowledge at runtime:

BRAIN_SYSTEM_PROMPT = """
You are the orchestrator of a data analytics system.

## Data Dictionary
[DATA_DICTIONARY]

## Business Rules
[BUSINESS_RULES]

## Available Workers
- sql_writer: Translates query specs to Athena SQL
- weight_applier: Applies rim weighting to raw results
- formatter: Converts data to JSON/markdown/narrative

## Instructions
1. Always plan before executing
2. Issue structured query specs to sql_writer (never raw SQL)
3. ALL metric outputs MUST pass through weight_applier
4. Flag low-confidence results for human review
"""

This approach means:

  • No retraining when the schema changes (just update the dictionary)
  • Version-controlled knowledge (prompts live in git alongside code)
  • A/B testable (swap prompts via feature flags)

Cost Profile

Sonnet is invoked 1-2 times per query (planning + synthesis). The majority of token consumption is in worker agents, which use cheaper models. This is the primary cost optimization lever.

PhaseModelTokensCost per Task
PlanningSonnet (USD 3/1M in)~2,000 input, ~500 output~USD 0.0135
SynthesisSonnet (USD 3/1M in)~1,500 input, ~300 output~USD 0.009
Brain Total~USD 0.023

6. Worker Agents: Cheap, Fast, Focused

Workers are the cost-efficiency engine. They don’t decide what to do; they execute structured instructions from the Brain.

6.1 SQL Writer (Haiku)

Receives a structured query specification from the Brain and produces Athena-compatible SQL:

InputOutput
Structured query spec (JSON) from BrainValid Athena SQL string
Target table, filters, GROUP BY dimensionsResult set (rows/columns)
Max scan size constraintError if query exceeds scan budget
SQL_WRITER_PROMPT = """You are an Athena SQL specialist.

Input: A JSON query specification with:
- table: target table name
- filters: WHERE conditions
- dimensions: GROUP BY columns
- aggregations: SUM/AVG/COUNT functions
- scan_budget_gb: maximum data scan allowed

Output: A single Athena-compatible SQL query.

Rules:
- NEVER use SELECT * (always specify columns)
- ALWAYS include partition filters for cost control
- NEVER exceed the scan budget
- Return ONLY the SQL, no explanation"""

6.2 Data Processor (Haiku/DeepSeek)

Receives raw query results and applies domain-specific transformations. In analytics systems, this might include statistical weighting, normalization, or aggregation:

PROCESSOR_PROMPT = """You are a data processing specialist.

Input: Raw query results and processing instructions from the orchestrator.
Output: Processed results with a confidence report.

Rules:
- Apply all specified transformations in order
- Report confidence level (high/medium/low)
- Flag any anomalies or missing data
- Return structured JSON output"""

6.3 Formatter (Haiku)

Converts processed data into the requested output format:

FormatUse Case
Structured JSONDashboard/chart rendering (Recharts, D3 compatible)
Markdown tableReport embedding
Plain-English narrativeClient-facing summaries
CSVData export

Worker Cost Profile

WorkerModelAvg TokensCost per Task
SQL WriterHaiku (USD 0.80/1M in)~800 in, ~200 out~USD 0.0014
ProcessorHaiku (USD 0.80/1M in)~1,200 in, ~300 out~USD 0.002
FormatterHaiku (USD 0.80/1M in)~600 in, ~400 out~USD 0.002
Workers Total~USD 0.0054

Combined cost per task: ~USD 0.028 (Brain USD 0.023 + Workers USD 0.005) vs. All-Sonnet: ~USD 0.135 (every step uses Sonnet) Savings: 4.8x


7. State Management and Checkpointing

In production, agents crash. Networks fail. Servers restart. Without durable state management, you lose everything.

How LangGraph Checkpointing Works

LangGraph follows a “read-execute-write” cycle:

  1. User sends message with thread_id
  2. LangGraph queries the database for the latest checkpoint
  3. State loaded into memory
  4. Agent processes through graph nodes
  5. After each super-step, new state serialized and saved as a checkpoint

Checkpoints are never overwritten. Full history is maintained, enabling “time travel” debugging. If your agent is 3 steps into a 10-step workflow when the server restarts, it picks up exactly where it left off.

DynamoDB Checkpointer for AWS

from langgraph_checkpoint_aws import DynamoDBSaver

checkpointer = DynamoDBSaver(
    table_name="agent-checkpoints",
    ttl_seconds=86400  # Auto-expire after 24 hours
)

# Intelligent payload handling:
# Payloads under 350KB -> stored directly in DynamoDB
# Payloads 350KB+ -> uploaded to S3 with DynamoDB reference

Checkpointer Comparison

BackendPackageBest For
InMemorySaverbuilt-inPrototyping only (never production)
SQLitelanggraph-checkpoint-sqliteLocal development
PostgreSQLlanggraph-checkpoint-postgresProduction (used by LangSmith)
DynamoDBlanggraph-checkpoint-awsAWS-native, serverless
Azure Cosmos DBlanggraph-checkpoint-azureAzure production

Production Rules

  1. Never use InMemorySaver in production — ephemeral and local
  2. Don’t store large binaries in state — store files in S3, save only URLs in state (a new copy is saved at every step)
  3. Use TTL for automatic checkpoint expiration
  4. Use compression to reduce checkpoint size
  5. Thread-scoped checkpoints — isolate state per conversation/task

8. AWS Infrastructure Deep Dive

Compute: Lambda vs. ECS/Fargate

CriterionLambdaECS/Fargate
Max duration15 minutesUnlimited
State managementExternal only (DynamoDB)In-memory + external
Max memory10 GB120 GB
Cold start1-4s (200ms with SnapStart)30-90s
Cost modelPay per invocation + durationPay per vCPU-hour
ScalingPer-request, instantPer-container, 30-90s

Decision framework:

  • Lambda for stateless, short-lived tasks: preprocessing, single inference calls, webhook handlers, lightweight tool execution
  • ECS/Fargate for long-running agent sessions that exceed 15 minutes, need in-memory state, or require persistent connections

Orchestration: Step Functions

AWS Step Functions provides native orchestration for multi-agent workflows:

Key patterns:

Parallel Fan-Out for multi-agent work:

Human-in-the-Loop with task tokens:

Direct Bedrock Integration (no Lambda needed):

Data Layer: S3 + Athena

Athena pricing: USD 5.00 per TB scanned. Optimize ruthlessly:

StrategySavings
Columnar format (Parquet/ORC)85-99% scan reduction
Partition pruning (date, region)Proportional to filter selectivity
Compression (Snappy/ZSTD)3-10x size reduction
Column selection (never SELECT *)Proportional to columns skipped
Scan budget guardrailHard cost ceiling per query

Scan budget pattern:

Bedrock: Model Pricing

ModelInput (per 1M tokens)Output (per 1M tokens)Use Case
Claude Sonnet 4.5USD 3.00USD 15.00Brain/planner
Claude 3.5 HaikuUSD 0.80USD 4.00Worker/executor
Claude 3.5 Haiku (Batch)USD 0.50USD 2.00Async batch
Claude Opus 4.5USD 5.00USD 25.00Complex reasoning

Prompt caching (critical for agents):

  • 5-minute cache: 1.25x write cost, 0.1x read cost (90% savings)
  • 1-hour cache: 2.0x write cost, 0.1x read cost

For agent systems with repeated system prompts and tool definitions, prompt caching can reduce costs by 85-90% on cached portions.

Cross-region inference: Routes requests across AWS regions for high availability with no additional charge. This is essentially free HA.


9. Cost Modeling: From USD 13,500 to USD 2,790/month

Here’s the real math for processing 1M agent tasks per month.

Scenario A: All-Sonnet (Naive Approach)

Every task uses Claude Sonnet for every step:

StepTokensCost per TaskMonthly (1M tasks)
Planning2,000 in / 500 outUSD 0.0135USD 13,500
SQL Writing1,000 in / 200 outUSD 0.006USD 6,000
Processing1,200 in / 300 outUSD 0.0081USD 8,100
Formatting600 in / 400 outUSD 0.008USD 8,000
Synthesis1,500 in / 300 outUSD 0.009USD 9,000
TotalUSD 0.0446USD 44,600/month

Scenario B: Brain/Worker Split

Sonnet for planning + synthesis (2 calls), Haiku for everything else (3 calls):

StepModelCost per TaskMonthly (1M tasks)
PlanningSonnetUSD 0.0135USD 1,350 (10% of tasks need replanning)
SQL WritingHaikuUSD 0.0014USD 1,400
ProcessingHaikuUSD 0.002USD 2,000
FormattingHaikuUSD 0.002USD 2,000
SynthesisSonnetUSD 0.009USD 900 (only when needed)
TotalUSD 0.028USD 7,650/month

Scenario C: Brain/Worker + Prompt Caching + Batch

Add prompt caching (90% of system prompt cached) and batch API for non-real-time tasks:

OptimizationSavings
Prompt caching (system prompts)-40% on token costs
Batch API for 60% of tasks-30% on those tasks
Token compression-15% on context
Estimated monthly~USD 2,790

Infrastructure Costs

ComponentMonthly Cost
Lambda (orchestrator + workers)~USD 60
ECS Fargate (baseline, 2 tasks)~USD 320
Fargate Spot (burst capacity)~USD 96
Step Functions (10-state, 100K/month)~USD 25
SQS (5M messages)~USD 2
DynamoDB (state storage)~USD 15
Athena (50 GB scanned)~USD 0.25
CloudWatch~USD 30
Infrastructure Total~USD 550/month

Grand Total

ApproachModel CostsInfra CostsTotal
All-SonnetUSD 44,600USD 550USD 45,150
Brain/WorkerUSD 7,650USD 550USD 8,200
Brain/Worker + Caching + BatchUSD 2,790USD 550USD 3,340

Total savings: 13.5x from the naive approach.


10. CI/CD Pipeline for Agent Systems

Agent CI/CD is different from traditional software CI/CD. You’re not just testing code correctness; you’re testing reasoning quality, cost efficiency, and output accuracy.

Pipeline Architecture

GitHub Actions Pipeline

name: Deploy Agent System
on:
  push:
    branches: [main]
    paths:
      - 'agents/**'
      - 'prompts/**'

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run unit tests
        run: pytest tests/agents/
      - name: Run agent eval suite
        run: python -m agent_eval --config eval_config.yaml

  build:
    needs: test
    steps:
      - name: Build agent container
        run: docker build -t agent-worker .
      - name: Push to ECR
        uses: aws-actions/amazon-ecr-login@v2

  deploy-canary:
    needs: build
    steps:
      - name: Deploy canary (10% traffic)
        uses: aws-actions/amazon-ecs-deploy-task-definition@v1
      - name: Monitor canary metrics (15 min)
        run: python scripts/monitor_canary.py --duration 900
      - name: Promote or rollback
        run: python scripts/promote_or_rollback.py

Agent-Specific Evaluation

Traditional testing checks “does the code work?” Agent testing checks “does the agent think well?”

LangSmith Evaluation Types:

  • Trajectory evaluation: Did the agent take the right steps? (e.g., did it call sql_writer before formatter?)
  • Output accuracy: Is the final answer correct against ground truth?
  • Cost monitoring: Did it stay within the token budget?
  • Latency check: Is P95 under the SLA threshold?
  • Multi-turn evaluation: Did the agent accomplish the goal across the full interaction?

Rollback triggers:

  • AgentTaskCompletionRate drops below 95%
  • CostPerTask exceeds 2x baseline
  • P95Latency exceeds SLA threshold
  • Eval accuracy drops below minimum threshold

Prompt Versioning

Treat prompts as code. Version-control them alongside your application:

Use feature flags to A/B test prompt versions in production.


11. Observability and Monitoring

Key Metrics to Track

Observability Stack

LayerToolPurpose
Agent tracingLangSmithToken usage, latency, cost per node, trajectory
InfrastructureCloudWatch + ADOTContainer metrics, Lambda duration, error rates
Distributed tracingOpenTelemetryEnd-to-end request flow across services
DashboardsCloudWatch GenAIPre-built model invocation dashboards
AlertsCloudWatch AlarmsPagerDuty/Slack on threshold breaches

LangSmith Integration

LangSmith provides agent-specific observability that generic APM tools can’t match:

  • Token usage per node — identify which agent step is most expensive
  • Trajectory visualization — see the exact path through the graph
  • Cost attribution — break down spend by agent, tool, and model
  • Eval drift detection — alert when production quality degrades
  • Insights Agent — AI-powered analysis of your traces to discover patterns and failure modes

12. Security Architecture

IAM Roles Per Agent (Least Privilege)

Every agent type gets its own execution role. The Brain agent should NOT have the same permissions as a SQL Writer worker:

Note: The worker role only has access to Haiku (not Sonnet), read-only S3 access, and limited DynamoDB operations.

Additional Security Measures

  • Secrets Manager for all API keys and credentials with automatic rotation
  • VPC Endpoints for Bedrock, S3, DynamoDB, SQS to keep traffic off public internet
  • PII Detection Middleware — LangGraph 1.0 includes built-in middleware for scanning inputs/outputs
  • Encryption at rest — SSE-KMS for S3 and DynamoDB, TLS 1.2+ for all API calls
  • Audit logging — CloudTrail for all Bedrock invocations and state changes

13. Human-in-the-Loop Patterns

Not all decisions should be automated. Low-confidence results, high-stakes actions, and edge cases need human oversight.

LangGraph Interrupt Pattern

from langgraph.types import interrupt

def sensitive_action_node(state: State):
    """Pause for human approval before executing."""
    result = analyze_risk(state)

    if result.confidence < 0.8:
        # Pause execution -- state saved to checkpoint store
        # No Python process stays alive during the wait
        human_decision = interrupt(dict(
            question="Low confidence result detected. Approve?",
            data=result.summary,
            confidence=result.confidence
        ))

        if human_decision == "reject":
            return dict(status="rejected_by_human")

    return execute_action(state)

How It Works in Production

  1. Agent reaches the interrupt point
  2. State is saved to DynamoDB checkpoint
  3. No compute is running — the system is completely idle
  4. Notification sent to human (Slack, email, dashboard)
  5. Human reviews and sends “resume” with their decision
  6. Agent loads state from checkpoint and continues

With Step Functions, this maps to the .waitForTaskToken pattern with configurable timeout (up to 1 year).


14. Error Handling and Resilience

Retry Configuration

from langgraph.types import RetryPolicy

# Different strategies per node type
llm_retry = RetryPolicy(
    max_attempts=3,
    backoff_factor=2.0,
    retry_on=[APIError, TimeoutError, RateLimitError]
)

db_retry = RetryPolicy(
    max_attempts=5,
    backoff_factor=1.5,
    retry_on=[ConnectionError, ThrottleError]
)

Fallback Strategy

Production Resilience Checklist

  • Timeout for every async operation (no unbounded waits)
  • Bounded cycles with hard stops (max_steps counter)
  • State-driven error tracking (errors are typed objects in state)
  • Graceful degradation (partial results better than no results)
  • Dead letter queues for permanently failed tasks
  • Circuit breaker for external service dependencies

15. Scaling Patterns

Queue-Based Scaling (SQS + ECS)

The recommended pattern for agent workloads:

Target tracking auto-scaling maintains this ratio by adding/removing ECS tasks.

Scaling Strategy by Component

ComponentScaling SignalMinMaxStrategy
Brain (Lambda)Invocations01000Reserved concurrency
Workers (ECS)SQS backlog per task250Target tracking
Workers (Spot)Burst demand020Weight-based with Fargate
Data queriesAthena workgroup--Scan budget guardrail

Predictive Scaling

For workloads with recurring patterns (business-hours traffic), AWS Predictive Scaling uses ML to forecast demand and pre-provisions capacity before the spike, eliminating the 2-3 minute reactive scaling lag.


16. Production Checklist

Before going live, verify every item:

Architecture

  • Brain/Worker model split implemented and tested
  • Context injection working (data dictionary, business rules)
  • Typed contracts between all agents (JSON schemas)
  • 5-layer separation enforced

State and Reliability

  • DynamoDB checkpointer configured with TTL
  • Checkpoint size monitored (state bloat alerts)
  • All external calls have timeouts
  • Retry policies configured per node type
  • Dead letter queues for failed tasks
  • Graceful shutdown handling (SIGTERM)

Cost Controls

  • Token budget limits via middleware
  • Scan budget guardrails on Athena workgroups
  • Cost-per-task monitoring with alerts
  • Prompt caching enabled for repeated system prompts
  • Batch API for non-real-time workloads

Security

  • Least-privilege IAM roles per agent type
  • Secrets Manager for all credentials
  • VPC endpoints for AWS services
  • PII detection middleware enabled
  • Audit logging via CloudTrail

Observability

  • LangSmith tracing with cost attribution
  • CloudWatch dashboards for infrastructure
  • OpenTelemetry for distributed tracing
  • Alert thresholds for latency, cost, error rate
  • Eval drift monitoring in production

CI/CD

  • Agent evaluations as deployment gates
  • Canary deployment with automatic rollback
  • Prompt versioning in git
  • Shadow mode for new agent versions

17. LangGraph vs. Alternatives: When to Choose What

FrameworkBest ForNot Great For
LangGraphProduction systems needing fine-grained control, conditional logic, durable execution, regulated industriesQuick prototypes, simple workflows
CrewAIFast prototyping, role-based collaboration, demosComplex conditional logic, production durability
AutoGenResearch, flexible multi-agent conversationsProduction reliability
Semantic Kernel.NET enterprise, Azure-native applicationsPython-first teams
AWS StrandsAWS-native, model-driven (LLM plans autonomously)When you need explicit execution paths
AWS Bedrock AgentsNo-code/low-code, managed serviceCustom orchestration logic

Migration path: CrewAI to LangGraph is the most common migration as teams hit CrewAI’s control flow ceiling. Start with CrewAI for validation, move to LangGraph for production.

Industry trend (2026): Convergence toward graph-based orchestration across all frameworks. LangGraph pioneered it; others are following.


18. Resources and References

Official Documentation

Case Studies

Architecture References

Cost and Pricing

Production Wisdom

Export for reading

Comments