Released on February 5, 2026, Claude Opus 4.6 came with a feature that felt more like a research preview than a product announcement: Agent Teams. Multiple Claude Code instances working in parallel, coordinating directly with each other, each owning a piece of a larger task.
I’ve been building with it for six weeks. Here’s what I’ve learned.
What Agent Teams Actually Is
The simplest mental model: Agent Teams lets you spin up a Claude Code session as an orchestrator, which can spawn sub-agents (other Claude Code instances) that work on parallel subtasks. Each sub-agent runs in its own context window. They can message each other directly — not just report back to the orchestrator.
This is different from how most multi-agent frameworks work today. In a typical setup:
- Agent A calls Agent B (or delegates to it)
- Agent B executes, returns result
- Agent A processes and decides next step
Agent Teams allows peer-to-peer coordination:
- Orchestrator spawns Agent A and Agent B
- Agent A discovers it needs something Agent B is working on
- Agent A messages Agent B directly
- Both continue, orchestrator synthesizes
For software development specifically, this maps well onto how engineering teams actually work. Not strictly hierarchical, with a tech lead delegating every decision — but collaborative, with engineers coordinating laterally on shared concerns.
The Adaptive Thinking Architecture
Alongside Agent Teams, Opus 4.6 introduced Adaptive Thinking — and this is the change that affects API integration most directly.
The old model: you set budget_tokens to control how much reasoning the model does. Simple, but blunt. You either paid for maximum reasoning on every call or accepted degraded performance on complex tasks.
The new model: you specify an effort level (low, medium, high, max) and let the model decide how much reasoning the problem actually requires. Adaptive thinking automatically enables interleaved thinking — the model can reason between steps, not just at the start of a response.
import anthropic
client = anthropic.Anthropic()
# Old approach — deprecated on Opus 4.6
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=8000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # ⚠️ Deprecated
},
messages=[{"role": "user", "content": "Review this PR diff..."}]
)
# New approach — use this
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=8000,
thinking={
"type": "adaptive",
"effort": "high" # low | medium | high (default) | max
},
messages=[{"role": "user", "content": "Review this PR diff..."}]
)
The other breaking change: prefilling assistant messages now returns a 400 error. If you’ve been using the assistant prefill pattern for steering outputs, you need to move this to system prompts before deploying Opus 4.6.
Context Compaction: The Enabler for Long Agent Sessions
One of the quiet but important features in Opus 4.6 is Context Compaction (beta). In long-running agentic workflows, context windows fill up. The traditional behavior: you hit the limit, the session fails, you lose state.
Context Compaction handles this by summarizing older context as the session approaches token limits, preserving the essential state while discarding the verbatim history. For Agent Teams sessions that might run for hours on a large codebase migration, this is the difference between a session that completes and one that fails at the 3-hour mark.
The 1M token context window (beta) helps too, but Context Compaction is what makes 12+ hour agent sessions genuinely viable.
Real Performance: Terminal-Bench 2.0
The benchmark that matters most for coding agents is Terminal-Bench 2.0, which evaluates performance on agentic coding tasks in a real terminal environment — not isolated coding puzzles, but end-to-end development workflows.
Opus 4.6 scored 65.4% on Terminal-Bench 2.0 — the highest ever recorded. For comparison, the previous best was in the low-50s.
The practical implication: Opus 4.6 can handle complete development workflows, not just code generation snippets. Multi-file changes, test running, error fixing, iterating based on output — the kind of task sequence a developer actually does.
Building With Agent Teams: A Real Example
Here’s a pattern I’ve used successfully for parallel code review across a large PR:
import anthropic
import subprocess
import json
client = anthropic.Anthropic()
def spawn_review_agent(file_path: str, diff_content: str, concern: str) -> str:
"""Run a focused review agent on a specific concern."""
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=4000,
thinking={"type": "adaptive", "effort": "high"},
system=f"""You are a specialized code reviewer focused on: {concern}.
Review only what's relevant to your focus area. Be specific and actionable.""",
messages=[
{
"role": "user",
"content": f"Review this change in {file_path}:\n\n{diff_content}"
}
]
)
return response.content[-1].text
def parallel_pr_review(pr_diff: str) -> dict:
"""Orchestrate parallel review across multiple concerns."""
concerns = [
("security", "Security vulnerabilities, injection risks, auth issues"),
("performance", "Performance bottlenecks, N+1 queries, memory leaks"),
("correctness", "Logic errors, edge cases, error handling gaps"),
("maintainability", "Code structure, naming, testability")
]
# In a real Agent Teams setup, these run in parallel Claude Code sessions
# Here showing the orchestration pattern
results = {}
for concern_id, concern_desc in concerns:
results[concern_id] = spawn_review_agent(
"pr_diff.txt",
pr_diff,
concern_desc
)
return results
# Orchestrator synthesizes results
def synthesize_reviews(reviews: dict) -> str:
synthesis_prompt = "\n\n".join([
f"## {concern.title()} Review\n{review}"
for concern, review in reviews.items()
])
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=2000,
thinking={"type": "adaptive", "effort": "medium"},
messages=[
{
"role": "user",
"content": f"Synthesize these parallel code reviews into a prioritized action list:\n\n{synthesis_prompt}"
}
]
)
return response.content[-1].text
The key architectural insight: each sub-agent runs with a focused context — it only knows about the files and concerns relevant to its slice of work. This produces better results than feeding everything to one agent because the reasoning isn’t diluted across unrelated concerns.
The Codebase Migration Case
The use case Anthropic highlights most prominently for Opus 4.6 is large-scale codebase migration, and the performance data supports this framing.
A multi-million-line migration that would typically take weeks of developer time can be orchestrated as an Agent Teams session where:
- An orchestrator agent maps the codebase and creates a migration plan
- Multiple sub-agents handle different modules in parallel
- Agents flag dependencies to the orchestrator when they encounter cross-module concerns
- The orchestrator coordinates merging work and resolving conflicts
The reported result: half the time compared to sequential single-agent approaches.
I’ve tested this at smaller scale — a 200K-line .NET codebase migration from .NET 6 to .NET 9. With a single agent: 8 hours. With Agent Teams (4 parallel sub-agents on different namespaces): 2.5 hours. The time savings are real, but the coordination overhead matters — the orchestrator needs enough context to resolve conflicts intelligently.
What Actually Changed for My Workflow
Six weeks in, the changes I actually care about:
Agent Teams is most valuable for tasks with natural parallelism. Code review, parallel module development, multi-format content generation. Less valuable for tasks where everything depends on everything else — sequential reasoning is still the right approach there.
Adaptive thinking is better than budget_tokens. The old approach required tuning budget_tokens per task type, which was trial-and-error. The effort levels map more intuitively to actual task complexity — I use high for code generation, medium for review, low for formatting/extraction tasks.
The 1M context window changes planning. Feeding an entire codebase into context (for large-scale analysis or migration planning) is now practical, not theoretical. This creates a new category of task that wasn’t feasible before.
Pricing stayed the same ($5/$25 per million tokens). For a capability upgrade this significant, keeping the price flat is notable. It changes the ROI calculation on tasks that were previously at the margin.
Migration Checklist
If you’re on Claude Opus 4.5 or earlier:
□ Replace thinking.budget_tokens → thinking.type: "adaptive", effort: "high"
□ Remove interleaved-thinking-2025-05-14 beta header (now ignored)
□ Move assistant prefill patterns to system prompts (prefill → 400 error)
□ Test context compaction behavior in long-running sessions
□ Update model string to "claude-opus-4-6"
The migration is low-risk — most changes are additive or provide sensible defaults. The prefill change is the only one likely to cause immediate failures, so test that first.
Looking Forward
The trajectory is clear: multi-agent coordination is becoming a first-class concern in how AI-assisted development tools are designed. Agent Teams in Opus 4.6, GitHub Copilot Workspace, Windsurf’s parallel agent sessions — all of these shipped within the same quarter.
The developer’s job is shifting from writing code to orchestrating agents that write code, with the developer focused on architecture, verification, and integration. That’s a different skill set than optimizing LLM prompts. The teams that figure out orchestration patterns — how to split work, how to verify agent output, how to handle agent coordination failures — will have a structural advantage.
That’s the bet worth making today.
Claude Opus 4.6 is available via the Anthropic API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Azure AI Foundry. Agent Teams is in research preview.