Claude SubAgents

Claude SubAgents are isolated Claude instances spawned by a parent (orchestrator) agent to perform bounded, well-defined tasks. They are the primary architectural mechanism for solving context window exhaustion and token cost explosion in production-scale agentic workflows within Claude Code.

The Core Problem They Solve

A single Claude Code session accumulates context linearly: system prompt, CLAUDE.md, memory files, tool outputs, and conversation history all compete for the same finite window (typically 200k tokens, up to 1M+ on higher tiers). As tasks grow in scope:

  • Context Blindness occurs when early instructions or file content fall outside the active window.
  • Auto-Compaction creates lossy summaries that can drop critical technical constraints.
  • Token Costs compound non-linearly because every round-trip carries the full accumulated context.

SubAgents break this pathology by enforcing isolated context boundaries at the agent level.

Architecture

┌──────────────────────────────────────────────┐
│          Orchestrator (Parent Agent)          │
│  Full session context: CLAUDE.md + history    │
│                                              │
│  Dispatches tasks ──► SubAgent A (task 1)    │
│                  ──► SubAgent B (task 2)    │
│                  ──► SubAgent C (task 3)    │
│                                              │
│  ◄── Receives concise summaries only          │
└──────────────────────────────────────────────┘

Key Properties

  1. Isolated Context Window: Each subagent starts fresh — it receives only a focused prompt and the specific files/tools required for its task, not the parent’s accumulated history.
  2. Bounded Scope: One subagent, one discrete objective.
  3. Summarized Return: The subagent returns a concise result to the parent, not a raw context dump, so the parent window does not inherit the subagent’s token usage.
  4. Parallelism: Multiple subagents can execute concurrently on independent tasks.

Dispatch Patterns

1. Task Decomposition

The orchestrator breaks a large feature into atomic sub-tasks and dispatches one subagent per task. Each returns a status; the orchestrator synthesizes the whole.

2. Parallel File Analysis

When multiple large files must be analyzed simultaneously, one subagent per file is spawned. Only summaries flow back to the parent, preventing bulk file content from bloating the parent context.

3. Sandboxed Experimentation

For exploratory changes (“try three approaches”), the orchestrator spawns parallel subagents and promotes the best result — without polluting the main session with failed attempts.

4. Specialist Delegation

The orchestrator delegates to role-specialized subagents:

  • Security Reviewer — loaded with security guidelines only
  • Test Generator — loaded with test framework docs and target module
  • Documentation Writer — loaded with style guide and public API

Token Cost Model

Session TypePer-Call Cost Formula
MonolithicSystem + Memory + Full History + New Input + Output
With SubAgentsMinimal System + Task Prompt + Relevant Files + Output (per subagent)

At scale, subagent architecture can reduce total session token cost by 60–80% for complex multi-file projects, because the parent only pays for summary returns rather than compounding full-context round-trips.

Integration Points

  • Skills: A Skill can internally dispatch subagents for heavy sub-tasks while keeping the Skill’s own context lean.
  • Context Window Diagnostics: Use /context to monitor parent context saturation — that is the trigger for subagent delegation.
  • Plan Mode: Plan Mode can draft a decomposition strategy before the orchestrator dispatches subagents in Act Mode.
  • Spec-Driven Development: Each task in a spec’s task list maps naturally to a subagent invocation.

When NOT to Use SubAgents

  • Simple, single-file tasks: Spawning a subagent adds latency overhead. Use direct tool calls instead.
  • Tightly-coupled atomic edits: If changes must be transactionally consistent across many files, a single agent with careful compaction may be safer.
  • Strict sequential dependencies: SubAgent results arrive asynchronously; if step N+1 depends critically on step N’s exact output, orchestrate sequentially rather than in parallel.

Design Philosophy

SubAgents apply microservices architecture principles to LLM cognition: small, well-defined interfaces between cognitive units prevent the “monolith” failure mode of a single overloaded context window. Each agent does one thing, knows only what it needs to know, and returns a minimal signal — exactly the same contract as a well-designed service boundary.

This mirrors the Specialized Agent Roles pattern (Architect Agent, Implementation Agent, QA Agent) at the microscale of individual Claude Code sessions.

Custom Subagent Configuration

Custom subagents are defined as Markdown files with YAML frontmatter stored in .claude/agents/ (project-scoped) or ~/.claude/agents/ (user-scoped). The file body becomes the system prompt.

---
name: code-reviewer
description: Reviews code quality. Use proactively after code changes.
tools: Read, Grep, Glob, Bash
model: sonnet
memory: project
---
 
You are a senior code reviewer...

Key Frontmatter Fields

FieldPurpose
nameUnique identifier; also the @-mention handle
descriptionRouting signal — Claude reads this to decide when to delegate
toolsAllowlist; inherits all tools if omitted
disallowedToolsDenylist; applied before tools
modelhaiku / sonnet / opus / full model ID / inherit
permissionModedefault, acceptEdits, auto, dontAsk, bypassPermissions, plan
memoryuser, project, or local — enables cross-session persistent knowledge
mcpServersScope MCP servers exclusively to this agent
hooksPreToolUse / PostToolUse / Stop lifecycle hooks
isolationworktree for git-isolated execution
backgroundtrue to always run concurrently

Scope Priority (highest to lowest)

  1. Managed settings (org-wide)
  2. --agents CLI flag (current session)
  3. .claude/agents/ (project-level)
  4. ~/.claude/agents/ (user-level)

Built-in Subagents

Claude Code ships three built-in workers:

  • Explore — read-only, Haiku-powered codebase search
  • Plan — read-only context-gathering during plan mode (cannot spawn sub-subagents)
  • general-purpose — full-tool multi-step operations

Invocation Patterns

  1. Natural language — name the agent; Claude decides whether to delegate
  2. @-mention@code-reviewer look at auth changes — guarantees delegation
  3. claude --agent <name> — entire session runs under that agent’s system prompt and tool restrictions

Fork Mode (Experimental)

Enable CLAUDE_CODE_FORK_SUBAGENT=1 to allow subagents that inherit the full parent conversation context rather than starting fresh. Forks share the parent’s prompt cache (cheaper) but sacrifice context isolation. Use when the subagent would need too much background to reconstruct from scratch.

Cost Routing via Model Selection

Routing tasks to cheaper models is the primary cost lever:

  • Exploration/search → Haiku
  • Balanced analysis → Sonnet
  • Complex reasoning → Opus

References