LLM Observability
LLM Observability is the ability to monitor, trace, and debug the performance of Large Language Models in production.
Key Metrics
- Token Usage: Cost and efficiency tracking.
- Latency: Time to first token and total response time.
- Accuracy (Evals): Using LLMs or heuristic checks to grade the quality of the output.
- Context Integrity: Measuring how much of the provided context was actually utilized or ignored (Context Window management).
SubAgent Cost Observability
When using Claude SubAgents, token observability becomes multi-dimensional: you must track both the parent orchestrator’s context and each subagent’s isolated context. Key metrics to monitor:
- Parent Context Saturation: Use
/contextto track when the orchestrator is nearing delegation threshold (~70–80% window). - Per-Subagent Token Spend: Each subagent spawn has a fixed startup cost (system prompt + task prompt). Measure whether the isolation benefit outweighs the spawn overhead.
- Summary Fidelity: Assess whether subagent summaries preserve enough precision for the parent to synthesize correctly.
Model-Tier Cost Routing
Custom subagents introduce a new observability dimension: model-tier tracking. When routing tasks to Haiku vs. Sonnet vs. Opus, track per-agent model tier to measure whether cost routing decisions are correct:
- High-frequency exploration tasks running on Opus signal a misconfigured agent
model:field - Complex reasoning tasks running on Haiku signal under-provisioning
Related: Context Engineering, Claude SubAgents