LLM Observability

LLM Observability is the ability to monitor, trace, and debug the performance of Large Language Models in production.

Key Metrics

  • Token Usage: Cost and efficiency tracking.
  • Latency: Time to first token and total response time.
  • Accuracy (Evals): Using LLMs or heuristic checks to grade the quality of the output.
  • Context Integrity: Measuring how much of the provided context was actually utilized or ignored (Context Window management).

SubAgent Cost Observability

When using Claude SubAgents, token observability becomes multi-dimensional: you must track both the parent orchestrator’s context and each subagent’s isolated context. Key metrics to monitor:

  • Parent Context Saturation: Use /context to track when the orchestrator is nearing delegation threshold (~70–80% window).
  • Per-Subagent Token Spend: Each subagent spawn has a fixed startup cost (system prompt + task prompt). Measure whether the isolation benefit outweighs the spawn overhead.
  • Summary Fidelity: Assess whether subagent summaries preserve enough precision for the parent to synthesize correctly.

Model-Tier Cost Routing

Custom subagents introduce a new observability dimension: model-tier tracking. When routing tasks to Haiku vs. Sonnet vs. Opus, track per-agent model tier to measure whether cost routing decisions are correct:

  • High-frequency exploration tasks running on Opus signal a misconfigured agent model: field
  • Complex reasoning tasks running on Haiku signal under-provisioning

Related: Context Engineering, Claude SubAgents

References