LLM Observability

LLM Observability is the ability to monitor, trace, and debug the performance of Large Language Models in production.

Key Metrics

Token Usage: Cost and efficiency tracking.
Latency: Time to first token and total response time.
Accuracy (Evals): Using LLMs or heuristic checks to grade the quality of the output.
Context Integrity: Measuring how much of the provided context was actually utilized or ignored (Context Window management).

SubAgent Cost Observability

When using Claude SubAgents, token observability becomes multi-dimensional: you must track both the parent orchestrator’s context and each subagent’s isolated context. Key metrics to monitor:

Parent Context Saturation: Use /context to track when the orchestrator is nearing delegation threshold (~70–80% window).
Per-Subagent Token Spend: Each subagent spawn has a fixed startup cost (system prompt + task prompt). Measure whether the isolation benefit outweighs the spawn overhead.
Summary Fidelity: Assess whether subagent summaries preserve enough precision for the parent to synthesize correctly.

Model-Tier Cost Routing

Custom subagents introduce a new observability dimension: model-tier tracking. When routing tasks to Haiku vs. Sonnet vs. Opus, track per-agent model tier to measure whether cost routing decisions are correct:

High-frequency exploration tasks running on Opus signal a misconfigured agent model: field
Complex reasoning tasks running on Haiku signal under-provisioning

Rakesh's Brain

Explorer

LLM Observability

LLM Observability

Key Metrics

SubAgent Cost Observability

Model-Tier Cost Routing

References

Table of Contents

Graph View

Latest Blog Posts

Backlinks