Agent Memory Systems

Agent Memory Systems are specialized architectures designed to provide LLMs with persistence, contextual awareness, and the ability to learn from past interactions.

🏗️ The Memory Hierarchy

To mimic human-like cognition, modern agent frameworks (like Supermemory, Mem0, and Letta) implement a tiered memory structure:

TierTechnologyPurpose
Short-termFIFO Buffers (In-memory)Immediate interaction context (Working memory). Fast but volatile.
Long-termVector Databases (Pinecone, Qdrant)Semantic storage of past knowledge. Survives restarts.
EpisodicTimestamped Graph/LogsStores specific events with full context for temporal reasoning.

🚀 Supermemory Architecture

Supermemory is a state-of-the-art implementation that focuses on Vector-First retrieval.

Key Features:

  • Fact Extraction: Automatically identifies and stores “facts” from conversations instead of just raw text.
  • Contradiction Resolution: Detects when new information conflicts with old stored knowledge.
  • Temporal Reasoning: The ability to answer questions about when something happened or the sequence of events.

🛠️ Implementation Patterns

Project Memory (Claude Code)

Claude Code implements a file-based persistent context pattern via claude.md and memory.md.

  • Explicit Context (claude.md): Human-defined instructions, conventions, and roadmaps.
  • Implicit Context (memory.md): AI-learned patterns and observations saved automatically during sessions (Auto-Memory).
  • Hierarchical Loading: Context is scoped (Global vs. Project vs. Sub-folder).
  • Progressive Disclosure: Uses “Skills” to load complex task context on-demand.

SubAgent Context Isolation

A related pattern for managing per-session memory is SubAgent Delegation. Rather than compressing long context into a single session, subagents start fresh with only the information they need. This enforces bounded working memory at the agent boundary — similar to how episodic memory isolates specific events from the long-term store.

Custom Subagent Persistent Memory

Custom subagents can be given a persistent memory directory via the memory: frontmatter field. This maps to a scoped directory where the agent writes and reads a MEMORY.md file across sessions:

ScopeLocationUse case
user~/.claude/agent-memory/<name>/Cross-project learnings
project.claude/agent-memory/<name>/Codebase-specific patterns (version-controllable)
local.claude/agent-memory-local/<name>/Project-specific, not committed

The agent’s system prompt is automatically extended with instructions to read and curate MEMORY.md, and the first 200 lines are injected at startup. This gives subagents institutional memory — recurring patterns, architectural decisions, and codebase idioms — without those facts inflating live session context.

See 2026-05-13-CampusX-Claude-Custom-Subagents for the full custom subagent configuration reference.

MCP Servers as External Memory Sources

MCP servers provide a complementary pattern to file-based memory: live external context injection. Instead of storing facts in MEMORY.md, agents query external systems (databases, GitHub, Notion, Jira) on-demand via MCP:

Memory TypeSourceFreshness
claude.md / memory.mdLocal filesStale until manually updated
Subagent MEMORY.mdScoped local filesStale until agent curates
MCP queryLive external systemReal-time

For example, a GitHub MCP server lets Claude query open issues, PR status, and repository metadata — information that would be impossible to maintain accurately in a static file. Similarly, a Jira MCP server pulls live ticket specs, and a Context7 MCP server fetches current library documentation.

Tradeoff: MCP queries cost tokens per request but provide guaranteed freshness. File-based memory costs tokens at session load but is free thereafter. The optimal strategy combines both: use file-based memory for stable patterns and MCP for volatile external state. See Claude + MCP Explained.

📊 Benchmarking Memory (MemoryBench)

Evaluating memory systems requires looking at three core metrics simultaneously (The Triple Score):

  1. Quality (Accuracy): How reliably can the agent recall and reason?
  2. Latency: How fast is the retrieval? (Measured in ms).
  3. Cost (Tokens): How much context is sent to the LLM? (Measured in tokens).

Source: Ingested from YouTube: Supermemory SOTA