Agent Memory Systems

Agent Memory Systems are specialized architectures designed to provide LLMs with persistence, contextual awareness, and the ability to learn from past interactions.

🏗️ The Memory Hierarchy

To mimic human-like cognition, modern agent frameworks (like Supermemory, Mem0, and Letta) implement a tiered memory structure:

Tier	Technology	Purpose
Short-term	FIFO Buffers (In-memory)	Immediate interaction context (Working memory). Fast but volatile.
Long-term	Vector Databases (Pinecone, Qdrant)	Semantic storage of past knowledge. Survives restarts.
Episodic	Timestamped Graph/Logs	Stores specific events with full context for temporal reasoning.

🚀 Supermemory Architecture

Supermemory is a state-of-the-art implementation that focuses on Vector-First retrieval.

Key Features:

Fact Extraction: Automatically identifies and stores “facts” from conversations instead of just raw text.
Contradiction Resolution: Detects when new information conflicts with old stored knowledge.
Temporal Reasoning: The ability to answer questions about when something happened or the sequence of events.

🛠️ Implementation Patterns

Project Memory (Claude Code)

Claude Code implements a file-based persistent context pattern via claude.md and memory.md.

Explicit Context (claude.md): Human-defined instructions, conventions, and roadmaps.
Implicit Context (memory.md): AI-learned patterns and observations saved automatically during sessions (Auto-Memory).
Hierarchical Loading: Context is scoped (Global vs. Project vs. Sub-folder).
Progressive Disclosure: Uses “Skills” to load complex task context on-demand.

SubAgent Context Isolation

A related pattern for managing per-session memory is SubAgent Delegation. Rather than compressing long context into a single session, subagents start fresh with only the information they need. This enforces bounded working memory at the agent boundary — similar to how episodic memory isolates specific events from the long-term store.

Custom Subagent Persistent Memory

Custom subagents can be given a persistent memory directory via the memory: frontmatter field. This maps to a scoped directory where the agent writes and reads a MEMORY.md file across sessions:

Scope	Location	Use case
`user`	`~/.claude/agent-memory/<name>/`	Cross-project learnings
`project`	`.claude/agent-memory/<name>/`	Codebase-specific patterns (version-controllable)
`local`	`.claude/agent-memory-local/<name>/`	Project-specific, not committed

The agent’s system prompt is automatically extended with instructions to read and curate MEMORY.md, and the first 200 lines are injected at startup. This gives subagents institutional memory — recurring patterns, architectural decisions, and codebase idioms — without those facts inflating live session context.

See 2026-05-13-CampusX-Claude-Custom-Subagents for the full custom subagent configuration reference.

MCP Servers as External Memory Sources

MCP servers provide a complementary pattern to file-based memory: live external context injection. Instead of storing facts in MEMORY.md, agents query external systems (databases, GitHub, Notion, Jira) on-demand via MCP:

Memory Type	Source	Freshness
`claude.md` / `memory.md`	Local files	Stale until manually updated
Subagent `MEMORY.md`	Scoped local files	Stale until agent curates
MCP query	Live external system	Real-time

For example, a GitHub MCP server lets Claude query open issues, PR status, and repository metadata — information that would be impossible to maintain accurately in a static file. Similarly, a Jira MCP server pulls live ticket specs, and a Context7 MCP server fetches current library documentation.

Tradeoff: MCP queries cost tokens per request but provide guaranteed freshness. File-based memory costs tokens at session load but is free thereafter. The optimal strategy combines both: use file-based memory for stable patterns and MCP for volatile external state. See Claude + MCP Explained.

📊 Benchmarking Memory (MemoryBench)

Evaluating memory systems requires looking at three core metrics simultaneously (The Triple Score):

Quality (Accuracy): How reliably can the agent recall and reason?
Latency: How fast is the retrieval? (Measured in ms).
Cost (Tokens): How much context is sent to the LLM? (Measured in tokens).

Source: Ingested from YouTube: Supermemory SOTA

Rakesh's Brain

Explorer

Agent-Memory-Systems