Agent Memory Systems
Agent Memory Systems are specialized architectures designed to provide LLMs with persistence, contextual awareness, and the ability to learn from past interactions.
🏗️ The Memory Hierarchy
To mimic human-like cognition, modern agent frameworks (like Supermemory, Mem0, and Letta) implement a tiered memory structure:
| Tier | Technology | Purpose |
|---|---|---|
| Short-term | FIFO Buffers (In-memory) | Immediate interaction context (Working memory). Fast but volatile. |
| Long-term | Vector Databases (Pinecone, Qdrant) | Semantic storage of past knowledge. Survives restarts. |
| Episodic | Timestamped Graph/Logs | Stores specific events with full context for temporal reasoning. |
🚀 Supermemory Architecture
Supermemory is a state-of-the-art implementation that focuses on Vector-First retrieval.
Key Features:
- Fact Extraction: Automatically identifies and stores “facts” from conversations instead of just raw text.
- Contradiction Resolution: Detects when new information conflicts with old stored knowledge.
- Temporal Reasoning: The ability to answer questions about when something happened or the sequence of events.
🛠️ Implementation Patterns
Project Memory (Claude Code)
Claude Code implements a file-based persistent context pattern via claude.md and memory.md.
- Explicit Context (
claude.md): Human-defined instructions, conventions, and roadmaps. - Implicit Context (
memory.md): AI-learned patterns and observations saved automatically during sessions (Auto-Memory). - Hierarchical Loading: Context is scoped (Global vs. Project vs. Sub-folder).
- Progressive Disclosure: Uses “Skills” to load complex task context on-demand.
SubAgent Context Isolation
A related pattern for managing per-session memory is SubAgent Delegation. Rather than compressing long context into a single session, subagents start fresh with only the information they need. This enforces bounded working memory at the agent boundary — similar to how episodic memory isolates specific events from the long-term store.
Custom Subagent Persistent Memory
Custom subagents can be given a persistent memory directory via the memory: frontmatter field. This maps to a scoped directory where the agent writes and reads a MEMORY.md file across sessions:
| Scope | Location | Use case |
|---|---|---|
user | ~/.claude/agent-memory/<name>/ | Cross-project learnings |
project | .claude/agent-memory/<name>/ | Codebase-specific patterns (version-controllable) |
local | .claude/agent-memory-local/<name>/ | Project-specific, not committed |
The agent’s system prompt is automatically extended with instructions to read and curate MEMORY.md, and the first 200 lines are injected at startup. This gives subagents institutional memory — recurring patterns, architectural decisions, and codebase idioms — without those facts inflating live session context.
See 2026-05-13-CampusX-Claude-Custom-Subagents for the full custom subagent configuration reference.
MCP Servers as External Memory Sources
MCP servers provide a complementary pattern to file-based memory: live external context injection. Instead of storing facts in MEMORY.md, agents query external systems (databases, GitHub, Notion, Jira) on-demand via MCP:
| Memory Type | Source | Freshness |
|---|---|---|
claude.md / memory.md | Local files | Stale until manually updated |
Subagent MEMORY.md | Scoped local files | Stale until agent curates |
| MCP query | Live external system | Real-time |
For example, a GitHub MCP server lets Claude query open issues, PR status, and repository metadata — information that would be impossible to maintain accurately in a static file. Similarly, a Jira MCP server pulls live ticket specs, and a Context7 MCP server fetches current library documentation.
Tradeoff: MCP queries cost tokens per request but provide guaranteed freshness. File-based memory costs tokens at session load but is free thereafter. The optimal strategy combines both: use file-based memory for stable patterns and MCP for volatile external state. See Claude + MCP Explained.
📊 Benchmarking Memory (MemoryBench)
Evaluating memory systems requires looking at three core metrics simultaneously (The Triple Score):
- Quality (Accuracy): How reliably can the agent recall and reason?
- Latency: How fast is the retrieval? (Measured in ms).
- Cost (Tokens): How much context is sent to the LLM? (Measured in tokens).
Source: Ingested from YouTube: Supermemory SOTA