Research

Best LLM Agent Memory Tools in 2026: 6 Active + MemGPT History

Mem0, Letta, Zep, Cognee, LangMem, Graphiti for LLM agent memory in 2026, plus MemGPT history. Compared on memory types, OSS license, and integration shape.

·
Updated
·
11 min read
agent-memory mem0 letta zep cognee memgpt langmem 2026
Editorial cover image on a pure black starfield background with faint white grid. Bold all-caps white headline AGENT MEMORY TOOLS 2026 fills the left half. The right half shows a wireframe brain-with-storage diagram drawn in pure white outlines with a soft white halo behind the storage node.
Table of Contents

LLM agents without persistent memory reset state on every session. The 2024 generation often did exactly that; the current generation of memory tools, as of May 2026, can remember user preferences, prior decisions, learned facts, and entity relationships across days and weeks. The six active tools below (plus MemGPT as historical context) cover the common memory shapes that show up in production agent stacks. The differences that matter are memory type coverage (semantic, episodic, entity, procedural), OSS license, backend flexibility, and integration with the broader trace and eval surface.

TL;DR: Best agent memory tool per use case

Use caseBest pickWhy (one phrase)PricingOSS
Lightweight semantic memoryMem0Strong dev ergonomics, pluggable storesOSS free; cloud paidApache 2.0
Hierarchical memory tiersLettaServer-first, MemGPT successorFree OSSApache 2.0
Memory + history + entity graphZep CloudAll-in-one managed platformFlex $125/mo, Flex Plus $375/mo, Enterprise customProprietary (legacy Apache-2.0 CE deprecated; current OSS is examples + Graphiti)
Knowledge-graph memoryCogneeRelationship modeling, graph + vectorOSS free; cloud paidApache 2.0
LangChain-native memoryLangMemTight LangChain integrationFree OSSMIT
Temporal knowledge graphGraphitiBi-temporal modeling, pluggable graph storeFree OSSApache 2.0
Historical hierarchical-memory referenceMemGPT (now Letta)Original MemGPT-paper repo; redirects to LettaFree OSSApache 2.0

If you only read one row: pick Mem0 for lightweight semantic memory; pick Letta for hierarchical memory tiers; pick Zep when memory plus entity graph plus session history live in one platform.

What an agent memory tool actually needs

Pick a tool that covers all six surfaces below. If a candidate lacks one, plan for an external service or custom integration.

  1. Memory write API. A clean way to store a fact, a preference, an entity, or a session. Without a clean write API, the team drifts to ad-hoc Postgres.
  2. Memory retrieval API. Semantic similarity, entity lookup, time-windowed recall. The retrieval shape determines what the agent can ask for.
  3. Memory consolidation. Check whether the tool supports deduplication, memory update, deletion/forgetting, and background consolidation; verify with a 1,000-interaction test.
  4. Memory types. Semantic, episodic, entity, procedural. Verify which types the tool supports natively against your workload’s needs.
  5. Backend flexibility. Pluggable vector store (Pinecone, Qdrant, Weaviate, pgvector). Pluggable graph store when entity memory matters.
  6. Trace integration. Every memory operation emits a span. Without span data, debugging a memory miss is guesswork.

Editorial scatter plot on a black starfield background titled AGENT MEMORY COVERAGE with subhead WHERE EACH 2026 TOOL SITS. Horizontal axis runs from semantic-only on the left through semantic + episodic in the middle to semantic + episodic + entity + procedural on the right. Vertical axis runs from closed at the bottom through hosted with OSS option in the middle to fully OSS at the top. Seven white dots: Mem0 in OSS x semantic + episodic with luminous halo, Letta in OSS x semantic + episodic + procedural, Zep Cloud in proprietary x semantic + episodic + entity (legacy Apache-2.0 CE deprecated), Cognee in OSS x semantic + entity, LangMem in OSS x semantic + episodic, Graphiti in OSS x entity, MemGPT (historical) in OSS x semantic + procedural redirecting to Letta.

The 7 agent memory tools compared

1. Mem0: Best for lightweight semantic memory

Open source. Apache 2.0. Hosted cloud option.

Use case: Production agents where the primary need is semantic memory with strong dev ergonomics: store a fact, retrieve facts by similarity. Mem0’s API is one-line to add or retrieve memory.

Architecture: Python and JavaScript SDK plus an optional managed service. Pluggable vector store (Qdrant, Pinecone, Weaviate, Chroma, pgvector). Pluggable LLM and embedding model. Mem0’s current README highlights an ADD-only memory algorithm where extracted facts accumulate over time, with entity linking and hybrid retrieval; verify forgetting/consolidation behavior against the latest API docs before relying on it.

Pricing: OSS is free. Mem0 cloud starts free for development; paid tiers for production.

OSS status: Apache 2.0.

Best for: Engineering teams that want to add semantic memory to an existing agent without operating a separate memory service. Strong fit for chat assistants, support agents, and copilots where user preference recall matters.

Worth flagging: Memory-as-flat-facts model is simpler than entity-graph memory; complex relationship queries need a different tool. The hosted service is newer than the OSS path; verify retention and data-handling policies.

2. Letta (formerly MemGPT): Best for hierarchical memory tiers

Open source. Apache 2.0. Hosted Letta Cloud.

Use case: Production agents where memory needs explicit tiers: short-term context, working memory, archival memory. Letta is the productized successor to the MemGPT paper and ships a server-first deployment model.

Architecture: Python server with REST API. Each agent has a structured memory state: persona, human, archival, recall. The server handles memory consolidation, paging between memory tiers, and tool-use mediation. Pluggable storage (sqlite default; Postgres for production).

Pricing: Free OSS. Letta Cloud has paid hosted tiers; verify Letta pricing.

OSS status: Apache 2.0.

Best for: Engineering teams that want a server-first memory model with explicit tiers. Strong fit for long-running agents, persistent assistants, and workflows where memory tier discipline matters.

Worth flagging: Server-first deployment is more involved than Mem0’s library-first model. The MemGPT-paper abstractions (persona, human blocks) take a session to internalize. Tool calls are mediated through the server; some teams prefer direct LLM-to-tool flows.

3. Zep Cloud: Best for memory + history + entity graph in one platform

Proprietary managed product. Legacy Apache-2.0 Community Edition deprecated. Current OSS surface is examples/integrations plus Graphiti.

Use case: Production agents that need three memory surfaces in one platform: session history (short-term), semantic memory (long-term facts), and an entity graph (people, projects, organizations). Zep Cloud ships all three with a unified SDK.

Architecture: Managed multi-tenant service. The old Apache-2.0 Community Edition is deprecated under a legacy/ directory in the GitHub repo. The current OSS surface is examples and integrations published alongside the managed product, plus Graphiti as the standalone temporal-graph framework. The entity graph is built from extracted entities and relationships.

Pricing: Zep Cloud uses credit-based pricing: Flex around $125/month, Flex Plus around $375/month, and Enterprise custom. Verify current request, credit, and enterprise terms.

OSS status: Zep Cloud is proprietary managed software; the old Apache-2.0 Community Edition is deprecated under legacy/; current OSS surface is examples/integrations plus Graphiti.

Best for: Teams that want one managed memory platform across short-term, long-term, and entity graph. Strong fit for customer support, healthcare assistants, and CRM-flavored agents.

Worth flagging: No supported self-host path today; managed-only operation may not fit teams with strict data-residency or air-gap requirements. Plan migrations off the deprecated Community Edition before it stops receiving security updates.

4. Cognee: Best for knowledge-graph memory

Open source. Apache 2.0. Hosted cloud option.

Use case: Agents whose memory is relationship-heavy: entities, projects, documents, organizations connected as a knowledge graph. Cognee builds and queries the graph with LLM-extracted entities and relationships.

Architecture: Python SDK that orchestrates document ingestion, entity extraction, graph construction, and retrieval. Pluggable vector store and graph store (Neo4j, NetworkX, Kuzu). LLM-powered entity and relationship extraction.

Pricing: Free OSS. Hosted Cognee tiers available.

OSS status: Apache 2.0.

Best for: Teams whose agents need to reason over relationships: research assistants, investigative agents, knowledge management. Strong fit for workloads where the data has natural graph structure.

Worth flagging: Knowledge-graph extraction is LLM-mediated and not perfect; verify the extraction quality on your data. Heavier setup than Mem0 because of the graph store. Some relationship queries are slower than pure semantic search.

5. LangMem: Best for LangChain-native memory

Open source. MIT.

Use case: Teams already on LangChain or LangGraph who want memory primitives that integrate cleanly with the LangChain runtime. LangMem provides reflection, memory store, and consolidation primitives that plug into existing LangGraph state.

Architecture: Python library that provides memory APIs over configurable storage with native LangGraph store integration. Reflection and summarization run via LLM-powered helpers. Consolidation via background tasks where supported by the storage backend.

Pricing: Free OSS.

OSS status: MIT. Part of the LangChain ecosystem.

Best for: LangChain v1 and LangGraph teams that want memory primitives in the same ecosystem.

Worth flagging: Outside LangChain, the library has less value. The memory abstraction is shallower than Letta’s hierarchical model or Zep’s entity graph. For deep memory needs, pair LangMem with another tool.

6. Graphiti: Best for temporal knowledge graphs

Open source. Apache 2.0.

Use case: Agents that need to reason about time: when a fact was true, when it changed, what the user believed at a past moment. Graphiti is a temporal knowledge graph framework with bi-temporal modeling.

Architecture: Python library that supports Neo4j, FalkorDB, Kuzu, and Amazon Neptune backends; Neo4j remains the common quickstart path. Bi-temporal: every fact has a valid-time (when the fact was true in reality) and a transaction-time (when the fact was recorded). Supports time-windowed queries: what did the agent know about user preferences as of last Tuesday?

Pricing: Free OSS. Operational cost is the underlying graph infrastructure (Neo4j, FalkorDB, Kuzu, or Neptune).

OSS status: Apache 2.0. Maintained by the Zep team.

Best for: Agents in regulated industries where audit trails matter, financial or legal agents that need point-in-time recall, and research agents that need temporal reasoning.

Worth flagging: Bi-temporal modeling adds complexity. The underlying graph database carries an operational footprint. Many production agents do not need temporal reasoning; for those, simpler memory is the right pick.

7. MemGPT (now Letta): Historical hierarchical-memory reference

Open source. Apache 2.0. Now part of Letta.

Use case: Historical context for the canonical hierarchical-memory abstraction from the MemGPT paper. The original cpacker/MemGPT GitHub repo now redirects to letta-ai/letta, so any production deployment should target Letta directly rather than treating MemGPT as a separate live framework.

Architecture: The original framework treated the LLM context as a virtual memory hierarchy with core, recall, and archival tiers, paged via tool calls. Those abstractions live on inside Letta’s server-first product (covered above as the second pick).

Pricing: Free OSS. New work should follow the Letta pricing page.

OSS status: Apache 2.0, repo redirects to Letta.

Best for: Reading the original abstractions and the paper. For production memory work, prefer Letta.

Worth flagging: This is included for historical clarity, not as a separately maintained tool. If you find docs or examples pointing at cpacker/MemGPT, treat them as Letta references.

Editorial four-panel dark dashboard mosaic for agent memory observability. Top-left: Memory operation trace with 5 memory.write and memory.retrieve spans, with a focal halo on a memory.retrieve that returned 3 facts. Top-right: Memory recall accuracy chart with 4 dataset rows showing recall@k for k=1, 3, 5, 10 with bars. Bottom-left: Memory size growth over time chart showing facts stored, facts consolidated, facts forgotten with three lines. Bottom-right: Entity graph snippet with 4 entities (User, Project, Organization, Document) connected by relationship edges, with a focal halo on a User-Project edge.

Decision framework: pick by constraint

  • Lightweight semantic memory: Mem0.
  • Hierarchical tiers: Letta (the active project; MemGPT is the historical name).
  • Memory + history + entity graph in one: Zep.
  • Knowledge-graph-flavored: Cognee, Graphiti.
  • Temporal reasoning: Graphiti.
  • LangChain-native: LangMem.
  • Academic reference: MemGPT paper (now Letta).

Common mistakes when picking an agent memory tool

  • Treating memory as RAG over chat history. Pure RAG over the conversation transcript misses entity relationships, time, and consolidation. The 2026 tools handle these natively.
  • Skipping consolidation. Memory that grows without bound degrades retrieval. Verify whether the candidate tool supports deduplication, memory update, deletion/forgetting, and background consolidation, then run a 1,000-interaction test before standardizing.
  • Picking on demo recall. Demos use idealized facts and idealized queries. Run a domain reproduction with real interactions and real failure modes.
  • Pricing only the platform fee. Real cost equals platform fee plus vector store cost plus embedding cost plus engineering hours.
  • Underestimating retrieval latency. Memory retrieval adds latency to every agent turn. Budget the p95 latency at production volume.
  • Skipping trace integration. A memory miss without span data is invisible. Wire memory operations into the trace surface.

Recent agent memory updates

DateEventWhy it matters
2024MemGPT paper published (arXiv 2310.08560)The canonical hierarchical-memory abstraction entered public discourse.
2024-2025MemGPT became LettaThe academic project productized into a server-first platform.
2024-2026Mem0 grew rapidly on GitHubLightweight semantic memory accumulated significant community adoption; verify current star count and release version on the repo.
2025-2026Zep deprecated Community Edition and consolidated on Zep CloudSelf-hosted Community Edition moved to legacy; managed Zep Cloud became the supported path.
2025-2026Cognee continued shipping knowledge-graph extraction featuresGraph-flavored agent memory has more production references; verify exact release versions and benchmarks before adopting.
2025-2026Graphiti shipped bi-temporal modeling and expanded backendsTemporal reasoning over agent memory became a first-class option; backends expanded beyond Neo4j.

How to actually evaluate this for production

  1. Define a labeled dataset. Agent interactions across sessions where memory matters: user preferences, factual recall, multi-turn reasoning, entity tracking. Hand-label expected memory behavior.

  2. Run candidate tools with the same upstream LLM. Hold prompts and tools constant. Measure recall@k, precision, p95 retrieval latency, cost per memory operation.

  3. Test consolidation. Run the agent for 1000+ interactions. Measure how memory size grows, how recall degrades, and how consolidation handles staleness.

  4. Wire to a trace surface. Every memory operation should emit a span. Wire to FutureAGI for span-attached recall scoring, Phoenix for OTel-native tracing, or Langfuse for self-hosted observability.

  5. Cost-adjust at production volume. Real cost equals platform fee plus vector store cost plus embedding cost plus engineering hours. Project 90 days.

Sources

Read next: Agent Architecture Patterns, Best Multi-Agent Frameworks, Best Vector Databases for RAG

Frequently asked questions

What is LLM agent memory and why does it matter?
LLM agent memory is the layer that stores and retrieves prior interactions, facts, preferences, and relationships across agent sessions. Without memory, every conversation starts cold. With memory, the agent remembers the user's name, prior decisions, and learned preferences across days and weeks. The basic primitives are short-term (conversation buffer), long-term (semantic memory of facts), episodic (recall of past sessions), and entity (relationships between people, projects, and concepts).
What are the main types of agent memory?
Five types matter in 2026. First, short-term: the rolling conversation buffer within a session. Second, long-term semantic: facts the agent remembers (user prefers concise answers; user is allergic to peanuts). Third, long-term episodic: past sessions retrieved by similarity. Fourth, entity / graph memory: relationships between people, projects, organizations stored as a graph. Fifth, procedural: learned skills and patterns. Production agents use 2-4 of these types in concert; pure RAG over chat history is rarely enough.
Which agent memory tool is best in 2026?
It depends on the memory shape. Mem0 leads on lightweight semantic memory with strong dev ergonomics. Letta (the productized successor to the MemGPT paper) leads on hierarchical memory with explicit memory tiers. Zep Cloud leads on memory + history + entity graph in one managed platform. Cognee leads on knowledge-graph-flavored memory with strong relationship modeling. LangMem leads on tight LangChain integration. Graphiti leads on temporal knowledge graphs with bi-temporal modeling. The original MemGPT repo now redirects to Letta, so treat MemGPT as historical context rather than a separate current option. Match the tool to the memory shape your agent needs.
Are agent memory tools open source?
Most are. Mem0 is Apache 2.0. Letta is Apache 2.0. Cognee is Apache 2.0. Graphiti is Apache 2.0. LangMem is MIT (LangChain ecosystem). The original MemGPT (Apache 2.0) is the academic project that became Letta. Zep's older Community Edition has been deprecated and moved to legacy; the current production offering is managed Zep Cloud, with examples and integrations published under permissive licenses. Verify the LICENSE file before redistributing.
How do agent memory tools integrate with vector databases?
Most agent memory tools sit on top of a vector store: Mem0 supports Qdrant, Pinecone, Weaviate, Chroma, pgvector. Letta uses an embedded sqlite store with pluggable backends. Zep Cloud uses its own optimized backend. Cognee uses a vector store plus a graph store. LangMem provides memory APIs over configurable storage with native LangGraph store integration. Graphiti supports Neo4j, FalkorDB, Kuzu, and Amazon Neptune backends; Neo4j remains the common quickstart path. Pick the memory tool first; the vector or graph store choice often follows.
How do I evaluate an agent memory tool for production?
Define a labeled dataset of agent interactions across sessions where memory matters: user preferences, factual recall, multi-turn reasoning, entity tracking. Run each candidate with the same upstream LLM and the same prompts. Score: recall accuracy (does the memory return the right fact?), precision (does it avoid stale facts?), latency p95 at production volume, and cost per memory operation. Pick the one whose miss-modes match your tolerance; published numbers vary across tools and methodologies, so build your own benchmark before procurement.
How do agent memory pricing models compare?
Mem0 OSS is free; hosted Mem0 cloud has paid tiers. Letta is free OSS with paid Letta Cloud tiers. Cognee is free OSS plus paid hosted tiers. Zep Cloud uses credit-based pricing (Flex around $125/month, Flex Plus around $375/month, Enterprise custom); the older self-hosted Community Edition is deprecated. LangMem is free OSS. Graphiti is free OSS. MemGPT is free OSS but redirects to Letta. Verify against vendor pricing pages; rates change quarterly.
How do I observe agent memory operations in production?
Wrap each memory call (retrieve, write, consolidate) in an OpenTelemetry span carrying the operation type, query, returned fact count, recall score, and latency. [FutureAGI's](https://futureagi.com/) Apache 2.0 traceAI library can be instrumented around memory calls so each retrieve and write becomes a span with custom attributes; Mem0, Letta, Zep, Cognee, LangMem, and Graphiti can all be wrapped this way. Eval functions then score memory recall and precision per span: `turing_flash` returns guardrail-style verdicts at 50-70ms p95, while full eval templates run roughly 1-2 seconds and belong in async or pre-deploy paths.
Related Articles
View all
Stay updated on AI observability

Get weekly insights on building reliable AI systems. No spam.