← Back to blog

How OpenClaw's Memory System Actually Works

· db0 teamopenclawmemorydeep-dive

LLMs have no memory.

Every time you open a new conversation, the model knows nothing about who you are, what you're working on, or what you discussed last time. It can answer questions, write code, and analyze documents — but it doesn't remember anything. Every conversation is a first meeting.

OpenClaw's memory system is, at its core, an engineering effort to simulate "remembering" on top of a model that inherently forgets.

This distinction matters. OpenClaw's memory is not a capability of the model — it's a product of system design. That means it works fundamentally differently from human memory. Human memory is weights in a neural network. OpenClaw's memory is files, database indexes, and a set of mechanisms for getting information into the model's view.

This mechanism consists of three subsystems.

OpenClaw Memory: Three Subsystems


Subsystem 1: Where information lives

The first problem of memory is storage. OpenClaw splits information across three storage layers with very different durability.

Permanent storage: Markdown files

OpenClaw's core memory lives in plain text files on disk. These files persist as long as you don't manually delete them — unaffected by restarts, unaffected by session changes.

There are two kinds:

The first is configuration filesSOUL.md defines the agent's identity and personality, AGENTS.md stores workflow rules and operational constraints, USER.md stores stable information about you (your projects, work preferences, communication style), and MEMORY.md stores facts and decisions that should persist across sessions. These files are the agent's "long-term memory" — the most stable layer in the entire system.

The second is log files — daily logs like memory/2026-03-16.md, recording the day's work status, decisions made, and tasks still pending.

Semi-permanent storage: Conversation history

Every conversation with the agent is saved in its entirety, in a structured format on disk. What you discussed with the agent today can be resumed tomorrow — that's the value of conversation history.

But "saved on disk" and "visible to the model" are two different things. When conversation history gets long, the system compresses older portions into summaries. The original content is still on disk, but the model can only see the summarized version. The information hasn't disappeared, but the details have been compressed away.

Temporary storage: The context window

This is where the model actually "sees" things. Your messages, the agent's replies, tool call results — everything lives here.

But it's finite. For Claude, this space is roughly 200K tokens — about the length of a medium-sized book. When there's too much content to fit, the system must make tradeoffs.

These three storage layers decrease in durability: files are permanent, conversation history is semi-permanent, the context window is purely temporary.


Subsystem 2: How information enters the model

Information sitting on disk is useless if the model can't see it. The second subsystem solves: how to get stored information into the context window so the model can use it.

Bootstrap injection: Auto-loaded every time

When a session starts, a set of workspace files are automatically injected into context — including AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, and USER.md. Which files get injected depends on the session type — MEMORY.md is only auto-loaded in the main private session, not in group conversations.

This is the most reliable way to deliver information, because it requires no judgment — regardless of relevance, it loads every time. This is why information in these files can truly "survive across sessions": even when conversation history gets compressed, these files will be reloaded next time.

But this reliability has a cost: loading every time means these files consume fixed context space. There's a hard limit — each file maxes out at 20,000 characters, and all configuration files together max out at 150,000 characters (roughly 50K tokens). Content beyond the limit is truncated. The system can inject a truncation warning — but if the agent doesn't notice this signal, the second half of the content has effectively vanished.

History rebuild: Resuming prior state

When continuing an existing session, the system reads conversation history from disk and rebuilds it into context. This lets the agent "remember" what happened in this session.

But when conversation history exceeds the context window's capacity, older conversations are compressed into summaries — this is compaction. Summaries are lossy: the general direction is preserved, but details, specific constraints, and exact wording are likely lost in the compression.

Sub-agents: A special case

OpenClaw supports spawning sub-agents for subtasks within a session. Sub-agents use a smaller set of bootstrap files — typically AGENTS.md, TOOLS.md, SOUL.md, IDENTITY.md, and USER.md, but not MEMORY.md as in the main private session.

This means sub-agents don't have access to the accumulated cross-session memories and decisions. If your workflow relies on sub-agents, be aware of what context they can see — you may need to pass information in explicitly through other means.


Subsystem 3: On-demand retrieval

Configuration files load every time — this handles "frequently needed" information. But there's a large amount of information that doesn't need to be present every time: session transcripts from months ago, historical decisions for a specific project, conclusions from a past discussion.

Injecting all of this into context isn't practical, but it should be findable when needed. That's the third subsystem's job: on-demand retrieval.

OpenClaw builds a searchable index over all memory files, supporting two matching modes simultaneously: keyword search for exact terms, and semantic search for meaning-based matches. The agent can use memory_search to query this index and inject relevant historical fragments into the current context.

Daily logs go through this path — they're not bootstrap-injected (otherwise months of logs would overflow the context), but retrieved on demand when needed. This is more of a design convention than a hard guarantee: the system won't automatically push today's log into context; the agent has to actively query for it.

This retrieval system has a prerequisite: only content that has been written to a file can be retrieved. Something said in conversation, if never recorded in any file, is invisible to the retrieval system.

There's an even more easily overlooked prerequisite: the agent must know to actively retrieve. memory_search is a tool exposed to the agent — whether it gets used well depends entirely on the prompt and workflow design. If AGENTS.md doesn't include explicit retrieval rules, the agent won't proactively search its own notes. It will only work with what's already in context. The retrieval system exists, but sits silent.

The hybrid retrieval is good enough for "finding relevant content," but has a fundamental limitation: it finds semantically similar text, not relationships between content. You tell the agent "Alice leads the auth team," then ask "who should I talk to about auth permissions" — retrieval can find Alice, and can find auth, but it doesn't understand the structural relationship "Alice manages auth." Information is fragments, not a knowledge graph. This limitation is barely noticeable with small amounts of memory, but as knowledge accumulates, it becomes a real boundary.


The three subsystems together

Viewed together, OpenClaw's memory system is really answering one question: How do you get the right information in front of the model at the right moment?

Permanent files handle "stable, always-needed information." Conversation history handles "continuity within this session." The retrieval system handles "relevant historical content, fetched on demand."

Three subsystems, each with its own role, covering different time spans and usage frequencies.

But this design has a core assumption: information must be actively written to a file to enter the system.

If you say "remember, never do X again" during a conversation — that instruction exists in conversation history. But once the history is compacted, or the session ends and a new one begins, it disappears. Writing it into AGENTS.md is the only way to make that rule truly stick.

This design choice isn't accidental. It reflects an engineering judgment: rather than having the system automatically guess "what's worth remembering," let the user or agent explicitly decide "what needs to be persisted." The cost is a higher usage threshold. The payoff is predictable behavior.


Understanding these three subsystems gives you the foundation for diagnosing problems.

Why does the agent sometimes "forget things"? Why does some information survive across sessions while other information doesn't? Why does the agent seem amnesiac on a new machine? Every one of these questions can be traced back to these three subsystems — and the next article in this series will break down these failure modes in detail.