Chapter 10

Memory Architecture

The 4-tier memory system — session, working, long-term, and semantic memory powered by pgvector.

Memory Architecture — How Agents Remember and Learn

Memory is not recall. Memory is identity. Without persistent memory, an agent is just a very expensive function call. OpenClaw uses files on disk. Flowwink evolved this into a 4-tier PostgreSQL system with vector search.


Think about your best colleague at work. What makes them valuable isn’t just what they know today — it’s that they remember your last conversation, they know how your business operates, they recall what didn’t work last time, and they’ve built up an intuition about what you actually need. That accumulated context is what separates a trusted colleague from a new hire.

An agent without memory is the new hire every single morning. No matter how many tasks it completed yesterday, it wakes up fresh — blank slate, no context, no continuity. Every interaction starts from zero.

Memory is what makes the “digital employee” metaphor real rather than rhetorical. Without it, you don’t have an employee. You have a very capable temp worker who forgets everything at the end of each shift.


The Memory Problem

LLMs are stateless. Each API call starts fresh. Without a memory system, the agent has no continuity — it can’t remember what it did yesterday, what it learned last week, or who it’s talking to.

OpenClaw’s solution (verified from source): Workspace files (AGENTS.md, SOUL.md, TOOLS.md, IDENTITY.md, USER.md, HEARTBEAT.md, MEMORY.md) are injected into every agent turn. Daily memory files (memory/*.md) are accessed on-demand via memory_search and memory_get tools. Simple, no database required, but all injected files consume tokens (capped at 20k per file, 150k total).

Flowwink’s evolution: A 4-tier PostgreSQL system with vector search, designed for multi-tenant business operations where memory must be isolated per tenant via RLS.

L1: Session Memory (ephemeral)
  │  Current conversation
  │  Cleared when session ends

L2: Working Memory (short-term)
  │  Top 20-30 recent entries
  │  Ordered by recency

L3: Long-term Memory (persistent)
  │  All persisted facts and lessons
  │  Full-text searchable

L4: Semantic Memory (vector)
     Embeddings for similarity search
     pgvector cosine similarity

L1: Session Memory

The current conversation history. This is what the LLM sees directly.

Storage: chat_messages table (or in-memory for edge functions) Access: Linear scan (most recent N messages) Lifetime: Session duration

Session memory includes:

  • User messages
  • Agent responses
  • Tool calls and results
  • System messages (skill instructions, context injection)

Pruning: When session memory approaches the context limit, older messages are summarized and replaced with a single [SUMMARY] message. Key facts are extracted and saved to L2/L3 before pruning.


L2: Working Memory

The agent’s “top of mind” — recent memories that are most likely relevant.

Storage: agent_memory table Access: Key/category filter, ordered by updated_at DESC LIMIT 30 Lifetime: Until displaced by newer entries

Working memory is loaded into the system prompt at every reasoning cycle. It’s limited to 30 entries to prevent context bloat.

SELECT * FROM agent_memory
WHERE category IN ('preference', 'context', 'fact')
ORDER BY updated_at DESC
LIMIT 30;

L3: Long-term Memory

All persisted facts, lessons, and learned patterns.

Storage: agent_memory table (full) Access: Full-text search via pg_trgm Lifetime: Permanent (until explicitly deleted)

Long-term memory is searched when the agent needs specific information that’s not in working memory.

SELECT * FROM agent_memory
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', 'blog engagement')
ORDER BY updated_at DESC;

L4: Semantic Memory

Vector embeddings for similarity-based retrieval.

Storage: agent_memory.embedding column (768-dimensional vector) Access: pgvector cosine similarity Lifetime: Permanent

Semantic memory enables the agent to find relevant memories even when the exact keywords don’t match.

SELECT *, 1 - (embedding <=> $query_embedding) as similarity
FROM agent_memory
WHERE embedding IS NOT NULL
ORDER BY embedding <=> $query_embedding
LIMIT 5;

Embedding providers (auto-fallback):

  1. OpenAI text-embedding-3-small (768d)
  2. Gemini text-embedding-004 (768d)

Memory Categories

Memories are organized by category:

CategoryPurposeExample
soulAgent personality & values{ purpose: "Help businesses grow", tone: "professional but warm" }
identityAgent identity & boundaries{ name: "FlowPilot", role: "Digital Operator", boundaries: ["No financial decisions"] }
preferenceLearned preferencespreferred_blog_tone: "conversational"
contextOperational contextlast_campaign_date: "2026-01-15"
factLearned facts & patternslesson:blog_engagement: { data_viz_posts: "3x more engagement" }
agentsOperational rules{ direct_action_rules: [...], self_improvement: [...] }

The Memory Lifecycle

Creation → Working → Long-term → Semantic → Retrieval → Influence
   │          │          │           │           │           │
   │          │          │           │           │           └─ Shapes future
   │          │          │           │           │              decisions
   │          │          │           │           └─ Found when
   │          │          │           │              needed
   │          │          │           └─ Embedded for
   │          │          │              similarity search
   │          │          └─ Persisted indefinitely
   │          └─ Top of mind (loaded every cycle)
   └─ Created by: reflection, learning,
      user input, or agent observation

Creation

Memories are created by:

  1. Reflection — The agent analyzes performance and saves learnings
  2. Learning — The flowpilot-learn function distills feedback
  3. User input — The user tells the agent something to remember
  4. Observation — The agent notices a pattern during operation

Embedding

New memories are automatically embedded:

// In handleMemoryWrite()
const embedding = await generateEmbedding(content);
await supabase.from('agent_memory').upsert({
  key, category, content, embedding, importance
});

Retrieval

When the agent needs information:

  1. Check working memory (top 30) — fast, always loaded
  2. If not found, search long-term memory — full-text search
  3. If still not found, search semantic memory — vector similarity

Workspace Files as Memory

Both OpenClaw and Flowwink use the same conceptual categories for agent configuration. The storage mechanism differs:

ConceptOpenClaw (verified)Flowwink (DB-driven)Purpose
PersonaSOUL.md (file, auto-injected)agent_memory key soulPersona, boundaries, tone
IdentityIDENTITY.md (file, auto-injected)agent_memory key identityAgent name, role, emoji
RulesAGENTS.md (file, auto-injected)agent_memory key agentsOperating instructions, conventions
HeartbeatHEARTBEAT.md (file, auto-injected)heartbeat_protocol memory keyProtocol text (7-step loop)
Tool policyTOOLS.md (file, auto-injected)tool_policy memory key{ blocked: string[] }
MemoryMEMORY.md (file, auto-injected)L2/L3/L4 tiers in agent_memoryPersistent facts and lessons
Daily notesmemory/*.md (on-demand via tools)Day-to-day observations

These are loaded into the system prompt at every reasoning cycle. They’re the agent’s “constitution” — the rules it lives by.

Safety: Internal system keys (tool_policy, heartbeat_state) are excluded from LLM context. This prevents infrastructure metadata from polluting the agent’s reasoning.


Memory Compression

As memory grows, compression becomes necessary:

preCompactionFlush(oldMessages, supabase)

  ├── AI extracts up to 5 discrete facts from old messages
  ├── Each fact saved as agent_memory entry with vector embedding
  └── Old messages replaced with [SUMMARY]

pruneConversationHistory(messages, supabase)

  ├── AI generates summary of old messages
  ├── Replaces old messages with [SUMMARY] message
  └── Keeps recent N messages intact

This ensures the agent doesn’t lose important information when conversations get long.


The Anti-Patterns

Anti-PatternProblemSolution
No memoryAgent starts from zero every sessionImplement L2/L3 memory
Everything in contextContext window fills up instantlyLazy loading + budget tiers
No categorizationCan’t find relevant memoriesCategory-based filtering
No embeddingCan’t find semantically similar memoriespgvector embedding
No compressionContext grows unboundedPre-compaction flush
No workspace filesAgent has no personality or rulesSoul/identity/agents keys

Memory is what transforms a stateless function into a persistent entity. Without it, the agent is a goldfish. With it, the agent is a colleague.

Next: how the agent learns and improves through feedback loops. Feedback Loops →

This is your handbook

Agentic AI is evolving fast. The patterns, the laws, the architecture — they need to stay current with the community's collective knowledge. I want this to become the go-to resource for anyone learning how autonomous agents work.

Whether you fix a typo, add a chapter, share a production story, or challenge an assumption — every contribution makes this better for everyone.