AI Assistant Memory Patterns: Building Persistent Context Systems

One of the biggest challenges in building effective AI assistants is managing memory. Unlike humans who naturally remember conversations and learn over time, AI models start fresh with each interaction. Implementing robust memory systems transforms a basic chatbot into a truly intelligent assistant that understands context, remembers preferences, and builds on past interactions.

This guide explores the patterns and techniques for building memory systems for AI assistants, with practical implementations for Claude and OpenClaw.

Understanding AI Memory Challenges

Large language models like Claude have a fundamental constraint: the context window. While modern models support large context windows (200K+ tokens), they still have limits. More importantly, nothing persists between sessions unless explicitly saved.

The Context Window Problem

The context window is like a model's short-term memory. Everything the AI knows during a conversation must fit within this window, including:

The system prompt
Conversation history
Relevant documents or data
Tool definitions and results

As conversations grow longer, you face difficult choices: truncate history (losing context), summarize (losing detail), or selectively include information (risking relevance).

Session Isolation

Each API call to Claude is stateless. The model doesn't remember previous conversations unless you explicitly include that history in your request. This means:

User preferences are forgotten between sessions
Learned patterns don't persist
Relationships must be re-established each time

The Solution: Explicit Memory Systems

To create AI assistants with real memory, we must build explicit systems that:

Capture important information during conversations
Store it persistently outside the model
Retrieve relevant memories when needed
Inject them into future conversations

Let's explore the patterns for achieving this.

Memory Architecture Layers

Effective AI memory systems typically have multiple layers, each serving different purposes.

Working Memory

Working memory holds information relevant to the current conversation. It's temporary but essential for maintaining coherent multi-turn dialogues.

interface WorkingMemory {
  conversationId: string;
  messages: Message[];
  activeContext: string[];
  pendingActions: Action[];
  scratchpad: Record<string, unknown>;
}

class WorkingMemoryManager {
  private memory: Map<string, WorkingMemory> = new Map();
  
  createSession(conversationId: string): WorkingMemory {
    const session: WorkingMemory = {
      conversationId,
      messages: [],
      activeContext: [],
      pendingActions: [],
      scratchpad: {}
    };
    this.memory.set(conversationId, session);
    return session;
  }
  
  addMessage(conversationId: string, message: Message): void {
    const session = this.memory.get(conversationId);
    if (session) {
      session.messages.push(message);
      this.pruneIfNeeded(session);
    }
  }
  
  private pruneIfNeeded(session: WorkingMemory): void {
    const MAX_MESSAGES = 50;
    if (session.messages.length > MAX_MESSAGES) {
      // Keep system messages and recent messages
      const systemMessages = session.messages.filter(m => m.role === 'system');
      const recentMessages = session.messages.slice(-MAX_MESSAGES + systemMessages.length);
      session.messages = [...systemMessages, ...recentMessages];
    }
  }
}

Short-Term Memory

Short-term memory persists across sessions but has a limited lifespan. It's useful for:

Recent user preferences
Ongoing tasks and projects
Temporary notes and reminders

interface ShortTermMemory {
  userId: string;
  entries: MemoryEntry[];
  expiresAt: Date;
}

interface MemoryEntry {
  id: string;
  content: string;
  type: 'preference' | 'task' | 'note' | 'context';
  createdAt: Date;
  accessedAt: Date;
  importance: number;
}

class ShortTermMemoryStore {
  private store: Map<string, ShortTermMemory> = new Map();
  
  async addEntry(userId: string, entry: Omit<MemoryEntry, 'id' | 'createdAt' | 'accessedAt'>): Promise<string> {
    const memory = this.store.get(userId) || this.createMemory(userId);
    
    const fullEntry: MemoryEntry = {
      ...entry,
      id: generateId(),
      createdAt: new Date(),
      accessedAt: new Date()
    };
    
    memory.entries.push(fullEntry);
    this.enforceLimit(memory);
    
    return fullEntry.id;
  }
  
  async getRelevant(userId: string, query: string, limit: number = 10): Promise<MemoryEntry[]> {
    const memory = this.store.get(userId);
    if (!memory) return [];
    
    // Score entries by relevance and recency
    const scored = memory.entries.map(entry => ({
      entry,
      score: this.calculateRelevance(entry, query)
    }));
    
    scored.sort((a, b) => b.score - a.score);
    
    // Update access times for retrieved entries
    const results = scored.slice(0, limit).map(s => s.entry);
    results.forEach(entry => entry.accessedAt = new Date());
    
    return results;
  }
  
  private calculateRelevance(entry: MemoryEntry, query: string): number {
    let score = 0;
    
    // Text similarity (simplified)
    const queryWords = query.toLowerCase().split(/\s+/);
    const entryWords = entry.content.toLowerCase().split(/\s+/);
    const overlap = queryWords.filter(w => entryWords.includes(w)).length;
    score += overlap * 10;
    
    // Recency bonus
    const hoursSinceAccess = (Date.now() - entry.accessedAt.getTime()) / (1000 * 60 * 60);
    score += Math.max(0, 50 - hoursSinceAccess);
    
    // Importance multiplier
    score *= entry.importance;
    
    return score;
  }
  
  private enforceLimit(memory: ShortTermMemory): void {
    const MAX_ENTRIES = 100;
    if (memory.entries.length > MAX_ENTRIES) {
      // Remove least important, oldest entries
      memory.entries.sort((a, b) => {
        const aScore = a.importance + (a.accessedAt.getTime() / 1e12);
        const bScore = b.importance + (b.accessedAt.getTime() / 1e12);
        return bScore - aScore;
      });
      memory.entries = memory.entries.slice(0, MAX_ENTRIES);
    }
  }
}

Long-Term Memory

Long-term memory stores information indefinitely. It's the foundation of a truly persistent assistant:

User profiles and preferences
Historical conversations (summarized)
Learned facts and relationships
Skills and procedures

interface LongTermMemory {
  userId: string;
  profile: UserProfile;
  facts: Fact[];
  conversations: ConversationSummary[];
  relationships: Relationship[];
}

interface Fact {
  id: string;
  subject: string;
  predicate: string;
  object: string;
  confidence: number;
  source: string;
  createdAt: Date;
  updatedAt: Date;
}

interface ConversationSummary {
  id: string;
  date: Date;
  topics: string[];
  keyPoints: string[];
  decisions: string[];
  followUps: string[];
}

class LongTermMemoryStore {
  private db: Database;
  
  async storeFact(userId: string, fact: Omit<Fact, 'id' | 'createdAt' | 'updatedAt'>): Promise<void> {
    const existing = await this.findExistingFact(userId, fact);
    
    if (existing) {
      // Update confidence and timestamp
      await this.db.facts.update(existing.id, {
        confidence: Math.min(1, existing.confidence + 0.1),
        updatedAt: new Date()
      });
    } else {
      await this.db.facts.insert({
        ...fact,
        id: generateId(),
        userId,
        createdAt: new Date(),
        updatedAt: new Date()
      });
    }
  }
  
  async queryFacts(userId: string, query: string): Promise<Fact[]> {
    // Use embedding-based search for semantic matching
    const embedding = await this.getEmbedding(query);
    
    return this.db.facts.vectorSearch({
      userId,
      embedding,
      limit: 20,
      minConfidence: 0.5
    });
  }
  
  async summarizeConversation(conversation: Message[]): Promise<ConversationSummary> {
    // Use Claude to generate a summary
    const summaryPrompt = `Summarize this conversation, extracting:
    1. Main topics discussed
    2. Key points and information shared
    3. Decisions made
    4. Follow-up actions needed
    
    Conversation:
    ${conversation.map(m => `${m.role}: ${m.content}`).join('\n')}`;
    
    const summary = await claude.complete(summaryPrompt);
    return JSON.parse(summary);
  }
}

Memory Retrieval Strategies

Having stored memories is useless without effective retrieval. The challenge is finding the right memories at the right time.

Keyword-Based Retrieval

The simplest approach matches keywords between the current query and stored memories:

function keywordRetrieval(query: string, memories: MemoryEntry[]): MemoryEntry[] {
  const queryTokens = tokenize(query.toLowerCase());
  
  return memories
    .map(memory => {
      const memoryTokens = tokenize(memory.content.toLowerCase());
      const overlap = queryTokens.filter(t => memoryTokens.includes(t));
      return { memory, score: overlap.length / queryTokens.length };
    })
    .filter(result => result.score > 0.1)
    .sort((a, b) => b.score - a.score)
    .map(result => result.memory);
}

Semantic Retrieval with Embeddings

For better semantic matching, use embedding vectors:

import { EmbeddingModel } from './embeddings';

class SemanticMemoryRetriever {
  private embedder: EmbeddingModel;
  private vectorStore: VectorStore;
  
  async indexMemory(memory: MemoryEntry): Promise<void> {
    const embedding = await this.embedder.embed(memory.content);
    await this.vectorStore.insert({
      id: memory.id,
      vector: embedding,
      metadata: {
        type: memory.type,
        createdAt: memory.createdAt
      }
    });
  }
  
  async retrieve(query: string, options: RetrievalOptions = {}): Promise<MemoryEntry[]> {
    const queryEmbedding = await this.embedder.embed(query);
    
    const results = await this.vectorStore.search({
      vector: queryEmbedding,
      limit: options.limit || 10,
      threshold: options.threshold || 0.7,
      filter: options.filter
    });
    
    return results.map(r => this.loadMemory(r.id));
  }
}

Hybrid Retrieval

Combine multiple strategies for best results:

class HybridRetriever {
  async retrieve(query: string, memories: MemoryEntry[]): Promise<MemoryEntry[]> {
    // Get results from multiple strategies
    const keywordResults = await this.keywordRetrieval(query, memories);
    const semanticResults = await this.semanticRetrieval(query);
    const recencyResults = await this.recencyRetrieval(memories);
    
    // Combine and deduplicate
    const combined = new Map<string, { memory: MemoryEntry; score: number }>();
    
    keywordResults.forEach((m, i) => {
      const existing = combined.get(m.id);
      const score = (keywordResults.length - i) / keywordResults.length;
      if (!existing || existing.score < score) {
        combined.set(m.id, { memory: m, score });
      }
    });
    
    semanticResults.forEach((m, i) => {
      const existing = combined.get(m.id);
      const score = (semanticResults.length - i) / semanticResults.length;
      if (existing) {
        existing.score += score * 0.5;
      } else {
        combined.set(m.id, { memory: m, score: score * 0.5 });
      }
    });
    
    // Sort by combined score
    return Array.from(combined.values())
      .sort((a, b) => b.score - a.score)
      .map(r => r.memory);
  }
}

Memory Injection Patterns

Once you've retrieved relevant memories, you need to inject them into the conversation effectively.

System Prompt Injection

Include memories in the system prompt:

function buildSystemPrompt(basePrompt: string, memories: MemoryEntry[]): string {
  const memorySection = memories.length > 0
    ? `\n\n## Relevant Memories\n${memories.map(m => `- ${m.content}`).join('\n')}`
    : '';
  
  return basePrompt + memorySection;
}

Context Window Management

Carefully manage what goes into the context window:

class ContextWindowManager {
  private readonly maxTokens: number;
  private readonly reservedForResponse: number = 4000;
  
  constructor(maxTokens: number = 200000) {
    this.maxTokens = maxTokens;
  }
  
  buildContext(components: ContextComponent[]): string {
    const availableTokens = this.maxTokens - this.reservedForResponse;
    
    // Sort by priority
    const sorted = [...components].sort((a, b) => b.priority - a.priority);
    
    let usedTokens = 0;
    const included: ContextComponent[] = [];
    
    for (const component of sorted) {
      const tokens = this.countTokens(component.content);
      
      if (usedTokens + tokens <= availableTokens) {
        included.push(component);
        usedTokens += tokens;
      } else if (component.required) {
        // Try to summarize required components that don't fit
        const summarized = this.summarize(component.content, availableTokens - usedTokens);
        included.push({ ...component, content: summarized });
        usedTokens += this.countTokens(summarized);
      }
    }
    
    return included.map(c => c.content).join('\n\n');
  }
}

Conversation Summarization

When history gets too long, summarize older messages:

class ConversationCompressor {
  async compress(messages: Message[], targetTokens: number): Promise<Message[]> {
    const currentTokens = this.countTokens(messages);
    
    if (currentTokens <= targetTokens) {
      return messages;
    }
    
    // Keep recent messages intact
    const recentCount = Math.min(10, messages.length);
    const recentMessages = messages.slice(-recentCount);
    const olderMessages = messages.slice(0, -recentCount);
    
    // Summarize older messages
    const summary = await this.summarize(olderMessages);
    
    return [
      {
        role: 'system',
        content: `[Previous conversation summary: ${summary}]`
      },
      ...recentMessages
    ];
  }
  
  private async summarize(messages: Message[]): Promise<string> {
    const prompt = `Summarize this conversation concisely, preserving:
    - Key decisions and agreements
    - Important facts mentioned
    - Unresolved questions or tasks
    
    ${messages.map(m => `${m.role}: ${m.content}`).join('\n')}`;
    
    return claude.complete(prompt);
  }
}

OpenClaw Memory Implementation

OpenClaw provides built-in support for memory through its file system. Here's how to leverage it effectively.

Memory Files

OpenClaw uses markdown files for persistent memory:

<!-- memory/MEMORY.md -->
# Long-Term Memory

## User Profile
- Name: John Smith
- Timezone: America/New_York
- Preferences: Prefers concise responses, technical depth

## Key Facts
- Works at TechCorp as a senior engineer
- Main project: E-commerce platform migration
- Prefers Python over JavaScript

## Recent Decisions
- 2026-02-15: Decided to use PostgreSQL for the new database
- 2026-02-18: Scheduled code review for Friday

Daily Memory Files

Track day-to-day context:

<!-- memory/2026-02-20.md -->
# February 20, 2026

## Morning Session
- Reviewed pull request #234
- Discussed API design for payment module
- User mentioned deadline is March 1st

## Tasks
- [ ] Follow up on payment provider selection
- [x] Send documentation links

## Notes
- User seemed stressed about deadline
- Prefer shorter meetings this week

Memory Update Patterns

Have the AI maintain its own memory:

const memoryUpdatePrompt = `After each conversation, update your memory files:

1. If you learned new facts about the user, add them to MEMORY.md
2. If there are tasks or follow-ups, add them to today's daily file
3. If preferences were expressed, note them

Be selective - only store information that will be useful in future conversations.`;

Advanced Memory Patterns

Episodic Memory

Store specific events and experiences:

interface Episode {
  id: string;
  timestamp: Date;
  participants: string[];
  location?: string;
  event: string;
  outcome: string;
  emotions?: string[];
  importance: number;
}

class EpisodicMemory {
  async recordEpisode(episode: Episode): Promise<void> {
    await this.store.insert(episode);
    
    // Extract facts from episode
    const facts = await this.extractFacts(episode);
    for (const fact of facts) {
      await this.factStore.storeFact(fact);
    }
  }
  
  async recall(cue: string): Promise<Episode[]> {
    // Find episodes matching the cue
    const embedding = await this.embed(cue);
    return this.store.vectorSearch({
      embedding,
      limit: 5
    });
  }
}

Procedural Memory

Remember how to do things:

interface Procedure {
  id: string;
  name: string;
  description: string;
  steps: string[];
  prerequisites: string[];
  successRate: number;
  lastUsed: Date;
}

class ProceduralMemory {
  async learnProcedure(name: string, steps: string[]): Promise<void> {
    const existing = await this.find(name);
    
    if (existing) {
      // Update with new steps, keeping what worked
      existing.steps = this.mergeSteps(existing.steps, steps);
      existing.lastUsed = new Date();
      await this.store.update(existing);
    } else {
      await this.store.insert({
        id: generateId(),
        name,
        description: await this.generateDescription(steps),
        steps,
        prerequisites: await this.inferPrerequisites(steps),
        successRate: 1.0,
        lastUsed: new Date()
      });
    }
  }
  
  async execute(name: string): Promise<ProcedureResult> {
    const procedure = await this.find(name);
    if (!procedure) throw new Error(`Unknown procedure: ${name}`);
    
    const result = await this.runSteps(procedure.steps);
    
    // Update success rate
    procedure.successRate = procedure.successRate * 0.9 + (result.success ? 0.1 : 0);
    await this.store.update(procedure);
    
    return result;
  }
}

Associative Memory

Create connections between memories:

interface Association {
  sourceId: string;
  targetId: string;
  type: 'causal' | 'temporal' | 'semantic' | 'emotional';
  strength: number;
}

class AssociativeMemory {
  async associate(memory1: string, memory2: string, type: Association['type']): Promise<void> {
    const existing = await this.findAssociation(memory1, memory2);
    
    if (existing) {
      existing.strength = Math.min(1, existing.strength + 0.1);
      await this.store.update(existing);
    } else {
      await this.store.insert({
        sourceId: memory1,
        targetId: memory2,
        type,
        strength: 0.5
      });
    }
  }
  
  async spread(startMemoryId: string, depth: number = 2): Promise<Memory[]> {
    const visited = new Set<string>();
    const results: Memory[] = [];
    
    async function explore(memoryId: string, currentDepth: number): Promise<void> {
      if (currentDepth > depth || visited.has(memoryId)) return;
      visited.add(memoryId);
      
      const memory = await this.loadMemory(memoryId);
      results.push(memory);
      
      const associations = await this.getAssociations(memoryId);
      const strongAssociations = associations.filter(a => a.strength > 0.3);
      
      for (const assoc of strongAssociations) {
        await explore(assoc.targetId, currentDepth + 1);
      }
    }
    
    await explore(startMemoryId, 0);
    return results;
  }
}

Memory Decay and Forgetting

Not all memories should persist forever. Implement forgetting for better performance:

class MemoryDecay {
  async applyDecay(memories: MemoryEntry[]): Promise<void> {
    const now = new Date();
    
    for (const memory of memories) {
      const daysSinceAccess = (now.getTime() - memory.accessedAt.getTime()) / (1000 * 60 * 60 * 24);
      
      // Exponential decay based on time and importance
      const decayRate = 0.95;
      const importanceBonus = memory.importance * 0.1;
      const decayFactor = Math.pow(decayRate, daysSinceAccess) + importanceBonus;
      
      memory.importance *= decayFactor;
      
      // Remove memories below threshold
      if (memory.importance < 0.1) {
        await this.archive(memory);
      }
    }
  }
  
  async consolidate(): Promise<void> {
    // Periodically merge similar memories
    const allMemories = await this.store.getAll();
    const clusters = await this.clusterSimilar(allMemories);
    
    for (const cluster of clusters) {
      if (cluster.length > 1) {
        const merged = await this.mergeMemories(cluster);
        await this.store.replace(cluster.map(m => m.id), merged);
      }
    }
  }
}

Conclusion

Building effective memory systems for AI assistants requires thoughtful architecture across multiple layers. From working memory that maintains conversation coherence to long-term storage that preserves user preferences and learned facts, each layer serves a crucial role.

Key principles to remember:

Layer your memory - Different types of information need different storage and retrieval patterns
Retrieve smartly - Use hybrid approaches combining keywords, semantics, and recency
Manage context carefully - The context window is precious real estate
Let memories decay - Not everything needs to be remembered forever
Build associations - Connected memories are more useful than isolated facts

With these patterns, you can build AI assistants that truly learn and remember, creating more natural and effective interactions over time.

AI Assistant Memory Patterns: Building Persistent Context Systems

AI Assistant Memory Patterns: Building Persistent Context Systems

Understanding AI Memory Challenges

The Context Window Problem

Session Isolation

The Solution: Explicit Memory Systems

Memory Architecture Layers

Working Memory

Short-Term Memory

Long-Term Memory

Memory Retrieval Strategies

Keyword-Based Retrieval

Semantic Retrieval with Embeddings

Hybrid Retrieval

Memory Injection Patterns

System Prompt Injection

Context Window Management

Conversation Summarization

OpenClaw Memory Implementation

Memory Files

Daily Memory Files

Memory Update Patterns

Advanced Memory Patterns

Episodic Memory

Procedural Memory

Associative Memory

Memory Decay and Forgetting

Conclusion

More Articles

The Ultimate OpenClaw AWS Setup Guide

Building AI Workflows with Tool Chaining in OpenClaw

Cost Optimization Guide for Self-Hosted AI Assistants: Run Claude on a Budget