AI Assistant Memory Patterns: Building Persistent Context Systems
AI Assistant Memory Patterns: Building Persistent Context Systems
One of the biggest challenges in building effective AI assistants is managing memory. Unlike humans who naturally remember conversations and learn over time, AI models start fresh with each interaction. Implementing robust memory systems transforms a basic chatbot into a truly intelligent assistant that understands context, remembers preferences, and builds on past interactions.
This guide explores the patterns and techniques for building memory systems for AI assistants, with practical implementations for Claude and OpenClaw.
Understanding AI Memory Challenges
Large language models like Claude have a fundamental constraint: the context window. While modern models support large context windows (200K+ tokens), they still have limits. More importantly, nothing persists between sessions unless explicitly saved.
The Context Window Problem
The context window is like a model's short-term memory. Everything the AI knows during a conversation must fit within this window, including:
- The system prompt
- Conversation history
- Relevant documents or data
- Tool definitions and results
As conversations grow longer, you face difficult choices: truncate history (losing context), summarize (losing detail), or selectively include information (risking relevance).
Session Isolation
Each API call to Claude is stateless. The model doesn't remember previous conversations unless you explicitly include that history in your request. This means:
- User preferences are forgotten between sessions
- Learned patterns don't persist
- Relationships must be re-established each time
The Solution: Explicit Memory Systems
To create AI assistants with real memory, we must build explicit systems that:
- Capture important information during conversations
- Store it persistently outside the model
- Retrieve relevant memories when needed
- Inject them into future conversations
Let's explore the patterns for achieving this.
Memory Architecture Layers
Effective AI memory systems typically have multiple layers, each serving different purposes.
Working Memory
Working memory holds information relevant to the current conversation. It's temporary but essential for maintaining coherent multi-turn dialogues.
interface WorkingMemory {
conversationId: string;
messages: Message[];
activeContext: string[];
pendingActions: Action[];
scratchpad: Record<string, unknown>;
}
class WorkingMemoryManager {
private memory: Map<string, WorkingMemory> = new Map();
createSession(conversationId: string): WorkingMemory {
const session: WorkingMemory = {
conversationId,
messages: [],
activeContext: [],
pendingActions: [],
scratchpad: {}
};
this.memory.set(conversationId, session);
return session;
}
addMessage(conversationId: string, message: Message): void {
const session = this.memory.get(conversationId);
if (session) {
session.messages.push(message);
this.pruneIfNeeded(session);
}
}
private pruneIfNeeded(session: WorkingMemory): void {
const MAX_MESSAGES = 50;
if (session.messages.length > MAX_MESSAGES) {
// Keep system messages and recent messages
const systemMessages = session.messages.filter(m => m.role === 'system');
const recentMessages = session.messages.slice(-MAX_MESSAGES + systemMessages.length);
session.messages = [...systemMessages, ...recentMessages];
}
}
}
Short-Term Memory
Short-term memory persists across sessions but has a limited lifespan. It's useful for:
- Recent user preferences
- Ongoing tasks and projects
- Temporary notes and reminders
interface ShortTermMemory {
userId: string;
entries: MemoryEntry[];
expiresAt: Date;
}
interface MemoryEntry {
id: string;
content: string;
type: 'preference' | 'task' | 'note' | 'context';
createdAt: Date;
accessedAt: Date;
importance: number;
}
class ShortTermMemoryStore {
private store: Map<string, ShortTermMemory> = new Map();
async addEntry(userId: string, entry: Omit<MemoryEntry, 'id' | 'createdAt' | 'accessedAt'>): Promise<string> {
const memory = this.store.get(userId) || this.createMemory(userId);
const fullEntry: MemoryEntry = {
...entry,
id: generateId(),
createdAt: new Date(),
accessedAt: new Date()
};
memory.entries.push(fullEntry);
this.enforceLimit(memory);
return fullEntry.id;
}
async getRelevant(userId: string, query: string, limit: number = 10): Promise<MemoryEntry[]> {
const memory = this.store.get(userId);
if (!memory) return [];
// Score entries by relevance and recency
const scored = memory.entries.map(entry => ({
entry,
score: this.calculateRelevance(entry, query)
}));
scored.sort((a, b) => b.score - a.score);
// Update access times for retrieved entries
const results = scored.slice(0, limit).map(s => s.entry);
results.forEach(entry => entry.accessedAt = new Date());
return results;
}
private calculateRelevance(entry: MemoryEntry, query: string): number {
let score = 0;
// Text similarity (simplified)
const queryWords = query.toLowerCase().split(/\s+/);
const entryWords = entry.content.toLowerCase().split(/\s+/);
const overlap = queryWords.filter(w => entryWords.includes(w)).length;
score += overlap * 10;
// Recency bonus
const hoursSinceAccess = (Date.now() - entry.accessedAt.getTime()) / (1000 * 60 * 60);
score += Math.max(0, 50 - hoursSinceAccess);
// Importance multiplier
score *= entry.importance;
return score;
}
private enforceLimit(memory: ShortTermMemory): void {
const MAX_ENTRIES = 100;
if (memory.entries.length > MAX_ENTRIES) {
// Remove least important, oldest entries
memory.entries.sort((a, b) => {
const aScore = a.importance + (a.accessedAt.getTime() / 1e12);
const bScore = b.importance + (b.accessedAt.getTime() / 1e12);
return bScore - aScore;
});
memory.entries = memory.entries.slice(0, MAX_ENTRIES);
}
}
}
Long-Term Memory
Long-term memory stores information indefinitely. It's the foundation of a truly persistent assistant:
- User profiles and preferences
- Historical conversations (summarized)
- Learned facts and relationships
- Skills and procedures
interface LongTermMemory {
userId: string;
profile: UserProfile;
facts: Fact[];
conversations: ConversationSummary[];
relationships: Relationship[];
}
interface Fact {
id: string;
subject: string;
predicate: string;
object: string;
confidence: number;
source: string;
createdAt: Date;
updatedAt: Date;
}
interface ConversationSummary {
id: string;
date: Date;
topics: string[];
keyPoints: string[];
decisions: string[];
followUps: string[];
}
class LongTermMemoryStore {
private db: Database;
async storeFact(userId: string, fact: Omit<Fact, 'id' | 'createdAt' | 'updatedAt'>): Promise<void> {
const existing = await this.findExistingFact(userId, fact);
if (existing) {
// Update confidence and timestamp
await this.db.facts.update(existing.id, {
confidence: Math.min(1, existing.confidence + 0.1),
updatedAt: new Date()
});
} else {
await this.db.facts.insert({
...fact,
id: generateId(),
userId,
createdAt: new Date(),
updatedAt: new Date()
});
}
}
async queryFacts(userId: string, query: string): Promise<Fact[]> {
// Use embedding-based search for semantic matching
const embedding = await this.getEmbedding(query);
return this.db.facts.vectorSearch({
userId,
embedding,
limit: 20,
minConfidence: 0.5
});
}
async summarizeConversation(conversation: Message[]): Promise<ConversationSummary> {
// Use Claude to generate a summary
const summaryPrompt = `Summarize this conversation, extracting:
1. Main topics discussed
2. Key points and information shared
3. Decisions made
4. Follow-up actions needed
Conversation:
${conversation.map(m => `${m.role}: ${m.content}`).join('\n')}`;
const summary = await claude.complete(summaryPrompt);
return JSON.parse(summary);
}
}
Memory Retrieval Strategies
Having stored memories is useless without effective retrieval. The challenge is finding the right memories at the right time.
Keyword-Based Retrieval
The simplest approach matches keywords between the current query and stored memories:
function keywordRetrieval(query: string, memories: MemoryEntry[]): MemoryEntry[] {
const queryTokens = tokenize(query.toLowerCase());
return memories
.map(memory => {
const memoryTokens = tokenize(memory.content.toLowerCase());
const overlap = queryTokens.filter(t => memoryTokens.includes(t));
return { memory, score: overlap.length / queryTokens.length };
})
.filter(result => result.score > 0.1)
.sort((a, b) => b.score - a.score)
.map(result => result.memory);
}
Semantic Retrieval with Embeddings
For better semantic matching, use embedding vectors:
import { EmbeddingModel } from './embeddings';
class SemanticMemoryRetriever {
private embedder: EmbeddingModel;
private vectorStore: VectorStore;
async indexMemory(memory: MemoryEntry): Promise<void> {
const embedding = await this.embedder.embed(memory.content);
await this.vectorStore.insert({
id: memory.id,
vector: embedding,
metadata: {
type: memory.type,
createdAt: memory.createdAt
}
});
}
async retrieve(query: string, options: RetrievalOptions = {}): Promise<MemoryEntry[]> {
const queryEmbedding = await this.embedder.embed(query);
const results = await this.vectorStore.search({
vector: queryEmbedding,
limit: options.limit || 10,
threshold: options.threshold || 0.7,
filter: options.filter
});
return results.map(r => this.loadMemory(r.id));
}
}
Hybrid Retrieval
Combine multiple strategies for best results:
class HybridRetriever {
async retrieve(query: string, memories: MemoryEntry[]): Promise<MemoryEntry[]> {
// Get results from multiple strategies
const keywordResults = await this.keywordRetrieval(query, memories);
const semanticResults = await this.semanticRetrieval(query);
const recencyResults = await this.recencyRetrieval(memories);
// Combine and deduplicate
const combined = new Map<string, { memory: MemoryEntry; score: number }>();
keywordResults.forEach((m, i) => {
const existing = combined.get(m.id);
const score = (keywordResults.length - i) / keywordResults.length;
if (!existing || existing.score < score) {
combined.set(m.id, { memory: m, score });
}
});
semanticResults.forEach((m, i) => {
const existing = combined.get(m.id);
const score = (semanticResults.length - i) / semanticResults.length;
if (existing) {
existing.score += score * 0.5;
} else {
combined.set(m.id, { memory: m, score: score * 0.5 });
}
});
// Sort by combined score
return Array.from(combined.values())
.sort((a, b) => b.score - a.score)
.map(r => r.memory);
}
}
Memory Injection Patterns
Once you've retrieved relevant memories, you need to inject them into the conversation effectively.
System Prompt Injection
Include memories in the system prompt:
function buildSystemPrompt(basePrompt: string, memories: MemoryEntry[]): string {
const memorySection = memories.length > 0
? `\n\n## Relevant Memories\n${memories.map(m => `- ${m.content}`).join('\n')}`
: '';
return basePrompt + memorySection;
}
Context Window Management
Carefully manage what goes into the context window:
class ContextWindowManager {
private readonly maxTokens: number;
private readonly reservedForResponse: number = 4000;
constructor(maxTokens: number = 200000) {
this.maxTokens = maxTokens;
}
buildContext(components: ContextComponent[]): string {
const availableTokens = this.maxTokens - this.reservedForResponse;
// Sort by priority
const sorted = [...components].sort((a, b) => b.priority - a.priority);
let usedTokens = 0;
const included: ContextComponent[] = [];
for (const component of sorted) {
const tokens = this.countTokens(component.content);
if (usedTokens + tokens <= availableTokens) {
included.push(component);
usedTokens += tokens;
} else if (component.required) {
// Try to summarize required components that don't fit
const summarized = this.summarize(component.content, availableTokens - usedTokens);
included.push({ ...component, content: summarized });
usedTokens += this.countTokens(summarized);
}
}
return included.map(c => c.content).join('\n\n');
}
}
Conversation Summarization
When history gets too long, summarize older messages:
class ConversationCompressor {
async compress(messages: Message[], targetTokens: number): Promise<Message[]> {
const currentTokens = this.countTokens(messages);
if (currentTokens <= targetTokens) {
return messages;
}
// Keep recent messages intact
const recentCount = Math.min(10, messages.length);
const recentMessages = messages.slice(-recentCount);
const olderMessages = messages.slice(0, -recentCount);
// Summarize older messages
const summary = await this.summarize(olderMessages);
return [
{
role: 'system',
content: `[Previous conversation summary: ${summary}]`
},
...recentMessages
];
}
private async summarize(messages: Message[]): Promise<string> {
const prompt = `Summarize this conversation concisely, preserving:
- Key decisions and agreements
- Important facts mentioned
- Unresolved questions or tasks
${messages.map(m => `${m.role}: ${m.content}`).join('\n')}`;
return claude.complete(prompt);
}
}
OpenClaw Memory Implementation
OpenClaw provides built-in support for memory through its file system. Here's how to leverage it effectively.
Memory Files
OpenClaw uses markdown files for persistent memory:
<!-- memory/MEMORY.md -->
# Long-Term Memory
## User Profile
- Name: John Smith
- Timezone: America/New_York
- Preferences: Prefers concise responses, technical depth
## Key Facts
- Works at TechCorp as a senior engineer
- Main project: E-commerce platform migration
- Prefers Python over JavaScript
## Recent Decisions
- 2026-02-15: Decided to use PostgreSQL for the new database
- 2026-02-18: Scheduled code review for Friday
Daily Memory Files
Track day-to-day context:
<!-- memory/2026-02-20.md -->
# February 20, 2026
## Morning Session
- Reviewed pull request #234
- Discussed API design for payment module
- User mentioned deadline is March 1st
## Tasks
- [ ] Follow up on payment provider selection
- [x] Send documentation links
## Notes
- User seemed stressed about deadline
- Prefer shorter meetings this week
Memory Update Patterns
Have the AI maintain its own memory:
const memoryUpdatePrompt = `After each conversation, update your memory files:
1. If you learned new facts about the user, add them to MEMORY.md
2. If there are tasks or follow-ups, add them to today's daily file
3. If preferences were expressed, note them
Be selective - only store information that will be useful in future conversations.`;
Advanced Memory Patterns
Episodic Memory
Store specific events and experiences:
interface Episode {
id: string;
timestamp: Date;
participants: string[];
location?: string;
event: string;
outcome: string;
emotions?: string[];
importance: number;
}
class EpisodicMemory {
async recordEpisode(episode: Episode): Promise<void> {
await this.store.insert(episode);
// Extract facts from episode
const facts = await this.extractFacts(episode);
for (const fact of facts) {
await this.factStore.storeFact(fact);
}
}
async recall(cue: string): Promise<Episode[]> {
// Find episodes matching the cue
const embedding = await this.embed(cue);
return this.store.vectorSearch({
embedding,
limit: 5
});
}
}
Procedural Memory
Remember how to do things:
interface Procedure {
id: string;
name: string;
description: string;
steps: string[];
prerequisites: string[];
successRate: number;
lastUsed: Date;
}
class ProceduralMemory {
async learnProcedure(name: string, steps: string[]): Promise<void> {
const existing = await this.find(name);
if (existing) {
// Update with new steps, keeping what worked
existing.steps = this.mergeSteps(existing.steps, steps);
existing.lastUsed = new Date();
await this.store.update(existing);
} else {
await this.store.insert({
id: generateId(),
name,
description: await this.generateDescription(steps),
steps,
prerequisites: await this.inferPrerequisites(steps),
successRate: 1.0,
lastUsed: new Date()
});
}
}
async execute(name: string): Promise<ProcedureResult> {
const procedure = await this.find(name);
if (!procedure) throw new Error(`Unknown procedure: ${name}`);
const result = await this.runSteps(procedure.steps);
// Update success rate
procedure.successRate = procedure.successRate * 0.9 + (result.success ? 0.1 : 0);
await this.store.update(procedure);
return result;
}
}
Associative Memory
Create connections between memories:
interface Association {
sourceId: string;
targetId: string;
type: 'causal' | 'temporal' | 'semantic' | 'emotional';
strength: number;
}
class AssociativeMemory {
async associate(memory1: string, memory2: string, type: Association['type']): Promise<void> {
const existing = await this.findAssociation(memory1, memory2);
if (existing) {
existing.strength = Math.min(1, existing.strength + 0.1);
await this.store.update(existing);
} else {
await this.store.insert({
sourceId: memory1,
targetId: memory2,
type,
strength: 0.5
});
}
}
async spread(startMemoryId: string, depth: number = 2): Promise<Memory[]> {
const visited = new Set<string>();
const results: Memory[] = [];
async function explore(memoryId: string, currentDepth: number): Promise<void> {
if (currentDepth > depth || visited.has(memoryId)) return;
visited.add(memoryId);
const memory = await this.loadMemory(memoryId);
results.push(memory);
const associations = await this.getAssociations(memoryId);
const strongAssociations = associations.filter(a => a.strength > 0.3);
for (const assoc of strongAssociations) {
await explore(assoc.targetId, currentDepth + 1);
}
}
await explore(startMemoryId, 0);
return results;
}
}
Memory Decay and Forgetting
Not all memories should persist forever. Implement forgetting for better performance:
class MemoryDecay {
async applyDecay(memories: MemoryEntry[]): Promise<void> {
const now = new Date();
for (const memory of memories) {
const daysSinceAccess = (now.getTime() - memory.accessedAt.getTime()) / (1000 * 60 * 60 * 24);
// Exponential decay based on time and importance
const decayRate = 0.95;
const importanceBonus = memory.importance * 0.1;
const decayFactor = Math.pow(decayRate, daysSinceAccess) + importanceBonus;
memory.importance *= decayFactor;
// Remove memories below threshold
if (memory.importance < 0.1) {
await this.archive(memory);
}
}
}
async consolidate(): Promise<void> {
// Periodically merge similar memories
const allMemories = await this.store.getAll();
const clusters = await this.clusterSimilar(allMemories);
for (const cluster of clusters) {
if (cluster.length > 1) {
const merged = await this.mergeMemories(cluster);
await this.store.replace(cluster.map(m => m.id), merged);
}
}
}
}
Conclusion
Building effective memory systems for AI assistants requires thoughtful architecture across multiple layers. From working memory that maintains conversation coherence to long-term storage that preserves user preferences and learned facts, each layer serves a crucial role.
Key principles to remember:
- Layer your memory - Different types of information need different storage and retrieval patterns
- Retrieve smartly - Use hybrid approaches combining keywords, semantics, and recency
- Manage context carefully - The context window is precious real estate
- Let memories decay - Not everything needs to be remembered forever
- Build associations - Connected memories are more useful than isolated facts
With these patterns, you can build AI assistants that truly learn and remember, creating more natural and effective interactions over time.
More Articles
The Ultimate OpenClaw AWS Setup Guide

The definitive guide to setting up OpenClaw on AWS. Includes spot instance configuration, cost optimization, and step-by-step instructions.
Building AI Workflows with Tool Chaining in OpenClaw
Master the art of chaining tools and function calls to build powerful multi-step AI automation workflows—from data extraction to content generation and deployment.
Cost Optimization Guide for Self-Hosted AI Assistants: Run Claude on a Budget
Practical strategies to reduce API costs for self-hosted AI assistants—smart model routing, caching, batching, and OpenClaw-specific optimizations to run Claude affordably.