Building Memory Systems for AI Chatbots: Beyond Context Windows

Every conversation with Claude starts fresh. The model has no memory of your previous chats, your preferences, or the context you've built up over weeks of interactions. For many applications, this is a fundamental limitation.
But you can build memory systems around Claude that give your chatbot the ability to recall past conversations, learn user preferences, and maintain context across sessions. This guide shows you how.
The Memory Problem
Claude's context window is large—up to 200K tokens—but it's not memory. Once a conversation ends, everything in that context is gone. The next conversation starts from zero.
This creates problems for:
- Customer support bots that should remember past tickets
- Personal assistants that need to learn your preferences
- Educational tools that should track learning progress
- Any application where continuity matters
The solution involves building external memory systems that store, index, and retrieve relevant information to inject into Claude's context at the right time.
Types of Memory
Effective chatbot memory systems implement multiple memory types:
Short-term memory holds the current conversation. This is simply the message history you're already passing to Claude.
Working memory stores active context—current goals, recent decisions, ongoing tasks. This persists across multiple conversations within a session or project.
Long-term memory captures facts, preferences, and important events that should be recalled months later. This requires persistent storage and intelligent retrieval.
Episodic memory remembers specific past conversations or events. "Remember when we discussed the marketing strategy last Tuesday?"
Let's implement each of these.
Short-Term Memory: Conversation History
The simplest memory is your message array. But even here, there are optimizations:
class ConversationMemory {
constructor(maxMessages = 20) {
this.messages = [];
this.maxMessages = maxMessages;
}
addMessage(role, content) {
this.messages.push({ role, content });
// Sliding window to stay within context limits
if (this.messages.length > this.maxMessages) {
this.messages = this.messages.slice(-this.maxMessages);
}
}
getContext() {
return this.messages;
}
}
For longer conversations, summarize older messages instead of dropping them:
async summarizeOlderMessages(messages) {
const toSummarize = messages.slice(0, -10);
const recent = messages.slice(-10);
const summary = await anthropic.messages.create({
model: "claude-haiku-3-20240307",
max_tokens: 500,
messages: [{
role: "user",
content: `Summarize this conversation in 2-3 sentences,
capturing key points and decisions:
${JSON.stringify(toSummarize)}`
}]
});
return [
{ role: "system", content: `Previous conversation summary: ${summary.content}` },
...recent
];
}
Working Memory: Active Context
Working memory holds information relevant to the current task or session. Store it in a structured format:
class WorkingMemory {
constructor() {
this.context = {
currentGoal: null,
activeProject: null,
recentDecisions: [],
pendingTasks: [],
lastUpdated: null
};
}
update(key, value) {
this.context[key] = value;
this.context.lastUpdated = new Date().toISOString();
}
toPrompt() {
return `Current context:
- Goal: ${this.context.currentGoal || 'None set'}
- Project: ${this.context.activeProject || 'None'}
- Recent decisions: ${this.context.recentDecisions.join(', ') || 'None'}
- Pending tasks: ${this.context.pendingTasks.join(', ') || 'None'}`;
}
}
Inject this into your system prompt so Claude always has access to the current working context.
Long-Term Memory: Vector Storage
Long-term memory requires storing information persistently and retrieving it based on semantic similarity. Vector databases excel at this.
Here's a complete implementation using Pinecone:
import { Pinecone } from '@pinecone-database/pinecone';
import Anthropic from '@anthropic-ai/sdk';
class LongTermMemory {
constructor(userId) {
this.userId = userId;
this.pinecone = new Pinecone();
this.index = this.pinecone.index('chatbot-memory');
this.anthropic = new Anthropic();
}
async store(content, metadata = {}) {
// Generate embedding using an embedding model
const embedding = await this.getEmbedding(content);
await this.index.upsert([{
id: `${this.userId}-${Date.now()}`,
values: embedding,
metadata: {
userId: this.userId,
content: content,
timestamp: new Date().toISOString(),
...metadata
}
}]);
}
async recall(query, topK = 5) {
const queryEmbedding = await this.getEmbedding(query);
const results = await this.index.query({
vector: queryEmbedding,
topK: topK,
filter: { userId: this.userId },
includeMetadata: true
});
return results.matches.map(match => ({
content: match.metadata.content,
timestamp: match.metadata.timestamp,
relevance: match.score
}));
}
async getEmbedding(text) {
// Use OpenAI's embedding model or similar
const response = await fetch('https://api.openai.com/v1/embeddings', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'text-embedding-3-small',
input: text
})
});
const data = await response.json();
return data.data[0].embedding;
}
}
Automatic Memory Extraction
Don't manually decide what to remember—let Claude extract important information automatically:
async extractMemories(conversation) {
const response = await anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1000,
messages: [{
role: "user",
content: `Analyze this conversation and extract any facts, preferences,
or important information worth remembering about the user.
Format as JSON: { "memories": [{ "type": "fact|preference|event",
"content": "...", "importance": 1-10 }] }
Conversation: ${JSON.stringify(conversation)}`
}]
});
const memories = JSON.parse(response.content);
for (const memory of memories.memories) {
if (memory.importance >= 7) {
await this.longTermMemory.store(memory.content, {
type: memory.type,
importance: memory.importance
});
}
}
}
Run this after each conversation to build up the user's memory profile.
Episodic Memory: Conversation Archives
Sometimes you need to recall specific past conversations, not just facts extracted from them:
class EpisodicMemory {
constructor(userId) {
this.userId = userId;
this.db = new Database('episodes.db');
}
async storeEpisode(conversation, summary) {
await this.db.insert('episodes', {
userId: this.userId,
timestamp: new Date().toISOString(),
summary: summary,
fullConversation: JSON.stringify(conversation),
embedding: await this.getEmbedding(summary)
});
}
async recallEpisode(query) {
// Semantic search over episode summaries
const episodes = await this.db.query(`
SELECT summary, timestamp, fullConversation
FROM episodes
WHERE userId = ?
ORDER BY vector_distance(embedding, ?) ASC
LIMIT 3
`, [this.userId, await this.getEmbedding(query)]);
return episodes;
}
}
Putting It Together
Here's how all the memory systems work together in a complete chatbot:
class MemoryEnhancedChatbot {
constructor(userId) {
this.conversation = new ConversationMemory();
this.working = new WorkingMemory();
this.longTerm = new LongTermMemory(userId);
this.episodic = new EpisodicMemory(userId);
this.anthropic = new Anthropic();
}
async chat(userMessage) {
// Add to conversation memory
this.conversation.addMessage('user', userMessage);
// Retrieve relevant long-term memories
const memories = await this.longTerm.recall(userMessage);
// Build context-enhanced prompt
const systemPrompt = `You are a helpful assistant with memory.
${this.working.toPrompt()}
Relevant memories about this user:
${memories.map(m => `- ${m.content}`).join('\n')}`;
// Get response from Claude
const response = await this.anthropic.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
system: systemPrompt,
messages: this.conversation.getContext()
});
// Store response in conversation memory
const assistantMessage = response.content[0].text;
this.conversation.addMessage('assistant', assistantMessage);
// Extract and store new memories (async, don't block)
this.extractMemories(this.conversation.messages.slice(-4));
return assistantMessage;
}
}
Memory Hygiene
Long-term memory needs maintenance:
Decay old memories. Reduce the importance score of memories over time. Delete memories that fall below a threshold.
Consolidate duplicates. When similar memories accumulate, merge them into single, stronger memories.
Respect privacy. Let users view, edit, and delete their memories. Build memory management into your UI.
async consolidateMemories() {
const allMemories = await this.longTerm.getAllMemories();
// Cluster similar memories
const clusters = await this.clusterBySimilarity(allMemories);
for (const cluster of clusters) {
if (cluster.length > 1) {
// Merge cluster into single memory
const merged = await this.mergeMemories(cluster);
await this.longTerm.store(merged);
// Delete originals
for (const memory of cluster) {
await this.longTerm.delete(memory.id);
}
}
}
}
Production Considerations
When building memory systems for real users:
- Encrypt stored memories with user-specific keys
- Implement GDPR compliance—users can export and delete their data
- Set memory quotas to prevent storage bloat
- Cache frequent recalls to reduce database load
- Test recall quality with diverse queries
Building memory into AI chatbots transforms them from stateless tools into persistent assistants that grow with their users. Start with simple conversation history, add long-term storage for key facts, and evolve toward full episodic memory as your application matures.
The result is an AI that doesn't just respond—it remembers.
More Articles
The Ultimate OpenClaw AWS Setup Guide

The definitive guide to setting up OpenClaw on AWS. Includes spot instance configuration, cost optimization, and step-by-step instructions.
Building AI Workflows with Tool Chaining in OpenClaw
Master the art of chaining tools and function calls to build powerful multi-step AI automation workflows—from data extraction to content generation and deployment.
Cost Optimization Guide for Self-Hosted AI Assistants: Run Claude on a Budget
Practical strategies to reduce API costs for self-hosted AI assistants—smart model routing, caching, batching, and OpenClaw-specific optimizations to run Claude affordably.