docs: add conversational memory design document

Outlines persistent memory system for making the bot a real conversational participant that knows people and remembers past interactions. Uses existing UserNotes column for permanent profiles and a new UserMemory table for expiring context with LLM-assigned lifetimes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 12:41:28 -05:00
parent 196f8c8ae5
commit d652c32063
1 changed files with 216 additions and 0 deletions
@@ -0,0 +1,216 @@
+# Conversational Memory Design
+
+## Goal
+
+Make the bot a real conversational participant that knows people, remembers past interactions, can answer general questions, and gives input based on accumulated context. People should be able to ask it questions and get thoughtful answers informed by who they are and what's happened before.
+
+## Design Decisions
+
+- **Memory approach**: Structured memory tables in existing MSSQL database
+- **Learning mode**: Both passive (observing chat via sentiment analysis) and active (direct conversations)
+- **Knowledge scope**: General knowledge + server/people awareness (no web search)
+- **Permanent memory**: Stored in existing `UserState.UserNotes` column (repurposed as LLM-maintained profile)
+- **Expiring memory**: New `UserMemory` table for transient context with LLM-assigned expiration
+
+## Database Changes
+
+### Repurposed: `UserState.UserNotes`
+
+No schema change needed. The column already exists as `NVARCHAR(MAX)`. Currently stores timestamped observation lines (max 10). Will be repurposed as an LLM-maintained **permanent profile summary** — a compact paragraph of durable facts about a user.
+
+Example content:
+```
+GTA Online grinder (rank 400+, wants to hit 500), sarcastic humor, works night shifts, hates battle royales. Has a dog named Rex. Banters with the bot, usually tries to get roasted. Been in the server since early 2024.
+```
+
+The LLM rewrites this field as a whole when new permanent facts emerge, rather than appending timestamped lines.
+
+### New Table: `UserMemory`
+
+Stores expiring memories — transient context that's relevant for days or weeks but not forever.
+
+```sql
+CREATE TABLE UserMemory (
+    Id          BIGINT IDENTITY(1,1) PRIMARY KEY,
+    UserId      BIGINT NOT NULL,
+    Memory      NVARCHAR(500) NOT NULL,
+    Topics      NVARCHAR(200) NOT NULL,       -- comma-separated tags
+    Importance  NVARCHAR(10) NOT NULL,         -- low, medium, high
+    ExpiresAt   DATETIME2 NOT NULL,
+    Source      NVARCHAR(20) NOT NULL,         -- 'chat' or 'passive'
+    CreatedAt   DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME(),
+    INDEX IX_UserMemory_UserId (UserId),
+    INDEX IX_UserMemory_ExpiresAt (ExpiresAt)
+)
+```
+
+Example rows:
+
+| Memory | Topics | Importance | ExpiresAt | Source |
+|--------|--------|------------|-----------|--------|
+| Frustrated about losing ranked matches in Warzone | warzone,fps,frustration | medium | +7d | passive |
+| Said they're quitting Warzone for good | warzone,fps | high | +30d | chat |
+| Drunk tonight, celebrating Friday | personal,celebration | low | +1d | chat |
+| Excited about GTA DLC dropping next week | gta,dlc | medium | +7d | passive |
+
+## Memory Extraction
+
+### From Direct Conversations (ChatCog)
+
+After the bot sends a chat reply, a **fire-and-forget background task** calls the triage LLM to extract memories from the conversation. This does not block the reply.
+
+New LLM tool definition:
+
+```python
+MEMORY_EXTRACTION_TOOL = {
+    "type": "function",
+    "function": {
+        "name": "extract_memories",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "memories": {
+                    "type": "array",
+                    "items": {
+                        "type": "object",
+                        "properties": {
+                            "memory": {
+                                "type": "string",
+                                "description": "A concise fact or observation worth remembering."
+                            },
+                            "topics": {
+                                "type": "array",
+                                "items": {"type": "string"},
+                                "description": "Topic tags for retrieval (e.g., 'gta', 'personal', 'warzone')."
+                            },
+                            "expiration": {
+                                "type": "string",
+                                "enum": ["1d", "3d", "7d", "30d", "permanent"],
+                                "description": "How long this memory stays relevant. Use 'permanent' for stable facts about the person."
+                            },
+                            "importance": {
+                                "type": "string",
+                                "enum": ["low", "medium", "high"],
+                                "description": "How important this memory is for future interactions."
+                            }
+                        },
+                        "required": ["memory", "topics", "expiration", "importance"]
+                    },
+                    "description": "Memories to store. Only include genuinely new or noteworthy information."
+                },
+                "profile_update": {
+                    "type": ["string", "null"],
+                    "description": "If a permanent fact was learned, provide the full updated profile summary incorporating the new info. Null if no profile changes needed."
+                }
+            },
+            "required": ["memories"]
+        }
+    }
+}
+```
+
+The extraction prompt receives:
+- The conversation that just happened (from `_chat_history`)
+- The user's current profile (`UserNotes`)
+- Instructions to only extract genuinely new information
+
+### From Passive Observation (SentimentCog)
+
+The existing `note_update` field from analysis results currently feeds `DramaTracker.update_user_notes()`. This will be enhanced:
+
+- If `note_update` contains a durable fact (the LLM can flag this), update `UserNotes` profile
+- If it's transient observation, insert into `UserMemory` with a 7d default expiration
+- The analysis tool's `note_update` field description gets updated to indicate whether the note is permanent or transient
+
+## Memory Retrieval at Chat Time
+
+When building context for a chat reply, memories are pulled in layers and injected as a structured block:
+
+### Layer 1: Profile (always included)
+```python
+profile = user_state.user_notes  # permanent profile summary
+```
+
+### Layer 2: Recent Expiring Memories (last 5 by CreatedAt)
+```sql
+SELECT TOP 5 Memory, Topics, CreatedAt
+FROM UserMemory
+WHERE UserId = ? AND ExpiresAt > SYSUTCDATETIME()
+ORDER BY CreatedAt DESC
+```
+
+### Layer 3: Topic-Matched Memories
+Extract keywords from the current message, match against `Topics` column:
+```sql
+SELECT TOP 5 Memory, Topics, CreatedAt
+FROM UserMemory
+WHERE UserId = ? AND ExpiresAt > SYSUTCDATETIME()
+  AND (Topics LIKE '%gta%' OR Topics LIKE '%warzone%')  -- dynamic from message keywords
+ORDER BY Importance DESC, CreatedAt DESC
+```
+
+### Layer 4: Channel Bias
+If in a game channel (e.g., `#gta-online`), add the game name as a topic filter to boost relevant memories.
+
+### Injected Context Format
+```
+[What you know about {username}:]
+Profile: GTA grinder (rank 400+), sarcastic, works night shifts, hates BRs. Banters with the bot.
+Recent: Said they're quitting Warzone (2 days ago) | Excited about GTA DLC (yesterday)
+Relevant: Mentioned trying to hit rank 500 in GTA (3 weeks ago)
+```
+
+Target: ~200-400 tokens of memory context per chat interaction.
+
+## Memory Maintenance
+
+### Pruning (daily background task)
+```sql
+DELETE FROM UserMemory WHERE ExpiresAt < SYSUTCDATETIME()
+```
+
+Also enforce a per-user cap (50 memories). When exceeded, delete oldest low-importance memories first:
+```sql
+-- Delete excess memories beyond cap, keeping high importance longest
+DELETE FROM UserMemory
+WHERE Id IN (
+    SELECT Id FROM UserMemory
+    WHERE UserId = ?
+    ORDER BY
+        CASE Importance WHEN 'high' THEN 3 WHEN 'medium' THEN 2 ELSE 1 END,
+        CreatedAt DESC
+    OFFSET 50 ROWS
+)
+```
+
+### Profile Consolidation
+When a `permanent` memory is extracted, the LLM provides an updated `profile_update` string that incorporates the new fact into the existing profile. This replaces `UserNotes` directly — no separate consolidation task needed.
+
+## Integration Changes
+
+| File | Changes |
+|------|---------|
+| `utils/database.py` | Add `UserMemory` table creation in schema. Add CRUD: `save_memory()`, `get_recent_memories()`, `get_memories_by_topics()`, `prune_expired_memories()`, `prune_excess_memories()`. Update `save_user_state()` (no schema change needed). |
+| `utils/llm_client.py` | Add `extract_memories()` method with `MEMORY_EXTRACTION_TOOL`. Add `MEMORY_EXTRACTION_PROMPT` for the extraction system prompt. |
+| `utils/drama_tracker.py` | `update_user_notes()` changes from appending timestamped lines to replacing the full profile string when a profile update is provided. Keep backward compat for non-profile note_updates during transition. |
+| `cogs/chat.py` | At chat time: query DB for memories, build memory context block, inject into prompt. After reply: fire-and-forget memory extraction task. |
+| `cogs/sentiment/` | Route `note_update` from analysis into `UserMemory` table (expiring) or `UserNotes` profile update (permanent). |
+| `bot.py` | Start daily memory pruning background task on bot ready. |
+
+## What Stays the Same
+
+- In-memory `_chat_history` deque (10 turns per channel) for immediate conversation coherence
+- All existing moderation/analysis logic
+- Mode system and personality prompts (memory context is additive)
+- `UserState` table schema (no changes)
+- Existing DramaTracker hydration flow
+
+## Token Budget
+
+Per chat interaction:
+- Profile summary: ~50-100 tokens
+- Recent memories (5): ~75-125 tokens
+- Topic-matched memories (5): ~75-125 tokens
+- **Total memory context: ~200-350 tokens**
+
+Memory extraction call (background, triage model): ~500 input tokens, ~200 output tokens per conversation.