How Memory Works
How Memory Works
Memory in QUI is not just chat history — it's a semantic memory system that stores, organizes, and retrieves knowledge from your character's interactions. Each memory is indexed with vector embeddings for similarity search, categorized by type, and connected to other memories through an association graph.
Two Layers of Memory
Conversation History (Messages)
When you chat with a character in Strings, both your messages and the character's responses are stored in the Memory Service. These messages belong to their conversation (String) and are loaded when you open that conversation.
Semantic Memory (Extracted Knowledge)
Beyond raw messages, the memory system extracts and retains meaningful information. Memories are tagged with a kind that determines how they're stored and how long they persist:
| Kind | What It Stores | Decay Rate |
|---|---|---|
| fact | Verified information, learned knowledge | Never decays |
| self | Character's self-knowledge, preferences | Never decays |
| preference | User preferences, stated likes/dislikes | Very slow decay |
| event | Things that happened, interactions | Moderate decay |
| general | General context and information | Moderate decay |
| fleeting | Temporary observations, passing thoughts | Fast decay |
Facts and self-knowledge are permanent. Fleeting memories fade quickly unless reinforced by repeated access.
Importance Scoring
Every memory has an importance score from 0.0 to 1.0:
- 1.0 — critical information (starred messages, key facts)
- 0.5 — default importance for normal conversation
- 0.0 — trivial information likely to be cleaned up
Importance affects how long a memory survives consolidation and how prominently it appears in search results. You can manually boost importance by starring messages in Strings.
How Memory Is Retrieved
When a character responds to your message, the Memory Service builds context by searching for relevant memories:
- Semantic search — your message is converted to a vector embedding and compared against stored memories by similarity
- Association following — memories linked to relevant results are also retrieved (see Association Graph)
- Token budgeting — retrieved memories are packed into the available context window, prioritized by relevance and importance
- Context assembly — the formatted memory context is included in the character's system prompt
The result: your character references relevant past conversations and learned knowledge naturally, without you needing to remind it.
Awareness Control
In Strings, the Awareness slider (0-100) controls how much memory context is included per message. Lower awareness = fewer memories retrieved (faster, cheaper). Higher awareness = richer context (more relevant but more tokens).
Memory Levels (Compression Tiers)
Memories exist at different compression levels:
| Level | Name | Description |
|---|---|---|
| 0 | Raw | Original messages and stored content — unprocessed |
| 1 | Consolidated | Summaries created by compressing related L0 memories |
| 2+ | Further compressed | Higher-level summaries of L1 content |
Raw memories accumulate over time. The Cortex service processes them into consolidated summaries, creating a hierarchical knowledge structure. This is similar to how human memory works — specific details fade while important patterns and knowledge are retained.
See Cortex Modes for how consolidation works.
Duplicate Detection
The memory system detects near-duplicate content using vector similarity. When you store a memory that is very similar to an existing one (>95% similarity), it's flagged as a possible duplicate. This prevents the same information from cluttering the memory graph.
What's Next
- Cortex Modes — 8 consolidation modes that process and organize memories
- Association Graph — how memories link together and get smarter over time