How Memory Works

Memory in QUI is not just chat history — it's a semantic memory system that stores, organizes, and retrieves knowledge from your character's interactions. Each memory is indexed with vector embeddings for similarity search, categorized by type, and connected to other memories through an association graph.

Two Layers of Memory

Conversation History (Messages)

When you chat with a character in Strings, both your messages and the character's responses are stored in the Memory Service. These messages belong to their conversation (String) and are loaded when you open that conversation.

Semantic Memory (Extracted Knowledge)

Beyond raw messages, the memory system extracts and retains meaningful information. Memories are tagged with a kind that determines how they're stored and how long they persist:

Kind	What It Stores	Decay Rate
fact	Verified information, learned knowledge	Never decays
self	Character's self-knowledge, preferences	Never decays
preference	User preferences, stated likes/dislikes	Very slow decay
event	Things that happened, interactions	Moderate decay
general	General context and information	Moderate decay
fleeting	Temporary observations, passing thoughts	Fast decay

Facts and self-knowledge are permanent. Fleeting memories fade quickly unless reinforced by repeated access.

Importance Scoring

Every memory has an importance score from 0.0 to 1.0:

1.0 — critical information (starred messages, key facts)
0.5 — default importance for normal conversation
0.0 — trivial information likely to be cleaned up

Importance affects how long a memory survives consolidation and how prominently it appears in search results. You can manually boost importance by starring messages in Strings.

How Memory Is Retrieved

When a character responds to your message, the Memory Service builds context by searching for relevant memories:

Semantic search — your message is converted to a vector embedding and compared against stored memories by similarity
Association following — memories linked to relevant results are also retrieved (see Association Graph)
Token budgeting — retrieved memories are packed into the available context window, prioritized by relevance and importance
Context assembly — the formatted memory context is included in the character's system prompt

The result: your character references relevant past conversations and learned knowledge naturally, without you needing to remind it.

Awareness Control

In Strings, the Awareness slider (0-100) controls how much memory context is included per message. Lower awareness = fewer memories retrieved (faster, cheaper). Higher awareness = richer context (more relevant but more tokens).

Memory Levels (Compression Tiers)

Memories exist at different compression levels:

Level	Name	Description
0	Raw	Original messages and stored content — unprocessed
1	Consolidated	Summaries created by compressing related L0 memories
2+	Further compressed	Higher-level summaries of L1 content

Raw memories accumulate over time. The Cortex service processes them into consolidated summaries, creating a hierarchical knowledge structure. This is similar to how human memory works — specific details fade while important patterns and knowledge are retained.

See Cortex Modes for how consolidation works.

Duplicate Detection

The memory system detects near-duplicate content using vector similarity. When you store a memory that is very similar to an existing one (>95% similarity), it's flagged as a possible duplicate. This prevents the same information from cluttering the memory graph.

What's Next

Cortex Modes — 8 consolidation modes that process and organize memories
Association Graph — how memories link together and get smarter over time

Updated on Mar 21, 2026