Skip to main content

How Memory Works

How Memory Works

Memory in QUI is not just chat history — it's a semantic memory system that stores, organizes, and retrieves knowledge from your character's interactions. Each memory is indexed with vector embeddings for similarity search, categorized by type, and connected to other memories through an association graph.


Two Layers of Memory

Conversation History (Messages)

When you chat with a character in Strings, both your messages and the character's responses are stored in the Memory Service. These messages belong to their conversation (String) and are loaded when you open that conversation.

Semantic Memory (Extracted Knowledge)

Beyond raw messages, the memory system extracts and retains meaningful information. Memories are tagged with a kind that determines how they're stored and how long they persist:

Kind What It Stores Decay Rate
fact Verified information, learned knowledge Never decays
self Character's self-knowledge, preferences Never decays
preference User preferences, stated likes/dislikes Very slow decay
event Things that happened, interactions Moderate decay
general General context and information Moderate decay
fleeting Temporary observations, passing thoughts Fast decay

Facts and self-knowledge are permanent. Fleeting memories fade quickly unless reinforced by repeated access.


Importance Scoring

Every memory has an importance score from 0.0 to 1.0:

  • 1.0 — critical information (starred messages, key facts)
  • 0.5 — default importance for normal conversation
  • 0.0 — trivial information likely to be cleaned up

Importance affects how long a memory survives consolidation and how prominently it appears in search results. You can manually boost importance by starring messages in Strings.


How Memory Is Retrieved

When a character responds to your message, the Memory Service builds context by searching for relevant memories:

  1. Semantic search — your message is converted to a vector embedding and compared against stored memories by similarity
  2. Association following — memories linked to relevant results are also retrieved (see Association Graph)
  3. Token budgeting — retrieved memories are packed into the available context window, prioritized by relevance and importance
  4. Context assembly — the formatted memory context is included in the character's system prompt

The result: your character references relevant past conversations and learned knowledge naturally, without you needing to remind it.

Awareness Control

In Strings, the Awareness slider (0-100) controls how much memory context is included per message. Lower awareness = fewer memories retrieved (faster, cheaper). Higher awareness = richer context (more relevant but more tokens).


Memory Levels (Compression Tiers)

Memories exist at different compression levels:

Level Name Description
0 Raw Original messages and stored content — unprocessed
1 Consolidated Summaries created by compressing related L0 memories
2+ Further compressed Higher-level summaries of L1 content

Raw memories accumulate over time. The Cortex service processes them into consolidated summaries, creating a hierarchical knowledge structure. This is similar to how human memory works — specific details fade while important patterns and knowledge are retained.

See Cortex Modes for how consolidation works.


Duplicate Detection

The memory system detects near-duplicate content using vector similarity. When you store a memory that is very similar to an existing one (>95% similarity), it's flagged as a possible duplicate. This prevents the same information from cluttering the memory graph.


What's Next

  • Cortex Modes — 8 consolidation modes that process and organize memories
  • Association Graph — how memories link together and get smarter over time
Updated on Mar 21, 2026