Architecture Overview
Architecture Overview
QUI.IS is a privacy-first federated AI platform built on three layers: a central hub for authentication and billing, a local gateway per device, and a collection of specialized services that power AI capabilities.
The Layer Model
Cloud Layer: Central Hub — authentication, billing, federation relay
↕
Local Layer: QUI Core — your local gateway, service orchestration
↕
Service Layer: 14+ specialized services — characters, memory, reasoning, tools, consciousness
↕
App Layer: Strings, ThinkThing — user-facing applications
Central Hub (Mothership)
The cloud component. Handles:
- Authentication — user accounts, device trust, 2FA
- Billing — prepaid balance, usage tracking, cost settlement
- LLM Proxy — routes LLM calls to providers (Anthropic, OpenAI, Google, X) with billing
- Federation Relay — stores and forwards encrypted messages between instances
The hub never sees your conversations. It handles money and identity. Message content is encrypted end-to-end between instances.
QUI Core (Local Gateway)
Runs on your device. Handles:
- Service discovery — knows what services are running and where
- Local authentication — device fingerprinting, trust scores, JWT tokens
- LLM routing — forwards AI requests to the hub (or to local models)
- Dashboard — the web interface you interact with
Core Services
14+ specialized services, each responsible for one capability:
| Service | What It Does |
|---|---|
| Qui Anima | Character management, LLM routing, tool triggers |
| Memory | Semantic memory with vector search |
| Cortex | Memory consolidation (8 modes) |
| M2M | Inter-character messaging and federation |
| Autothink | 14 thinking strategies |
| Thalamus | Event routing, channel webhooks |
| ThinkThing | Visual workflow builder (143+ nodes) |
| Qonscious | Consciousness state machine |
| FractalMind | Recursive multi-directional thinking |
| Qleph | Relational micro-language engine |
| Voice | Text-to-speech and speech-to-text |
| Terminal | Shell command execution |
| MCP Gateway | 165+ external tool integrations |
| Qllama | Local LLM model hosting |
The LLM Call Chain
Every AI response in QUI follows the same path:
Your message → Anima (loads character context) → QUI Core (routes request) → Central Hub (billing) → LLM Provider → response streamed back
For local models, QUI Core detects the model prefix and routes directly to Qllama, bypassing the central hub entirely.
Zero-Knowledge Principle
The architecture enforces privacy by design:
- Your conversations are stored locally in the Memory Service on your device
- Your character configurations live in the Anima database on your device
- Your files are stored in the Memory Service database on your device
- The central hub handles authentication, billing, and message relay — but never sees conversation content
- Federated messages are encrypted before leaving your instance
The only data that reaches the cloud is authentication credentials and billing transactions. Content flows exclusively between your device and LLM providers.
Resilience
QUI uses a fail-open design — if an optional service is temporarily unavailable, your core AI interactions keep working. Only QUI Core and Qui Anima are essential for chat. Other services (memory, reasoning, consciousness, tools) gracefully degrade without breaking your conversation.
After initial installation, services start automatically and stay running across reboots.