LLM Providers
LLM Providers
Every character in Qui Anima needs an LLM (Large Language Model) to generate responses. QUI supports four cloud providers plus local model hosting — you choose which provider and model each character uses.
Supported Providers
| Provider | Models Include | Strengths |
|---|---|---|
| Anthropic | Claude Opus, Sonnet, Haiku | Strong reasoning, safety-conscious, long context windows |
| OpenAI | GPT-4o, GPT-4o Mini, GPT-4 Turbo | Broad capability, strong tool use, wide ecosystem |
| Gemini Pro, Gemini Flash | Cost-effective, multimodal, fast inference | |
| X (Grok) | Grok models | Real-time information access |
Local Models (Qllama)
QUI also supports local LLM hosting via Qllama (an Ollama wrapper). Local models run entirely on your hardware:
- No internet required — fully offline operation
- No billing cost — runs on your GPU/CPU at no per-token charge
- Privacy — data never leaves your machine
- GPU recommended — local models are slow on CPU-only systems
See Local Models for setup details.
Selecting a Provider Per Character
Each character has its own provider and model selection, configured in the core Anima node:
- Open the Visual Builder
- Click the core Anima node (center of canvas)
- Select the LLM Provider from the dropdown
- Select the specific Model from the available models for that provider
Different characters can use different providers. A simple helper character might use a fast, cheap model (Haiku, GPT-4o Mini, Gemini Flash) while a complex analyst uses a powerful one (Opus, GPT-4o, Gemini Pro).
The LLM Call Chain
When your character generates a response, the request flows through a billing-safe chain:
Your character (Anima) → QUI Core (local gateway) → Billing verification → LLM Provider
At each step:
- Anima assembles the full character context — identity, personality, memory, knowledge, tool descriptions — into a complete prompt
- QUI Core routes the request. For cloud models, it forwards to the central billing hub. For local models (model name starts with
local-), it routes directly to Qllama on your machine. - Billing verification — the hub checks your balance, reserves the estimated cost, caps output tokens to what you can afford, then forwards to the provider. After the response, it charges actual usage and refunds any overage.
- LLM Provider generates the response
You don't need to manage API keys or provider authentication. The system handles this through proxy keys that are generated and cached automatically.
Cost Awareness
LLM costs depend on:
- Model tier — larger models cost more per token
- Input tokens — everything sent to the LLM (system prompt, memory, knowledge, conversation history)
- Output tokens — the response generated
Cost Control Levers
| Control | Where to Set | Effect |
|---|---|---|
| Model selection | Core node in Visual Builder | Cheaper models = lower cost per token |
| Token Limits | Token Limits node in Visual Builder | Caps maximum output per response |
| Memory awareness | Awareness slider in Strings | Controls how much memory context is included (less = fewer input tokens) |
| Spending limits | Billing tab in QUI Core dashboard | Daily/monthly caps across all characters |
| Local models | Switch to a local-* model |
Zero billing cost (runs on your hardware) |
Quick Cost Comparison
| Tier | Example Models | Relative Cost |
|---|---|---|
| Economy | Haiku, GPT-4o Mini, Gemini Flash | $ |
| Standard | Sonnet, GPT-4o, Gemini Pro | $$ |
| Premium | Opus | $$$ |
| Free | Local models via Qllama | — |
Check the Model Pricing tab in your QUI Core dashboard for exact per-token pricing across all providers.
Provider-Specific Notes
Anthropic
- Claude models support very long context windows (up to 200K tokens on some models)
- Strong at following complex instructions and maintaining character consistency
OpenAI
- GPT-4o is multimodal (text + image input)
- Avoid o1 and o3 model families — they handle temperature and system messages differently and may cause errors
- Gemini Flash is one of the cheapest options with acceptable quality
- Gemini models support multimodal input
X (Grok)
- Access to real-time information
- Model names start with
grokorxai
Switching Providers
You can change a character's provider and model at any time in the Visual Builder. The change takes effect on the next message. Conversation history and memory are unaffected — they belong to the character, not the provider.
Tip: Try different models for the same character to find the best quality-to-cost ratio for your use case. A character's personality and capabilities remain consistent across providers — only the underlying LLM changes.