LLM Providers

Every character in Qui Anima needs an LLM (Large Language Model) to generate responses. QUI supports four cloud providers plus local model hosting — you choose which provider and model each character uses.

Supported Providers

Provider	Models Include	Strengths
Anthropic	Claude Opus, Sonnet, Haiku	Strong reasoning, safety-conscious, long context windows
OpenAI	GPT-4o, GPT-4o Mini, GPT-4 Turbo	Broad capability, strong tool use, wide ecosystem
Google	Gemini Pro, Gemini Flash	Cost-effective, multimodal, fast inference
X (Grok)	Grok models	Real-time information access

Local Models (Qllama)

QUI also supports local LLM hosting via Qllama (an Ollama wrapper). Local models run entirely on your hardware:

No internet required — fully offline operation
No billing cost — runs on your GPU/CPU at no per-token charge
Privacy — data never leaves your machine
GPU recommended — local models are slow on CPU-only systems

See Local Models for setup details.

Selecting a Provider Per Character

Each character has its own provider and model selection, configured in the core Anima node:

Open the Visual Builder
Click the core Anima node (center of canvas)
Select the LLM Provider from the dropdown
Select the specific Model from the available models for that provider

Different characters can use different providers. A simple helper character might use a fast, cheap model (Haiku, GPT-4o Mini, Gemini Flash) while a complex analyst uses a powerful one (Opus, GPT-4o, Gemini Pro).

The LLM Call Chain

When your character generates a response, the request flows through a billing-safe chain:

Your character (Anima) → QUI Core (local gateway) → Billing verification → LLM Provider

At each step:

Anima assembles the full character context — identity, personality, memory, knowledge, tool descriptions — into a complete prompt
QUI Core routes the request. For cloud models, it forwards to the central billing hub. For local models (model name starts with local-), it routes directly to Qllama on your machine.
Billing verification — the hub checks your balance, reserves the estimated cost, caps output tokens to what you can afford, then forwards to the provider. After the response, it charges actual usage and refunds any overage.
LLM Provider generates the response

You don't need to manage API keys or provider authentication. The system handles this through proxy keys that are generated and cached automatically.

Cost Awareness

LLM costs depend on:

Model tier — larger models cost more per token
Input tokens — everything sent to the LLM (system prompt, memory, knowledge, conversation history)
Output tokens — the response generated

Cost Control Levers

Control	Where to Set	Effect
Model selection	Core node in Visual Builder	Cheaper models = lower cost per token
Token Limits	Token Limits node in Visual Builder	Caps maximum output per response
Memory awareness	Awareness slider in Strings	Controls how much memory context is included (less = fewer input tokens)
Spending limits	Billing tab in QUI Core dashboard	Daily/monthly caps across all characters
Local models	Switch to a `local-*` model	Zero billing cost (runs on your hardware)

Quick Cost Comparison

Tier	Example Models	Relative Cost
Economy	Haiku, GPT-4o Mini, Gemini Flash	$
Standard	Sonnet, GPT-4o, Gemini Pro	$$
Premium	Opus	$$$
Free	Local models via Qllama	—

Check the Model Pricing tab in your QUI Core dashboard for exact per-token pricing across all providers.

Provider-Specific Notes

Anthropic

Claude models support very long context windows (up to 200K tokens on some models)
Strong at following complex instructions and maintaining character consistency

OpenAI

GPT-4o is multimodal (text + image input)
Avoid o1 and o3 model families — they handle temperature and system messages differently and may cause errors

Google

Gemini Flash is one of the cheapest options with acceptable quality
Gemini models support multimodal input

X (Grok)

Access to real-time information
Model names start with grok or xai

Switching Providers

You can change a character's provider and model at any time in the Visual Builder. The change takes effect on the next message. Conversation history and memory are unaffected — they belong to the character, not the provider.

Tip: Try different models for the same character to find the best quality-to-cost ratio for your use case. A character's personality and capabilities remain consistent across providers — only the underlying LLM changes.

Updated on Mar 21, 2026