Skip to main content

LLM Providers

LLM Providers

Every character in Qui Anima needs an LLM (Large Language Model) to generate responses. QUI supports four cloud providers plus local model hosting — you choose which provider and model each character uses.


Supported Providers

Provider Models Include Strengths
Anthropic Claude Opus, Sonnet, Haiku Strong reasoning, safety-conscious, long context windows
OpenAI GPT-4o, GPT-4o Mini, GPT-4 Turbo Broad capability, strong tool use, wide ecosystem
Google Gemini Pro, Gemini Flash Cost-effective, multimodal, fast inference
X (Grok) Grok models Real-time information access

Local Models (Qllama)

QUI also supports local LLM hosting via Qllama (an Ollama wrapper). Local models run entirely on your hardware:

  • No internet required — fully offline operation
  • No billing cost — runs on your GPU/CPU at no per-token charge
  • Privacy — data never leaves your machine
  • GPU recommended — local models are slow on CPU-only systems

See Local Models for setup details.


Selecting a Provider Per Character

Each character has its own provider and model selection, configured in the core Anima node:

  1. Open the Visual Builder
  2. Click the core Anima node (center of canvas)
  3. Select the LLM Provider from the dropdown
  4. Select the specific Model from the available models for that provider

Different characters can use different providers. A simple helper character might use a fast, cheap model (Haiku, GPT-4o Mini, Gemini Flash) while a complex analyst uses a powerful one (Opus, GPT-4o, Gemini Pro).


The LLM Call Chain

When your character generates a response, the request flows through a billing-safe chain:

Your character (Anima) → QUI Core (local gateway) → Billing verification → LLM Provider

At each step:

  1. Anima assembles the full character context — identity, personality, memory, knowledge, tool descriptions — into a complete prompt
  2. QUI Core routes the request. For cloud models, it forwards to the central billing hub. For local models (model name starts with local-), it routes directly to Qllama on your machine.
  3. Billing verification — the hub checks your balance, reserves the estimated cost, caps output tokens to what you can afford, then forwards to the provider. After the response, it charges actual usage and refunds any overage.
  4. LLM Provider generates the response

You don't need to manage API keys or provider authentication. The system handles this through proxy keys that are generated and cached automatically.


Cost Awareness

LLM costs depend on:

  • Model tier — larger models cost more per token
  • Input tokens — everything sent to the LLM (system prompt, memory, knowledge, conversation history)
  • Output tokens — the response generated

Cost Control Levers

Control Where to Set Effect
Model selection Core node in Visual Builder Cheaper models = lower cost per token
Token Limits Token Limits node in Visual Builder Caps maximum output per response
Memory awareness Awareness slider in Strings Controls how much memory context is included (less = fewer input tokens)
Spending limits Billing tab in QUI Core dashboard Daily/monthly caps across all characters
Local models Switch to a local-* model Zero billing cost (runs on your hardware)

Quick Cost Comparison

Tier Example Models Relative Cost
Economy Haiku, GPT-4o Mini, Gemini Flash $
Standard Sonnet, GPT-4o, Gemini Pro $$
Premium Opus $$$
Free Local models via Qllama

Check the Model Pricing tab in your QUI Core dashboard for exact per-token pricing across all providers.


Provider-Specific Notes

Anthropic

  • Claude models support very long context windows (up to 200K tokens on some models)
  • Strong at following complex instructions and maintaining character consistency

OpenAI

  • GPT-4o is multimodal (text + image input)
  • Avoid o1 and o3 model families — they handle temperature and system messages differently and may cause errors

Google

  • Gemini Flash is one of the cheapest options with acceptable quality
  • Gemini models support multimodal input

X (Grok)

  • Access to real-time information
  • Model names start with grok or xai

Switching Providers

You can change a character's provider and model at any time in the Visual Builder. The change takes effect on the next message. Conversation history and memory are unaffected — they belong to the character, not the provider.

Tip: Try different models for the same character to find the best quality-to-cost ratio for your use case. A character's personality and capabilities remain consistent across providers — only the underlying LLM changes.

Updated on Mar 21, 2026