Billing

QUI uses a prepaid balance system. You add funds to your account, and each AI interaction deducts from your balance based on actual token usage. Billing is handled transparently — you interact with your characters and the costs are tracked automatically.

How Billing Works

Every time a character generates a response using a cloud LLM provider, the system:

Reserves the estimated cost before the call
Sends the request to the LLM provider (Anthropic, OpenAI, Google, or X)
Charges the actual token usage after the response completes
Refunds the difference if the actual cost was less than the estimate

You never pay more than the actual usage. The reserve-and-refund system prevents overspending while ensuring calls go through without delay.

Adding Funds

Navigate to the Billing tab in your QUI Core dashboard sidebar.

Click Add Funds
Choose an amount
Complete the payment

Your balance updates immediately and is available for use across all characters and services.

[Screenshot: Billing tab showing balance and Add Funds button]

Checking Your Balance

The Billing tab shows:

Current balance — your available funds
Usage history — a breakdown of costs by date, character, and provider
Usage by app — which applications (Strings, Concierge, etc.) are using your balance

Spending Limits

You can set spending limits to control costs:

Daily limit — maximum spend per day
Monthly limit — maximum spend per month

When a limit is reached, cloud LLM calls are paused until the next period. Local models (via Qllama) are not affected by spending limits since they run on your hardware at no cost.

Understanding Costs

LLM costs depend on two factors:

Token count — how many tokens are in the input (your message, character context, memory) and output (the response)
Model pricing — each model has different rates per 1K tokens

You can compare model pricing in the Model Pricing tab of your dashboard. Generally:

Fast/cheap models (Haiku, GPT-4o Mini, Gemini Flash) — best for casual conversation and quick answers
Powerful models (Opus, GPT-4o, Gemini Pro) — better for complex reasoning, longer outputs, and detailed analysis

Each character can use a different model. Choose based on the character's purpose — a simple helper doesn't need the most expensive model.

BYOK (Bring Your Own Key)

If you prefer to use your own API keys from LLM providers:

Switch to BYOK mode in your account settings
Enter your API keys for each provider you want to use
The platform deducts only the platform fee from your prepaid balance — the provider bills you directly for token usage

In BYOK mode, your balance covers the platform fee only. The provider charges appear on your own account with that provider.

Note: BYOK is all-or-nothing per account. You cannot use BYOK for one provider and managed billing for another. Your balance and usage reports remain consolidated regardless of mode.

Cost-Saving Tips

Use fast models for simple tasks — Haiku, GPT-4o Mini, and Gemini Flash cost a fraction of larger models
Set token limits per character — each character has a configurable maximum output token count
Use local models for experimentation — Qllama runs on your hardware with zero billing cost
Monitor the Model Pricing tab — provider prices change; check periodically for better options
Set spending limits — prevent unexpected costs with daily and monthly caps

Updated on Mar 21, 2026

Getting Started