Billing
Billing
QUI uses a prepaid balance system. You add funds to your account, and each AI interaction deducts from your balance based on actual token usage. Billing is handled transparently — you interact with your characters and the costs are tracked automatically.
How Billing Works
Every time a character generates a response using a cloud LLM provider, the system:
- Reserves the estimated cost before the call
- Sends the request to the LLM provider (Anthropic, OpenAI, Google, or X)
- Charges the actual token usage after the response completes
- Refunds the difference if the actual cost was less than the estimate
You never pay more than the actual usage. The reserve-and-refund system prevents overspending while ensuring calls go through without delay.
Adding Funds
Navigate to the Billing tab in your QUI Core dashboard sidebar.
- Click Add Funds
- Choose an amount
- Complete the payment
Your balance updates immediately and is available for use across all characters and services.
[Screenshot: Billing tab showing balance and Add Funds button]
Checking Your Balance
The Billing tab shows:
- Current balance — your available funds
- Usage history — a breakdown of costs by date, character, and provider
- Usage by app — which applications (Strings, Concierge, etc.) are using your balance
Spending Limits
You can set spending limits to control costs:
- Daily limit — maximum spend per day
- Monthly limit — maximum spend per month
When a limit is reached, cloud LLM calls are paused until the next period. Local models (via Qllama) are not affected by spending limits since they run on your hardware at no cost.
Understanding Costs
LLM costs depend on two factors:
- Token count — how many tokens are in the input (your message, character context, memory) and output (the response)
- Model pricing — each model has different rates per 1K tokens
You can compare model pricing in the Model Pricing tab of your dashboard. Generally:
- Fast/cheap models (Haiku, GPT-4o Mini, Gemini Flash) — best for casual conversation and quick answers
- Powerful models (Opus, GPT-4o, Gemini Pro) — better for complex reasoning, longer outputs, and detailed analysis
Each character can use a different model. Choose based on the character's purpose — a simple helper doesn't need the most expensive model.
BYOK (Bring Your Own Key)
If you prefer to use your own API keys from LLM providers:
- Switch to BYOK mode in your account settings
- Enter your API keys for each provider you want to use
- The platform deducts only the platform fee from your prepaid balance — the provider bills you directly for token usage
In BYOK mode, your balance covers the platform fee only. The provider charges appear on your own account with that provider.
Note: BYOK is all-or-nothing per account. You cannot use BYOK for one provider and managed billing for another. Your balance and usage reports remain consolidated regardless of mode.
Cost-Saving Tips
- Use fast models for simple tasks — Haiku, GPT-4o Mini, and Gemini Flash cost a fraction of larger models
- Set token limits per character — each character has a configurable maximum output token count
- Use local models for experimentation — Qllama runs on your hardware with zero billing cost
- Monitor the Model Pricing tab — provider prices change; check periodically for better options
- Set spending limits — prevent unexpected costs with daily and monthly caps