Voice
Voice
The Voice service adds text-to-speech (TTS) and speech-to-text (STT) capabilities to your characters. Characters can speak their responses aloud and listen to voice input.
Text-to-Speech (Kokoro TTS)
Kokoro is a lightweight, high-quality TTS engine that runs locally. When voice is enabled on a character, you can click the speak button on any message to hear it read aloud.
Streaming TTS
For longer messages, the Voice service uses streaming TTS — it splits the text into sentences and synthesizes each one independently. Audio starts playing after the first sentence is ready (typically 1-2 seconds), rather than waiting for the entire message to be synthesized.
This sentence-by-sentence approach provides low-latency audio output even for long responses.
Voice Selection
Multiple voices are available. Configure the voice per character in the Visual Builder's Voice node settings:
- Select from available Kokoro voices
- Adjust playback speed
Speech-to-Text (Whisper)
Whisper (via faster-whisper) transcribes spoken audio into text. When voice is enabled, you can record a voice message instead of typing.
The transcribed text is sent as a regular message to the character. Language detection is automatic.
Using Voice in Strings
When a character has the Voice node enabled:
- A speak button appears on each message — click to hear the character's response
- A microphone button appears in the input bar — click to record voice input
- Speaking state is indicated with a pulsing animation on the speak button
Voice features are available alongside text — you can mix voice and text input freely.
Enabling Voice
- Add the Voice node to your character's canvas in the Visual Builder
- Configure voice selection and speed in the node settings
- Voice features become available in Strings for conversations with that character
Voice processing runs entirely on your local machine. No audio is sent to cloud services.
Note: TTS quality and speed depend on your hardware. GPU acceleration significantly improves synthesis speed.