Chatterbox Text to Speech
Text to speech workflow using Chatterbox
Chatterbox
TTS
2
253
Nodes & Models
WorkflowGraphics
LoadAudio
SaveAudio
PreviewAudio
ChatterboxTTS
ChatterboxVC
ChatterboxTTS
ChatterboxTTS
ChatterboxVC
ChatterBox TTS is an open‑source text‑to‑speech and voice‑cloning system that turns text into natural‑sounding speech, lets you clone voices from a few seconds of audio, and gives fine control over emotion and intensity.​
What it does
Converts text into high‑quality speech with controls for pitch, speed, and emotion (from neutral to highly dramatic).​
Performs zero‑shot voice cloning: upload a short reference clip (around 5 seconds) and it can mimic that voice without separate training.​
Supports multilingual output (around 22 languages) and can keep a cloned voice consistent across languages for dubbing/localization.​
Voice change and control
Works as a voice changer by cloning a target voice and then speaking any input text in that style, allowing accent, pacing, and emotional intensity adjustments.​
Provides explicit “exaggeration” or intensity parameters so you can dial emotion and expressiveness up or down programmatically.​
Includes watermarking/provenance options (PerTh) in some deployments so synthetic audio can be detected and tracked responsibly.​​
How it’s typically used
Via web UIs where you paste text, choose or clone a voice, adjust emotion/pacing, and download audio.​
As a self‑hosted or API‑based engine for agents, NPCs, audiobooks, podcasts, accessibility tools, or localized dubbing.
Read more

