floyo logo
Workflows
Pricing
floyo logo
Workflows
Pricing

VibeVoice: Single-Speaker Text to Speech

VibeVoice

932

Generates in about 28 secs

Nodes & Models

LoadTextFromFileNode
VibeVoiceSingleSpeakerNode
LoadAudio
PreviewAudio

HOW IT WORKS

Step 1. Add your text Type or paste the script you want spoken into the text box, or load it from a .txt file. Punctuation guides the pacing and pauses, so write it the way you want it read.

Step 2. Add a reference voice Upload a short, clean audio clip of the voice you want. The model listens to its tone and timbre and reads your text in that voice. Works great with: 10 to 30 second clips · one speaker · little to no background noise

Step 3. Hit run and download You get back an audio clip of your script spoken in the reference voice. Preview it in the workflow, then download. Ready for: Premiere · Audacity · CapCut · any editor

First time? Leave every setting as-is. The defaults (VibeVoice-Large · 20 steps · CFG 1.3 · fixed seed) are the right starting point for almost everyone.


RECOMMENDED SETTINGS

Quick-start guide. Find the goal that matches yours and copy the settings.

  • Standard narration (most people) Start here — VibeVoice-Large · 20 steps · CFG 1.3 · fixed seed. The right starting point for almost everyone.

  • The voice sounds off or robotic — Give it a cleaner reference. 10 to 30 seconds of clear, single-speaker audio with no music. The model copies what it hears, so the sample quality sets the ceiling.

  • Want a faster render — Switch to the smaller 1.5B model or lower the step count. Speed goes up, with a small drop in polish.

  • Delivery drifts or rushes — Lower the temperature below 0.95 for steadier, more predictable pacing. Raise it for a more expressive, varied read.

  • It strays from the script — Nudge the CFG scale above 1.3. It pushes the model to stick closer to your text and the reference voice.

  • Reproduce a take you liked — Keep the seed fixed. The same text, voice, and seed give you the same read again.

  • Unexpected background music — Use a dry reference clip. VibeVoice is content-aware and can carry over music or ambience that is present in the voice sample.

Text: Paste your script straight in or load a .txt file. Keep sentences clear and punctuated. Commas and periods become pauses, so they shape how the read flows.


USE CASES

🎙️ Podcasters & Audio Creators Turn a script into narration in one consistent voice. For a two-way conversation, reach for the multi-speaker workflow instead.

📚 Audiobooks & Long-form Narrate long text in a single voice. VibeVoice is built for long-form audio and holds the voice steady across long passages.

🎬 Video & E-learning Generate voiceover for tutorials, explainers, and course modules without booking a recording session.

♿ Accessibility & Drafts Read documents aloud, or rough-cut a voiceover to time out a video before you bring in talent.


WHAT WORKS BEST / WHAT TO AVOID

✅ Works great

  • A clean, single-speaker reference clip

  • English or Chinese text

  • Natural punctuation for pacing

  • Clear, well-recorded voice samples

⚠️ May produce softer results

  • Noisy or music-backed reference audio

  • Languages other than English or Chinese

  • Long run-on sentences with no punctuation

  • A reference clip with more than one voice


NEW TO COMFYUI?

Start with the free ComfyUI for Beginners Course on Floyo. Sixteen short videos take you from zero to running your own AI workflows. No setup headaches, no jargon, clear hands-on lessons. Watch the course, then run any workflow here in your browser.

👉 Watch the free ComfyUI for Beginners Course →


FAQ

What is VibeVoice? VibeVoice is an open-source text-to-speech model from Microsoft Research, released under the MIT license. It is built for expressive, long-form speech and uses a next-token diffusion design to produce natural prosody and steady voice consistency across long passages. This workflow runs the single-speaker setup.

Can VibeVoice clone a voice from a sample? Yes. VibeVoice reads a short reference clip and synthesizes your text in that voice, learning the tone and timbre from the audio you provide. Only use voices you have the right to use. Cloning a real person's voice without their explicit, recorded consent is prohibited by the model's license.

What is the difference between VibeVoice-Large and the 1.5B model? VibeVoice-Large is the higher-quality model and the default here, with the most natural and consistent output. The 1.5B model is smaller and faster, with a small trade in polish. Start with Large, and switch to 1.5B when you want a quicker render or are working on lighter hardware.

What languages does VibeVoice support? VibeVoice is trained on English and Chinese. Text in other languages is unsupported and can come out unintelligible, so keep your scripts to English or Chinese for reliable results.

Can I use VibeVoice audio commercially? Yes. VibeVoice is MIT licensed, which permits commercial use, and audio you generate on Floyo carries full commercial rights. The exception is voice cloning: you must have explicit consent to use a real person's voice, and impersonation is not permitted.

How to run VibeVoice online? You can run VibeVoice online through Floyo. No installation, no setup, no GPU to rent. Open the workflow in your browser, paste your text, add a reference voice, and hit run. Free to try.


WHY FLOYO?

Floyo is the only platform with team collaboration for ComfyUI in the browser. You run workflows with no install. You share run history, assets, and models across your team. You pay only when you generate. Floyo supports open-source and closed-source models.

A creator runs a narration and likes the voice. A teammate opens that exact run from shared history and keeps going. No file handoffs. No version confusion.

For studios and enterprise teams, Floyo adds private workspaces, pooled resources, and a team usage dashboard. Other ComfyUI cloud tools run for one person at a time. Floyo runs for the whole team, with transparent per-generation costs.


Ready to try it? Paste your text, add a reference voice, and run it. The settings are already set.

Launch Workflow, Free

Questions? Watch the free course or check the FAQ above.

Read more

N