API

Pricing

Workflows

API

Pricing

Change Emotion using IndexTTS2

Clone any voice and change the emotional delivery. Upload audio, type new text, adjust emotions, and get speech with different feelings using the same voice.

Audio to Audio

voice clone

326

Generates in about -- secs

floyoofficial

Nodes & Models

ComfyUI Official

LoadAudio

IndexTTS2EmotionFromText

IndexTTS2EmotionVector

IndexTTS2Simple

IndexTTS2SaveAudio

Voice cloning with precise emotion control using IndexTTS2.

Upload any audio sample to clone the voice, then type new text and choose the emotional delivery. Control emotions with sliders for happy, sad, angry, surprised, and more, or describe emotions in plain text. Get the same voice saying different words with completely different feelings.

Three ways to control emotion: manual sliders, text descriptions, or reference audio. No training needed for voice cloning.

How do you clone voices with emotion control using IndexTTS2?

Upload your voice sample, type the new text, set emotion levels with sliders or text descriptions, and IndexTTS2 generates speech that matches the voice but with your chosen emotional tone.

Source Audio Upload any clear speech sample, 5-30 seconds works best. The quality and clarity of this audio determines how well the voice gets cloned. Single speaker works better than multiple voices.

Text to Speak Type what you want the cloned voice to say. Keep it under 200 words for best quality. The model handles different languages but works best with the same language as your source audio.

Emotion Vectors Want precise control? Use the sliders. Set happy to 0.8 for cheerful delivery. Try sad at 1.0 for somber tone. Mix emotions like surprised (0.5) plus happy (0.3) for excitement. Keep total under 1.5 to avoid artifacts.

Emotion from Text Prefer natural descriptions? Type "slightly nervous but confident" or "angry and frustrated." The model converts your description into emotion vectors automatically. Works well for complex emotional states.

Emotion Audio Got reference audio with the perfect emotion? Upload it as emotion_audio. The model extracts the emotional tone and applies it to your voice clone.

What is voice cloning with emotion control good for?

Perfect for content creators, voice actors, and developers who need the same voice delivering content with different emotional tones without multiple recording sessions.

Content creators can generate character voices for storytelling with consistent vocal identity but varied emotions. Voice actors can prototype different emotional deliveries before recording. Game developers can create NPC dialogue with emotional range from single voice samples.

Audiobook narrators can adjust emotional delivery for different characters or scenes. Podcast creators can maintain vocal consistency while expressing different moods across episodes.

The catch: works best with clear, single-speaker audio samples. Background noise or multiple voices in the source audio reduce cloning quality. Not suitable for real-time conversation.

FAQ

What audio quality works best for voice cloning? Clear, single-speaker recordings with minimal background noise. 5-30 seconds of speech provides enough voice data for cloning.

Can I mix multiple emotions at once? Yes, use emotion vector sliders to blend emotions. Try happy (0.4) plus surprised (0.3) for excitement. Keep the total under 1.5.

Does this work with different languages? The model handles multiple languages but performs best when source audio and target text match languages.

What if the emotion sounds too strong? Lower the emotion values. Try 0.3-0.6 instead of 1.0 for more subtle emotional shifts.

How to run IndexTTS2 voice cloning online? You can run IndexTTS2 voice cloning online through Floyo. No installation, no setup. Open the workflow in your browser, upload your audio, and hit run.

Discover more workflows

You might like these too.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

floyoofficial

14.6k

VFX

Video2Video

Video Production

Wan2.6

Wan 2.6 Reference to Video

floyoofficial

14.6k

API

gemini 3 pro

Image2Image

typography

Google just released Nano Banana Pro, and honestly, it's a pretty big step up from the original Nano Banana. The main thing? It can actually put legible text in images now. Like, real text that you can read, not the garbled nonsense most AI models spit out.

Nano Banana Pro: Generate & Edit Images

mdmz

11.0k

wan 2.2

wan22

wan 2.2 animate

wan 22 animate

wan animate

Wan 2.2 Animate Preprocess by Kijai (MDMZ Edition)

goshnii

10.6k

Face swap

Flux

flux 2 klein

Flux 2 Klein face swap

Flux face swap

head swap

image 2 image

image editing

Instead of using outdated or unstable techniques, this workflow was designed to take full advantage of FLUX 2 KLEIN's editing capabilities—using a face image and a reference character image to produce clean, highly consistent results.

Flux 2 Klein 9b - Perfect Face swap

floyoofficial

9.5k

Image to Image

Text to Image

Z-Image Turbo

Workflow for text-to-image and image-to-image generation

Z-Image Turbo: Text or Image to Image

Workflow for text-to-image and image-to-image generation