API

Pricing

Workflows

API

Pricing

Qwen3 ASR: Transcribe Audio

Upload audio and Qwen3's ASR engine returns the transcript, word-level timing for SRT subtitles, and an optional translation to English. Language auto-detected.

asr

audio

qwen

speech to text

subtitles

transcription

110

AUDIO CONVERSION - AUDIO OUTPUT_1777381208151.png

Generates in about 31 secs

floyoofficial

Nodes & Models

ComfyUI Official

Qwen3TTSEngineNode

LoadAudio

MarkdownNote

UnifiedASRTranscribeNode

PreviewAny

Qwen3 audio transcription with optional English translation.

Upload an audio file, pick transcribe or translate, and get back clean text. Keep the forced aligner on and you also get timing data you can use to build an SRT subtitle file.

Auto language detection means you don't need to know what's in the audio before running it.

How do you transcribe audio with Qwen3?

Load your audio file, leave the engine on its 0.6B default, and run it. Qwen3 detects the language and returns the text. For SRT subtitles, keep the forced aligner on and the workflow outputs word-level timing alongside the transcript. To get English instead of the source language, switch the task to translate.

Audio file Drop in any audio. MP3, WAV, the usual formats. Length isn't a hard ceiling because the workflow chunks longer files automatically.

Task: transcribe or translate Want the original words written out? Use transcribe. Want the audio rendered in English no matter the source language? Switch to translate. The translate target defaults to English but you can set a different target language if you need one.

Forced aligner (asr_use_forced_aligner) On by default. Leave it on if you need timing data for subtitles or word-level timestamps. Turn it off if you want raw text and the fastest possible run.

Language Auto by default. The engine figures it out. Lock it to a specific language if your audio is mixed and Qwen3 keeps switching tracks on you.

Model size (0.6B) The default. Small, fast, and accurate enough for most spoken audio. Handles podcasts, interviews, and meeting recordings without issue.

Chunk size and overlap Defaults: 30 second chunks, 2 second overlap. Long audio gets split into pieces and stitched back together. The 2 second overlap stops words from getting cut at chunk boundaries. Most people never touch these.

What is Qwen3 ASR good for?

Transcribing podcasts, interviews, meetings, and voice notes when you want both the text and the timing data in one pass. The ability to switch between transcribe and translate makes it useful for working across languages without bringing a second tool into your pipeline.

Solid for: podcast transcripts, video subtitles via SRT export with the forced aligner, interview transcription where you need timestamps for citation, and turning foreign-language audio into English text.

Less ideal for: noisy field recordings or heavily overlapping speakers. ASR models still struggle with both.

The translate task is a one-shot rendering of the audio into the target language. It isn't running a separate translation pass after transcription, so the output reads as direct translated speech rather than a strict word-for-word match.

FAQ

Does Qwen3 ASR support timestamps for SRT subtitles? Yes. Keep the asr_use_forced_aligner option on (it's the default) and the workflow returns timing data alongside the text. You can use that timing data to build an SRT file with start and end times for each word or segment.

What languages does Qwen3 ASR transcribe? Set language to Auto and the engine detects it. Qwen3 handles dozens of languages out of the box. If you have mixed-language audio and want it to stay in one, set the language manually instead of Auto.

Can Qwen3 transcribe and translate audio at the same time? The task setting picks one mode per run. Transcribe gives you the source language text. Translate renders the audio directly into your target language (English by default). Run it twice if you want both outputs.

How long can audio files be for Qwen3 ASR? Long ones. The workflow chunks audio into 30 second pieces with 2 second overlap and stitches the result back together. You don't need to split files yourself before running.

How to run Qwen3 ASR online? You can run Qwen3 ASR online through Floyo. No installation, no setup. Open the workflow in your browser, upload your audio, and hit run. Free to try.

Discover more workflows

You might like these too.

Whisper Speech-to-Text and SRT Subtitle Generator

floyoofficial

264

audio

speech to text

srt

STT

subtitles

transcription

whisper

Upload any audio file and Whisper transcribes it into text with word-level and segment-level SRT subtitle files. Auto language detection included.

Whisper Speech-to-Text and SRT Subtitle Generator

Upload any audio file and Whisper transcribes it into text with word-level and segment-level SRT subtitle files. Auto language detection included.

Qwen3 ASR via TTS Audio Suite for SRT Builder

floyoofficial

277

audio transcription

qwen

speech to text

SRT

subtitle generation

Transcribe any audio file to text and timed SRT subtitles using Qwen3's speech recognition engine. Upload your audio, get broadcast-ready captions back.

Qwen3 ASR via TTS Audio Suite for SRT Builder

Transcribe any audio file to text and timed SRT subtitles using Qwen3's speech recognition engine. Upload your audio, get broadcast-ready captions back.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

floyoofficial

14.6k

VFX

Video2Video

Video Production

Wan2.6

Wan 2.6 Reference to Video

floyoofficial

14.6k

API

gemini 3 pro

Image2Image

typography

Google just released Nano Banana Pro, and honestly, it's a pretty big step up from the original Nano Banana. The main thing? It can actually put legible text in images now. Like, real text that you can read, not the garbled nonsense most AI models spit out.

Nano Banana Pro: Generate & Edit Images

mdmz

11.0k

wan 2.2

wan22

wan 2.2 animate

wan 22 animate

wan animate

Wan 2.2 Animate Preprocess by Kijai (MDMZ Edition)