Whisper STT
Create a text from speech using Whisper STT
AILab
Audio to Text
Speech to Text
STT
Transcribe
1
213
Nodes & Models
WorkflowGraphics
LoadAudio
WhisperSTT
ShowText|pysssss
ShowText|pysssss
Whisper STT from AILab is a speechâtoâtext (automatic speech recognition) system built around OpenAIâs Whisper model that converts spoken audio into written text.
What it is
Generalâpurpose ASR model that handles multilingual speech recognition, speech translation to English, and language identification in one network.
In AILab/ComfyUI context, exposed as a Whisper STT node that takes audio input and outputs a text STRING for downstream nodes.â
Key features
Robust transcription on noisy, realâworld audio thanks to training on ~680,000 hours of diverse multilingual data.
Supports many languages plus optional direct translation to English from nonâEnglish speech.
Provides timestamps, language detection, and task control (transcribe vs. translate) through special tokens/options.
In Comfy/AILab nodes, accepts common audio formats and returns plain text ready for subtitles, prompting, or logging.â
Bestâfit use cases
Generating subtitles or transcripts for recorded voice, podcasts, lectures, and tutorials.
Voiceâdriven prompting or control in ComfyUI/AILab, where spoken commands are turned into text prompts or parameters.â
Multilingual meeting notes and interview transcription, including translation to English when needed.
Read more

