API

Pricing

Workflows

API

Pricing

Whisper STT

Create a text from speech using Whisper STT

AILab

Audio to Text

Speech to Text

STT

Transcribe

290

AUDIO CONVERSION - AUDIO OUTPUT_1776912054234.png

Generates in about -- secs

floyoofficial

Nodes & Models

ComfyUI Official

WorkflowGraphics

LoadAudio

WhisperSTT

dspy_nodes

ShowText|pysssss

ComfyUI-Custom-Scripts

ShowText|pysssss

Whisper STT from AILab is a speech‑to‑text (automatic speech recognition) system built around OpenAI’s Whisper model that converts spoken audio into written text.

What it is

General‑purpose ASR model that handles multilingual speech recognition, speech translation to English, and language identification in one network.
In AILab/ComfyUI context, exposed as a Whisper STT node that takes audio input and outputs a text STRING for downstream nodes.

Key features

Robust transcription on noisy, real‑world audio thanks to training on ~680,000 hours of diverse multilingual data.
Supports many languages plus optional direct translation to English from non‑English speech.
Provides timestamps, language detection, and task control (transcribe vs. translate) through special tokens/options.
In Comfy/AILab nodes, accepts common audio formats and returns plain text ready for subtitles, prompting, or logging.

Best‑fit use cases

Generating subtitles or transcripts for recorded voice, podcasts, lectures, and tutorials.
Voice‑driven prompting or control in ComfyUI/AILab, where spoken commands are turned into text prompts or parameters.
Multilingual meeting notes and interview transcription, including translation to English when needed.

Discover more workflows

You might like these too.

floyoofficial

240

ASR

Audio to Text

qwen

STT

Upload an audio file and Qwen3 ASR 1.7B transcribes it to text. Supports 52 languages, auto-detects the language, and handles noisy audio. No setup needed.

Qwen3 ASR 1.7B - Speech to Text

Upload an audio file and Qwen3 ASR 1.7B transcribes it to text. Supports 52 languages, auto-detects the language, and handles noisy audio. No setup needed.

Whisper Speech-to-Text and SRT Subtitle Generator

floyoofficial

264

audio

speech to text

srt

STT

subtitles

transcription

whisper

Upload any audio file and Whisper transcribes it into text with word-level and segment-level SRT subtitle files. Auto language detection included.

Whisper Speech-to-Text and SRT Subtitle Generator

Upload any audio file and Whisper transcribes it into text with word-level and segment-level SRT subtitle files. Auto language detection included.

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

Fast LoRA Training for Flux via Floyo API

floyoofficial

4.4k

API

Flux

LoRa Training

FLUX is great at generating images, but locking in a specific aesthetic or character is easier with a  LoRA. Here's how to create your own.

Fast LoRA Training for Flux via Floyo API

FLUX is great at generating images, but locking in a specific aesthetic or character is easier with a  LoRA. Here's how to create your own.

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

floyoofficial

14.6k

VFX

Video2Video

Video Production

Wan2.6

Wan 2.6 Reference to Video

floyoofficial

14.6k

API

gemini 3 pro

Image2Image

typography

Google just released Nano Banana Pro, and honestly, it's a pretty big step up from the original Nano Banana. The main thing? It can actually put legible text in images now. Like, real text that you can read, not the garbled nonsense most AI models spit out.

Nano Banana Pro: Generate & Edit Images