API

Pricing

Workflows

API

Pricing

Qwen 3.5 9B for Open Source LLM and VLM

Run Qwen 3.5 9B in ComfyUI as a text-only LLM or as a vision language model. Attach an image or a video, write your prompt, and get text back.

image to text

llm

open source

qwen

text generation

vlm

151

Generates in about 8 secs

floyoofficial

Nodes & Models

ComfyUI Official

CLIPLoader

qwen3.5_9b_bf16.safetensors

Ver Private

Comm Use

PrimitiveStringMultiline

LoadImage

TextGenerate

PreviewAny

Run Qwen 3.5 9B inside ComfyUI. Use it as a text-only LLM, or turn on the image or video input and use it as a vision language model.

Write a prompt and get text back. Attach an image and the model can describe what's in it, answer questions about it, or read text out of it. Attach a video and it can describe the action, summarize what happens across the clip, or answer questions about specific moments.

Output is plain text. Use it on its own, or feed the result into another step in your workflow.

Image or video (optional) Off by default. Want pure LLM mode? Leave it bypassed. Want to analyze a picture? Enable the image input and upload your file. Want to describe or summarize a clip? Enable the video input and drop in a short video. Same model, three modes.

Load the Qwen 3.5 9B model, write your prompt, and run. The image and video inputs are optional and bypassed by default. Skip them for code, writing, or any text-only task. Enable the image input for vision Q&A, or the video input to describe what's happening in a clip. Adjust temperature and sampling if you want a different output style.

Prompt Write what you want. "Write Python code that..." for code. "Describe this image in detail" with an image attached for captions. "What's happening in this video?" with a clip attached for video summaries. "What's in this picture?" for visual Q&A. The example shipped with the workflow asks for Java calculator code, so swap it out for whatever you need.

"What is Qwen 3.5 9B good for in ComfyUI?"

The vision mode is the more interesting half. Caption a folder of training images for a LoRA. Read text out of a screenshot. Describe a reference photo and use the result as the prompt for an image model later in the chain. Feed it a short clip and get a written description of the action, the scene, or specific moments. Useful for video tagging, draft captions, content review, or pulling prompts out of existing footage.

FAQ

What's the difference between LLM and VLM mode in Qwen 3.5 9B? LLM mode means the model only sees text. You write a prompt, it writes a response. VLM mode adds image or video input. The same Qwen 3.5 9B can see what's in a picture or clip and answer questions about it, describe it, or read text from it. Toggle by enabling the LoadImage or video node.

Can Qwen 3.5 9B describe a video? Yes. Enable the video input, upload a short clip, and prompt with something like "Describe what's happening in this video" or "Summarize the main action." The model samples frames and reads them as a sequence. Good for tagging clips, drafting captions, or pulling prompts out of reference footage.

Discover more workflows

You might like these too.

Vertical Video FX Inserter - Qwen + Wan 2.1 FunControl

floyoofficial

634

fx-integration

image-to-image

qwen

reference-image

upscaling

video-conditioning

wan21-funcontrol

Vertical Video FX Inserter - Qwen + Wan 2.1 FunControl

Vertical Video Scene Extension & Coverage Generator

floyoofficial

502

first-last frame

qwen

reference-image

wan2.2

Vertical Video Scene Extension & Coverage Generator

floyoofficial

25.2k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

21.9k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

floyoofficial

14.6k

VFX

Video2Video

Video Production

Wan2.6

Wan 2.6 Reference to Video

floyoofficial

14.6k

API

gemini 3 pro

Image2Image

typography

Google just released Nano Banana Pro, and honestly, it's a pretty big step up from the original Nano Banana. The main thing? It can actually put legible text in images now. Like, real text that you can read, not the garbled nonsense most AI models spit out.

Nano Banana Pro: Generate & Edit Images

mdmz

11.0k

wan 2.2

wan22

wan 2.2 animate

wan 22 animate

wan animate

Wan 2.2 Animate Preprocess by Kijai (MDMZ Edition)