Workflows

Pricing

Capybara for Text to Image

Create unique images using Capybara

Capybara

Text2Image

328

Generates in about 1 min 40 secs

floyoofficial

Nodes & Models

ComfyUI Official

RandomNoise

KSamplerSelect

MarkdownNote

UNETLoader

capybara_v0.1.safetensors

Ver Private

Comm Use

VAELoader

hunyuanvideo15_vae_fp16.safetensors

Ver Private

Comm Use

DualCLIPLoader

qwen_2.5_vl_7b.safetensors

Ver Private

Comm Use

byt5_small_glyphxl_fp16.safetensors

Ver Private

Comm Use

WorkflowGraphics

BasicScheduler

ModelSamplingSD3

CLIPTextEncode

CFGGuider

SamplerCustomAdvanced

VAEDecode

AddLabel

PreviewImage

ComfyUI-Easy-Use

easy positive

Capybara is a unified visual generation model that can do text‑to‑image, image editing, and video tasks, but here you’d use it mainly for text‑to‑image to create high‑quality still images from prompts.

What it is

A 14B diffusion‑transformer model (built on HunyuanVideo 1.5) that supports T2I, T2V, I2I, and V2V in one architecture, with custom ComfyUI nodes.
For text‑to‑image, you give a natural‑language prompt and it generates 720p‑class images with strong realism and style flexibility.

Key features (text to image)

Handles complex scenes (multiple characters, detailed environments) while keeping good global composition.
Supports instruction‑like prompts (“cinematic close‑up,” “anime style,” “studio product shot”) thanks to its unified semantic/vision transformer design.
Recommended settings around 720p, ~50 steps for best quality, with the option to reduce steps using acceleration LoRAs for faster renders.
Tight ComfyUI integration via official templates like “Capybara: Text to Image,” so you can drop it into existing node graphs easily.

Best use cases

Cinematic keyframes and concept art from detailed text briefs (characters, lighting, camera language).
Stylized or realistic illustrations for thumbnails, posters, and social content when you don’t need separate models for video.
Unified pipelines where you might later extend a still image into motion (I2V/T2V) using the same Capybara model family.

Discover more workflows

You might like these too.

Z-Image Turbo: Fast Image Generation in Seconds

floyoofficial

20.8k

Marketing

Photography

Production

Text2Image

Z-Image Turbo

Fast Image Generation in Seconds

Z-Image Turbo: Fast Image Generation in Seconds

Fast Image Generation in Seconds

floyoofficial

4.3k

concept art

Fine-tuning

Text2Image

Z-Image

Z-image-base

Create sunning images using z-image base model (non distlled).

Z-Image Base: High-Detail Text to Image

Create sunning images using z-image base model (non distlled).

floyoofficial

24.5k

AiVideo

API

image to video

video generation

wan 2.5

Wan 2.5: Image to Video with Audio

LTX 2.3 Image-to-Video and Text-to-Video (Combo)

luminousinitiative

2.6k

Image to Video

LTX2.3

Text to Video

Create both from Image-to-Video and Text-to-Video using LTX 2.3

LTX 2.3 Image-to-Video and Text-to-Video (Combo)

Create both from Image-to-Video and Text-to-Video using LTX 2.3

floyoofficial

14.0k

API

gemini 3 pro

Image2Image

typography

Google just released Nano Banana Pro, and honestly, it's a pretty big step up from the original Nano Banana. The main thing? It can actually put legible text in images now. Like, real text that you can read, not the garbled nonsense most AI models spit out.

Nano Banana Pro: Generate & Edit Images