Create with Alibaba Happy Horse model now! Try here 👉

Pricing

Create with Alibaba Happy Horse model now! Try here 👉

COMMUNITY PAGE

Run Longcat on Floyo

Home / Model / LongCat on Floyo

AI IMAGE GENERATION & EDITING

Run LongCat on Floyo

Meituan's 6B parameter bilingual image model with industry-leading Chinese text rendering, photorealistic output, and instruction-based editing. Outperforms models 3-4x its size. Open source.

Run Meituan's LongCat Image through ComfyUI in your browser. No API key, no installs, no local GPU.

Parameters

Text Rendering

Chinese + English (SOTA)

Modes

Generate + Edit

Benchmark

#2 open-source (T2I-CoreBench)

Try LongCat Now →

Browse All Models

No installation. Runs in browser. Updated April 2026.

What to get?

LongCat Image is Meituan's 6B parameter open-source bilingual (Chinese-English) foundation model for image generation and editing. It uses a hybrid MM-DiT and Single-DiT architecture with a Qwen2.5-VL-7B text/vision encoder. Ranks #2 among all open-source models on T2I-CoreBench (surpassed only by the 32B FLUX2.dev). Industry-leading Chinese text rendering with superior accuracy on common and rare characters. Photorealistic output that rivals 20B+ parameter competitors. Generation and editing share the same architecture. Includes a 10x faster Edit-Turbo distilled variant. Available as ComfyUI nodes on Floyo.

LONGCAT WORKFLOWS ON FLOYO

LongCat for Text to Image

LongCat Image Edit - Instruction Image Editing

What is LongCat?

LongCat Image is a 6B parameter open-source text-to-image and image editing model from Meituan, one of China's largest technology companies. Released December 5, 2025, with the technical report published December 8, 2025. It is designed as a bilingual (Chinese-English) foundation model that solves three problems most open-source models struggle with: accurate multilingual text rendering, photorealism at small parameter counts, and unified generation and editing in one architecture.

With only 6B parameters, LongCat outperforms models 2-4x its size. On T2I-CoreBench, it ranks #2 among all open-source models, surpassed only by the 32B-parameter FLUX2.dev. It beats Qwen-Image-20B and HunyuanImage-3.0 (80B parameters) on text rendering and photorealism benchmarks. This efficiency comes from the hybrid MM-DiT architecture and a training pipeline with three progressive stages plus RLHF alignment.

Chinese text rendering is LongCat's strongest differentiator. Most image models garble non-Latin scripts. LongCat renders common Chinese characters with high accuracy and achieves industry-leading coverage of the Chinese dictionary, including rare and complex characters. English text rendering is also strong. The Qwen2.5-VL-7B encoder provides deep understanding of both languages.

The editing variant (LongCat-Image-Edit) uses the same architecture for instruction-based editing. Describe what you want to change in natural language, and the model applies the edit while preserving composition and lighting. The Edit-Turbo variant distills this to 10x speed. Editing consistency across multiple rounds is a specific design goal.

On Floyo, LongCat runs through native ComfyUI nodes on H100 NVL GPUs. Two workflows cover text-to-image generation and instruction-based image editing. No model downloads, no local setup.

What can you create with LongCat?

LongCat covers text-to-image generation, instruction-based image editing, bilingual poster and banner design, product photography with embedded text, UI mockups, marketing assets with Chinese and English copy, and multi-round iterative editing with consistent lighting and textures. The model is designed for production use where text accuracy and photorealism both matter.

Capability	What It Does	Use Case
Chinese Text Rendering	Industry-leading accuracy for common and rare Chinese characters. Stable rendering of complex typography, signs, and calligraphy.	Chinese marketing, bilingual posters, signage, menus
Photorealistic Generation	Generates images with believable lighting, depth, and textures that rival 20B+ parameter models despite being only 6B parameters.	Product photography, editorial images, hero images
Instruction Editing	Describe edits in natural language. The model applies them while preserving composition, lighting, and texture consistency across rounds.	Client revisions, iterative design, post-production
Bilingual Prompting	Write prompts in Chinese, English, or mixed language. The Qwen2.5-VL-7B encoder understands both natively.	Multilingual teams, localized content, cross-market assets
Poster and Banner Design	Generate production-ready posters with embedded text. Wrap in-image copy in double quotes for best results.	Ad creatives, event banners, social graphics, e-commerce
Pipeline Integration	Chain with video models in ComfyUI. Generate with LongCat, animate with Wan 2.7, add voiceover with Fish Audio S2. Or use LongCat Edit to modify outputs from other image models.	Multi-model workflows, end-to-end production

What are LongCat's key features?

LongCat's feature set targets a specific gap: most open-source image models produce nice pictures but fail at text rendering, especially non-Latin scripts. LongCat was designed from the ground up for bilingual text accuracy alongside photorealism. The unified generation-and-editing architecture means you don't need separate models for creation and revision.

Industry-Leading Chinese Text Rendering

LongCat renders common Chinese characters with superior accuracy and stability compared to all other open-source models. Its dictionary coverage extends to rare and complex characters that most models cannot handle at all. This comes from the Qwen2.5-VL-7B encoder, which understands Chinese text at a semantic level, plus a training pipeline specifically designed to optimize text rendering quality.

6B Parameter Efficiency

At 6B parameters, LongCat is significantly smaller than competitors like Qwen-Image (20B) and HunyuanImage 3.0 (80B MoE). It still outperforms them on text rendering and photorealism benchmarks. This means lower VRAM, faster inference, and reduced deployment costs. On T2I-CoreBench, it ranks #2 among all open-source models, behind only the 32B FLUX2.dev.

Unified Generation and Editing

The same hybrid MM-DiT architecture powers both text-to-image generation and instruction-based editing. The Qwen2.5-VL-7B encoder provides a unified conditional space that handles both tasks. Generate an image, then edit it with natural language instructions. Lighting, textures, and composition stay consistent across edit rounds.

Edit-Turbo (10x Speed)

Released February 3, 2026, Edit-Turbo is the distilled version of LongCat-Image-Edit. It achieves a 10x speedup while maintaining the editing quality of the full model. For iterative workflows where you need fast turnaround on client revisions, this is the variant to use.

RLHF-Aligned Quality

Training uses curated reward models during the RL phase to align outputs with human aesthetic preferences. This is on top of the three-stage training pipeline (pre-training on diverse data, mid-training on higher-quality data, SFT on the highest-quality examples). The result is photorealistic output with strong instruction adherence.

Most Comprehensive Open-Source Ecosystem

Meituan releases not just model weights but the entire training pipeline: pre-training, mid-training, post-training checkpoints, and the full toolchain. This is the most complete open-source release in the image generation space. Researchers can reproduce results, modify training, and extend the model with full visibility into how it was built.

How does LongCat compare to other image models?

LongCat ranks #2 on T2I-CoreBench with 6B parameters, behind only the 32B FLUX2.dev. It leads all open-source models on Chinese text rendering. Z-Image Turbo leads on inference speed (8 steps). Qwen-Image leads on raw parameter scale (20B). FLUX Kontext leads on the editing ecosystem. LongCat's edge: best Chinese text rendering, strongest efficiency-to-quality ratio, and the most complete open-source release.

Model	Parameters	Chinese Text	Edit Mode	T2I-CoreBench
LongCat	6B	SOTA (common + rare)	Yes (+ Edit-Turbo)	#2 open-source
FLUX2.dev	32B	Moderate	Via Kontext	#1 open-source
Z-Image Turbo	6B	Good (EN + CN)	No	High
Qwen-Image	20B	Strong	Via Qwen-Edit	High

Source: LongCat-Image Technical Report (arXiv:2512.07584), T2I-CoreBench results (December 2025), Meituan GitHub, and third-party benchmark comparisons as of April 2026.

How does LongCat work?

LongCat uses a hybrid MM-DiT and Single-DiT architecture, consistent with FLUX, paired with the Qwen2.5-VL-7B vision-language model as its text encoder. The VLM encoder provides a unified conditional space that handles both generation (from text) and editing (from text + image) in the same architecture. This dual-use design is what lets one model serve both tasks.

The training pipeline has four stages. Pre-training on a large, diverse dataset establishes the model's foundational understanding. Mid-training narrows to higher-quality data. Supervised fine-tuning (SFT) focuses on the highest-quality examples. Finally, RLHF alignment uses curated reward models to push outputs toward human aesthetic preferences, photorealism, and text accuracy.

Chinese text rendering quality comes from two sources. The Qwen2.5-VL-7B encoder understands Chinese characters at a semantic level, not as pixel patterns. And the training data includes heavily curated examples of Chinese typography with progressively tighter quality filters across the three training stages. The combination produces consistent character rendering that other models achieve only inconsistently.

On Floyo, LongCat runs through native ComfyUI nodes on H100 NVL GPUs. The text-to-image workflow generates images from prompts. The editing workflow takes a source image plus an instruction and applies the modification. Both share the same model weights. You can chain LongCat with other ComfyUI nodes in the same workflow for complete production pipelines.

Frequently Asked Questions

Common questions about running LongCat on Floyo.

Is LongCat free to use on Floyo?

You can start with Floyo's free pricing plan. To continue using the service beyond the free tier, upgrade your Floyo pricing plan. LongCat is open-source, so there is no additional API cost beyond your Floyo plan.

How do I run LongCat without installing anything?

Open Floyo in your browser, search "LongCat" in the template library, and pick the text-to-image or image editing workflow. Click Run, write your prompt, and generate. Floyo handles the GPU, ComfyUI environment, and model weights. No local install, no Python setup.

Who made LongCat?

Meituan's LongCat Team. Meituan is one of China's largest technology companies. LongCat-Image weights were released December 5, 2025. The technical report was published December 8, 2025 on arXiv (2512.07584). Edit-Turbo (10x faster editing) was released February 3, 2026. Full Diffusers support was added December 16, 2025.

How does LongCat compare to Z-Image Turbo?

Both are 6B parameter models with strong efficiency. Z-Image Turbo leads on speed (8-step inference, sub-second on enterprise GPUs). LongCat leads on Chinese text rendering accuracy and has a dedicated editing variant. Z-Image Turbo is Apache 2.0 licensed. Both are available on Floyo and can be used in the same pipeline for different tasks.

Can LongCat render Chinese text in images?

Yes. This is LongCat's strongest feature. It achieves industry-leading accuracy and dictionary coverage for Chinese characters, including rare and complex ones. Wrap in-image text in double quotes for best results (e.g., a neon sign that reads "开放"). English text rendering is also strong.

Can I combine LongCat with other AI models in one workflow?

Yes. Floyo runs ComfyUI, which lets you chain multiple models. Generate with LongCat, animate with Wan 2.7 or Kling Omni, add voiceover with Fish Audio S2 or Chatterbox. Or generate with another image model and refine with LongCat Edit. All in one pipeline.

Can I use LongCat output commercially?

LongCat weights and training code are open source. Check the specific license on the HuggingFace model card for commercial usage terms. The model was built on FLUX-style architecture and uses the Qwen2.5-VL encoder, so downstream license obligations may apply. Review the license before commercial deployment.

What is Edit-Turbo?

Edit-Turbo is the distilled version of LongCat-Image-Edit, released February 3, 2026. It achieves a 10x speedup while maintaining the editing quality of the full model. For iterative workflows where fast turnaround matters, Edit-Turbo is the recommended variant.

Try LongCat on Floyo

6B parameter bilingual image generation and editing with industry-leading Chinese text rendering and photorealistic output. Run it in your browser.

Try LongCat Now →

Browse All Models