Sulphur 2 - Text to Video
Generate video with synced audio from a text prompt using Sulphur 2, a stylized cinematic finetune built on Lightricks' LTX 2.3 video model.
animation
film production
image to video
lora
ltx 2
text to video
video generation
2
57
Nodes & Models
LTXVAudioVAELoader
ltx-2.3/ltx-2.3-22b-dev.safetensors
PrimitiveInt
LatentUpscaleModelLoader
ltx-2.3-spatial-upscaler-x2-1.0.safetensors
KSamplerSelect
PrimitiveBoolean
LoraLoaderModelOnly
ltx-2.3-22b-distilled-lora-384.safetensors
sulphur_lora_rank_768.safetensors
ltx-2.3-22b-distilled-lora-1.1_fro90_ceil72_condsafe.safetensors
VAELoader
taeltx2_3.safetensors
ManualSigmas
RandomNoise
CheckpointLoaderSimple
ltx-2.3/ltx-2.3-22b-dev.safetensors
LTXAVTextEncoderLoader
gemma_3_12B_it_fp4_mixed.safetensors
ltx-2.3/ltx-2.3-22b-dev.safetensors
PrimitiveStringMultiline
ComfyMathExpression
LTX2SamplingPreviewOverride
Reroute
ResizeImageMaskNode
CLIPTextEncode
LTXVEmptyLatentAudio
PathchSageAttentionKJ
EmptyLTXVLatentVideo
LTXVConditioning
LTXVPreprocess
LTXVImgToVideoInplace
LTXVConcatAVLatent
CFGGuider
LTXVScheduler
SamplerCustomAdvanced
LTXVSeparateAVLatent
LTXVCropGuides
LTXVLatentUpsampler
VAEDecodeTiled
LTXVAudioVAEDecode
VHS_VideoCombine
VHS_VideoCombine
VHS_VideoCombine
Sulphur 2 is a community finetune of LTX 2.3 that generates video and synced audio from a single text prompt.
Type a scene description, set the length, and hit run. Output is around 10 seconds of cinematic motion with a sharper, more expressive style than base LTX 2.3. Audio comes baked in from the same generation pass.
Sulphur 2 is open source and uncensored. Built by SulphurAI, weights live on Hugging Face.
How do you use Sulphur 2 for text-to-video?
Type your scene into the Prompt box, set resolution and length, then hit run. Sulphur 2 generates video and audio together in one pass. Defaults are tuned for short cinematic clips at 1366x768. The inputs worth touching first are the prompt, the T2V switch, and the length.
Prompt Write the scene like a cinematographer's shot list. Action, subject, camera move, environment, mood. Sulphur 2 inherits LTX 2.3's prompt adherence, so detail pays off. Keep it under 200 words and stay literal. Vague prompts produce vague motion.
Switch to Text to Video? On by default. Leave it on for pure text-to-video. Flip it off to feed an input image and turn this into image-to-video, where Sulphur 2 animates from your starting frame.
Length and Frame Rate Default is 241 frames at 24 fps, around 10 seconds. Want shorter clips for fast iteration? Drop length to 121 (5 seconds). Frame count must follow LTX's rule of (multiple of 8) + 1.
Width and Height Default 1366x768 widescreen. Need vertical for social? Try 768x1366 - LTX 2.3 was trained on portrait data natively, so 9:16 holds up. Both dimensions must be divisible by 32.
CFG 3.6 by default in the second sampling pass. Want stronger prompt adherence and crisper subjects? Push to 5.0 to 6.0. Want softer, more painterly motion? Drop to 2.5 to 3.0. The catch: high CFG can flatten motion.
Negative Prompt The default ("pc game, video game, cartoon, childish, ugly") steers output away from amateur or game-engine looks. Add anything else you want to avoid. Keep it short. Long negatives water down the positive prompt.
Seed Set to fixed when comparing how prompt changes affect the same scene. Set to randomize when exploring variations. Try a few seeds on the same prompt before tweaking anything else.
What is Sulphur 2 good for?
Stylized cinematic shorts where motion and mood matter more than precision. Music video sequences, atmospheric scenes, character-driven shots, and uncensored creative work the base LTX 2.3 won't touch. Sulphur 2 trades some of LTX 2.3's literal prompt following for stronger expression and bolder camera movement.
Community feedback on Sulphur 2 calls out more cinematic motion and stylized output compared to vanilla LTX 2.3. Useful when you want a clip that feels directed, not generated.
Reach for it when: you're making mood-driven content, you need uncensored output the base model won't generate, or LTX 2.3 is producing motion that feels too clean or too flat.
Reach for base LTX 2.3 instead when: you need maximum prompt fidelity for a product shot, the scene is documentary-realistic, or audio quality is the priority. Sulphur 2 keeps LTX 2.3's audio capability but doesn't improve on it.
FAQ
What is Sulphur 2? Sulphur 2 is an open-source finetune of Lightricks' LTX 2.3 video model, built by SulphurAI. It generates video and synced audio from text or image inputs, with a focus on stylized motion and uncensored output. Weights are at SulphurAI/Sulphur-2-base on Hugging Face.
How is Sulphur 2 different from LTX 2.3? Sulphur 2 is a finetune, not a separate architecture. It runs on the same LTX 2.3 base (22B parameters, DiT-based, audio-video in one pass) but produces more cinematic, expressive motion and is uncensored. LTX 2.3 alone is more literal and conservative.
What length and resolution can Sulphur 2 generate? Sulphur 2 inherits LTX 2.3's limits: clips up to around 20 seconds, native resolutions up to 4K, and 24 or 48 fps options. This workflow defaults to 1366x768 at 24fps for 241 frames. Width and height must be divisible by 32, frame count follows (multiple of 8) + 1.
Does Sulphur 2 generate audio with the video? Yes. LTX 2.3 generates synchronized video and audio in one pass, and Sulphur 2 keeps that capability. Ambience, action sounds, and rhythmic cues come out aligned with motion. Quality varies. Speech is harder than ambient sound.
How to run Sulphur 2 online? You can run Sulphur 2 online through Floyo. No installation, no setup. Open the workflow in your browser, type your prompt, and hit run. Free to try.
Sulphur 2 by SulphurAI. Built on LTX 2.3 by Lightricks.
Read more

