Release Diffusers 0.36.0: Pipelines galore, new caching method, training scripts, and more 🎄 · huggingface/diffusers

The release features a number of new image and video pipelines, a new caching method, a new training script, new kernels - powered attention backends, and more. It is quite packed with a lot of new stuff, so make sure you read the release notes fully 🚀

New image pipelines

Flux2: Flux2 is the latest generation of image generation and editing model from Black Forest Labs. It’s capable of taking multiple input images as reference, making it versatile for different use cases.
Z-Image: Z-Image is a best-of-its-kind image generation model in the 6B param regime. Thanks to @JerryWu-code in #12703.
QwenImage Edit Plus: It’s an upgrade of QwenImage Edit and is capable of taking multiple input images as references. It can act as both a generation and an editing model. Thanks to @naykun for contributing in #12357.
Bria FIBO: FIBO is trained on structured JSON captions up to 1,000+ words and designed to understand and control different visual parameters such as lighting, composition, color, and camera settings, enabling precise and reproducible outputs. Thanks to @galbria for contributing this in #12545.
Kandinsky Image Lite: Kandinsky 5.0 Image Lite is a lightweight image generation model (6B parameters). Thanks to @leffff for contributing this in #12664.
ChronoEdit: ChronoEdit reframes image editing as a video generation task, using input and edited images as start/end frames to leverage pretrained video models with temporal consistency. A temporal reasoning stage introduces reasoning tokens to ensure physically plausible edits and visualize the editing trajectory. Thanks to @zhangjiewu for contributing this in #12593.

New video pipelines

Sana-Video: Sana-Video is a fast and efficient video generation model, equipped to handle long video sequences, thanks to its incorporation of linear attention. Thanks to @lawrence-cj for contributing this in #12634.
Kandinsky 5: Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger models and offers the best understanding of Russian concepts in the open-source ecosystem. Thanks to @leffff for contributing this in #12478.
Hunyuan 1.5: HunyuanVideo-1.5 is a lightweight yet powerful video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs.
Wan Animate: Wan-Animate is a state-of-the-art character animation and replacement video model based on Wan2.1. Given a reference character image and driving motion video, it can either animate the character with motion from the driving video, or replace the existing character in that video with that character.

New `kernels`-powered attention backends

The kernels library helps you save a lot of time by providing pre-built kernel interfaces for various environments and accelerators. This release features three new kernels-powered attention backends:

Flash Attention 3 (+ its varlen variant)
Flash Attention 2 (+ its varlen variant)
SAGE

This means if any of the above backend is supported by your development environment, you should be able to skip the manual process of building the corresponding kernels and just use:

# Make sure you have `kernels` installed: `pip install kernels`.
# You can choose `flash_hub` or `sage_hub`, too.
pipe.transformer.set_attention_backend("_flash_3_hub")

For more details, check out the documentation.

TaylorSeer cache

TaylorSeer is now supported in Diffusers, delivering upto 3x speedups with negligible-to-none quality compromise. Thanks to @toilaluan for contributing this in #12648.

New training script

Our Flux.2 integration features a LoRA fine-tuning script that you can check out here. We provide a number of optimizations to help make it run on consumer GPUs.

Misc

Reusing AttentionMixin: Making certain compatible models subclass from the AttentionMixin class helped us get rid of 2K LoC. Going forward, users can expect more such refactorings that will help make the library leaner and simpler. Check out the #12463.
Diffusers backend in SGLang: sgl-project/sglang#14112
We started the Diffusers MVP program to work with talented community members who will help us improve the library across multiple fronts. Check out the link for more information.

All commits

remove unneeded checkpoint imports. by @sayakpaul in #12488
[tests] fix clapconfig for text backbone in audioldm2 by @sayakpaul in #12490
ltx0.9.8 (without IC lora, autoregressive sampling) by @yiyixuxu in #12493
[docs] Attention checks by @stevhliu in #12486
[CI] Check links by @stevhliu in #12491
[ci] xfail more incorrect transformer imports. by @sayakpaul in #12455
[tests] introduce VAETesterMixin to consolidate tests for slicing and tiling by @sayakpaul in #12374
docs: cleanup of runway model by @EazyAl in #12503
Kandinsky 5 is finally in Diffusers! by @leffff in #12478
Remove Qwen Image Redundant RoPE Cache by @dg845 in #12452
Raise warning instead of error when imports are missing for custom code by @DN6 in #12513
Fix: Use incorrect temporary variable key when replacing adapter name… by @FeiXie8 in #12502
[docs] Organize toctree by modality by @stevhliu in #12514
styling issues. by @sayakpaul in #12522
Add Photon model and pipeline support by @DavidBert in #12456
purge HF_HUB_ENABLE_HF_TRANSFER; promote Xet by @Vaibhavs10 in #12497
Prx by @DavidBert in #12525
[core] AutoencoderMixin to abstract common methods by @sayakpaul in #12473
Kandinsky5 No cfg fix by @asomoza in #12527
Fix: Add _skip_keys for AutoencoderKLWan by @yiyixuxu in #12523
[CI] xfail the test_wuerstchen_prior test by @sayakpaul in #12530
[tests] Test attention backends by @sayakpaul in #12388
fix CI bug for kandinsky3_img2img case by @kaixuanliu in #12474
Fix MPS compatibility in get_1d_sincos_pos_embed_from_grid #12432 by @Aishwarya0811 in #12449
Handle deprecated transformer classes by @DN6 in #12517
fix constants.py to user upper() by @sayakpaul in #12479
HunyuanImage21 by @yiyixuxu in #12333
Loose the criteria tolerance appropriately for Intel XPU devices by @kaixuanliu in #12460
Deprecate Stable Cascade by @DN6 in #12537
[chore] Move guiders experimental warning by @sayakpaul in #12543
Fix Chroma attention padding order and update docs to use lodestones/Chroma1-HD by @josephrocca in #12508
Add AITER attention backend by @lauri9 in #12549
Fix small inconsistency in output dimension of "_get_t5_prompt_embeds" function in sd3 pipeline by @alirezafarashah in #12531
Kandinsky 5 10 sec (NABLA suport) by @leffff in #12520
Improve pos embed for Flux.1 inference on Ascend NPU by @gameofdimension in #12534
support latest few-step wan LoRA. by @sayakpaul in #12541
[Pipelines] Enable Wan VACE to run since single transformer by @DN6 in #12428
fix crash if tiling mode is enabled by @sywangyi in #12521
Fix typos in kandinsky5 docs by @Meatfucker in #12552
[ci] don't run sana layerwise casting tests in CI. by @sayakpaul in #12551
Bria fibo by @galbria in #12545
Avoiding graph break by changing the way we infer dtype in vae.decoder by @ppadjinTT in #12512
[Modular] Fix for custom block kwargs by @DN6 in #12561
[Modular] Allow custom blocks to be saved to local_dir by @DN6 in #12381
Fix Stable Diffusion 3.x pooled prompt embedding with multiple images by @friedrich in #12306
Fix custom code loading in Automodel by @DN6 in #12571
[modular] better warn message by @yiyixuxu in #12573
[tests] add tests for flux modular (t2i, i2i, kontext) by @sayakpaul in #12566
[modular]pass hub_kwargs to load_config by @yiyixuxu in #12577
ulysses enabling in native attention path by @sywangyi in #12563
Kandinsky 5.0 Docs fixes by @leffff in #12582
[docs] sort doc by @sayakpaul in #12586
[LoRA] add support for more Qwen LoRAs by @linoytsaban in #12581
[Modular] Allow ModularPipeline to load from revisions by @DN6 in #12592
Add optional precision-preserving preprocessing for examples/unconditional_image_generation/train_unconditional.py by @turian in #12596
[SANA-Video] Adding 5s pre-trained 480p SANA-Video inference by @lawrence-cj in #12584
Fix overflow and dtype handling in rgblike_to_depthmap (NumPy + PyTorch) by @MohammadSadeghSalehi in #12546
[Modular] Some clean up for Modular tests by @DN6 in #12579
feat: enable attention dispatch for huanyuan video by @DefTruth in #12591
fix the crash in Wan-AI/Wan2.2-TI2V-5B-Diffusers if CP is enabled by @sywangyi in #12562
[CI] Push test fix by @DN6 in #12617
add ChronoEdit by @zhangjiewu in #12593
[modular] wan! by @yiyixuxu in #12611
[CI] Fix typo in uv install by @DN6 in #12618
fix: correct import path for load_model_dict_into_meta in conversion scripts by @yashwantbezawada in #12616
Fix Context Parallel validation checks by @DN6 in #12446
[Modular] Clean up docs by @DN6 in #12604
Fix: update type hints for Tuple parameters across multiple files to support variable-length tuples by @cesaryuan in #12544
[CI] Remove unittest dependency from testing_utils.py by @DN6 in #12621
Fix rotary positional embedding dimension mismatch in Wan and SkyReels V2 transformers by @charchit7 in #12594
fix copies by @yiyixuxu in #12637
Add MLU Support. by @a120092009 in #12629
fix dispatch_attention_fn check by @yiyixuxu in #12636
[modular] add tests for qwen modular by @sayakpaul in #12585
ArXiv -> HF Papers by @qgallouedec in #12583
[docs] Update install instructions by @stevhliu in #12626
[modular] add a check by @yiyixuxu in #12628
Improve docstrings and type hints in scheduling_amused.py by @delmalih in #12623
[WIP]Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz) by @dg845 in #12526
adjust unit tests for test_save_load_float16 by @kaixuanliu in #12500
skip autoencoderdl layerwise casting memory by @sayakpaul in #12647
[utils] Update check_doc_toc by @stevhliu in #12642
[docs] AutoModel by @stevhliu in #12644
Improve docstrings and type hints in scheduling_ddim.py by @delmalih in #12622
Improve docstrings and type hints in scheduling_ddpm.py by @delmalih in #12651
[Modular] Add Custom Blocks guide to doc by @DN6 in #12339
Improve docstrings and type hints in scheduling_euler_discrete.py by @delmalih in #12654
Update Wan Animate Docs by @dg845 in #12658
Rope in float32 for mps or npu compatibility by @DavidBert in #12665
[PRX pipeline]: add 1024 resolution ratio bins by @DavidBert in #12670
SANA-Video Image to Video pipeline SanaImageToVideoPipeline support by @lawrence-cj in #12634
[CI] Make CI logs less verbose by @DN6 in #12674
Revert AutoencoderKLWan's dim_mult default value back to list by @dg845 in #12640
[CI] Temporarily pin transformers by @DN6 in #12677
[core] Refactor hub attn kernels by @sayakpaul in #12475
[CI] Fix indentation issue in workflow files by @DN6 in #12685
[CI] Fix failing Pipeline CPU tests by @DN6 in #12681
Improve docstrings and type hints in scheduling_pndm.py by @delmalih in #12676
Community Pipeline: FluxFillControlNetInpaintPipeline for FLUX Fill-Based Inpainting with ControlNet by @pratim4dasude in #12649
Improve docstrings and type hints in scheduling_lms_discrete.py by @delmalih in #12678
Add FluxLoraLoaderMixin to Fibo pipeline by @SwayStar123 in #12688
bugfix: fix chrono-edit context parallel by @DefTruth in #12660
[core] support sage attention + FA2 through kernels by @sayakpaul in #12439
[i8n-pt] Fix grammar and expand Portuguese documentation by @cdutr in #12598
Fix variable naming typos in community FluxControlNetFillInpaintPipeline by @sqhuang in #12701
fix typo in docs by @lawrence-cj in #12675
Add Support for Z-Image Series by @JerryWu-code in #12703
let's go Flux2 🚀 by @sayakpaul in #12711
Update script names in README for Flux2 training by @anvilarth in #12713
[lora]: Fix Flux2 LoRA NaN test by @sayakpaul in #12714
[docs] Correct flux2 links by @sayakpaul in #12716
[docs] put autopipeline after overview and hunyuanimage in images by @sayakpaul in #12548
Improve docstrings and type hints in scheduling_dpmsolver_multistep.py by @delmalih in #12710
Support unittest for Z-image ⚡️ by @JerryWu-code in #12715
[chore] remove torch.save from remnant code. by @sayakpaul in #12717
Enable regional compilation on z-image transformer model by @sayakpaul in #12736
Fix examples not loading LoRA adapter weights from checkpoint by @SurAyush in #12690
[Modular] Add single file support to Modular by @DN6 in #12383
fix type-check for z-image transformer by @DefTruth in #12739
Hunyuanvideo15 by @yiyixuxu in #12696
[Docs] Update Imagen Video paper link in schedulers by @delmalih in #12724
Improve docstrings and type hints in scheduling_heun_discrete.py by @delmalih in #12726
Improve docstrings and type hints in scheduling_euler_ancestral_discrete.py by @delmalih in #12766
fix FLUX.2 context parallel by @DefTruth in #12737
Rename BriaPipeline to BriaFiboPipeline in documentation by @galbria in #12758
Update bria_fibo.md with minor fixes by @sayakpaul in #12731
[feat]: implement "local" caption upsampling for Flux.2 by @sayakpaul in #12718
Add ZImage LoRA support and integrate into ZImagePipeline by @CalamitousFelicitousness in #12750
Add support for Ovis-Image by @DoctorKey in #12740
Fix TPU (torch_xla) compatibility Error about tensor repeat func along with empty dim. by @JerryWu-code in #12770
Fixes #12673. record_stream in group offloading is not working properly by @KimbingNg in #12721
[core] start varlen variants for attn backend kernels. by @sayakpaul in #12765
[core] reuse AttentionMixin for compatible classes by @sayakpaul in #12463
Deprecate upcast_vae in SDXL based pipelines by @DN6 in #12619
Kandinsky 5.0 Video Pro and Image Lite by @leffff in #12664
Fix: leaf_level offloading breaks after delete_adapters by @adi776borate in #12639
[tests] fix hunuyanvideo 1.5 offloading tests. by @sayakpaul in #12782
[Z-Image] various small changes, Z-Image transformer tests, etc. by @sayakpaul in #12741
Z-Image-Turbo from_single_file by @hlky in #12756
Update attention_backends.md to format kernels by @sayakpaul in #12757
Improve docstrings and type hints in scheduling_unipc_multistep.py by @delmalih in #12767
fix spatial compression ratio error for AutoEncoderKLWan doing tiled encode by @jerry2102 in #12753
[lora] support more ZImage LoRAs by @sayakpaul in #12790
PRX Set downscale_freq_shift to 0 for consistency with internal implementation by @DavidBert in #12791
Fix broken group offloading with block_level for models with standalone layers by @rycerzes in #12692
[Docs] Add Z-Image docs by @asomoza in #12775
move kandisnky docs. by @sayakpaul (direct commit on v0.36.0-release)
[docs] minor fixes to kandinsky docs by @sayakpaul in #12797
Improve docstrings and type hints in scheduling_deis_multistep.py by @delmalih in #12796
[Feat] TaylorSeer Cache by @toilaluan in #12648
Update the TensorRT-ModelOPT to Nvidia-ModelOPT by @jingyu-ml in #12793
add post init for safty checker by @jiqing-feng in #12794
[HunyuanVideo1.5] support step-distilled by @yiyixuxu in #12802
Add ZImageImg2ImgPipeline by @CalamitousFelicitousness in #12751
Release: v0.36.0-release by @sayakpaul (direct commit on v0.36.0-release)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@yiyixuxu
- ltx0.9.8 (without IC lora, autoregressive sampling) (#12493)
- Fix: Add _skip_keys for AutoencoderKLWan (#12523)
- HunyuanImage21 (#12333)
- [modular] better warn message (#12573)
- [modular]pass hub_kwargs to load_config (#12577)
- [modular] wan! (#12611)
- fix copies (#12637)
- fix dispatch_attention_fn check (#12636)
- [modular] add a check (#12628)
- Hunyuanvideo15 (#12696)
- [HunyuanVideo1.5] support step-distilled (#12802)
@leffff
- Kandinsky 5 is finally in Diffusers! (#12478)
- Kandinsky 5 10 sec (NABLA suport) (#12520)
- Kandinsky 5.0 Docs fixes (#12582)
- Kandinsky 5.0 Video Pro and Image Lite (#12664)
@dg845
- Remove Qwen Image Redundant RoPE Cache (#12452)
- [WIP]Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz) (#12526)
- Update Wan Animate Docs (#12658)
- Revert AutoencoderKLWan's dim_mult default value back to list (#12640)
@DN6
- Raise warning instead of error when imports are missing for custom code (#12513)
- Handle deprecated transformer classes (#12517)
- Deprecate Stable Cascade (#12537)
- [Pipelines] Enable Wan VACE to run since single transformer (#12428)
- [Modular] Fix for custom block kwargs (#12561)
- [Modular] Allow custom blocks to be saved to local_dir (#12381)
- Fix custom code loading in Automodel (#12571)
- [Modular] Allow ModularPipeline to load from revisions (#12592)
- [Modular] Some clean up for Modular tests (#12579)
- [CI] Push test fix (#12617)
- [CI] Fix typo in uv install (#12618)
- Fix Context Parallel validation checks (#12446)
- [Modular] Clean up docs (#12604)
- [CI] Remove unittest dependency from testing_utils.py (#12621)
- [Modular] Add Custom Blocks guide to doc (#12339)
- [CI] Make CI logs less verbose (#12674)
- [CI] Temporarily pin transformers (#12677)
- [CI] Fix indentation issue in workflow files (#12685)
- [CI] Fix failing Pipeline CPU tests (#12681)
- [Modular] Add single file support to Modular (#12383)
- Deprecate upcast_vae in SDXL based pipelines (#12619)
@DavidBert
- Add Photon model and pipeline support (#12456)
- Prx (#12525)
- Rope in float32 for mps or npu compatibility (#12665)
- [PRX pipeline]: add 1024 resolution ratio bins (#12670)
- PRX Set downscale_freq_shift to 0 for consistency with internal implementation (#12791)
@galbria
- Bria fibo (#12545)
- Rename BriaPipeline to BriaFiboPipeline in documentation (#12758)
@lawrence-cj
- [SANA-Video] Adding 5s pre-trained 480p SANA-Video inference (#12584)
- SANA-Video Image to Video pipeline SanaImageToVideoPipeline support (#12634)
- fix typo in docs (#12675)
@zhangjiewu
- add ChronoEdit (#12593)
@delmalih
- Improve docstrings and type hints in scheduling_amused.py (#12623)
- Improve docstrings and type hints in scheduling_ddim.py (#12622)
- Improve docstrings and type hints in scheduling_ddpm.py (#12651)
- Improve docstrings and type hints in scheduling_euler_discrete.py (#12654)
- Improve docstrings and type hints in scheduling_pndm.py (#12676)
- Improve docstrings and type hints in scheduling_lms_discrete.py (#12678)
- Improve docstrings and type hints in scheduling_dpmsolver_multistep.py (#12710)
- [Docs] Update Imagen Video paper link in schedulers (#12724)
- Improve docstrings and type hints in scheduling_heun_discrete.py (#12726)
- Improve docstrings and type hints in scheduling_euler_ancestral_discrete.py (#12766)
- Improve docstrings and type hints in scheduling_unipc_multistep.py (#12767)
- Improve docstrings and type hints in scheduling_deis_multistep.py (#12796)
@pratim4dasude
- Community Pipeline: FluxFillControlNetInpaintPipeline for FLUX Fill-Based Inpainting with ControlNet (#12649)
@JerryWu-code
- Add Support for Z-Image Series (#12703)
- Support unittest for Z-image ⚡️ (#12715)
- Fix TPU (torch_xla) compatibility Error about tensor repeat func along with empty dim. (#12770)
@CalamitousFelicitousness
- Add ZImage LoRA support and integrate into ZImagePipeline (#12750)
- Add ZImageImg2ImgPipeline (#12751)
@DoctorKey
- Add support for Ovis-Image (#12740)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Diffusers 0.36.0: Pipelines galore, new caching method, training scripts, and more 🎄

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

New image pipelines

New video pipelines

New `kernels`-powered attention backends

TaylorSeer cache

New training script

Misc

All commits

Significant community contributions

Contributors

Uh oh!

Diffusers 0.36.0: Pipelines galore, new caching method, training scripts, and more 🎄

New image pipelines

New video pipelines

New kernels-powered attention backends

TaylorSeer cache

New training script

Misc

All commits

Significant community contributions

Contributors

Uh oh!

New `kernels`-powered attention backends