The release features a number of new image and video pipelines, a new caching method, a new training script, new kernels - powered attention backends, and more. It is quite packed with a lot of new stuff, so make sure you read the release notes fully 🚀
New image pipelines
- Flux2: Flux2 is the latest generation of image generation and editing model from Black Forest Labs. It’s capable of taking multiple input images as reference, making it versatile for different use cases.
- Z-Image: Z-Image is a best-of-its-kind image generation model in the 6B param regime. Thanks to @JerryWu-code in #12703.
- QwenImage Edit Plus: It’s an upgrade of QwenImage Edit and is capable of taking multiple input images as references. It can act as both a generation and an editing model. Thanks to @naykun for contributing in #12357.
- Bria FIBO: FIBO is trained on structured JSON captions up to 1,000+ words and designed to understand and control different visual parameters such as lighting, composition, color, and camera settings, enabling precise and reproducible outputs. Thanks to @galbria for contributing this in #12545.
- Kandinsky Image Lite: Kandinsky 5.0 Image Lite is a lightweight image generation model (6B parameters). Thanks to @leffff for contributing this in #12664.
- ChronoEdit: ChronoEdit reframes image editing as a video generation task, using input and edited images as start/end frames to leverage pretrained video models with temporal consistency. A temporal reasoning stage introduces reasoning tokens to ensure physically plausible edits and visualize the editing trajectory. Thanks to @zhangjiewu for contributing this in #12593.
New video pipelines
- Sana-Video: Sana-Video is a fast and efficient video generation model, equipped to handle long video sequences, thanks to its incorporation of linear attention. Thanks to @lawrence-cj for contributing this in #12634.
- Kandinsky 5: Kandinsky 5.0 T2V Lite is a lightweight video generation model (2B parameters) that ranks #1 among open-source models in its class. It outperforms larger models and offers the best understanding of Russian concepts in the open-source ecosystem. Thanks to @leffff for contributing this in #12478.
- Hunyuan 1.5: HunyuanVideo-1.5 is a lightweight yet powerful video generation model that achieves state-of-the-art visual quality and motion coherence with only 8.3 billion parameters, enabling efficient inference on consumer-grade GPUs.
- Wan Animate: Wan-Animate is a state-of-the-art character animation and replacement video model based on Wan2.1. Given a reference character image and driving motion video, it can either animate the character with motion from the driving video, or replace the existing character in that video with that character.
New kernels-powered attention backends
The kernels library helps you save a lot of time by providing pre-built kernel interfaces for various environments and accelerators. This release features three new kernels-powered attention backends:
- Flash Attention 3 (+ its
varlenvariant) - Flash Attention 2 (+ its
varlenvariant) - SAGE
This means if any of the above backend is supported by your development environment, you should be able to skip the manual process of building the corresponding kernels and just use:
# Make sure you have `kernels` installed: `pip install kernels`.
# You can choose `flash_hub` or `sage_hub`, too.
pipe.transformer.set_attention_backend("_flash_3_hub")For more details, check out the documentation.
TaylorSeer cache
TaylorSeer is now supported in Diffusers, delivering upto 3x speedups with negligible-to-none quality compromise. Thanks to @toilaluan for contributing this in #12648.
New training script
Our Flux.2 integration features a LoRA fine-tuning script that you can check out here. We provide a number of optimizations to help make it run on consumer GPUs.
Misc
- Reusing
AttentionMixin: Making certain compatible models subclass from theAttentionMixinclass helped us get rid of 2K LoC. Going forward, users can expect more such refactorings that will help make the library leaner and simpler. Check out the #12463. - Diffusers backend in SGLang: sgl-project/sglang#14112
- We started the Diffusers MVP program to work with talented community members who will help us improve the library across multiple fronts. Check out the link for more information.
All commits
- remove unneeded checkpoint imports. by @sayakpaul in #12488
- [tests] fix clapconfig for text backbone in audioldm2 by @sayakpaul in #12490
- ltx0.9.8 (without IC lora, autoregressive sampling) by @yiyixuxu in #12493
- [docs] Attention checks by @stevhliu in #12486
- [CI] Check links by @stevhliu in #12491
- [ci] xfail more incorrect transformer imports. by @sayakpaul in #12455
- [tests] introduce
VAETesterMixinto consolidate tests for slicing and tiling by @sayakpaul in #12374 - docs: cleanup of runway model by @EazyAl in #12503
- Kandinsky 5 is finally in Diffusers! by @leffff in #12478
- Remove Qwen Image Redundant RoPE Cache by @dg845 in #12452
- Raise warning instead of error when imports are missing for custom code by @DN6 in #12513
- Fix: Use incorrect temporary variable key when replacing adapter name… by @FeiXie8 in #12502
- [docs] Organize toctree by modality by @stevhliu in #12514
- styling issues. by @sayakpaul in #12522
- Add Photon model and pipeline support by @DavidBert in #12456
- purge HF_HUB_ENABLE_HF_TRANSFER; promote Xet by @Vaibhavs10 in #12497
- Prx by @DavidBert in #12525
- [core]
AutoencoderMixinto abstract common methods by @sayakpaul in #12473 - Kandinsky5 No cfg fix by @asomoza in #12527
- Fix: Add _skip_keys for AutoencoderKLWan by @yiyixuxu in #12523
- [CI] xfail the test_wuerstchen_prior test by @sayakpaul in #12530
- [tests] Test attention backends by @sayakpaul in #12388
- fix CI bug for kandinsky3_img2img case by @kaixuanliu in #12474
- Fix MPS compatibility in get_1d_sincos_pos_embed_from_grid #12432 by @Aishwarya0811 in #12449
- Handle deprecated transformer classes by @DN6 in #12517
- fix constants.py to user
upper()by @sayakpaul in #12479 - HunyuanImage21 by @yiyixuxu in #12333
- Loose the criteria tolerance appropriately for Intel XPU devices by @kaixuanliu in #12460
- Deprecate Stable Cascade by @DN6 in #12537
- [chore] Move guiders experimental warning by @sayakpaul in #12543
- Fix Chroma attention padding order and update docs to use
lodestones/Chroma1-HDby @josephrocca in #12508 - Add AITER attention backend by @lauri9 in #12549
- Fix small inconsistency in output dimension of "_get_t5_prompt_embeds" function in sd3 pipeline by @alirezafarashah in #12531
- Kandinsky 5 10 sec (NABLA suport) by @leffff in #12520
- Improve pos embed for Flux.1 inference on Ascend NPU by @gameofdimension in #12534
- support latest few-step wan LoRA. by @sayakpaul in #12541
- [Pipelines] Enable Wan VACE to run since single transformer by @DN6 in #12428
- fix crash if tiling mode is enabled by @sywangyi in #12521
- Fix typos in kandinsky5 docs by @Meatfucker in #12552
- [ci] don't run sana layerwise casting tests in CI. by @sayakpaul in #12551
- Bria fibo by @galbria in #12545
- Avoiding graph break by changing the way we infer dtype in vae.decoder by @ppadjinTT in #12512
- [Modular] Fix for custom block kwargs by @DN6 in #12561
- [Modular] Allow custom blocks to be saved to
local_dirby @DN6 in #12381 - Fix Stable Diffusion 3.x pooled prompt embedding with multiple images by @friedrich in #12306
- Fix custom code loading in Automodel by @DN6 in #12571
- [modular] better warn message by @yiyixuxu in #12573
- [tests] add tests for flux modular (t2i, i2i, kontext) by @sayakpaul in #12566
- [modular]pass hub_kwargs to load_config by @yiyixuxu in #12577
- ulysses enabling in native attention path by @sywangyi in #12563
- Kandinsky 5.0 Docs fixes by @leffff in #12582
- [docs] sort doc by @sayakpaul in #12586
- [LoRA] add support for more Qwen LoRAs by @linoytsaban in #12581
- [Modular] Allow ModularPipeline to load from revisions by @DN6 in #12592
- Add optional precision-preserving preprocessing for examples/unconditional_image_generation/train_unconditional.py by @turian in #12596
- [SANA-Video] Adding 5s pre-trained 480p SANA-Video inference by @lawrence-cj in #12584
- Fix overflow and dtype handling in rgblike_to_depthmap (NumPy + PyTorch) by @MohammadSadeghSalehi in #12546
- [Modular] Some clean up for Modular tests by @DN6 in #12579
- feat: enable attention dispatch for huanyuan video by @DefTruth in #12591
- fix the crash in Wan-AI/Wan2.2-TI2V-5B-Diffusers if CP is enabled by @sywangyi in #12562
- [CI] Push test fix by @DN6 in #12617
- add ChronoEdit by @zhangjiewu in #12593
- [modular] wan! by @yiyixuxu in #12611
- [CI] Fix typo in uv install by @DN6 in #12618
- fix: correct import path for load_model_dict_into_meta in conversion scripts by @yashwantbezawada in #12616
- Fix Context Parallel validation checks by @DN6 in #12446
- [Modular] Clean up docs by @DN6 in #12604
- Fix: update type hints for Tuple parameters across multiple files to support variable-length tuples by @cesaryuan in #12544
- [CI] Remove unittest dependency from
testing_utils.pyby @DN6 in #12621 - Fix rotary positional embedding dimension mismatch in Wan and SkyReels V2 transformers by @charchit7 in #12594
- fix copies by @yiyixuxu in #12637
- Add MLU Support. by @a120092009 in #12629
- fix dispatch_attention_fn check by @yiyixuxu in #12636
- [modular] add tests for qwen modular by @sayakpaul in #12585
- ArXiv -> HF Papers by @qgallouedec in #12583
- [docs] Update install instructions by @stevhliu in #12626
- [modular] add a check by @yiyixuxu in #12628
- Improve docstrings and type hints in scheduling_amused.py by @delmalih in #12623
- [WIP]Add Wan2.2 Animate Pipeline (Continuation of #12442 by tolgacangoz) by @dg845 in #12526
- adjust unit tests for
test_save_load_float16by @kaixuanliu in #12500 - skip autoencoderdl layerwise casting memory by @sayakpaul in #12647
- [utils] Update check_doc_toc by @stevhliu in #12642
- [docs] AutoModel by @stevhliu in #12644
- Improve docstrings and type hints in scheduling_ddim.py by @delmalih in #12622
- Improve docstrings and type hints in scheduling_ddpm.py by @delmalih in #12651
- [Modular] Add Custom Blocks guide to doc by @DN6 in #12339
- Improve docstrings and type hints in scheduling_euler_discrete.py by @delmalih in #12654
- Update Wan Animate Docs by @dg845 in #12658
- Rope in float32 for mps or npu compatibility by @DavidBert in #12665
- [PRX pipeline]: add 1024 resolution ratio bins by @DavidBert in #12670
- SANA-Video Image to Video pipeline
SanaImageToVideoPipelinesupport by @lawrence-cj in #12634 - [CI] Make CI logs less verbose by @DN6 in #12674
- Revert
AutoencoderKLWan'sdim_multdefault value back to list by @dg845 in #12640 - [CI] Temporarily pin transformers by @DN6 in #12677
- [core] Refactor hub attn kernels by @sayakpaul in #12475
- [CI] Fix indentation issue in workflow files by @DN6 in #12685
- [CI] Fix failing Pipeline CPU tests by @DN6 in #12681
- Improve docstrings and type hints in scheduling_pndm.py by @delmalih in #12676
- Community Pipeline: FluxFillControlNetInpaintPipeline for FLUX Fill-Based Inpainting with ControlNet by @pratim4dasude in #12649
- Improve docstrings and type hints in scheduling_lms_discrete.py by @delmalih in #12678
- Add FluxLoraLoaderMixin to Fibo pipeline by @SwayStar123 in #12688
- bugfix: fix chrono-edit context parallel by @DefTruth in #12660
- [core] support sage attention + FA2 through
kernelsby @sayakpaul in #12439 - [i8n-pt] Fix grammar and expand Portuguese documentation by @cdutr in #12598
- Fix variable naming typos in community FluxControlNetFillInpaintPipeline by @sqhuang in #12701
- fix typo in docs by @lawrence-cj in #12675
- Add Support for Z-Image Series by @JerryWu-code in #12703
- let's go Flux2 🚀 by @sayakpaul in #12711
- Update script names in README for Flux2 training by @anvilarth in #12713
- [lora]: Fix Flux2 LoRA NaN test by @sayakpaul in #12714
- [docs] Correct flux2 links by @sayakpaul in #12716
- [docs] put autopipeline after overview and hunyuanimage in images by @sayakpaul in #12548
- Improve docstrings and type hints in scheduling_dpmsolver_multistep.py by @delmalih in #12710
- Support unittest for Z-image ⚡️ by @JerryWu-code in #12715
- [chore] remove torch.save from remnant code. by @sayakpaul in #12717
- Enable regional compilation on z-image transformer model by @sayakpaul in #12736
- Fix examples not loading LoRA adapter weights from checkpoint by @SurAyush in #12690
- [Modular] Add single file support to Modular by @DN6 in #12383
- fix type-check for z-image transformer by @DefTruth in #12739
- Hunyuanvideo15 by @yiyixuxu in #12696
- [Docs] Update Imagen Video paper link in schedulers by @delmalih in #12724
- Improve docstrings and type hints in scheduling_heun_discrete.py by @delmalih in #12726
- Improve docstrings and type hints in scheduling_euler_ancestral_discrete.py by @delmalih in #12766
- fix FLUX.2 context parallel by @DefTruth in #12737
- Rename BriaPipeline to BriaFiboPipeline in documentation by @galbria in #12758
- Update bria_fibo.md with minor fixes by @sayakpaul in #12731
- [feat]: implement "local" caption upsampling for Flux.2 by @sayakpaul in #12718
- Add ZImage LoRA support and integrate into ZImagePipeline by @CalamitousFelicitousness in #12750
- Add support for Ovis-Image by @DoctorKey in #12740
- Fix TPU (torch_xla) compatibility Error about tensor repeat func along with empty dim. by @JerryWu-code in #12770
- Fixes #12673.
record_streamin group offloading is not working properly by @KimbingNg in #12721 - [core] start varlen variants for attn backend kernels. by @sayakpaul in #12765
- [core] reuse
AttentionMixinfor compatible classes by @sayakpaul in #12463 - Deprecate
upcast_vaein SDXL based pipelines by @DN6 in #12619 - Kandinsky 5.0 Video Pro and Image Lite by @leffff in #12664
- Fix: leaf_level offloading breaks after delete_adapters by @adi776borate in #12639
- [tests] fix hunuyanvideo 1.5 offloading tests. by @sayakpaul in #12782
- [Z-Image] various small changes, Z-Image transformer tests, etc. by @sayakpaul in #12741
- Z-Image-Turbo
from_single_fileby @hlky in #12756 - Update attention_backends.md to format kernels by @sayakpaul in #12757
- Improve docstrings and type hints in scheduling_unipc_multistep.py by @delmalih in #12767
- fix spatial compression ratio error for AutoEncoderKLWan doing tiled encode by @jerry2102 in #12753
- [lora] support more ZImage LoRAs by @sayakpaul in #12790
- PRX Set downscale_freq_shift to 0 for consistency with internal implementation by @DavidBert in #12791
- Fix broken group offloading with block_level for models with standalone layers by @rycerzes in #12692
- [Docs] Add Z-Image docs by @asomoza in #12775
- move kandisnky docs. by @sayakpaul (direct commit on v0.36.0-release)
- [docs] minor fixes to kandinsky docs by @sayakpaul in #12797
- Improve docstrings and type hints in scheduling_deis_multistep.py by @delmalih in #12796
- [Feat] TaylorSeer Cache by @toilaluan in #12648
- Update the TensorRT-ModelOPT to Nvidia-ModelOPT by @jingyu-ml in #12793
- add post init for safty checker by @jiqing-feng in #12794
- [HunyuanVideo1.5] support step-distilled by @yiyixuxu in #12802
- Add ZImageImg2ImgPipeline by @CalamitousFelicitousness in #12751
- Release: v0.36.0-release by @sayakpaul (direct commit on v0.36.0-release)
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @yiyixuxu
- ltx0.9.8 (without IC lora, autoregressive sampling) (#12493)
- Fix: Add _skip_keys for AutoencoderKLWan (#12523)
- HunyuanImage21 (#12333)
- [modular] better warn message (#12573)
- [modular]pass hub_kwargs to load_config (#12577)
- [modular] wan! (#12611)
- fix copies (#12637)
- fix dispatch_attention_fn check (#12636)
- [modular] add a check (#12628)
- Hunyuanvideo15 (#12696)
- [HunyuanVideo1.5] support step-distilled (#12802)
- @leffff
- @dg845
- @DN6
- Raise warning instead of error when imports are missing for custom code (#12513)
- Handle deprecated transformer classes (#12517)
- Deprecate Stable Cascade (#12537)
- [Pipelines] Enable Wan VACE to run since single transformer (#12428)
- [Modular] Fix for custom block kwargs (#12561)
- [Modular] Allow custom blocks to be saved to
local_dir(#12381) - Fix custom code loading in Automodel (#12571)
- [Modular] Allow ModularPipeline to load from revisions (#12592)
- [Modular] Some clean up for Modular tests (#12579)
- [CI] Push test fix (#12617)
- [CI] Fix typo in uv install (#12618)
- Fix Context Parallel validation checks (#12446)
- [Modular] Clean up docs (#12604)
- [CI] Remove unittest dependency from
testing_utils.py(#12621) - [Modular] Add Custom Blocks guide to doc (#12339)
- [CI] Make CI logs less verbose (#12674)
- [CI] Temporarily pin transformers (#12677)
- [CI] Fix indentation issue in workflow files (#12685)
- [CI] Fix failing Pipeline CPU tests (#12681)
- [Modular] Add single file support to Modular (#12383)
- Deprecate
upcast_vaein SDXL based pipelines (#12619)
- @DavidBert
- @galbria
- @lawrence-cj
- @zhangjiewu
- add ChronoEdit (#12593)
- @delmalih
- Improve docstrings and type hints in scheduling_amused.py (#12623)
- Improve docstrings and type hints in scheduling_ddim.py (#12622)
- Improve docstrings and type hints in scheduling_ddpm.py (#12651)
- Improve docstrings and type hints in scheduling_euler_discrete.py (#12654)
- Improve docstrings and type hints in scheduling_pndm.py (#12676)
- Improve docstrings and type hints in scheduling_lms_discrete.py (#12678)
- Improve docstrings and type hints in scheduling_dpmsolver_multistep.py (#12710)
- [Docs] Update Imagen Video paper link in schedulers (#12724)
- Improve docstrings and type hints in scheduling_heun_discrete.py (#12726)
- Improve docstrings and type hints in scheduling_euler_ancestral_discrete.py (#12766)
- Improve docstrings and type hints in scheduling_unipc_multistep.py (#12767)
- Improve docstrings and type hints in scheduling_deis_multistep.py (#12796)
- @pratim4dasude
- Community Pipeline: FluxFillControlNetInpaintPipeline for FLUX Fill-Based Inpainting with ControlNet (#12649)
- @JerryWu-code
- @CalamitousFelicitousness
- @DoctorKey
- Add support for Ovis-Image (#12740)