Skip to content

Conversation

@dg845
Copy link
Collaborator

@dg845 dg845 commented Oct 9, 2025

What does this PR do?

This PR removes self.rope_cache in QwenEmbedRope, so that RoPE frequency caching is only done once through the functools.lru_cache decorator on _compute_video_freqs. It also changes the maxsize argument for lru_cache from None to 128 to prevent the cache from causing OOM errors.

Fixes #12401

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul
@naykun
@chenxiao111222

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@naykun could you also give it a look?

@dg845
Copy link
Collaborator Author

dg845 commented Oct 9, 2025

One thing to note is that the current PR implementation (in 52cf252) isn't quite equivalent to the implementation in main when not compiling since frame is part of the cache key for the lru_cache, but not part of the cache key for self.rope_cache.

@naykun do you think changing the cache key to include frame makes sense? If frame is the same for most calls to _compute_video_freqs, then I think the two implementations are approximately equivalent.

@sayakpaul
Copy link
Member

I think the use of a frame based notation is present to allow extension to videos easily. For images, this shouldn't matter at all because the idx is always a constant and since img_shapes is supplemented just once from the pipeline:

img_shapes: Optional[List[Tuple[int, int, int]]] = None,

(example)

do you think changing the cache key to include frame makes sense? If frame is the same for most calls to _compute_video_freqs, then I think the two implementations are approximately equivalent.

For the rope_cache, I am not sure how much of a readability compromise that would be, though.

@dg845
Copy link
Collaborator Author

dg845 commented Oct 17, 2025

Gentle ping @naykun to take a look :).

@naykun
Copy link
Contributor

naykun commented Oct 19, 2025

Hi everyone!
Apologies for missing the notifications earlier.

As @sayakpaul mentioned, we’re currently keeping the frame parameter in place to maintain future compatibility with video-related features. For text-to-image and image editing scenarios, the frame will always be 1, and the idx will vary depending on the input conditions or output images. The changes in this PR won’t affect that logic or impact caching efficiency.

That said, I’d like to double-check with @sayakpaul: using lru_cache won’t interfere with compilation in any edge cases, right?

@sayakpaul
Copy link
Member

That said, I’d like to double-check with @sayakpaul: using lru_cache won’t interfere with compilation in any edge cases, right?

At least for the current image-based stuff, it won't. We cannot fully compile the model without having graph-breaks, anyway, because:
https://github.com/huggingface/diffusers/blob/23ebbb4bc81a17ebea17cb7cb94f301199e49a7f/src/diffusers/models/transformers/transformer_qwenimage.py#L185C9-L186C37

But that's fine because we can still apply regional compilation and obtain speedups from the QwenImage family.

@dg845 dg845 merged commit 7853bfb into huggingface:main Oct 20, 2025
10 of 11 checks passed
@sayakpaul
Copy link
Member

@dg845 could you also run the compilation tests for QwenImage? Sorry, I forgot to mention it.

class QwenImageTransformerCompileTests(TorchCompileTesterMixin, unittest.TestCase):

@dg845
Copy link
Collaborator Author

dg845 commented Oct 20, 2025

Running

RUN_SLOW=1 RUN_COMPILE=1 pytest tests/models/transformers/test_models_transformer_qwenimage.py::QwenImageTransformerCompileTests

gives no errors (at least locally on DGX).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Redundant lru_cache in QwenEmbedRope

4 participants