-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Remove Qwen Image Redundant RoPE Cache #12452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Qwen Image Redundant RoPE Cache #12452
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@naykun could you also give it a look?
|
One thing to note is that the current PR implementation (in 52cf252) isn't quite equivalent to the implementation in @naykun do you think changing the cache key to include |
|
I think the use of a
For the |
|
Gentle ping @naykun to take a look :). |
|
Hi everyone! As @sayakpaul mentioned, we’re currently keeping the frame parameter in place to maintain future compatibility with video-related features. For text-to-image and image editing scenarios, the frame will always be 1, and the idx will vary depending on the input conditions or output images. The changes in this PR won’t affect that logic or impact caching efficiency. That said, I’d like to double-check with @sayakpaul: using lru_cache won’t interfere with compilation in any edge cases, right? |
At least for the current image-based stuff, it won't. We cannot fully compile the model without having graph-breaks, anyway, because: But that's fine because we can still apply regional compilation and obtain speedups from the QwenImage family. |
|
@dg845 could you also run the compilation tests for QwenImage? Sorry, I forgot to mention it.
|
|
Running RUN_SLOW=1 RUN_COMPILE=1 pytest tests/models/transformers/test_models_transformer_qwenimage.py::QwenImageTransformerCompileTestsgives no errors (at least locally on DGX). |
What does this PR do?
This PR removes
self.rope_cacheinQwenEmbedRope, so that RoPE frequency caching is only done once through thefunctools.lru_cachedecorator on_compute_video_freqs. It also changes themaxsizeargument forlru_cachefromNoneto128to prevent the cache from causing OOM errors.Fixes #12401
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@sayakpaul
@naykun
@chenxiao111222