-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Add LTX 2.0 Video Pipelines #12915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LTX 2.0 Video Pipelines #12915
Conversation
LTX 2.0 Vocoder Implementation
LTX 2.0 Video VAE Implementation
…into ltx-2-transformer
sayakpaul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small comments.
tests/models/autoencoders/test_models_autoencoder_kl_ltx2_audio.py
Outdated
Show resolved
Hide resolved
| num_rope_elems = num_pos_dims * 2 | ||
|
|
||
| # 3. Create a 1D grid of frequencies for RoPE | ||
| freqs_dtype = torch.float64 if self.double_precision else torch.float32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit): we could keep the self.freqs_dtype inside the init to skip doing it multiple times.
| video_cross_attn_rotary_emb = self.cross_attn_rope(video_coords[:, 0:1, :], device=hidden_states.device) | ||
| audio_cross_attn_rotary_emb = self.cross_attn_audio_rope( | ||
| audio_coords[:, 0:1, :], device=audio_hidden_states.device | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit): would be nice to have a comment about the small indexing going on there.
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* Initial implementation of LTX 2.0 latent upsampling pipeline * Add new LTX 2.0 spatial latent upsampler logic * Add test script for LTX 2.0 latent upsampling * Add option to enable VAE tiling in upsampling test script * Get latent upsampler working with video latents * Fix typo in BlurDownsample * Add latent upsample pipeline docstring and example * Remove deprecated pipeline VAE slicing/tiling methods * make style and make quality * When returning latents, return unpacked and denormalized latents for T2V and I2V * Add model_cpu_offload_seq for latent upsampling pipeline --------- Co-authored-by: Daniel Gu <dgu8957@gmail.com>
yiyixuxu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
|
Merging as the CI failures are unrelated. |
In this example, an `image is not being passed to the pipeline. Should be: |
What does this PR do?
This PR adds pipelines for the LTX 2.0 video generation model (code, weights). LTX 2.0 is an audio-video foundation model that generates videos with synced audio; it supports generation tasks such as text-to-video (T2V), text-image-to-video (TI2V), and more.
An example usage script for I2V is as follows:
Note that LTX 2.0 video generation uses a lot of memory; it is necessary to use CPU offloading even for an A100 which has 80 GB VRAM (assuming no other memory optimizations other than
bf16inference are used).Here is an I2V sample from the above:
ltx2_i2v_sample.mp4
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@yiyixuxu
@sayakpaul
@ofirbb