Skip to content

Conversation

Gary-ChenJL
Copy link
Contributor

Adds support for Wan2.1 control model

   control model support for Wan2.1
Copy link
Contributor

Summary of Changes

Hello @Gary-ChenJL, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly extends the fastvideo library by introducing a comprehensive video-to-video (V2V) generation pipeline. It specifically adds support for the Wan2.1 control model, allowing users to generate new videos conditioned on an existing input video. The changes encompass new configuration classes for the specialized CLIP encoder, a dedicated stage for encoding video inputs into latent space, and robust handling of video file loading and preprocessing, alongside an illustrative example script.

Highlights

  • Video-to-Video Pipeline Introduction: A new WanVideoToVideoPipeline is added, enabling video-to-video generation capabilities within the fastvideo library.
  • Wan2.1 Control Model Integration: Explicit support for the Wan2.1 control model is introduced, including specific CLIP vision encoder configurations (WAN2_1ControlCLIPVisionConfig) and pipeline configurations (WANV2VConfig).
  • Enhanced Video Input Handling: The system now supports direct video inputs via a new video_path parameter, with load_video updated to extract FPS and handle video preprocessing (resampling, resizing) for V2V tasks.
  • Dedicated Video VAE Encoding: A VideoVAEEncodingStage is implemented to efficiently encode input video frames into latent space, which is crucial for the V2V generation process.
  • Customizable CLIP Encoder Behavior: The CLIP attention mechanism is made more flexible, allowing conditional scaling and causal attention based on model configuration, which is utilized by the new control model.
  • New Example Script: An example script (basic_wan2_2_Fun.py) is provided to demonstrate the usage of the new video-to-video pipeline with the Wan2.1-Fun-1.3B-Control-Diffusers model.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a video-to-video (V2V) pipeline for the Wan2.1 control model. The changes include new pipeline configurations, stages for handling video inputs, and updates to support the V2V workflow. My review focuses on correctness, maintainability, and potential bugs. I've identified several critical issues, including a resource leak from un-deleted temporary files, typos that will lead to ImportError and AttributeError, and a confusing typo in a configuration parameter (is_casual vs is_causal). I've also suggested improvements for code clarity, style, and efficiency, such as simplifying a complex preprocessing function and improving type hints.

@SolitaryThinker SolitaryThinker added the go Trigger Buildkite CI label Oct 9, 2025
Comment on lines 185 to 226
pil_images = []
original_fps = None

try:
if video_path.endswith(".gif"):
gif = PIL.Image.open(video_path)
try:
# GIF FPS estimation
if hasattr(gif, 'info') and 'duration' in gif.info:
duration_ms = gif.info['duration']
if duration_ms > 0:
original_fps = 1000.0 / duration_ms

while True:
pil_images.append(gif.copy())
gif.seek(gif.tell() + 1)
except EOFError:
pass
else:
try:
imageio.plugins.ffmpeg.get_exe()
except AttributeError:
raise AttributeError(
"`Unable to find an ffmpeg installation on your machine. Please install via `pip install imageio-ffmpeg"
) from None

with imageio.get_reader(video_path) as reader:
try:
original_fps = reader.get_meta_data().get('fps', None)
except:
# Fallback: try to get from format-specific metadata
try:
original_fps = reader.get_meta_data().get('source_size', {}).get('fps', None)
except:
pass

for frame in reader:
pil_images.append(PIL.Image.fromarray(frame))
finally:
# Clean up temporary file if it was created
if was_tempfile_created and os.path.exists(video_path):
os.remove(video_path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the original code was not the best, but could you clean this up and remove all of these try except blocks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored

pil_images = convert_method(pil_images)

return pil_images
return pil_images, original_fps
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few other places using this util method. We should either update those places or use a flag to return fps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored, now backward compatible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go Trigger Buildkite CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants