Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .ci/ignore_treon_docker.txt
Original file line number Diff line number Diff line change
Expand Up @@ -78,4 +78,5 @@ notebooks/qwen2.5-omni-chatbot/qwen2.5-omni-chatbot.ipynb
notebooks/intern-video2-classiciation/intern-video2-classification.ipynb
notebooks/flex.2-image-generation/flex.2-image-generation.ipynb
notebooks/wan2.1-text-to-video/wan2.1-text-to-video.ipynb
notebooks/ace-step-music-generation/ace-step-music-generation.ipynb
notebooks/ace-step-music-generation/ace-step-music-generation.ipynb
notebooks/wan2.2-text-image-to-video/wan2.2-text-image-to-video.ipynb
6 changes: 6 additions & 0 deletions .ci/skipped_notebooks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -566,3 +566,9 @@
- macos-13
- ubuntu-22.04
- windows-2022
- notebook: notebooks/wan2.2-text-image-to-video/wan2.2-text-image-to-video.ipynb
skips:
- os:
- macos-13
- ubuntu-22.04
- windows-2022
1 change: 1 addition & 0 deletions .ci/spellcheck/.pyspelling.wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,7 @@ CTC
CTM
CUDA
CustomEncoderWav
customizable
CVF
CVPR
CNNs
Expand Down
31 changes: 31 additions & 0 deletions notebooks/wan2.2-text-image-to-video/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Text-Image to Video generation with Wan2.2 and OpenVINO

Wan2.2 is a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. Wan2.2 is a major upgrade to Wan2.1 which includes following features:

- **Effective MoE Architecture**: Wan2.2 introduces a Mixture-of-Experts (MoE) architecture into video diffusion models. By separating the denoising process cross timesteps with specialized powerful expert models, this enlarges the overall model capacity while maintaining the same computational cost.

- **Cinematic-level Aesthetics**: Wan2.2 incorporates meticulously curated aesthetic data, complete with detailed labels for lighting, composition, contrast, color tone, and more. This allows for more precise and controllable cinematic style generation, facilitating the creation of videos with customizable aesthetic preferences.

- **Complex Motion Generation**: Compared to Wan2.1, Wan2.2 is trained on a significantly larger data, with +65.6% more images and +83.2% more videos. This expansion notably enhances the model's generalization across multiple dimensions such as motions, semantics, and aesthetics, achieving TOP performance among all open-sourced and closed-sourced models.

- **Efficient High-Definition Hybrid TI2V**: Wan2.2 open-sources a 5B model built with our advanced Wan2.2-VAE that achieves a compression ratio of 16×16×4. This model supports both text-to-video and image-to-video generation at 720P resolution with 24fps and can also run on consumer-grade graphics cards like 4090. It is one of the fastest 720P@24fps models currently available, capable of serving both the industrial and academic sectors simultaneously.

You can find more details about model in [model card](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers) and [original repository](https://github.com/Wan-Video/Wan2.2)

<img width="962" height="1118" alt="image" src="https://github.com/user-attachments/assets/8bc4a9ca-9036-4efb-8738-4417db9f3164" />

## Notebook contents
This tutorial consists of the following steps:
- Prerequisites
- Convert and Optimize model
- Run inference pipeline
- Interactive inference

In this tutorial we consider how to convert, optimize and run Wan2.2 model for Text-Image to Video generation using OpenVINO.

## Installation instructions
This is a self-contained example that relies solely on its own code.</br>
We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
For details, please refer to [Installation Guide](../../README.md).

<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/wan2.2-text-image-to-video/README.md" />
71 changes: 71 additions & 0 deletions notebooks/wan2.2-text-image-to-video/gradio_helper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
import gradio as gr
import torch
from diffusers.utils import export_to_video, load_image
import numpy as np
import requests
from PIL import Image
from io import BytesIO

MAX_SEED = np.iinfo(np.int32).max

# Use raw content URL for GitHub
raw_url = "https://raw.githubusercontent.com/Wan-Video/Wan2.2/main/examples/i2v_input.JPG"
response = requests.get(raw_url)
img = Image.open(BytesIO(response.content))
img.save("i2v_input.jpg")


def make_demo(pipeline):
def generate_video(prompt, negative_prompt, image, guidance_scale=1.0, seed=42, progress=gr.Progress(track_tqdm=True)):
image = load_image(image)
output = pipeline(
image=image,
prompt=prompt,
negative_prompt=negative_prompt,
height=832,
width=480,
num_frames=20,
guidance_scale=guidance_scale,
num_inference_steps=4,
generator=torch.Generator().manual_seed(seed),
).frames[0]

video_path = "output.mp4"
export_to_video(output, video_path, fps=10)
return video_path

iface = gr.Interface(
fn=generate_video,
inputs=[
gr.Textbox(label="Prompt", placeholder="Enter your video prompt here"),
gr.Textbox(label="Negative Prompt", placeholder="Optional negative prompt", value=""),
gr.Image(label="Input Image", type="pil"),
gr.Slider(
label="Guidance scale",
minimum=0.0,
maximum=20.0,
step=0.1,
value=1.0,
),
gr.Slider(
label="Seed",
minimum=0,
maximum=MAX_SEED,
step=1,
value=42,
),
],
outputs=gr.Video(label="Generated Video"),
title="Wan2.2-TI2V-5B OpenVINO Video Generator",
flagging_mode="never",
examples=[
[
"Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside.",
"",
"i2v_input.jpg",
5.0,
42,
],
],
)
return iface
Loading