openvinotoolkit · openvino-dev-samples · Dec 10, 2025 · Dec 10, 2025 · Dec 11, 2025 · Dec 11, 2025
diff --git a/.ci/ignore_treon_docker.txt b/.ci/ignore_treon_docker.txt
@@ -78,4 +78,5 @@ notebooks/qwen2.5-omni-chatbot/qwen2.5-omni-chatbot.ipynb
 notebooks/intern-video2-classiciation/intern-video2-classification.ipynb
 notebooks/flex.2-image-generation/flex.2-image-generation.ipynb
 notebooks/wan2.1-text-to-video/wan2.1-text-to-video.ipynb
-notebooks/ace-step-music-generation/ace-step-music-generation.ipynb
+notebooks/ace-step-music-generation/ace-step-music-generation.ipynb
+notebooks/wan2.2-text-image-to-video/wan2.2-text-image-to-video.ipynb
diff --git a/.ci/skipped_notebooks.yml b/.ci/skipped_notebooks.yml
@@ -566,3 +566,9 @@
         - macos-13
         - ubuntu-22.04
         - windows-2022
+- notebook: notebooks/wan2.2-text-image-to-video/wan2.2-text-image-to-video.ipynb
+  skips:
+    - os:
+        - macos-13
+        - ubuntu-22.04
+        - windows-2022
diff --git a/.ci/spellcheck/.pyspelling.wordlist.txt b/.ci/spellcheck/.pyspelling.wordlist.txt
@@ -162,6 +162,7 @@ CTC
 CTM
 CUDA
 CustomEncoderWav
+customizable
 CVF
 CVPR
 CNNs

diff --git a/notebooks/wan2.2-text-image-to-video/README.md b/notebooks/wan2.2-text-image-to-video/README.md
@@ -0,0 +1,31 @@
+# Text-Image to Video generation with Wan2.2 and OpenVINO
+
+Wan2.2 is a comprehensive and open suite of video foundation models that pushes the boundaries of video generation. Wan2.2 is a major upgrade to Wan2.1 which includes following features:
+
+- **Effective MoE Architecture**: Wan2.2 introduces a Mixture-of-Experts (MoE) architecture into video diffusion models. By separating the denoising process cross timesteps with specialized powerful expert models, this enlarges the overall model capacity while maintaining the same computational cost.
+
+- **Cinematic-level Aesthetics**: Wan2.2 incorporates meticulously curated aesthetic data, complete with detailed labels for lighting, composition, contrast, color tone, and more. This allows for more precise and controllable cinematic style generation, facilitating the creation of videos with customizable aesthetic preferences.
+
+- **Complex Motion Generation**: Compared to Wan2.1, Wan2.2 is trained on a significantly larger data, with +65.6% more images and +83.2% more videos. This expansion notably enhances the model's generalization across multiple dimensions such as motions, semantics, and aesthetics, achieving TOP performance among all open-sourced and closed-sourced models.
+
+- **Efficient High-Definition Hybrid TI2V**: Wan2.2 open-sources a 5B model built with our advanced Wan2.2-VAE that achieves a compression ratio of 16×16×4. This model supports both text-to-video and image-to-video generation at 720P resolution with 24fps and can also run on consumer-grade graphics cards like 4090. It is one of the fastest 720P@24fps models currently available, capable of serving both the industrial and academic sectors simultaneously.
+
+You can find more details about model in [model card](https://huggingface.co/Wan-AI/Wan2.2-TI2V-5B-Diffusers) and [original repository](https://github.com/Wan-Video/Wan2.2)
+
+<img width="962" height="1118" alt="image" src="https://github.com/user-attachments/assets/8bc4a9ca-9036-4efb-8738-4417db9f3164" />
+
+## Notebook contents
+This tutorial consists of the following steps:
+- Prerequisites
+- Convert and Optimize model
+- Run inference pipeline
+- Interactive inference
+
+In this tutorial we consider how to convert, optimize and run Wan2.2 model for Text-Image to Video generation using OpenVINO.
+
+## Installation instructions
+This is a self-contained example that relies solely on its own code.</br>
+We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.
+For details, please refer to [Installation Guide](../../README.md).
+
+<img referrerpolicy="no-referrer-when-downgrade" src="https://static.scarf.sh/a.png?x-pxid=5b5a4db0-7875-4bfb-bdbd-01698b5b1a77&file=notebooks/wan2.2-text-image-to-video/README.md" />
diff --git a/notebooks/wan2.2-text-image-to-video/gradio_helper.py b/notebooks/wan2.2-text-image-to-video/gradio_helper.py
@@ -0,0 +1,71 @@
+import gradio as gr
+import torch
+from diffusers.utils import export_to_video, load_image
+import numpy as np
+import requests
+from PIL import Image
+from io import BytesIO
+
+MAX_SEED = np.iinfo(np.int32).max
+
+# Use raw content URL for GitHub
+raw_url = "https://raw.githubusercontent.com/Wan-Video/Wan2.2/main/examples/i2v_input.JPG"
+response = requests.get(raw_url)
+img = Image.open(BytesIO(response.content))
+img.save("i2v_input.jpg")
+
+
+def make_demo(pipeline):
+    def generate_video(prompt, negative_prompt, image, guidance_scale=1.0, seed=42, progress=gr.Progress(track_tqdm=True)):
+        image = load_image(image)
+        output = pipeline(
+            image=image,
+            prompt=prompt,
+            negative_prompt=negative_prompt,
+            height=832,
+            width=480,
+            num_frames=20,
+            guidance_scale=guidance_scale,
+            num_inference_steps=4,
+            generator=torch.Generator().manual_seed(seed),
+        ).frames[0]
+
+        video_path = "output.mp4"
+        export_to_video(output, video_path, fps=10)
+        return video_path
+
+    iface = gr.Interface(
+        fn=generate_video,
+        inputs=[
+            gr.Textbox(label="Prompt", placeholder="Enter your video prompt here"),
+            gr.Textbox(label="Negative Prompt", placeholder="Optional negative prompt", value=""),
+            gr.Image(label="Input Image", type="pil"),
+            gr.Slider(
+                label="Guidance scale",
+                minimum=0.0,
+                maximum=20.0,
+                step=0.1,
+                value=1.0,
+            ),
+            gr.Slider(
+                label="Seed",
+                minimum=0,
+                maximum=MAX_SEED,
+                step=1,
+                value=42,
+            ),
+        ],
+        outputs=gr.Video(label="Generated Video"),
+        title="Wan2.2-TI2V-5B OpenVINO Video Generator",
+        flagging_mode="never",
+        examples=[
+            [
+                "Summer beach vacation style, a white cat wearing sunglasses sits on a surfboard. The fluffy-furred feline gazes directly at the camera with a relaxed expression. Blurred beach scenery forms the background featuring crystal-clear waters, distant green hills, and a blue sky dotted with white clouds. The cat assumes a naturally relaxed posture, as if savoring the sea breeze and warm sunlight. A close-up shot highlights the feline's intricate details and the refreshing atmosphere of the seaside.",
+                "",
+                "i2v_input.jpg",
+                5.0,
+                42,
+            ],
+        ],
+    )
+    return iface
-Original file line number
+Diff line change
@@ Expand Up / @@ -162,6 +162,7 @@ CTC @@
     CTM
     CUDA
     CustomEncoderWav
+    customizable
     CVF
     CVPR
     CNNs
@@ Expand Down @@