567-labs · devin-ai-integration · Oct 16, 2025 · ellipsis-dev · Oct 16, 2025
diff --git a/docs/blog/posts/multimodal-gemini.md b/docs/blog/posts/multimodal-gemini.md
@@ -22,14 +22,20 @@ In this post, we'll explore how to use Google's Gemini model with Instructor to
 
 ## Setting Up the Environment
 
-First, let's set up our environment with the necessary libraries:
-
-```python
+First, install the required dependencies:
 
+```bash
+pip install "instructor[google-genai]" pydantic
 ```
 
 <!-- more -->
 
+Make sure you have your Google API key set as an environment variable:
+
+```bash
+export GOOGLE_API_KEY=your_api_key_here
+```
+
 ## Defining Our Data Models
 
 We'll use Pydantic to define our data models for tourist destinations and recommendations:
@@ -49,32 +55,37 @@ class Recommendations(BaseModel):
 
 ## Initializing the Gemini Client
 
-Next, we'll set up our Gemini client using Instructor:
+Next, set up the Gemini client using Instructor's unified provider interface:
 
 ```python
-client = instructor.from_gemini(
-    client=genai.GenerativeModel(
-        model_name="models/gemini-1.5-flash-latest",
-    ),
+import instructor
+
+client = instructor.from_provider(
+    "google/gemini-2.0-flash-exp",
+    async_client=False,
 )
 ```
 
 ## Uploading and Processing the Video
 
-To analyze a video, we first need to upload it:
+To analyze a video, first upload it using `VideoWithGenaiFile`:
 
 ```python
-file = genai.upload_file("./takayama.mp4")
+# Upload the video and wait for processing
+video = instructor.VideoWithGenaiFile.from_new_genai_file("./takayama.mp4")
 ```
 
-Then, we can process the video and extract recommendations:
+Then, process the video and extract recommendations:
 
 ```python
-resp = client.chat.completions.create(
+resp = client.messages.create(
     messages=[
         {
             "role": "user",
-            "content": ["What places do they recommend in this video?", file],
+            "content": [
+                "What places do they recommend in this video?",
+                video,
+            ],
         }
     ],
     response_model=Recommendations,
@@ -217,9 +228,67 @@ To address these limitations and expand the capabilities of our video analysis s
 
 By addressing these challenges and exploring these new directions, we can create a more comprehensive and nuanced video analysis system, opening up even more possibilities for applications in travel, education, and beyond.
 
+## Complete Working Example
+
+Here's a complete, runnable example that you can use with your own videos:
+
+```python
+import instructor
+from pydantic import BaseModel
+
+
+class TouristDestination(BaseModel):
+    name: str
+    description: str
+    location: str
+
+
+class Recommendations(BaseModel):
+    chain_of_thought: str
+    description: str
+    destinations: list[TouristDestination]
+
+
+def analyze_video(video_path: str):
+    # Initialize the client
+    client = instructor.from_provider(
+        "google/gemini-2.0-flash-exp",
+        async_client=False,
+    )
+
+    # Upload the video
+    video = instructor.VideoWithGenaiFile.from_new_genai_file(video_path)
+
+    # Extract structured data
+    recommendations = client.messages.create(
+        messages=[
+            {
+                "role": "user",
+                "content": [
+                    "What tourist destinations do they recommend in this video?",
+                    video,
+                ],
+            }
+        ],
+        response_model=Recommendations,
+    )
+
+    return recommendations
+
+
+# Usage
+results = analyze_video("./travel_video.mp4")
+for dest in results.destinations:
+    print(f"{dest.name} - {dest.location}")
+    print(f"  {dest.description}\n")
+```
+
+For a more detailed example with proper error handling and formatting, check out our [video extraction example](https://github.com/instructor-ai/instructor/tree/main/examples/video-extraction-gemini).
+
 ## Related Documentation
 - [Multimodal Concepts](../../concepts/multimodal.md) - Working with images, video, and audio
 - [Google Integration](../../integrations/google.md) - Complete Gemini setup guide
+- [Video Extraction Example](https://github.com/instructor-ai/instructor/tree/main/examples/video-extraction-gemini) - Complete working example with video
 
 ## See Also
 - [OpenAI Multimodal](openai-multimodal.md) - Compare multimodal approaches

diff --git a/examples/video-extraction-gemini/README.md b/examples/video-extraction-gemini/README.md
@@ -0,0 +1,85 @@
+# Video Analysis with Gemini 2.5 Pro
+
+This example demonstrates how to use Google's Gemini 2.5 Pro model with Instructor to analyze videos and extract structured information about tourist destinations.
+
+## Features
+
+- Upload videos to Gemini API using `VideoWithGenaiFile`
+- Extract structured recommendations using Pydantic models
+- Support for analyzing travel content and tourist destinations
+- Type-safe structured outputs
+
+## Requirements
+
+```bash
+pip install instructor google-genai pydantic
+```
+
+## Setup
+
+1. Get your Google API key from [Google AI Studio](https://makersuite.google.com/app/apikey)
+2. Set the environment variable:
+
+```bash
+export GOOGLE_API_KEY=your_api_key_here
+```
+
+## Usage
+
+Run the script with a path to your video file:
+
+```bash
+python run.py path/to/your/video.mp4
+```
+
+Example with a travel video:
+
+```bash
+python run.py takayama_travel.mp4
+```
+
+## How It Works
+
+The example:
+
+1. Uploads your video file to the Gemini API using `VideoWithGenaiFile.from_new_genai_file()`
+2. Sends a prompt asking for tourist destination recommendations
+3. Uses Instructor to parse the response into structured Pydantic models
+4. Returns a list of destinations with names, descriptions, and locations
+
+## Output Structure
+
+The analysis returns:
+
+- **chain_of_thought**: Detailed reasoning about the video content
+- **description**: Overall summary of the video
+- **destinations**: List of tourist destinations, each with:
+  - name: Name of the destination
+  - description: What makes it interesting
+  - location: Where it's located
+
+## Supported Video Formats
+
+Gemini supports the following video formats:
+- MP4
+- MPEG
+- MOV
+- AVI
+- FLV
+- MPG
+- WebM
+- WMV
+- 3GPP
+- QuickTime
+
+## Notes
+
+- Video files are uploaded to Google's servers for processing
+- Large videos may take longer to upload and process
+- The API automatically waits for the upload to complete before processing
+
+## Related Examples
+
+- [Multimodal Gemini Guide](../../docs/blog/posts/multimodal-gemini.md)
+- [Image Analysis with Gemini](../vision/)
+- [PDF Processing with Gemini](../../docs/blog/posts/chat-with-your-pdf-with-gemini.md)
diff --git a/examples/video-extraction-gemini/run.py b/examples/video-extraction-gemini/run.py
@@ -0,0 +1,110 @@
+"""
+Video Analysis with Gemini 2.5 Pro
+
+This example demonstrates how to use Gemini 2.5 Pro with Instructor to analyze videos
+and extract structured information. We'll process a video and extract tourist destinations
+mentioned in it.
+
+Requirements:
+    pip install instructor google-genai pydantic
+
+Usage:
+    export GOOGLE_API_KEY=your_api_key_here
+
+    python run.py path/to/your/video.mp4
+"""
+
+import instructor
+from pydantic import BaseModel
+import sys
+
+
+class TouristDestination(BaseModel):
+    """Represents a tourist destination mentioned in the video."""
+
+    name: str
+    description: str
+    location: str
+
+
+class VideoRecommendations(BaseModel):
+    """Structured output containing recommendations from the video."""
+
+    chain_of_thought: str
+    description: str
+    destinations: list[TouristDestination]
+
+
+def analyze_video(video_path: str):
+    """
+    Analyze a video and extract tourist destination recommendations.
+
+    Args:
+        video_path: Path to the video file to analyze
+
+    Returns:
+        VideoRecommendations object containing structured data
+    """
+    client = instructor.from_provider(
+        "google/gemini-2.0-flash-exp",
-        "google/gemini-2.0-flash-exp",
+        "google/gemini-2.5-pro",
-        "google/gemini-2.0-flash-exp",
+        "google/gemini-2.5-pro",
+        async_client=False,
+    )
+
+    print(f"Uploading video: {video_path}")
+    video = instructor.VideoWithGenaiFile.from_new_genai_file(video_path)
+    print(f"Video uploaded successfully: {video.source}")
+
+    print("Analyzing video content...")
+    recommendations = client.messages.create(
+        messages=[
+            {
+                "role": "user",
+                "content": [
+                    "What tourist destinations and places do they recommend in this video? "
+                    "Provide a detailed analysis including the name, description, and location of each place.",
+                    video,
+                ],
+            }
+        ],
+        response_model=VideoRecommendations,
+    )
+
+    return recommendations
+
+
+def main():
+    """Main function to run the video analysis."""
+    if len(sys.argv) < 2:
+        print("Usage: python run.py <path_to_video>")
+        print("Example: python run.py travel_video.mp4")
+        sys.exit(1)
+
+    video_path = sys.argv[1]
+
+    try:
+        results = analyze_video(video_path)
+
+        print("\n" + "=" * 80)
+        print("VIDEO ANALYSIS RESULTS")
+        print("=" * 80)
+
+        print(f"\nOverview: {results.description}")
+        print(f"\nAnalysis: {results.chain_of_thought}")
+
+        print(f"\nDestinations Found: {len(results.destinations)}")
+        print("-" * 80)
+
+        for i, dest in enumerate(results.destinations, 1):
+            print(f"\n{i}. {dest.name}")
+            print(f"   Location: {dest.location}")
+            print(f"   Description: {dest.description}")
+
+        print("\n" + "=" * 80)
+
+    except Exception as e:
+        print(f"Error analyzing video: {e}")
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/instructor/__init__.py b/instructor/__init__.py
@@ -1,7 +1,7 @@
 import importlib.util
 
 from .mode import Mode
-from .processing.multimodal import Image, Audio
+from .processing.multimodal import Image, Audio, Video, VideoWithGenaiFile
 
 from .dsl import (
     CitationMixin,
@@ -38,6 +38,8 @@
     "Instructor",
     "Image",
     "Audio",
+    "Video",
+    "VideoWithGenaiFile",
     "from_openai",
     "from_litellm",
     "from_provider",