-
-
Notifications
You must be signed in to change notification settings - Fork 869
feat(multimodal): add Video support for Gemini with structured outputs #1851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add Video and VideoWithGenaiFile classes to multimodal.py - Support video file upload and processing with Gemini API - Export Video classes from instructor package - Update multimodal-gemini blog post with current API - Add complete video extraction example with Gemini 2.5 Pro - Update autodetect_media to handle video files - Support multiple video formats (MP4, MOV, WebM, etc.) Resolves #1850 Co-Authored-By: Jason Liu <jason@jxnl.co>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed everything up to 6b0236d in 2 minutes and 18 seconds. Click for details.
- Reviewed
722lines of code in5files - Skipped
0files when reviewing. - Skipped posting
8draft comments. View those below. - Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. instructor/processing/multimodal.py:540
- Draft comment:
Avoid using 'assert' for MIME type validation; use explicit exception raising so that error checks aren’t skipped in optimized mode. - Reason this comment was not posted:
Confidence changes required:80%<= threshold85%None
2. instructor/processing/multimodal.py:557
- Draft comment:
Avoid using 'assert' for MIME type checks in from_path(); prefer raising a ValueError for clearer error handling. - Reason this comment was not posted:
Confidence changes required:80%<= threshold85%None
3. instructor/processing/multimodal.py:611
- Draft comment:
Ensure 'self.data' is not None before performing base64 decoding in to_genai() to avoid runtime errors. - Reason this comment was not posted:
Confidence changes required:80%<= threshold85%None
4. instructor/__init__.py:4
- Draft comment:
Typographical Notice: The new import includesVideoWithGenaiFile. Considering the commit title mentions Gemini, please verify if the class name should beVideoWithGeminiFilefor consistency. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 85% The comment makes an assumption based on a commit title without full context. Looking at the code, there's a clear distinction between Gemini and Genai providers. VideoWithGenaiFile likely refers to Google's Genai specifically, not Gemini. The name appears intentional and correct. Additionally, suggesting renames based on commit messages rather than code context violates our review rules. I could be wrong about the distinction between Genai and Gemini - they might be the same thing with different naming conventions. The commit message might have important context I'm missing. Even if Genai and Gemini are related, the codebase clearly uses both terms distinctly in different places. We should trust the author's naming choice without more context. Delete this comment. The name appears intentional and correct within the codebase's naming patterns, and suggesting renames based on commit messages rather than code violates our review rules.
5. instructor/__init__.py:42
- Draft comment:
Typo check: The identifier "VideoWithGenaiFile" might be intended to be "VideoWithGenAIFile" for consistent capitalization. Please verify if this naming is correct. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 85% While the comment points out a potential naming inconsistency, without seeing the actual multimodal.py file where this class is defined, we can't be certain if this is actually inconsistent with the codebase's conventions. The comment is also just asking for verification rather than definitively stating a problem. Many codebases use "ai" lowercase in their naming. I might be too quick to dismiss this - inconsistent capitalization across a codebase can lead to confusion. Also, "AI" is very commonly capitalized in most codebases. However, one of our key rules is to not make speculative comments or ask for verification. This comment starts with "Please verify" which is explicitly against our rules. The comment should be deleted because it's asking for verification rather than definitively pointing out an issue, and we don't have enough context to be certain about the codebase's naming conventions.
6. instructor/processing/multimodal.py:47
- Draft comment:
Typo detection: The MIME type 'video/mov' may be incorrect. Typically, the official MIME type for QuickTime movies is 'video/quicktime'. Consider removing or correcting 'video/mov' to avoid duplication with 'video/quicktime'. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 85% While 'video/quicktime' is indeed the official MIME type for QuickTime movies, 'video/mov' is also commonly used in practice. Having both provides better compatibility. The comment is technically correct about the official standard, but removing 'video/mov' could break compatibility with systems that use this unofficial but common MIME type. The comment focuses on technical correctness but ignores practical compatibility concerns. Also, having both MIME types isn't necessarily harmful. While the comment is technically accurate about the official MIME type, keeping both variants improves real-world compatibility without causing issues. The comment should be deleted because removing 'video/mov' could reduce compatibility, and having both MIME types doesn't cause problems.
7. instructor/processing/multimodal.py:508
- Draft comment:
Typo in the autodetect_safely docstring: the parameter type is written as 'Union[str,path]'. Consider adding a space after the comma, e.g., 'Union[str, Path]'. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 85% While the comment is technically correct about the formatting, it's a very minor issue that doesn't affect functionality. The review rules emphasize focusing on substantive issues and not making purely informative comments. Additionally, this is part of new code being added in the diff, but it's such a minor formatting issue that it doesn't warrant a comment. The comment is accurate and points to a real inconsistency in formatting. Consistent formatting can make code more readable and maintainable. While code consistency is valuable, this is an extremely minor formatting issue that doesn't impact code understanding or functionality. The review rules specifically say not to make comments that are obvious or unimportant. The comment should be deleted as it addresses a trivial formatting issue that doesn't meaningfully impact code quality or understanding.
8. instructor/processing/multimodal.py:622
- Draft comment:
Inconsistent terminology: The docstring for from_new_genai_file mentions 'uploading to Gemini API' while the integration and other parts refer to 'GenAI'. Consider standardizing the terminology for clarity. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 85% While consistency in terminology is generally good, this seems like a minor issue. The terms Gemini and GenAI both refer to Google's AI offerings and are commonly used interchangeably. The code still functions correctly regardless of which term is used. The comment doesn't point out a bug or suggest an important improvement. The comment does identify a real inconsistency in terminology. Having consistent terminology could make the code more maintainable and easier to understand. However, both terms are widely recognized in the context of Google's AI services. This is a very minor stylistic issue that doesn't impact functionality or clarity significantly. The comment should be deleted as it points out a minor stylistic issue that doesn't meaningfully impact code quality or clarity.
Workflow ID: wflow_tPoxs9xWjADvEK7C
You can customize by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
| VideoRecommendations object containing structured data | ||
| """ | ||
| client = instructor.from_provider( | ||
| "google/gemini-2.0-flash-exp", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Model version mismatch: the title mentions Gemini 2.5 Pro but the provider string is 'google/gemini-2.0-flash-exp'. Ensure consistency.
| "google/gemini-2.0-flash-exp", | |
| "google/gemini-2.5-pro", |
Deploying with
|
| Status | Name | Latest Commit | Updated (UTC) |
|---|---|---|---|
| ✅ Deployment successful! View logs |
instructor | 6b0236d | Oct 16 2025, 12:54 PM |
feat(multimodal): add Video support for Gemini with structured outputs
Summary
This PR adds full video support to Instructor for use with Google's Gemini models, addressing issue #1850. The implementation includes:
VideoandVideoWithGenaiFileclasses to handle video file upload and processingfrom_providerinstead of deprecatedfrom_gemini)examples/video-extraction-gemini/The Video class follows the same pattern as existing multimodal classes (Image, Audio, PDF) and supports:
VideoWithGenaiFile.from_new_genai_file()Review & Testing Checklist for Human
examples/video-extraction-gemini/run.pywith a real video file to verify the upload flow worksgemini-2.0-flash-exp(or appropriate model) actually supports video inputs - the blog post title mentions "2.5 Pro" but code uses "2.0-flash-exp"VideoWithGenaiFile.from_new_genai_file()correctly uploads videos to Gemini API (modeled after PDFWithGenaiFile but untested)VALID_VIDEO_MIME_TYPESare actually supported by GeminiRecommended Test Plan
GOOGLE_API_KEYenvironment variablepython examples/video-extraction-gemini/run.py sample_video.mp4Notes
Session: https://app.devin.ai/sessions/68339918c75c44ac8f6e3813937a0517
Requested by: Jason Liu (@jxnl)
Important
Adds video support for Gemini models with new
Videoclasses, integrating into the multimodal pipeline and updating documentation and examples.VideoandVideoWithGenaiFileclasses inmultimodal.pyfor video file handling and processing.multimodal-gemini.mdto usefrom_providerinstead offrom_gemini.examples/video-extraction-gemini/.VideoandVideoWithGenaiFileto__init__.pyexports.This description was created by
for 6b0236d. You can customize this summary. It will automatically update as commits are pushed.