This is NOT an official implementation of DyCoke. For the official implementation, please refer to this repo.
As compared to official implementation, This repo integrates DyCoke with more recent VLMs such as Gemma3 and InternVL3
To use this repo, you need two key packages to be installed in a venv with python>=3.10
torch==2.5.1
tranformers==4.53.0
To have a quick demo of DyCoke with Gemma3, please run
python test_gemma3_with_dycoke.py --video resources/example_video.mp4 --prompt "Explain the video." --use_dycoke
DyCoke can be tunred off by removing --use_dycoke
argument. Please note that to avoid OOM errors I have configured
utils/video_reader.py
to select every 12th frame.
With DyCoke enabled Gemma3-4b-it model runs ~37 tokens/sec whereas vanilla gemma3-4b-it runs around ~12 tokens/sec
With DyCoke | Without DyCoke |
![]() |
![]() |