Skip to content

Conversation

johncalvinroberts
Copy link
Contributor

Hello! This PR fixes a small issue leading to audio/video sync drift when using fastConcatMP4 to concatenate mp4s.

The problem: if input MP4 streams to fastConcatMP4 contain internal timestamp offsets that don't start at 0, cumulative A/V drift occurs during concatenation, making the video not match the audio. This was affecting both audio and video, a video output from fastConcatMP4 is both the wrong duration and showing a/v drift, and also dropping frames due to the incorrectly calculated durations.

This fixes it by normalizing sample timestamps within each stream to start from 0 before applying concatenation offsets, ensuring the correct duration is calculated. This also synchronizes audio timing to the video timeline to maintain correct a/v sync.

I don't have a repro case I can share right now but can provide if needed.

@hughfenghen
Copy link
Collaborator

Thanks for the fix.
Please provide sample files that can reproduce the issue.

@johncalvinroberts
Copy link
Contributor Author

johncalvinroberts commented Jul 29, 2025

Thanks for your response!! @hughfenghen

To demo a reproduction of the bug, please follow these steps.

  1. Firstly, clone this repro: https://github.com/johncalvinroberts/webav-concat-repro
  2. Run it locally and choose one of the examples (Bars example is the most reliable way to repro)
  3. Click "concatenate videos"
Screenshot 2025-07-29 at 11 46 45 AM
  1. To visually see the audio drift you can click Sync play both to play back the original video alongside the fastConcatMp4 concat'd video. You can also play the fastConcatMp4 concat'd video alone to see the a/v drift (watch consonants like p or b to see the drift).
  2. Download the two videos (concatenated.mp4 + original.mp4)
  3. There's a bash script in the repository with an ffprobe command to compare the two videos. Run it against the downloaded videos, like: sh ./check_lengths.sh ./original.mp4 ./concatenated.mp4
  4. Inspect the output to see discrepancies in the downloaded concatenated vs. original video, like so:
File                 | Duration (frames)     | Audio Duration  | A/V Diff
-------------------- | --------------------- | --------------- | ---------
concatenated.mp4     |   56400.0ms (1410 frames) |   56872.0ms |   +472.0ms ⚠️
original.mp4         |   56720.0ms (1418 frames) |   56746.0ms |    +26.0ms ✅

So, as you can see, the file concatenated with webav is missing 8 frames, the video duration is shorter overall and audio duration is longer overall than the original video.

My fix addresses this by normalizing sample timestamps within each input stream to start from 0 before applying concatenation offsets. The root cause was that input MP4 streams contain internal timestamp offsets that don't start at 0, which were being preserved and accumulated during concatenation, causing progressive A/V drift. The normalization eliminates these internal offsets while maintaining proper synchronization between audio and video tracks.

@hughfenghen hughfenghen merged commit cdffcbd into WebAV-Tech:main Aug 1, 2025
2 checks passed
@github-actions github-actions bot mentioned this pull request Aug 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants