Forced-Aligned Diarization Labels for AMI and AliMeeting

This repository provides the diarization labels of the AMI and AliMeeting datasets obtained via forced alignment, used in our ASRU 2025 paper "Can We Really Repurpose Multi-Speaker ASR Corpus for Speaker Diarization?" The labels are created using Montreal Forced Aligner with pretrained models.

Note that the provided labels are not perfect due to alignment errors, transcription ambiguity, out-of-vocabulary words, etc. But at least, it is common to regard labels obtained via forced alignment as ground truth in VAD research (e.g., [Kraljevski+, Interspeech 2025] and [Tan+, Computer Speech & Language 2020]).

Annotations

AMI

The original dataset can be downloaded from the official website.
The train/val/eval split follows the full-corpus ASR partition.
We used the only_words setup, which only uses the intervals marked as words, not vocal sounds. Please refer to BUTSpeechFIT/AMi-diarization-setup for more details.
For forced alignment, we first used English (US) MFA G2P model v3.0.0 to generate pronunciations for OOV words, and then used English MFA acoustic model v3.1.0 and English (US) MFA dictionary v3.1.0 for alignment.

AliMeeting

The original dataset can be downloaded from OpenSLR.
For forced alignment, we first used Mandarin (China) MFA G2P model v3.0.0 to generate pronunciations for OOV words, and then used Mandarin MFA acoustic model v3.0.0 and Mandarin (China) MFA dictionary v3.0.0 for alignment.

Citation

If you use our annotations, please cite our paper below.

@inproceedings{horiguchi_asru2025,
    author = {Horiguchi, Shota and Tawara, Naohiro and Ashihara, Takanori and Ando, Atsushi and Delcroix, Marc},
    title = {Can We Really Repurpose Multi-Speaker ASR Corpus for Speaker Diarization?},
    booktitle={IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
    year = {2025},
    month = {Dec},
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
AMI		AMI
AliMeeting		AliMeeting
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Forced-Aligned Diarization Labels for AMI and AliMeeting

Annotations

AMI

AliMeeting

Citation

About

Uh oh!

Releases

Packages

License

nttcslab-sp/diar-forced-alignment

Folders and files

Latest commit

History

Repository files navigation

Forced-Aligned Diarization Labels for AMI and AliMeeting

Annotations

AMI

AliMeeting

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages