+In December 2022, our lab and Wordsense (an affiliated company with Looloo Technology) released Thonburian Whisper as a part of Huggingface's Whisper fine-tuning event. When we first released the model, our goal was to address the challenges that vanilla Whisper models faced with Thai speech. Through careful combination of audio datasets, strategic augmentation, and improved segmentation, we achieved significant improvements in Word Error Rate (WER) across different model sizes. The distilled models proved particularly interesting, achieving strong performance with less than 1,500 hours of audio data. We recently publish our model at [ICNLSP 2024](https://aclanthology.org/2024.icnlsp-1.17/) where we have grown community using our model on [Github](https://github.com/biodatlab/thonburian-whisper).
0 commit comments