A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
- 
            Updated
            Oct 23, 2025 
- Python
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
A PyTorch-based Speech Toolkit
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
SincNet is a neural architecture for efficiently processing raw audio samples.
This project uses a variety of advanced voiceprint recognition models such as EcapaTdnn, ResNetSE, ERes2Net, CAM++, etc. It is not excluded that more models will be supported in the future. At the same time, this project also supports MelSpectrogram, Spectrogram data preprocessing methods
In defence of metric learning for speaker recognition
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
an open-source implementation of sequence-to-sequence based speech processing engine
🔈 Deep Learning & 3D Convolutional Neural Networks for Speaker Verification
Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, powered by SOTA open source.
Unofficial reimplementation of ECAPA-TDNN for speaker recognition (EER=0.86 for Vox1_O when train only in Vox2)
Angular penalty loss functions in Pytorch (ArcFace, SphereFace, Additive Margin, CosFace)
speaker diarization by uis-rnn and speaker embedding by vgg-speaker-recognition
This repository contains audio samples and supplementary materials accompanying publications by the "Speaker, Voice and Language" team at Google.
Aims to create a comprehensive voice toolkit for training, testing, and deploying speaker verification systems.
The SpeechBrain project aims to build a novel speech toolkit fully based on PyTorch. With SpeechBrain users can easily create speech processing systems, ranging from speech recognition (both HMM/DNN and end-to-end), speaker recognition, speech enhancement, speech separation, multi-microphone speech processing, and many others.
A desktop application that uses AI to translate voice between languages in real time, while preserving the speaker's tone and emotion.
使用Tensorflow实现声纹识别
Deep speaker embeddings in PyTorch, including x-vectors. Code used in this work: https://arxiv.org/abs/2007.16196
Add a description, image, and links to the speaker-recognition topic page so that developers can more easily learn about it.
To associate your repository with the speaker-recognition topic, visit your repo's landing page and select "manage topics."