Research on Multimodal LLMs and Speech AI
Pinned Loading
-
NVIDIA-NeMo/NeMo
NVIDIA-NeMo/NeMo PublicA scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
-
-
NeMo_VoiceTextBlender
NeMo_VoiceTextBlender PublicNAACL 2025 main conference: "VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning"
-
speech-model-compression
speech-model-compression PublicA collection of papers related to speech model compression
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.