Open-source framework for intelligent speech interaction
A text-to-speech, speech-to-text and speech-to-speech library
Large Audio Language Model built for natural interactions
Multi-modal large language model designed for audio understanding
Controllable & emotion-expressive zero-shot TTS
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Transforming Multimodal Content into Captivating Multilingual Audio
Framework for building real-time voice and multimodal AI agents
Tokenizer-Free TTS for Multilingual Speech Generation
Translate the video from one language to another and embed dubbing
Offline Text To Speech synthesis for python
Capable of understanding text, audio, vision, video
Open Source Speech Language Model
A fast TTS architecture with conditional flow matching
Industrial-level controllable zero-shot text-to-speech system
A Systematic Framework for Interactive World Modeling
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Synchronized Translation for Videos
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Instant voice cloning by MIT and MyShell. Audio foundation model
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Interface for OuteTTS models
A high-quality rapid TTS voice cloning model
A TTS model capable of generating ultra-realistic dialogue
Towards Human-Sounding Speech