Synchronized Translation for Videos
Foundational model for human-like, expressive TTS
Framework for building realtime multimodal voice AI agents apps
Capable of understanding text, audio, vision, video
A lightweight text-to-speech model with zero-shot voice cloning
MARS5 speech model (TTS) from CAMB.AI
Framework for building real-time voice and multimodal AI agents
SOTA discrete acoustic codec models with 40/75 tokens per second
Voice Recognition to Text Tool
Qwen3-ASR is an open-source series of ASR models
Free, high-quality text-to-speech API endpoint to replace OpenAI
A nearly-live implementation of OpenAI's Whisper
1 min voice data can also be used to train a good TTS model
Real-time voice interactive digital human
Controllable & emotion-expressive zero-shot TTS
The official Python SDK for the ElevenLabs API
A speech-text foundation model for real time dialogue
Generate audiobooks from e-books, voice cloning & 1107+ languages
Offline inference engine for art, real-time voice conversations
Official PyTorch Implementation
Converts text to speech in realtime
Speakr is a personal, self-hosted web application
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
Repo of Qwen2-Audio chat & pretrained large audio language model
Open source AI VTuber platform with voice chat and Live2D avatars