A sound cloning tool with a web interface, using your voice
Management of Yandex Station and other smart home devices
NLP Cloud serves high performance pre-trained or custom models for NER
Interface for OuteTTS models
Framework for building neural networks
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Open-source industrial-grade ASR models
Repo of Qwen2-Audio chat & pretrained large audio language model
Instant voice cloning by MIT and MyShell. Audio foundation model
Generate audiobooks from e-books
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
An opinionated CLI to transcribe Audio files w/ Whisper on-device
SoTA open-source TTS
High-quality multi-lingual text-to-speech library by MyShell.ai
Reading book source
LLM-based Reinforcement Learning audio edit model
Automatically translates the text of a video based on a subtitle file
Bailing is a voice dialogue robot similar to GPT-4o
Scalable generative AI framework built for researchers and developers
Chat with it via text and voice
Official PyTorch Implementation
Han Language Processing
Open source AI VTuber platform with voice chat and Live2D avatars
Conversational voice AI agents
Towards Human-Level Text-to-Speech through Style Diffusion