A text-to-speech, speech-to-text and speech-to-speech library
Large Audio Language Model built for natural interactions
The Triton Inference Server provides an optimized cloud
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Free, high-quality text-to-speech API endpoint to replace OpenAI
A lightweight text-to-speech model with zero-shot voice cloning
Document Image Parsing via Heterogeneous Anchor Prompting”
Oobabooga - The definitive Web UI for local AI, with powerful features
Capable of understanding text, audio, vision, video
The official Python SDK for the ElevenLabs API
WhatsApp MCP server enabling AI access to chats and messaging
StreamSpeech is a seamless model for offline speech recognition
Towards Human-Sounding Speech
Qwen3-omni is a natively end-to-end, omni-modal LLM
One-click deployment (including offline integration package)
Tokenizer-Free TTS for Multilingual Speech Generation
Execute SQL queries and manage databases seamlessly with Timeplus
Converts text to speech in realtime
Data manipulation and transformation for audio signal processing
A nearly-live implementation of OpenAI's Whisper
Provides convenient access to the Anthropic REST API from any Python 3
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Open source AI wearable platform for recording and summarizing speech
Controllable & emotion-expressive zero-shot TTS