Large Audio Language Model built for natural interactions
A text-to-speech, speech-to-text and speech-to-speech library
The Triton Inference Server provides an optimized cloud
Convert files and web content into clean, usable Markdown easily
Easy-to-use Speech Toolkit including Self-Supervised Learning model
The python library for real-time communication
Free, high-quality text-to-speech API endpoint to replace OpenAI
A lightweight text-to-speech model with zero-shot voice cloning
Oobabooga - The definitive Web UI for local AI, with powerful features
Document Image Parsing via Heterogeneous Anchor Prompting”
The official Python SDK for the ElevenLabs API
A react-based starter app for using the Live API over websockets
WhatsApp MCP server enabling AI access to chats and messaging
Capable of understanding text, audio, vision, video
StreamSpeech is a seamless model for offline speech recognition
Access to Anthropic's safety-first language model APIs
A HTML5 video player with a parser that saves traffic
Towards Human-Sounding Speech
Open source text-to-speech tool, supports extra-long text
Execute SQL queries and manage databases seamlessly with Timeplus
Converts text to speech in realtime
One-click deployment (including offline integration package)
Qwen3-omni is a natively end-to-end, omni-modal LLM
This SDK is now deprecated, use the new unified Google GenAI SDK
A nearly-live implementation of OpenAI's Whisper