Large Audio Language Model built for natural interactions
A text-to-speech, speech-to-text and speech-to-speech library
The Triton Inference Server provides an optimized cloud
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Free, high-quality text-to-speech API endpoint to replace OpenAI
Convert files and web content into clean, usable Markdown easily
A lightweight text-to-speech model with zero-shot voice cloning
The python library for real-time communication
Document Image Parsing via Heterogeneous Anchor Prompting”
Oobabooga - The definitive Web UI for local AI, with powerful features
A react-based starter app for using the Live API over websockets
Access to Anthropic's safety-first language model APIs
WhatsApp MCP server enabling AI access to chats and messaging
Capable of understanding text, audio, vision, video
The official Python SDK for the ElevenLabs API
StreamSpeech is a seamless model for offline speech recognition
Qwen3-omni is a natively end-to-end, omni-modal LLM
Towards Human-Sounding Speech
Tokenizer-Free TTS for Multilingual Speech Generation
A HTML5 video player with a parser that saves traffic
A nearly-live implementation of OpenAI's Whisper
Open source text-to-speech tool, supports extra-long text
Converts text to speech in realtime
One-click deployment (including offline integration package)
AI-powered MCP server for desktop file and terminal automation