A text-to-speech, speech-to-text and speech-to-speech library
Large Audio Language Model built for natural interactions
Free, high-quality text-to-speech API endpoint to replace OpenAI
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Convert files and web content into clean, usable Markdown easily
A lightweight text-to-speech model with zero-shot voice cloning
The python library for real-time communication
Oobabooga - The definitive Web UI for local AI, with powerful features
Document Image Parsing via Heterogeneous Anchor Prompting”
Capable of understanding text, audio, vision, video
A react-based starter app for using the Live API over websockets
Access to Anthropic's safety-first language model APIs
WhatsApp MCP server enabling AI access to chats and messaging
The official Python SDK for the ElevenLabs API
Towards Human-Sounding Speech
Qwen3-omni is a natively end-to-end, omni-modal LLM
StreamSpeech is a seamless model for offline speech recognition
Tokenizer-Free TTS for Multilingual Speech Generation
A HTML5 video player with a parser that saves traffic
A nearly-live implementation of OpenAI's Whisper
AI-powered MCP server for desktop file and terminal automation
Open source text-to-speech tool, supports extra-long text
One-click deployment (including offline integration package)
Converts text to speech in realtime
Cross-platform, customizable ML solutions