A text-to-speech, speech-to-text and speech-to-speech library
Large Audio Language Model built for natural interactions
The Triton Inference Server provides an optimized cloud
Swing Music is a beautiful, self-hosted music player
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Free, high-quality text-to-speech API endpoint to replace OpenAI
Self-hosted game stream host for Moonlight
Streaming Real-time Audio-Driven Avatar Generation
A speech-text foundation model for real time dialogue
A lightweight text-to-speech model with zero-shot voice cloning
Automated Music Discovery and Collection Manager
Document Image Parsing via Heterogeneous Anchor Prompting”
Oobabooga - The definitive Web UI for local AI, with powerful features
Capable of understanding text, audio, vision, video
The official Python SDK for the ElevenLabs API
WhatsApp MCP server enabling AI access to chats and messaging
StreamSpeech is a seamless model for offline speech recognition
Towards Human-Sounding Speech
Qwen3-omni is a natively end-to-end, omni-modal LLM
Tokenizer-Free TTS for Multilingual Speech Generation
One-click deployment (including offline integration package)
GenAI Processors is a lightweight Python library
Execute SQL queries and manage databases seamlessly with Timeplus
Converts text to speech in realtime
A nearly-live implementation of OpenAI's Whisper