AudioMuse-AI is an Open Source Dockerized environment
Qwen3-omni is a natively end-to-end, omni-modal LLM
Generate audiobooks from EPUBs, PDFs and text with captions
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Generate audiobooks from e-books, voice cloning & 1107+ languages
Synchronized Translation for Videos
A nearly-live implementation of OpenAI's Whisper
Implementation of AudioLM audio generation model in Pytorch
Fast multimodal LLM for real-time voice interaction and AI apps
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Capable of understanding text, audio, vision, video
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
SOTA discrete acoustic codec models with 40/75 tokens per second
Framework for building real-time voice and multimodal AI agents
Instant voice cloning by MIT and MyShell. Audio foundation model
Oobabooga - The definitive Web UI for local AI, with powerful features
SOTA Open Source TTS
Free, high-quality text-to-speech API endpoint to replace OpenAI
Data manipulation and transformation for audio signal processing
Sample code and notebooks for Generative AI on Google Cloud
Automatic Speech Recognition with Word-level Timestamps
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
AI video generator optimized for low VRAM and older GPUs use
Interface for OuteTTS models
An Open Source implementation of Notebook LM with more flexibility