A speech-text foundation model for real time dialogue
48khz stereo neural audio codec for general audio
Python Audio Analysis Library: Feature Extraction, Classification
A lightweight audio-to-MIDI converter with pitch bend detection
Music player and music library manager for Linux, Windows, and macOS
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Open-source multi-speaker long-form text-to-speech model
Dumb downloader that scrapes the web
Multilingual speech recognition and audio understanding model
Multimodal Diffusion with Representation Alignment
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Generate audiobooks from e-books, voice cloning & 1107+ languages
Generate audiobooks from EPUBs, PDFs and text with captions
Swing Music is a beautiful, self-hosted music player
A nearly-live implementation of OpenAI's Whisper
Musician-oriented Linux distro
Download videos from almost any website
AudioMuse-AI is an Open Source Dockerized environment
Qwen3-omni is a natively end-to-end, omni-modal LLM
Synchronized Translation for Videos
Cross platform GUI tool for downloading videos from Bilibili sites
Implementation of AudioLM audio generation model in Pytorch
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Comprehensive Gradio WebUI for audio processing
Download videos from websites like YouTube and many others