The Chrome OS Virtual Machine Monitor
Transform a cold separation into a warm Skill
Implementation of AudioLM audio generation model in Pytorch
Fast multimodal LLM for real-time voice interaction and AI apps
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Capable of understanding text, audio, vision, video
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
SOTA discrete acoustic codec models with 40/75 tokens per second
Chinese Financial Trading Framework Based on Multi-Agent LLM
Framework for building real-time voice and multimodal AI agents
Download videos from websites like YouTube and many others
Instant voice cloning by MIT and MyShell. Audio foundation model
Oobabooga - The definitive Web UI for local AI, with powerful features
SOTA Open Source TTS
Free, high-quality text-to-speech API endpoint to replace OpenAI
PersonaPlex code
An open source digital image forensic toolset
Speakr is a personal, self-hosted web application
Streaming Real-time Audio-Driven Avatar Generation
Data manipulation and transformation for audio signal processing
Sample code and notebooks for Generative AI on Google Cloud
Automatic Speech Recognition with Word-level Timestamps
Intelligent automation and multi-agent orchestration for Claude Code
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Interface for OuteTTS models