Capable of understanding text, audio, vision, video
An open-source music player with simple UI
Fast multimodal LLM for real-time voice interaction and AI apps
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
SOTA discrete acoustic codec models with 40/75 tokens per second
Instant voice cloning by MIT and MyShell. Audio foundation model
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
AI video generator optimized for low VRAM and older GPUs use
Oobabooga - The definitive Web UI for local AI, with powerful features
Framework for building real-time voice and multimodal AI agents
Free, high-quality text-to-speech API endpoint to replace OpenAI
SOTA Open Source TTS
Data manipulation and transformation for audio signal processing
Video player for improving quality of hand-drawn images
Automatic Speech Recognition with Word-level Timestamps
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Sample code and notebooks for Generative AI on Google Cloud
PersonaPlex code
Speakr is a personal, self-hosted web application
Streaming Real-time Audio-Driven Avatar Generation
Interface for OuteTTS models
Automatically translates the text of a video based on a subtitle file
Robust Speech Recognition via Large-Scale Weak Supervision
The music player of today
Trying to be a robust, user-friendly and hackable music player