Trying to be a robust, user-friendly and hackable music player
Offline Text To Speech synthesis for python
An Open Source implementation of Notebook LM with more flexibility
Label Studio is a multi-type data labeling and annotation tool
A Web UI for easy subtitle using whisper model
Unified web UI for training and running open models locally
Generate blog articles from video or audio
Open source AI model for generating full songs from lyrics prompts
A Systematic Framework for Interactive World Modeling
Multimodal-Driven Architecture for Customized Video Generation
MARS5 speech model (TTS) from CAMB.AI
Controllable & emotion-expressive zero-shot TTS
The official Python SDK for the ElevenLabs API
Converts text to speech in realtime
Use Microsoft Edge's online text-to-speech service from Python
Convert various image, audio and video formats from your context menu.
Unofficial Python API and agentic skill for Google NotebookLM
A general fine-tuning kit geared toward image/video/audio diffusion
The most powerful and modular diffusion model GUI, api and backend
Voice Recognition to Text Tool
GenAI Processors is a lightweight Python library
One-click deployment (including offline integration package)
A TTS model capable of generating ultra-realistic dialogue
Python inference and LoRA trainer package for the LTX-2 audio–video
A sound cloning tool with a web interface, using your voice