An Open Source implementation of Notebook LM with more flexibility
Interface for OuteTTS models
Multi-user UI for managing and running Stable Diffusion workflows tool
Robust Speech Recognition via Large-Scale Weak Supervision
Automatically translates the text of a video based on a subtitle file
Label Studio is a multi-type data labeling and annotation tool
AI tool converting video/audio into structured documents instantly
Open source AI model for generating full songs from lyrics prompts
Unified web UI for training and running open models locally
Offline Text To Speech synthesis for python
Generate blog articles from video or audio
Unofficial Python API and agentic skill for Google NotebookLM
A Systematic Framework for Interactive World Modeling
MARS5 speech model (TTS) from CAMB.AI
A general fine-tuning kit geared toward image/video/audio diffusion
The most powerful and modular diffusion model GUI, api and backend
EPUB to audiobook converter, optimized for Audiobookshelf
Helps scientists define testable, modular, self-documenting dataflow
ChatGPT interface with better UI
Controllable & emotion-expressive zero-shot TTS
The official Python SDK for the ElevenLabs API
Converts text to speech in realtime
Voice Recognition to Text Tool
Use Microsoft Edge's online text-to-speech service from Python
A PyTorch-based Speech Toolkit