Official MiniMax Model Context Protocol (MCP) server
A robust, efficient, low-latency speech-to-text library
A generative speech model for daily dialogue
Reading book source
Generate audiobooks from e-books, voice cloning & 1107+ languages
Offline Text To Speech synthesis for python
A minimalist command line knowledge base manager
A simple tool for reading in poorly redacted documents
Generating Immersive, Explorable, and Interactive 3D Worlds
Converts text to speech in realtime
The behavior guidance framework for customer-facing LLM agents
EPUB to audiobook converter, optimized for Audiobookshelf
Automatic Speech Recognition with Word-level Timestamps
Use Microsoft Edge's online text-to-speech service from Python
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
CLIP, Predict the most relevant text snippet given an image
Python library and CLI tool to interface with Google Translate
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
Qwen3-omni is a natively end-to-end, omni-modal LLM
A TTS that fits in your CPU (and pocket)
ComfyUI wrapper nodes for HunyuanVideo
Implementation of Imagen, Google's Text-to-Image Neural Network
Qwen-Image is a powerful image generation foundation model
Python bindings for MuPDF's rendering library.