A PyTorch-based Speech Toolkit
Open Source Speech Language Model
Python inference and LoRA trainer package for the LTX-2 audio–video
One-click deployment (including offline integration package)
A TTS model capable of generating ultra-realistic dialogue
A sound cloning tool with a web interface, using your voice
Scalable data pre processing and curation toolkit for LLMs
Industrial-level controllable zero-shot text-to-speech system
The official Python Library for the Groq API
Open source AI wearable platform for recording and summarizing speech
A high-quality rapid TTS voice cloning model
An extremely simple tool for separating vocals and background music
Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model
Code and models for ICML 2024 paper, NExT-GPT
Streamlines and simplifies prompt design for both developers
A fast TTS architecture with conditional flow matching
A python tool that uses GPT-4, FFmpeg, and OpenCV
ImageBind One Embedding Space to Bind Them All
High-Quality Voice Cloning TTS for 600+ Languages
Data Infrastructure providing an approach to multimodal AI workloads
Qwen3-ASR is an open-source series of ASR models
Open source codebase for Scale Agentex
Python library and CLI tool to interface with Google Translate
State-of-the-art TTS model under 25MB
Document Image Parsing via Heterogeneous Anchor Prompting”