Open-source framework for intelligent speech interaction
Multi-modal large language model designed for audio understanding
Controllable & emotion-expressive zero-shot TTS
Capable of understanding text, audio, vision, video
A Systematic Framework for Interactive World Modeling
Open Source Speech Language Model
Industrial-level controllable zero-shot text-to-speech system
State-of-the-art TTS model under 25MB
Qwen3-TTS is an open-source series of TTS models
GLM-4-Voice | End-to-End Chinese-English Conversational Model
A Conversational Speech Generation Model
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)