GLM-TTS is an advanced text-to-speech synthesis system built on large language model technologies that focuses on producing high-quality, expressive, and controllable spoken output, including features like emotion modulation and zero-shot voice cloning. It uses a two-stage architecture where a generative LLM first converts text into intermediate speech token sequences and then a Flow-based neural model converts those tokens into natural audio waveforms, enabling rich prosody and voice character even for unseen speakers. The system introduces a multi-reward reinforcement learning framework that jointly optimizes for voice similarity, emotional expressiveness, pronunciation, and intelligibility, yielding output that can rival commercial options in naturalness and expressiveness. GLM-TTS also supports phoneme-level control and hybrid text + phoneme input, giving developers precise control over pronunciation critical for multilingual or polyphone­-rich languages.

Features

  • Zero-shot voice cloning from short prompt audio
  • Multi-reward reinforcement learning for expressive prosody
  • Two-stage LLM + Flow-based audio generation pipeline
  • Support for phoneme-level control and hybrid inputs
  • High-quality synthesis comparable with commercial TTS
  • Streaming real-time speech synthesis

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow GLM-TTS

GLM-TTS Web Site

Other Useful Business Software
Easy-to-Use Website Accessibility Widget Icon
Easy-to-Use Website Accessibility Widget

An accessibility solution for quick website accessibility improvement.

All in One Accessibility is an AI based accessibility tool that helps organizations to enhance the accessibility and usability of websites quickly.
Learn More
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of GLM-TTS!

Additional Project Details

Programming Language

Python

Related Categories

Python Text to Speech Software, Python AI Models

Registered

2026-01-20