Industrial-level controllable zero-shot text-to-speech system
SOTA discrete acoustic codec models with 40/75 tokens per second
Conditional Variational Autoencoder with Adversarial Learning
Implementation of a Transformer based neural network
DeepMind's Tacotron-2 Tensorflow implementation