python voice synthesis free download

NVIDIA NeMo

Toolkit for conversational AI

NVIDIA NeMo, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures. Conversational AI...

Downloads: 2 This Week

Last Update: 2026-03-23

See Project

Video Diffusion - Pytorch

Implementation of Video Diffusion Models

Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch. Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch. It uses a special space-time factored U-net, extending generation from 2D images to 3D videos. 14k for difficult moving mnist (converging much faster and better than NUWA) - wip. Any new developments for text-to-video synthesis will be centralized at...

Downloads: 0 This Week

Last Update: 2024-05-03

See Project

KoboldCpp

Run GGUF models easily with a UI or API. One File. Zero Install.

KoboldCpp is an easy-to-use AI text-generation software for GGML and GGUF models, inspired by the original KoboldAI. It's a single self-contained distributable that builds off llama.cpp and adds many additional powerful features.

Downloads: 486 This Week

Last Update: 17 hours ago

See Project

DALL-E 2 - Pytorch

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch. The main novelty seems to be an extra layer of indirection with the prior network (whether it is an autoregressive transformer or a diffusion network), which predicts an image embedding based on the text embedding from CLIP. Specifically, this repository will only build out the diffusion prior network, as it is the best performing variant (but which incidentally involves a causal transformer as...

Downloads: 1 This Week

Last Update: 2023-10-19

See Project

VALL-E

PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)

We introduce a language modeling approach for text to speech synthesis (TTS). Specifically, we train a neural codec language model (called VALL-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a conditional language modeling task rather than continuous signal regression as in previous work. During the pre-training stage, we scale up the TTS training data to 60K hours of English speech which is hundreds of times larger than existing systems....

Downloads: 0 This Week

Last Update: 2023-04-14

See Project

Amiga Memories

A walk along memory lane

Amiga Memories is a project (started & released in 2013) that aims to make video programmes that can be published on the internet. The images and sound produced by Amiga Memories are 100% automatically generated. The generator itself is implemented in Squirrel, the 3D rendering is done on GameStart 3D. An Amiga Memories video is mostly based on a narrative. The purpose of the script is to define the spoken and written content. The spoken text will be read by a voice synthesizer (Text To...

Downloads: 0 This Week

Last Update: 2023-03-22

See Project

NÜWA - Pytorch

Implementation of NÜWA, attention network for text to video synthesis

Implementation of NÜWA, state of the art attention network for text-to-video synthesis, in Pytorch. It also contains an extension into video and audio generation, using a dual decoder approach. It seems as though a diffusion-based method has taken the new throne for SOTA. However, I will continue on with NUWA, extending it to use multi-headed codes + hierarchical causal transformer. I think that direction is untapped for improving on this line of work. In the paper, they also present a way...

Downloads: 0 This Week

Last Update: 2023-03-22

See Project

Point-E

Point cloud diffusion for 3D model synthesis

point-e is the official repository for Point-E, a generative model developed by OpenAI that produces 3D point clouds from textual (or image) prompts. Its principal advantage is speed: it can generate 3D assets in just 1–2 minutes on a single GPU, which is significantly faster than many competing text-to-3D models. The model works via a two-stage diffusion approach: first, it uses a text → image diffusion network to produce a synthetic 2D view consistent with the prompt; then a second...

Downloads: 0 This Week

Last Update: 2025-10-02

See Project

CIPS-3D

3D-aware GANs based on NeRF (arXiv)

3D-aware GANs based on NeRF (arXiv). This repository contains the code of the paper, CIPS-3D: A 3D-Aware Generator of GANs Based on Conditionally-Independent Pixel Synthesis. The problem of mirror symmetry refers to the sudden change of the direction of the bangs near the yaw angle of pi/2. We propose to use an auxiliary discriminator to solve this problem. Note that in the initial stage of training, the auxiliary discriminator must dominate the generator more than the main discriminator...

Downloads: 0 This Week

Last Update: 2023-03-21

See Project

GANformer

Generative Adversarial Transformers

This is an implementation of the GANformer model, a novel and efficient type of transformer, explored for the task of image generation. The network employs a bipartite structure that enables long-range interactions across the image, while maintaining computation of linearly efficiency, that can readily scale to high-resolution synthesis. The model iteratively propagates information from a set of latent variables to the evolving visual features and vice versa, to support the refinement of...

Downloads: 0 This Week

Last Update: 2023-03-22

See Project

PyTorch pretrained BigGAN

PyTorch implementation of BigGAN with pretrained weights

An op-for-op PyTorch reimplementation of DeepMind's BigGAN model with the pre-trained weights from DeepMind. This repository contains an op-for-op PyTorch reimplementation of DeepMind's BigGAN that was released with the paper Large Scale GAN Training for High Fidelity Natural Image Synthesis. This PyTorch implementation of BigGAN is provided with the pretrained 128x128, 256x256 and 512x512 models by DeepMind. We also provide the scripts used to download and convert these models from the...

Downloads: 0 This Week

Last Update: 2023-03-21

See Project

Search Results for "python voice synthesis"

Showing 11 open source projects for "python voice synthesis"

NVIDIA NeMo

Video Diffusion - Pytorch

KoboldCpp

DALL-E 2 - Pytorch

VALL-E

Amiga Memories

NÜWA - Pytorch

Point-E

CIPS-3D

GANformer

PyTorch pretrained BigGAN

Search Results for "python voice synthesis"

Showing 11 open source projects for "python voice synthesis"

NVIDIA NeMo

Video Diffusion - Pytorch

KoboldCpp

DALL-E 2 - Pytorch

VALL-E

Amiga Memories

NÜWA - Pytorch

Point-E

CIPS-3D

GANformer

PyTorch pretrained BigGAN

Related Searches

Related Categories