audio synthesis free download

Showing 629 open source projects for "audio synthesis"

View related business solutions

AestheticsPro Medical Spa Software
Our new software release will dramatically improve your medspa business performance while enhancing the customer experience

AestheticsPro is the most complete Aesthetics Software on the market today. HIPAA Cloud Compliant with electronic charting, integrated POS, targeted marketing and results driven reporting; AestheticsPro delivers the tools you need to manage your medical spa business. It is our mission To Provide an All-in-One Cutting Edge Software to the Aesthetics Industry.

Learn More
The full-stack observability platform that protects your dataLayer, tags and conversion data
Stop losing revenue to bad data today. and protect your marketing data with Code-Cube.io.

Code-Cube.io detects issues instantly, alerts you in real time and helps you resolve them fast. No manual QA. No unreliable data. Just data you can trust and act on.

Learn More
1

Step-Audio

Open-source framework for intelligent speech interaction

Step-Audio is a unified, open-source framework aimed at building intelligent speech systems that combine both comprehension and generation: it integrates large language models (LLMs) with speech input/output to handle not only semantic understanding but also rich vocal characteristics like tone, style, dialect, emotion, and prosody. The design moves beyond traditional separate-component pipelines (ASR → text model → TTS), instead offering a multimodal model that ingests speech or audio and produces speech accordingly, enabling natural dialogue, voice cloning, and expressive speech synthesis. ...

Downloads: 0 This Week

Last Update: 2026-03-16
See Project
2

MLX-Audio

A text-to-speech, speech-to-text and speech-to-speech library

...The project provides a straightforward CLI (mlx_audio.tts.generate) as well as a Python API for programmatic generation of audio, including parameters for voice choice, speed, language hints, output format, and sample rate. It includes examples such as audiobook generation to demonstrate long-form synthesis and joined audio segments. On top of that, MLX-Audio offers a modern web interface powered by FastAPI, with real-time waveform and 3D visualizations, file upload, and audio management.

Downloads: 15 This Week

Last Update: 2026-03-30
See Project
3

SuperCollider

Audio server, programming language, and IDE for sound synthesis

SuperCollider is a platform for audio synthesis and algorithmic composition, used by musicians, artists, and researchers working with sound. It is free and open source software available for Windows, macOS, and Linux. scsynth, a real-time audio server, forms the core of the platform. It features 400+ unit generators (“UGens”) for analysis, synthesis, and processing.

Downloads: 3 This Week

Last Update: 2025-11-24
See Project
4

Fun Audio Chat

Large Audio Language Model built for natural interactions

Fun Audio Chat is an interactive voice-first conversational AI platform designed to let users engage in natural spoken dialogue with large language models in real time, turning speech into context-aware responses while maintaining a smooth back-and-forth experience. It combines speech recognition, audio processing, and AI generation so users can speak simply and receive spoken replies, enabling applications such as virtual assistants, voice bots, and hands-free chat interfaces. The system...

Downloads: 1 This Week

Last Update: 2026-02-27
See Project
SoftCo: Enterprise Invoice and P2P Automation Software
For companies that process over 20,000 invoices per year

SoftCo Accounts Payable Automation processes all PO and non-PO supplier invoices electronically from capture and matching through to invoice approval and query management. SoftCoAP delivers unparalleled touchless automation by embedding AI across matching, coding, routing, and exception handling to minimize the number of supplier invoices requiring manual intervention. The result is 89% processing savings, supported by a context-aware AI Assistant that helps users understand exceptions, answer questions, and take the right action faster.

Learn More
5

Step-Audio 2

Multi-modal large language model designed for audio understanding

Step-Audio2 is an advanced, end-to-end multimodal large language model designed for high-fidelity audio understanding and natural speech conversation: unlike many pipelines that separate speech recognition, processing, and synthesis, Step-Audio2 processes raw audio, reasons about semantic and paralinguistic content (like emotion, speaker characteristics, non-verbal cues), and can generate contextually appropriate responses — including potentially generating or transforming audio output. ...

Downloads: 0 This Week

Last Update: 2026-03-16
See Project
6

Voicebox

The open-source voice synthesis studio powered by Qwen3-TTS

Voicebox is a local-first voice synthesis studio that aims to bring professional, DAW-like voice generation workflows to a desktop app while keeping models and voice data entirely on your machine. It positions itself as an open-source alternative to cloud voice platforms by emphasizing privacy, offline use, and freedom from subscriptions or usage caps. The tool supports downloading voice models, cloning voices from short audio samples, and generating speech locally, then organizing the results using studio-oriented editing concepts. ...

Downloads: 72 This Week

Last Update: 2026-03-17
See Project
7

FluidSynth

Software synthesizer based on the SoundFont 2 specifications

FluidSynth is a real-time software synthesizer based on the SoundFont 2 specifications and has reached widespread distribution. FluidSynth itself does not have a graphical user interface, but due to its powerful API several applications utilize it and it has even found its way onto embedded systems and is used in some mobile apps.

Downloads: 46 This Week

Last Update: 2026-02-21
See Project
8

Furnace

A multi-system chiptune tracker compatible with DefleMask modules

Furnace is a powerful multi-system chiptune tracker that enables users to compose music using the sound chips of classic computers, consoles, and arcade hardware. It supports an extensive range of audio chips, including FM synthesis, wavetable synthesis, and sample-based systems, making it one of the most versatile trackers available. The software is compatible with multiple operating systems and can be used both as a standalone application and as a development tool for retro-style audio production. Its interface is inspired by traditional tracker software, allowing precise control over note sequences, effects, and instrument parameters. ...

Downloads: 5 This Week

Last Update: 6 days ago
See Project
9

Sonic Pi

Sonic Pi is your free code-based music creation and performance tool

Sonic Pi is a new kind of musical instrument. Instead of strumming strings or whacking things with sticks - you write code, live. Sonic Pi is a complete open source programming environment originally designed to explore and teach programming concepts within schools through the process of creating new sounds. In addition to being an engaging education resource it has evolved into an extremely powerful and performance-ready live coding instrument suitable for professional artists and DJs....

Downloads: 20 This Week

Last Update: 2025-06-26
See Project
Premier Construction Software
Premier is a global leader in financial construction ERP software.

Rated #1 Construction Accounting Software by Forbes Advisor in 2022 & 2023. Our modern SAAS solution is designed to meet the needs of General Contractors, Developers/Owners, Homebuilders & Specialty Contractors.

Learn More
10

Faust

Functional programming language for signal processing

Faust (Functional Audio Stream) is a functional programming language for sound synthesis and audio processing with a strong focus on the design of synthesizers, musical instruments, audio effects, etc. Faust targets high-performance signal processing applications and audio plug-ins for a variety of platforms and standards. The core component of Faust is its compiler.

Downloads: 16 This Week

Last Update: 2026-03-20
See Project
11

Overtone

Collaborative programmable music

Overtone is an open-source audio environment designed to explore new musical ideas from synthesis and sampling to instrument building, live coding and collaborative jamming. We combine the powerful SuperCollider audio engine, with Clojure, a state-of-the-art lisp, to create an intoxicating interactive sonic experience. Synchronize your visuals and noise with ease.

Downloads: 7 This Week

Last Update: 2024-11-07
See Project
12

GLM-TTS

Controllable & emotion-expressive zero-shot TTS

GLM-TTS is an advanced text-to-speech synthesis system built on large language model technologies that focuses on producing high-quality, expressive, and controllable spoken output, including features like emotion modulation and zero-shot voice cloning. It uses a two-stage architecture where a generative LLM first converts text into intermediate speech token sequences and then a Flow-based neural model converts those tokens into natural audio waveforms, enabling rich prosody and voice character even for unseen speakers. ...

Downloads: 4 This Week

Last Update: 4 days ago
See Project
13

Hugging Face - Speech To Speech

Open speech-to-speech models and pipelines by Hugging Face toolkit AI

This project from Hugging Face focuses on enabling direct speech-to-speech processing using modern machine learning models. It provides tools and reference implementations that allow audio input to be transformed into audio output without requiring an intermediate text representation. Hugging Face - Speech To Speech builds on recent advances in speech modeling, combining components such as speech recognition, translation, and synthesis into unified pipelines. It is designed to help researchers and developers experiment with multilingual and cross-lingual voice applications. ...

Downloads: 3 This Week

Last Update: 2026-03-18
See Project
14

Podcastfy.ai

Transforming Multimodal Content into Captivating Multilingual Audio

Podcastfy is an open-source Python package that transforms multi-modal content (text, images) into engaging, multi-lingual audio conversations using GenAI. Input content includes websites, PDFs, youtube videos as well as images. Unlike UI-based tools focused primarily on note-taking or research synthesis (e.g. NotebookLM), Podcastfy focuses on the programmatic and bespoke generation of engaging, conversational transcripts and audio from a multitude of multi-modal sources enabling customization and scale.

Downloads: 6 This Week

Last Update: 2024-11-16
See Project
15

RHVoice

Free open source speech synthesizer for Russian and other languages

RHVoice is a free and open-source multilingual speech synthesizer. Its developers hope to give more visually impaired people the ability to use a good free synthesis voice reading in their native language with their screen reader. We are especially interested in supporting those languages for which there are currently no good voices that could be used with a screen reader. The creator of RHVoice, Olga Yakovleva, is blind herself. Many of the contributors to the RHVoice project, both...

Downloads: 45 This Week

Last Update: 2026-03-31
See Project
16

Pipecat

Framework for building real-time voice and multimodal AI agents

Pipecat is an open source Python framework designed for building real-time voice and multimodal conversational AI agents. It provides developers with tools to orchestrate complex pipelines that combine speech recognition, language models, audio processing, and speech synthesis into a cohesive conversational system. Pipecat focuses on low-latency interactions so voice conversations with AI feel natural and responsive during live use. Pipecat allows applications to integrate multiple AI services and transports, enabling flexible deployment across different environments and communication channels. ...

Downloads: 5 This Week

Last Update: 2026-03-28
See Project
17

VoxCPM2

Tokenizer-Free TTS for Multilingual Speech Generation

...The system is trained on massive multilingual datasets, enabling support for dozens of languages and dialects while maintaining high fidelity and realism in generated audio. VoxCPM stands out for its ability to perform voice cloning with minimal input, capturing not only the speaker’s timbre but also nuanced features such as rhythm, accent, and emotional delivery. It also introduces voice design capabilities, allowing users to generate entirely new voices from natural language descriptions without requiring reference audio.

Downloads: 0 This Week

Last Update: 7 hours ago
See Project
18

pyVideoTrans

Translate the video from one language to another and embed dubbing

pyVideoTrans is an ambitious open-source multimedia processing project that assembles speech recognition, subtitle generation, AI translation, voice synthesis, and video assembly into a unified pipeline for converting videos from one language to another with embedded dubbing and captions. At its core it runs speech-to-text models to transcribe audio tracks, translates the resulting text into a target language using local or cloud-based translation engines, synthesizes new speech to match the translated subtitles, and then merges that speech back into the video, creating a fully localized media file. ...

Downloads: 37 This Week

Last Update: 2026-03-27
See Project
19

pyttsx3

Offline Text To Speech synthesis for python

...On Windows it uses SAPI5, on Linux it typically uses eSpeak or eSpeak-NG, and on macOS it can use NSSpeechSynthesizer or AVSpeechSynthesizer, giving it broad cross-platform compatibility. The library exposes a simple but flexible API for controlling voice selection, speaking rate, volume, and other synthesis parameters from Python code. It supports both a high-level speak convenience function and a lower-level engine object with event hooks, queuing, and saving output to audio files. The repository includes examples and documentation that show how to adjust properties dynamically, persist synthesized output, and integrate pyttsx3 into GUIs or background services.

Downloads: 23 This Week

Last Update: 2025-11-28
See Project
20

Matcha-TTS

A fast TTS architecture with conditional flow matching

Matcha-TTS is a non-autoregressive neural text-to-speech architecture that uses conditional flow matching to generate speech quickly while maintaining natural quality. It models speech as an ODE-based generative process, and conditional flow matching lets it reach high-quality audio in only a few synthesis steps, which greatly reduces latency compared to score-matching diffusion approaches. The model is fully probabilistic, so it can generate diverse realizations of the same text while still sounding stable and intelligible. The repository provides an end-to-end TTS pipeline: a PyTorch/Lightning training stack, configuration files, pre-trained checkpoints, a command-line interface, and a Gradio app for interactive testing. ...

Downloads: 15 This Week

Last Update: 2025-11-28
See Project
21

Qwen2.5-Omni

Capable of understanding text, audio, vision, video

...Very strong benchmark performance across modalities (audio understanding, speech recognition, image/video reasoning) and often outperforming or matching single-modality models at a similar scale. Real-time streaming responses, including natural speech synthesis (text-to-speech) and chunked inputs for low latency interaction.

Downloads: 0 This Week

Last Update: 2025-09-23
See Project
22

TADA

Open Source Speech Language Model

...This approach can support applications such as conversational AI, speech synthesis, multimodal language modeling, and speech understanding systems. The project explores ways to treat speech and text as integrated data streams rather than separate pipelines, enabling more coherent interactions between language and audio. Because it operates as a generative framework, TADA can be used for research into advanced speech-language models and multimodal artificial intelligence systems.

Downloads: 0 This Week

Last Update: 2026-03-24
See Project
23

IndexTTS2

Industrial-level controllable zero-shot text-to-speech system

IndexTTS is a modern, zero-shot text-to-speech (TTS) system engineered to deliver high-quality, natural-sounding speech synthesis with few requirements and strong voice-cloning capabilities. It builds on state-of-the-art models such as XTTS and other modern neural TTS backbones, improving them with a conformer-based speech conditional encoder and upgrading the decoder to a high-quality vocoder (BigVGAN2), leading to clearer and more natural audio output.

Downloads: 11 This Week

Last Update: 2025-11-27
See Project
24

Riffusion App

Stable diffusion for real-time music generation (web app)

Riffusion App Hobby is an open-source interactive web application that enables real-time music generation using stable diffusion models adapted for audio synthesis. Unlike traditional music generation tools, it treats audio as spectrogram images and applies diffusion techniques to generate continuous sound transitions, allowing users to create evolving musical loops and compositions. The application is built with modern web technologies including Next.js, React, and three.js, providing a responsive and visually engaging interface for experimentation. ...

Downloads: 2 This Week

Last Update: 2026-03-18
See Project
25

HY-World 1.5

A Systematic Framework for Interactive World Modeling

...It blends advanced reasoning with multimodal synthesis, enabling agents to describe scenes, generate context-appropriate responses, and contribute to narrative or gameplay flows. The underlying framework typically supports large-context state tracking across extended interactions, blending temporal and spatial multimodal signals.

Downloads: 10 This Week

Last Update: 2026-03-24
See Project