Search Results for "audio source separation" - Page 3

Sort By:

Showing 965 open source projects for "audio source separation"

View related business solutions

Python Clear Filters & Widen Search

Secure your business by securing your people.
Over 100,000 businesses trust 1Password

Take the guesswork out of password management, shadow IT, infrastructure, and secret sharing so you can keep your people safe and your business moving.

Learn More
Power through agendas and documents, make more informed decisions and conduct board meetings faster.
For team managers searching for a solution to manage their meetings

iBabs not only captures the entire decision-making process – it takes all the paperwork out of meetings. iBabs empowers everyone who has ever organized or attended, a meeting. With a seemingly simple app that offers complete control and a comprehensive overview of all those fiddly details. With about 3000 organizations and over 300,000 users, iBabs gives you peace of mind. So you can quickly organize effective meetings, and good decisions can be made with confidence. iBabs didn’t just happen overnight. We started analyzing and simplifying board meeting processes many years ago. We understand all the work that goes into meetings, and how to streamline everything so it all flows smoothly. On any device, confidentially, securely and automatically. Make good decisions with confidence.

Learn More
1

Crosvm

The Chrome OS Virtual Machine Monitor

crosvm (ChromeOS Virtual Machine Monitor) is a secure, lightweight virtual machine monitor built on top of the Linux KVM hypervisor. Developed for ChromeOS, it is designed to isolate and execute Linux and Android guests efficiently while maintaining strong security boundaries. Unlike general-purpose emulators like QEMU, crosvm avoids full hardware emulation and focuses on modern paravirtualized I/O using the virtio standard, reducing complexity and attack surface. Written in Rust, it...

Downloads: 13 This Week

Last Update: 2026-04-12
See Project
2

colleague-skill

Transform a cold separation into a warm Skill

colleague-skill is a specialized agent skill designed to simulate a collaborative teammate within AI-driven workflows, enabling agents to behave more like human colleagues in problem-solving scenarios. The project focuses on enhancing interaction quality by introducing role-based behavior, contextual awareness, and cooperative task execution. It allows agents to provide suggestions, feedback, and alternative approaches, mimicking real-world collaboration dynamics. The system likely...

Downloads: 7 This Week

Last Update: 2026-04-06
See Project
3

AudioLM - Pytorch

Implementation of AudioLM audio generation model in Pytorch

Implementation of AudioLM, a Language Modeling Approach to Audio Generation out of Google Research, in Pytorch It also extends the work for conditioning with classifier free guidance with T5. This allows for one to do text-to-audio or TTS, not offered in the paper. Yes, this means VALL-E can be trained from this repository. It is essentially the same. This repository now also contains a MIT licensed version of SoundStream. It is also compatible with EnCodec, however, be aware that it...

Downloads: 3 This Week

Last Update: 2025-01-12
See Project
4

Ultravox

Fast multimodal LLM for real-time voice interaction and AI apps

Ultravox is an open source multimodal large language model designed specifically for real-time voice-based interactions. It is built to process both text and spoken audio directly, eliminating the need for a separate speech recognition stage and enabling more seamless conversational experiences. Ultravox works by combining text prompts with encoded audio inputs, allowing it to understand spoken language alongside written instructions in a unified pipeline.

Downloads: 2 This Week

Last Update: 2026-03-18
See Project
Managed Cybersecurity Platform Built for MSPs
Discover the cyber platform that secures and insures SMEs

In a world that lives and breathes all things digital, every business is at risk. Cybersecurity has become a major problem for small and growing businesses due to limited budgets, resources, time, and training. Hackers are leveraging these vulnerabilities, and most of the existing cybersecurity solutions on the market are too cumbersome, too complicated, and far too costly.

Learn More
5

Real-Time Voice Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Real-Time Voice Cloning is an influential deep-learning repository that demonstrates how to clone a voice from just a few seconds of audio and then generate arbitrary speech in that voice in near real time. It implements the SV2TTS pipeline (“Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis”) in three stages: a speaker encoder, a synthesizer, and a vocoder. In the first stage, short audio clips are converted into a fixed-dimensional speaker embedding that...

Downloads: 11 This Week

Last Update: 2026-03-09
See Project
6

Qwen2.5-Omni

Capable of understanding text, audio, vision, video

Qwen2.5-Omni is an end-to-end multimodal flagship model in the Qwen series by Alibaba Cloud, designed to process multiple modalities (text, images, audio, video) and generate responses both as text and natural speech in streaming real-time. It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. It holds...

Downloads: 1 This Week

Last Update: 2025-09-23
See Project
7

Hugging Face - Speech To Speech

Open speech-to-speech models and pipelines by Hugging Face toolkit AI

This project from Hugging Face focuses on enabling direct speech-to-speech processing using modern machine learning models. It provides tools and reference implementations that allow audio input to be transformed into audio output without requiring an intermediate text representation. Hugging Face - Speech To Speech builds on recent advances in speech modeling, combining components such as speech recognition, translation, and synthesis into unified pipelines. It is designed to help...

Downloads: 2 This Week

Last Update: 2026-03-18
See Project
8

WavTokenizer

SOTA discrete acoustic codec models with 40/75 tokens per second

WavTokenizer is a state-of-the-art discrete acoustic codec designed specifically for audio language modeling, capable of compressing 24 kHz audio into just 40 or 75 tokens per second while preserving high perceptual quality. It is built to represent speech, music, and general audio with extremely low bitrate, making it ideal as a front-end for large audio language models like GPT-4o and similar architectures. The model uses a single-quantizer design together with temporal compression to...

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
9

TradingAgents

Chinese Financial Trading Framework Based on Multi-Agent LLM

TradingAgents-CN is a Chinese-enhanced, multi-agent LLM framework aimed at building financial analysis and trading-oriented workflows, with an emphasis on collaboration between specialized agents rather than a single monolithic prompt. It organizes market-related tasks into roles and stages so different agents can contribute research, reasoning, aggregation, and decision support in a structured pipeline. The project is oriented toward practical usage, including a stack that can be run in a...

Downloads: 8 This Week

Last Update: 5 days ago
See Project
Create and manage the email signature you need
For companies and organizations that need an email signature solution

With WiseStamp it’s easy to unify your brand and turn your emails into a powerful marketing tool. Get the most out of your emails with a professionally designed custom email signature.

Learn More
10

Pipecat

Framework for building real-time voice and multimodal AI agents

Pipecat is an open source Python framework designed for building real-time voice and multimodal conversational AI agents. It provides developers with tools to orchestrate complex pipelines that combine speech recognition, language models, audio processing, and speech synthesis into a cohesive conversational system. Pipecat focuses on low-latency interactions so voice conversations with AI feel natural and responsive during live use.

Downloads: 10 This Week

Last Update: 5 days ago
See Project
11

Unrud Video Downloader

Download videos from websites like YouTube and many others

Video Downloader is a desktop application designed to simplify the process of downloading videos from various online platforms through a user-friendly graphical interface. Built on top of yt-dlp, it abstracts the complexity of command-line tools and provides an accessible way for users to retrieve video and audio content. The application supports a wide range of features, including downloading entire playlists, handling private or password-protected content, and automatically selecting...

Downloads: 16 This Week

Last Update: 2026-04-09
See Project
12

OpenVoice

Instant voice cloning by MIT and MyShell. Audio foundation model

OpenVoice is a versatile instant voice cloning system that can replicate a speaker’s tone color from just a short audio clip and then generate speech in multiple languages. It is designed not only to match the timbre of the reference voice, but also to give granular control over style parameters such as emotion, accent, rhythm, pauses, and intonation. The model supports cross-lingual and even zero-shot cross-lingual voice cloning, so a speaker recorded in one language can be made to speak...

Downloads: 21 This Week

Last Update: 2025-11-28
See Project
13

Text Generation Web UI

Oobabooga - The definitive Web UI for local AI, with powerful features

A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA. Dropdown menu for switching between models. Notebook mode that resembles OpenAI's playground. Chat mode for conversation and role playing. Instruct mode compatible with Alpaca and Open Assistant formats. Nice HTML output for GPT-4chan. Markdown output for GALACTICA, including LaTeX rendering. Custom chat characters. Advanced chat features (send images, get audio responses with TTS)....

Downloads: 60 This Week

Last Update: 4 days ago
See Project
14

Fish Speech

SOTA Open Source TTS

Fish Speech is a state-of-the-art open-source text-to-speech project that has evolved into the OpenAudio series of advanced TTS models. The repository hosts the code and tooling for training, fine-tuning, and serving high-quality TTS, while the current flagship models (OpenAudio-S1 and S1-mini) are distributed via Fish Audio’s playground and Hugging Face. The models are evaluated with Seed TTS metrics and achieve exceptionally low word and character error rates, indicating strong intelligibility and alignment between text and audio. ...

Downloads: 13 This Week

Last Update: 2025-11-28
See Project
15

OpenAI-Compatible Edge-TTS API

Free, high-quality text-to-speech API endpoint to replace OpenAI

OpenAI-Compatible Edge-TTS API is a local, OpenAI-compatible text-to-speech API that uses edge-tts—Microsoft Edge’s online TTS service—as the backend. The project emulates the /v1/audio/speech endpoint used by OpenAI, so any client that can talk to the OpenAI TTS API can be redirected to this service with minimal changes. It exposes parameters for input text, voice selection, audio format, and playback speed, mirroring the OpenAI interface while mapping popular OpenAI voice names to...

Downloads: 2 This Week

Last Update: 2025-11-28
See Project
16

PersonaPlex

PersonaPlex code

...PersonaPlex also supports persona and voice control, allowing developers to define the role and speaking style of the agent using text prompts and voice conditioning, making it suitable for applications like customized voice assistants, interactive character agents, or domain-specific conversational tools. Internally, it processes continuous audio streams in a hybrid input format so that speech understanding and generation occur jointly.

Downloads: 1 This Week

Last Update: 2026-03-02
See Project
17

Sherloq

An open source digital image forensic toolset

Sherloq is a research-oriented toolkit designed for digital image forensics, providing an integrated environment to experiment with algorithms for image analysis and tampering detection. Rather than functioning as an automated decision-making system, it serves as a companion tool for researchers, enthusiasts, and students who want to explore forensic techniques from scientific literature and workshops. The project emphasizes transparency and community collaboration, contrasting with...

Downloads: 7 This Week

Last Update: 3 days ago
See Project
18

Speakr

Speakr is a personal, self-hosted web application

...Behind the scenes, Speakr leverages modern TTS engines and streaming audio technologies to deliver smooth and responsive speech generation without noticeable delay. The project is built with extensibility in mind, enabling developers to add custom voices, integrate additional languages, and tailor the backend for different hardware or cloud environments. It also supports saving generated audio as downloadable files so users can reuse the speech outputs in other projects, presentations, or media content.

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
19

LiveAvatar

Streaming Real-time Audio-Driven Avatar Generation

LiveAvatar is an open-source research and implementation project that provides a unified framework for real-time, streaming, interactive avatar video generation driven by audio and other control signals. It implements techniques from state-of-the-art diffusion-based avatar modeling to support infinite-length continuous video generation with low latency, enabling interactive AI avatars that maintain continuity and realism over extended sessions.

Downloads: 0 This Week

Last Update: 2026-04-08
See Project
20

TorchAudio

Data manipulation and transformation for audio signal processing

The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the same philosophy of providing strong GPU acceleration, having a focus on trainable features through the autograd system, and having consistent style (tensor names and dimension names). Therefore, it is primarily a machine learning library and not a general signal processing library. The benefits of PyTorch can be seen in torchaudio through having all the computations be through PyTorch...

Downloads: 2 This Week

Last Update: 2026-02-17
See Project
21

Generative AI

Sample code and notebooks for Generative AI on Google Cloud

Generative AI is a comprehensive collection of code samples, notebooks, and demo applications designed to help developers build generative-AI workflows on the Vertex AI platform. It spans multiple modalities—text, image, audio, search (RAG/grounding) and more—showing how to integrate foundation models like the Gemini family into cloud projects. The README emphasises getting started with prompts, datasets, environments and sample apps, making it ideal for both experimentation and...

Downloads: 8 This Week

Last Update: 2 days ago
See Project
22

WhisperX

Automatic Speech Recognition with Word-level Timestamps

WhisperX is an advanced speech recognition system built on top of OpenAI’s Whisper model, designed to improve transcription accuracy and timing precision for long-form audio. It addresses key limitations of standard Whisper implementations by introducing voice activity detection and forced alignment techniques to produce word-level timestamps. The system enables batched inference, significantly increasing transcription speed while maintaining high accuracy. It is particularly effective for...

Downloads: 15 This Week

Last Update: 2026-04-06
See Project
23

Claude Code Plugins

Intelligent automation and multi-agent orchestration for Claude Code

Claude Code Plugins is a lightweight framework designed to define, manage, and execute AI agents in a modular and extensible way, typically focusing on orchestrating tasks using large language models and tool integrations. The project provides abstractions for building agents that can interpret instructions, execute commands, and interact with external systems in a structured workflow. It emphasizes simplicity and composability, allowing developers to define agent behaviors through reusable...

Downloads: 6 This Week

Last Update: 5 days ago
See Project
24

VoxCPM

TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning

VoxCPM is a tokenizer-free text-to-speech system that models speech in a continuous space, aiming for extremely realistic, context-aware synthesis and true-to-life zero-shot voice cloning. Instead of converting speech into discrete tokens, it uses an end-to-end diffusion-autoregressive architecture built on the MiniCPM-4 backbone, combining hierarchical language modeling, finite scalar quantization (FSQ), and local Diffusion Transformers. This design helps decouple semantic and acoustic...

Downloads: 58 This Week

Last Update: 2026-04-08
See Project
25

OuteTTS

Interface for OuteTTS models

OuteTTS is an interface library for running OuteTTS text-to-speech models across a range of backends, making it easier to deploy the same model on different hardware and runtimes. It provides a high-level Interface API that wraps model configuration, speaker handling, and audio generation so you can focus on integrating speech into your application rather than wiring up low-level engines. The project supports multiple backends including llama.cpp (Python bindings and server), Hugging Face...

Downloads: 1 This Week

Last Update: 2025-11-28
See Project