Search Results for "audio source separation" - Page 4

Sort By:

Showing 965 open source projects for "audio source separation"

View related business solutions

Python Clear Filters & Widen Search

Airlock Digital - Application Control (Allowlisting) Made Simple
Airlock Digital delivers an easy-to-manage and scalable application control solution to protect endpoints with confidence.

For organizations seeking the most effective way to prevent malware and ransomware in their environments. It has been designed to provide scalable, efficient endpoint security for organizations with even the most diverse architectures and rigorous compliance requirements. Built by practitioners for the world’s largest and most secure organizations, Airlock Digital delivers precision Application Control & Allowlisting for the modern enterprise.

Learn More
Rev Your Digital Product Delivery Engine
Enterprise-grade platform designed to connect strategy, planning, and execution across digital product development and software delivery

Planview links your technology vision directly to teams' daily work, providing complete visibility and control over your digital product delivery ecosystem.

Learn More
1

Open Notebook

An Open Source implementation of Notebook LM with more flexibility

Open Notebook is an open-source, privacy-focused alternative to Google’s Notebook LM that gives users full control over their research and AI workflows. Designed to be self-hosted, it ensures complete data sovereignty by keeping your content local or within your own infrastructure. The platform supports 16+ AI providers—including OpenAI, Anthropic, Ollama, Google, and LM Studio—allowing flexible model choice and cost optimization.

Downloads: 39 This Week

Last Update: 6 hours ago
See Project
2

OuteTTS

Interface for OuteTTS models

OuteTTS is an interface library for running OuteTTS text-to-speech models across a range of backends, making it easier to deploy the same model on different hardware and runtimes. It provides a high-level Interface API that wraps model configuration, speaker handling, and audio generation so you can focus on integrating speech into your application rather than wiring up low-level engines. The project supports multiple backends including llama.cpp (Python bindings and server), Hugging Face...

Downloads: 1 This Week

Last Update: 2025-11-28
See Project
3

DoWhy

DoWhy is a Python library for causal inference

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks. Much like machine learning libraries have done for prediction, DoWhy is a Python library that aims to spark causal thinking and analysis. DoWhy provides a wide variety of algorithms for effect estimation, causal structure learning, diagnosis of causal...

Downloads: 1 This Week

Last Update: 2025-11-03
See Project
4

Whisper

Robust Speech Recognition via Large-Scale Weak Supervision

OpenAI Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented...

Downloads: 69 This Week

Last Update: 2025-06-26
See Project
The leading LMS solution for mission critical learning needs
it takes the modern learning environment to workforce enablement and beyond.

Streamline and integrate your complex learning, compliance, content monetization, and external training capabilities while keeping your people safe and delivering profits with Seertech’s LMS solution.

Learn More
5

Auto Synced & Translated Dubs

Automatically translates the text of a video based on a subtitle file

Auto-Synced-Translated-Dubs is a toolchain that automatically translates and re-dubs videos using AI voices while keeping the new speech aligned to the original timing via subtitle files. It assumes you have a human-made SRT (or similar) subtitle file; the script then uses translation services such as Google Cloud or DeepL to generate translated subtitle tracks in one or more target languages. Using the timestamps of each subtitle line, it computes the required duration of each spoken...

Downloads: 2 This Week

Last Update: 2025-11-28
See Project
6

StableSwarmUI

Multi-user UI for managing and running Stable Diffusion workflows tool

StableSwarmUI is a web-based interface designed to manage and coordinate Stable Diffusion image generation workflows in a multi-user environment. It focuses on enabling multiple users to interact with shared resources, making it suitable for collaborative or server-based deployments. It provides a centralized system where users can submit, monitor, and manage generation tasks through a browser interface. It abstracts much of the complexity involved in running diffusion models by offering a...

Downloads: 4 This Week

Last Update: 2026-03-18
See Project
7

FeelUOwn

Trying to be a robust, user-friendly and hackable music player

FeelUOwn is a user-friendly, and hackable music player.

Downloads: 1 This Week

Last Update: 2026-03-02
See Project
8

Label Studio

Label Studio is a multi-type data labeling and annotation tool

...Detect objects on image, bboxes, polygons, circular, and keypoints supported. Partition image into multiple segments. Use ML models to pre-label and optimize the process. Label Studio is an open-source data labeling tool. It lets you label data types like audio, text, images, videos, and time series with a simple and straightforward UI and export to various model formats. It can be used to prepare raw data or improve existing training data to get more accurate ML models. The frontend part of Label Studio app lies in the frontend/ folder and written in React JSX. ...

Downloads: 30 This Week

Last Update: 2026-03-13
See Project
9

Tauon

The music player of today

Tauon is a modern, streamlined music player app that's packed with features! An emphasis on playlists and drag-and-drop importing puts you in control of your music library. Faded volume control, 24-bit FLAC support, and gapless playback provide the ultimate listening experience. Excellent CUE sheet support, an original smart playlist system, and network playback from koel or Airsonic servers. Last.fm, Listenbrainz, and Maloja scribbling. MPRIS2 support for desktop integration. Tauon is a...

Downloads: 6 This Week

Last Update: 2026-04-05
See Project
Discover the power of eDiscovery for law firms.
Streamline your legal processes and ensure compliance with our eDiscovery company.

DWR eDiscovery allows legal professionals to process, analyze, review, and produce documents that are relevant to litigation and other legal disclosure obligations. Our tools allow easy ingestion and analysis of client and opposing party documents using a comprehensive set of document review features including AI search, keyword search, keyword highlighting, metadata filtering, marking documents, privilege log management, redactions, and a range of analysis tools to help users best understand their document corpus.

Learn More
10

AI-Media2Doc

AI tool converting video/audio into structured documents instantly

AI-Media2Doc is a web-based application that uses large language models to convert video and audio content into structured, readable documents in a single workflow. It is designed to transform multimedia inputs into formats such as knowledge notes, summaries, mind maps, and social-style articles, making content easier to review and reuse. AI-Media2Doc emphasizes privacy by processing media locally in the browser using WebAssembly-based ffmpeg, ensuring that original video files are not...

Downloads: 4 This Week

Last Update: 2026-03-18
See Project
11

YuE

Open source AI model for generating full songs from lyrics prompts

YuE is an open source project that provides a foundation model designed for full-song music generation using artificial intelligence. It focuses on transforming text inputs such as lyrics and genre prompts into complete musical compositions that include both vocal and instrumental tracks. Unlike many shorter audio generators, the model is capable of producing songs that last several minutes while maintaining coherent musical structure and alignment with the provided lyrics. ...

Downloads: 4 This Week

Last Update: 5 days ago
See Project
12

Unsloth Studio

Unified web UI for training and running open models locally

Unsloth Studio is a web-based interface for running and training AI models locally with a unified and user-friendly experience. It allows users to work with a wide range of models for text, audio, vision, embeddings, and more without relying heavily on cloud infrastructure. Built on top of the Unsloth framework, it focuses on high-performance training with reduced VRAM usage and faster speeds compared to traditional methods. The platform supports fine-tuning, pretraining, and reinforcement...

Downloads: 19 This Week

Last Update: 2026-04-08
See Project
13

pyttsx3

Offline Text To Speech synthesis for python

pyttsx3 is an offline text-to-speech library for Python that wraps native speech engines instead of calling cloud APIs. It is designed to work entirely without an internet connection, making it suitable for local automation, kiosks, accessibility tools, and embedded applications. On Windows it uses SAPI5, on Linux it typically uses eSpeak or eSpeak-NG, and on macOS it can use NSSpeechSynthesizer or AVSpeechSynthesizer, giving it broad cross-platform compatibility. The library exposes a...

Downloads: 17 This Week

Last Update: 2025-11-28
See Project
14

BlogWizard

Generate blog articles from video or audio

BlogWizard is a demo/utility project built on top of Groq’s LLM infrastructure that converts video or audio content into well-structured blog posts, enabling creators to repurpose multimedia content into text — useful for SEO, accessibility, or reaching audiences that prefer reading. The tool uses transcription (e.g. via Whisper) to extract text from audio/video, then runs an LLM-based generation pipeline to transform that content into coherent, readable blog-format posts — with sections,...

Downloads: 0 This Week

Last Update: 2025-12-19
See Project
15

notebooklm-py

Unofficial Python API and agentic skill for Google NotebookLM

...Its goal is to provide programmatic access not just to standard notebook operations, but also to many capabilities that are either limited or unavailable in the web interface, making it especially useful for automation and custom pipelines. The project covers notebook management, source ingestion, conversational querying, research workflows, and sharing controls, while also enabling the generation of a wide range of study and media artifacts. These outputs include audio overviews, videos, slide decks, infographics, quizzes, flashcards, reports, data tables, and mind maps, with configurable formats and export options.

Downloads: 7 This Week

Last Update: 2026-03-17
See Project
16

HY-World 1.5

A Systematic Framework for Interactive World Modeling

HY-WorldPlay is a Hunyuan AI project focusing on immersive multimodal content generation and interaction within virtual worlds or simulated environments. It aims to empower AI agents with the capability to both understand and generate multimedia content — including text, audio, image, and potentially 3D or game-world elements — enabling lifelike dialogue, environmental interpretations, and responsive world behavior. The platform targets use cases in digital entertainment, game worlds,...

Downloads: 18 This Week

Last Update: 4 days ago
See Project
17

MARS5

MARS5 speech model (TTS) from CAMB.AI

...To control speaker identity, MARS5 uses a short reference audio clip, typically between 2 and 12 seconds, from which it learns the voice characteristics. It supports two main inference modes: shallow clone, which is faster and only needs the reference audio, and deep clone, which additionally uses the transcript of the reference audio to increase similarity and naturalness at the cost of more computation.

Downloads: 0 This Week

Last Update: 2025-11-28
See Project
18

SimpleTuner

A general fine-tuning kit geared toward image/video/audio diffusion

SimpleTuner is an open-source toolkit designed to simplify the fine-tuning of modern diffusion models for generating images, video, and audio. The project focuses on providing a clear and understandable training environment for researchers, developers, and artists who want to customize generative AI models without navigating complex machine learning pipelines.

Downloads: 1 This Week

Last Update: 5 days ago
See Project
19

ComfyUI

The most powerful and modular diffusion model GUI, api and backend

The most powerful and modular diffusion model is GUI and backend. This UI will let you design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart-based interface. We are a team dedicated to iterating and improving ComfyUI, supporting the ComfyUI ecosystem with tools like node manager, node registry, cli, automated testing, and public documentation. Open source AI models will win in the long run against closed models and we are only at the beginning. Our core mission...

Downloads: 140 This Week

Last Update: 2 days ago
See Project
20

EPUB to Audiobook Converter

EPUB to audiobook converter, optimized for Audiobookshelf

EPUB to Audiobook Converter is a tool designed to convert EPUB ebooks into chaptered audiobooks, optimized specifically for Audiobookshelf servers. It reads each chapter from an EPUB file, generates audio using a chosen text-to-speech backend, and outputs separate MP3 files with chapter titles preserved as metadata to make navigation easier. The project supports multiple TTS providers, including Microsoft Azure TTS, EdgeTTS, OpenAI TTS, local Piper, and Kokoro via an OpenAI-compatible...

Downloads: 16 This Week

Last Update: 2026-02-02
See Project
21

Hamilton DAGWorks

Helps scientists define testable, modular, self-documenting dataflow

Hamilton is a lightweight Python library for directed acyclic graphs (DAGs) of data transformations. Your DAG is portable; it runs anywhere Python runs, whether it's a script, notebook, Airflow pipeline, FastAPI server, etc. Your DAG is expressive; Hamilton has extensive features to define and modify the execution of a DAG (e.g., data validation, experiment tracking, remote execution). To create a DAG, write regular Python functions that specify their dependencies with their parameters. As...

Downloads: 0 This Week

Last Update: 2025-10-11
See Project
22

GLM-TTS

Controllable & emotion-expressive zero-shot TTS

GLM-TTS is an advanced text-to-speech synthesis system built on large language model technologies that focuses on producing high-quality, expressive, and controllable spoken output, including features like emotion modulation and zero-shot voice cloning. It uses a two-stage architecture where a generative LLM first converts text into intermediate speech token sequences and then a Flow-based neural model converts those tokens into natural audio waveforms, enabling rich prosody and voice...

Downloads: 3 This Week

Last Update: 2026-04-10
See Project
23

ElevenLabs Python

The official Python SDK for the ElevenLabs API

elevenlabs-python is the official Python SDK for the ElevenLabs API, giving developers a convenient way to access ElevenLabs’ high-quality, lifelike voices. The library wraps the HTTP API into a typed Python client, so you can perform text-to-speech, streaming, voice cloning, voice management, and agents-related operations with simple method calls. It exposes ElevenLabs’ main models such as Eleven Multilingual v2, Eleven Flash v2.5, and Eleven Turbo v2.5, each targeting different trade-offs...

Downloads: 3 This Week

Last Update: 6 days ago
See Project
24

RealtimeTTS

Converts text to speech in realtime

RealtimeTTS is a low-latency text-to-speech library built for real-time applications such as voice chat with LLMs, assistants, and interactive tools. It is designed around a streaming model: you can feed it text incrementally (for example, as an LLM responds) and get audio output almost immediately, which keeps end-to-end latency very low. The library is engine-agnostic and plugs into a wide range of cloud and local TTS systems, including OpenAI, ElevenLabs, Azure, Coqui, Piper, StyleTTS2,...

Downloads: 4 This Week

Last Update: 2026-03-28
See Project
25

ChatGPT Clone

ChatGPT interface with better UI

ChatGPT Clone demonstrates a ChatGPT-style conversational interface wired to large-language-model backends, packaged so developers can self-host and extend. The goal is to replicate the core chat UX—message history, streaming tokens, code blocks, and system prompts—while letting you plug in different provider APIs or local models. It showcases a clean separation between the web client and the message orchestration layer so you can experiment with prompts, roles, and memory strategies. The...

Downloads: 3 This Week

Last Update: 2025-11-07
See Project