audio/visual free download

Showing 19117 open source projects for "audio/visual"

View related business solutions

See what everyone is allocated to. Projects, clients, meetings - all in one tool.
The fast, simple way to schedule people, equipment and other resources online.

Designed to replace clunky, old scheduling spreadsheets, Resource Guru helps managers get organized fast. The platform covers resource planning, resource scheduling, resource management, staff leave management, reporting, and more.

Free Trial
Effortlessly manage macOS, iOS, iPadOS and tvOS devices across multiple clients and locations.
The Most Powerful Apple Device Management Tool for MSPs and IT Teams

Addigy solutions accelerate Apple adoption in any environment.

Learn More
1

visual-explainer

Agent skill + prompt templates that generate rich HTML pages

visual-explainer is an AI-oriented agent skill that converts complex terminal or analytical output into polished, human-readable HTML reports designed for quick comprehension and sharing. The project includes prompt templates and automation logic that enable coding agents to generate visual summaries such as diff reviews, architecture overviews, plan audits, and structured data tables.

Downloads: 1 This Week

Last Update: 2026-03-09
See Project
2

Step-Audio

Open-source framework for intelligent speech interaction

Step-Audio is a unified, open-source framework aimed at building intelligent speech systems that combine both comprehension and generation: it integrates large language models (LLMs) with speech input/output to handle not only semantic understanding but also rich vocal characteristics like tone, style, dialect, emotion, and prosody. The design moves beyond traditional separate-component pipelines (ASR → text model → TTS), instead offering a multimodal model that ingests speech or audio and produces speech accordingly, enabling natural dialogue, voice cloning, and expressive speech synthesis. ...

Downloads: 5 This Week

Last Update: 2026-03-16
See Project
3

Visual Blocks

Visual Blocks for ML is a Google visual programming framework

...Because everything lives in the browser, sharing is as simple as exporting a project or link, and collaborators can experiment without installing toolchains. For educators and product teams alike, Visual Blocks reduces the distance from idea to interactive proof-of-concept by turning ML diagrams.

Downloads: 0 This Week

Last Update: 2026-02-17
See Project
4

Visual Regression Tracker

Backend and Frontend application for tracking differences via image

Open source, self-hosted solution for visual testing and managing results of visual testing. Service receives images, performs pixel-by-pixel comparisons with its previously accepted baseline, and provides immediate results in order to catch unexpected changes. Use implemented libraries to integrate with existing automated suites by adding assertions based on image comparison.

Downloads: 0 This Week

Last Update: 2026-01-09
See Project
MaintainX is the world-leading mobile-first workflow management platform for industrial and frontline workers.
Trusted by Operational Leaders Across the Globe

Your day-to-day maintenance tasks, simplified. MaintainX eliminates the paperwork, so you can spend less time on your clipboard and more time getting things done.

Learn More
5

Kimi-Audio

Audio foundation model excelling in audio understanding

Kimi-Audio is an ambitious open-source audio foundation model designed to unify a wide array of audio processing tasks — from speech recognition and audio understanding to generative conversation and sound event classification — within a single cohesive architecture. Instead of fragmenting work across specialized models, Kimi-Audio handles automatic speech recognition (ASR), audio question answering, automatic audio captioning, speech emotion recognition, and audio-to-text chat in one system, enabling developers to build rich, multimodal audio applications without stitching together disparate components. ...

Downloads: 2 This Week

Last Update: 2026-01-27
See Project
6

MLX-Audio

A text-to-speech, speech-to-text and speech-to-speech library

...The project provides a straightforward CLI (mlx_audio.tts.generate) as well as a Python API for programmatic generation of audio, including parameters for voice choice, speed, language hints, output format, and sample rate. It includes examples such as audiobook generation to demonstrate long-form synthesis and joined audio segments. On top of that, MLX-Audio offers a modern web interface powered by FastAPI, with real-time waveform and 3D visualizations, file upload, and audio management.

Downloads: 2 This Week

Last Update: 2026-03-30
See Project
7

Visual Studio Code

Modern IDE and code editor from Microsoft for Mac, Windows, and Linux

Visual Studio Code is updated monthly with new features and bug fixes. You can download it for Windows, macOS, and Linux on Visual Studio Code's website. To get the latest releases every day, install the Insiders build. Debug code right from the editor. Launch or attach to your running apps and debug with break points, call stacks, and an interactive console.

1 Review

Downloads: 73 This Week

Last Update: 18 hours ago
See Project
8

Qwen2-Audio

Repo of Qwen2-Audio chat & pretrained large audio language model

Qwen2-Audio is a large audio-language model by Alibaba Cloud, part of the Qwen series. It is trained to accept various audio signal inputs (including speech, sounds, etc.) and perform both voice chat and audio analysis, producing textual responses. It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models.

Downloads: 0 This Week

Last Update: 2025-09-23
See Project
9

Qwen-Audio

Chat & pretrained large audio language model proposed by Alibaba Cloud

Qwen-Audio is a large audio-language model developed by Alibaba Cloud, built to accept various types of audio input (speech, natural sounds, music, singing) along with text input, and output text. There is also an instruction-tuned version called Qwen-Audio-Chat which supports conversational interaction (multi-round), audio + text input, creative tasks and reasoning over audio.

Downloads: 2 This Week

Last Update: 2025-09-23
See Project
Planview is the leading end-to-end platform for Strategic Portfolio Management (SPM) and Digital Product Development (DPD)
Manage project and product portfolios enterprise-wide

Planview AdaptiveWork (formerly Clarizen) with embedded AI helps you proactively plan and deliver any type and size of portfolio, project, and work. Gain AI-enhanced visibility and insights, drive collaboration, and achieve better business outcomes across your organization.

Learn More
10

Visual Boy Advance - M

Emulator for the Game Boy, Game Boy Color, and Game Boy Advance

Visual Boy Advance - M (VBA-M) is an open-source emulator designed to run Game Boy, Game Boy Color, and Game Boy Advance games on modern systems. It is a continuation and improvement of the original Visual Boy Advance project, with enhanced accuracy, performance, and compatibility. VBA-M supports multiple platforms, making it accessible across Windows, macOS, and Linux environments.

Downloads: 53 This Week

Last Update: 2025-10-26
See Project
11

C# for Visual Studio Code

C# support for Visual Studio Code (powered by OmniSharp)

Welcome to the C# extension for Visual Studio Code! This extension provides the following features inside VS Code. Lightweight development tools for .NET Core. Great C# editing support, including Syntax Highlighting, IntelliSense, Go to Definition, Find All References, etc. Debugging support for .NET Core (CoreCLR). Note: Mono debugging is not supported. Desktop CLR debugging has limited support.

Downloads: 52 This Week

Last Update: 2026-03-17
See Project
12

Obsidian Visual Skills Pack

Generate Canvas, Excalidraw, and Mermaid diagrams from text

LLM-TLDR is a Python-based tool designed to dramatically reduce the amount of code a large language model needs to read by extracting the essential structure and context from a codebase and presenting only the most relevant parts to the model. Traditional approaches often dump entire files into a model’s context, which quickly exceeds token limits; LLM-TLDR instead indexes project structure, traces dependencies, and summarizes code in a way that preserves semantic relevance while shrinking...

Downloads: 0 This Week

Last Update: 2026-02-12
See Project
13

1D Visual Tokenization and Generation

This repo contains the code for 1D tokenizer and generator

The 1D Visual Tokenization and Generation project from ByteDance introduces a novel “one-dimensional” tokenizer designed for images: instead of representing images with large grids of 2D tokens (as in many prior generative/image-modeling systems), it compresses images into as few as 32 discrete tokens (or more, optionally) — thereby achieving a very compact, efficient representation that drastically speeds up generation and reconstruction while retaining strong fidelity.

Downloads: 0 This Week

Last Update: 2025-12-02
See Project
14

Audio Priority Bar

A native macOS menu bar app for managing audio device priorities

Audio Priority Bar is a lightweight macOS utility that gives users precise control over how audio output is prioritized across different apps and devices, filling a gap in the system audio stack that Apple doesn’t natively expose. Once installed, it places an always-accessible control in the menu bar that lets you assign priority levels to individual audio sources so that more important sounds (like alerts, calls, or music) can override or duck less important ones (like background noise or game audio). ...

Downloads: 1 This Week

Last Update: 2026-02-03
See Project
15

Cypress Visual Regression

Module for adding visual regression testing to Cypress

Module for adding visual regression testing to Cypress.

Downloads: 0 This Week

Last Update: 2026-01-26
See Project
16

Fun Audio Chat

Large Audio Language Model built for natural interactions

Fun Audio Chat is an interactive voice-first conversational AI platform designed to let users engage in natural spoken dialogue with large language models in real time, turning speech into context-aware responses while maintaining a smooth back-and-forth experience. It combines speech recognition, audio processing, and AI generation so users can speak simply and receive spoken replies, enabling applications such as virtual assistants, voice bots, and hands-free chat interfaces. ...

Downloads: 0 This Week

Last Update: 2026-02-27
See Project
17

Step-Audio-EditX

LLM-based Reinforcement Learning audio edit model

Step-Audio-EditX is an open-source, 3 billion-parameter audio model from StepFun AI designed to make expressive and precise editing of speech and audio as easy as text editing. Rather than treating audio editing as low-level waveform manipulation, this model converts speech into a sequence of discrete “audio tokens” (via a dual-codebook tokenizer) — combining a linguistic token stream and a semantic (prosody/emotion/style) token stream — thereby abstracting audio editing into high-level token operations. ...

Downloads: 0 This Week

Last Update: 2026-04-09
See Project
18

Step-Audio 2

Multi-modal large language model designed for audio understanding

Step-Audio2 is an advanced, end-to-end multimodal large language model designed for high-fidelity audio understanding and natural speech conversation: unlike many pipelines that separate speech recognition, processing, and synthesis, Step-Audio2 processes raw audio, reasons about semantic and paralinguistic content (like emotion, speaker characteristics, non-verbal cues), and can generate contextually appropriate responses — including potentially generating or transforming audio output. ...

Downloads: 0 This Week

Last Update: 2026-03-16
See Project
19

LTX-2.3

Official Python inference and LoRA trainer package

LTX-2.3 is an open-source multimodal artificial intelligence foundation model developed by Lightricks for generating synchronized video and audio from prompts or other inputs. Unlike most earlier video generation systems that only produced silent clips, LTX-2 combines video and audio generation in a unified architecture capable of producing coherent audiovisual scenes. The model uses a diffusion-transformer-based architecture designed to generate high-fidelity visual frames while simultaneously producing corresponding audio elements such as speech, music, ambient sound, or effects. ...

Downloads: 191 This Week

Last Update: 2026-04-13
See Project
20

Prettier Formatter for Visual Studio

Visual Studio Code extension for Prettier

Prettier is an opinionated code formatter. It enforces a consistent style by parsing your code and re-printing it with its own rules that take the maximum line length into account, wrapping code when necessary. To ensure that this extension is used over other extensions you may have installed, be sure to set it as the default formatter in your VS Code settings. This setting can be set for all languages or by a specific language. If you want to disable Prettier on a particular language you...

Downloads: 12 This Week

Last Update: 2026-03-16
See Project
21

C/C++ for Visual Studio Code

Repository for the Microsoft C/C++ extension for VS Code

The C/C++ extension adds language support for C/C++ to Visual Studio Code, including features such as IntelliSense and debugging. C/C++ support for Visual Studio Code is provided by a Microsoft C/C++ extension to enable cross-platform C and C++ development on Windows, Linux, and macOS. C++ is a compiled language meaning your program's source code must be translated (compiled) before it can be run on your computer.

Downloads: 74 This Week

Last Update: 5 days ago
See Project
22

Butterchurn

Butterchurn is a WebGL implementation of the Milkdrop Visualizer

Butterchurn is a WebGL-based music visualization engine that recreates the classic MilkDrop visualizer experience entirely in the browser using modern web technologies. It is designed to render complex, real-time audio-reactive graphics that respond dynamically to music input, producing highly immersive and fluid visual effects. The engine uses GPU acceleration through WebGL to achieve high performance, allowing it to handle intricate shader-based visualizations without overwhelming system resources. Butterchurn supports preset-based rendering, enabling users to load, customize, and switch between a wide variety of visual styles that evolve over time with the audio signal. ...

Downloads: 0 This Week

Last Update: 2026-04-08
See Project
23

LatentSync

Taming Stable Diffusion for Lip Sync

...The system leverages a U-Net diffusion backbone, with cross-attention of audio embeddings (via an audio encoder) and reference video frames to guide generation, and applies a set of loss functions (temporal, perceptual, sync-net based) to enforce lip-sync accuracy, visual fidelity, and temporal consistency. Over versions, LatentSync has improved temporal stability and lowered resource requirements — making inference more practical (e.g. 8 GB VRAM for earlier versions, somewhat higher for latest models).

Downloads: 6 This Week

Last Update: 2025-12-02
See Project
24

HunyuanVideo-Foley

Multimodal Diffusion with Representation Alignment

HunyuanVideo-Foley is a multimodal diffusion model from Tencent Hunyuan for high-fidelity Foley (sound effects) audio generation synchronized to video scenes. It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks. Produces high-quality 48 kHz audio output suitable for professional use. ...

Downloads: 3 This Week

Last Update: 2025-09-28
See Project
25

S&box

s&box is a modern game engine, built on Valve's Source 2

...Built on a cutting-edge game engine, s&box allows creators to prototype, build, and share interactive game modes, tools, and environments using C#, JavaScript, and visual scripting, promoting accessible content creation for developers of varying skill levels. The platform emphasizes multiplayer and community experiences, giving creators direct control over networking, physics, rendering, and audio without needing to build those systems from scratch. With real-time recompilation and fast iteration loops, developers can see changes instantly, speeding up the creative process dramatically compared to traditional engines. ...

Downloads: 164 This Week

Last Update: 6 days ago
See Project