Showing 19117 open source projects for "audio/visual"

View related business solutions
  • See what everyone is allocated to. Projects, clients, meetings - all in one tool. Icon
    See what everyone is allocated to. Projects, clients, meetings - all in one tool.

    The fast, simple way to schedule people, equipment and other resources online.

    Designed to replace clunky, old scheduling spreadsheets, Resource Guru helps managers get organized fast. The platform covers resource planning, resource scheduling, resource management, staff leave management, reporting, and more.
    Free Trial
  • Effortlessly manage macOS, iOS, iPadOS and tvOS devices across multiple clients and locations. Icon
    Effortlessly manage macOS, iOS, iPadOS and tvOS devices across multiple clients and locations.

    The Most Powerful Apple Device Management Tool for MSPs and IT Teams

    Addigy solutions accelerate Apple adoption in any environment.
    Learn More
  • 1
    visual-explainer

    visual-explainer

    Agent skill + prompt templates that generate rich HTML pages

    visual-explainer is an AI-oriented agent skill that converts complex terminal or analytical output into polished, human-readable HTML reports designed for quick comprehension and sharing. The project includes prompt templates and automation logic that enable coding agents to generate visual summaries such as diff reviews, architecture overviews, plan audits, and structured data tables.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Step-Audio

    Step-Audio

    Open-source framework for intelligent speech interaction

    Step-Audio is a unified, open-source framework aimed at building intelligent speech systems that combine both comprehension and generation: it integrates large language models (LLMs) with speech input/output to handle not only semantic understanding but also rich vocal characteristics like tone, style, dialect, emotion, and prosody. The design moves beyond traditional separate-component pipelines (ASR → text model → TTS), instead offering a multimodal model that ingests speech or audio and produces speech accordingly, enabling natural dialogue, voice cloning, and expressive speech synthesis. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    Visual Blocks

    Visual Blocks

    Visual Blocks for ML is a Google visual programming framework

    ...Because everything lives in the browser, sharing is as simple as exporting a project or link, and collaborators can experiment without installing toolchains. For educators and product teams alike, Visual Blocks reduces the distance from idea to interactive proof-of-concept by turning ML diagrams.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Visual Regression Tracker

    Visual Regression Tracker

    Backend and Frontend application for tracking differences via image

    Open source, self-hosted solution for visual testing and managing results of visual testing. Service receives images, performs pixel-by-pixel comparisons with its previously accepted baseline, and provides immediate results in order to catch unexpected changes. Use implemented libraries to integrate with existing automated suites by adding assertions based on image comparison.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MaintainX is the world-leading mobile-first workflow management platform for industrial and frontline workers. Icon
    MaintainX is the world-leading mobile-first workflow management platform for industrial and frontline workers.

    Trusted by Operational Leaders Across the Globe

    Your day-to-day maintenance tasks, simplified. MaintainX eliminates the paperwork, so you can spend less time on your clipboard and more time getting things done.
    Learn More
  • 5
    Kimi-Audio

    Kimi-Audio

    Audio foundation model excelling in audio understanding

    Kimi-Audio is an ambitious open-source audio foundation model designed to unify a wide array of audio processing tasks — from speech recognition and audio understanding to generative conversation and sound event classification — within a single cohesive architecture. Instead of fragmenting work across specialized models, Kimi-Audio handles automatic speech recognition (ASR), audio question answering, automatic audio captioning, speech emotion recognition, and audio-to-text chat in one system, enabling developers to build rich, multimodal audio applications without stitching together disparate components. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    MLX-Audio

    MLX-Audio

    A text-to-speech, speech-to-text and speech-to-speech library

    ...The project provides a straightforward CLI (mlx_audio.tts.generate) as well as a Python API for programmatic generation of audio, including parameters for voice choice, speed, language hints, output format, and sample rate. It includes examples such as audiobook generation to demonstrate long-form synthesis and joined audio segments. On top of that, MLX-Audio offers a modern web interface powered by FastAPI, with real-time waveform and 3D visualizations, file upload, and audio management.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Visual Studio Code

    Visual Studio Code

    Modern IDE and code editor from Microsoft for Mac, Windows, and Linux

    Visual Studio Code is updated monthly with new features and bug fixes. You can download it for Windows, macOS, and Linux on Visual Studio Code's website. To get the latest releases every day, install the Insiders build. Debug code right from the editor. Launch or attach to your running apps and debug with break points, call stacks, and an interactive console.
    Downloads: 73 This Week
    Last Update:
    See Project
  • 8
    Qwen2-Audio

    Qwen2-Audio

    Repo of Qwen2-Audio chat & pretrained large audio language model

    Qwen2-Audio is a large audio-language model by Alibaba Cloud, part of the Qwen series. It is trained to accept various audio signal inputs (including speech, sounds, etc.) and perform both voice chat and audio analysis, producing textual responses. It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Qwen-Audio

    Qwen-Audio

    Chat & pretrained large audio language model proposed by Alibaba Cloud

    Qwen-Audio is a large audio-language model developed by Alibaba Cloud, built to accept various types of audio input (speech, natural sounds, music, singing) along with text input, and output text. There is also an instruction-tuned version called Qwen-Audio-Chat which supports conversational interaction (multi-round), audio + text input, creative tasks and reasoning over audio.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Planview is the leading end-to-end platform for Strategic Portfolio Management (SPM) and Digital Product Development (DPD) Icon
    Planview is the leading end-to-end platform for Strategic Portfolio Management (SPM) and Digital Product Development (DPD)

    Manage project and product portfolios enterprise-wide

    Planview AdaptiveWork (formerly Clarizen) with embedded AI helps you proactively plan and deliver any type and size of portfolio, project, and work. Gain AI-enhanced visibility and insights, drive collaboration, and achieve better business outcomes across your organization.
    Learn More
  • 10
    Visual Boy Advance - M

    Visual Boy Advance - M

    Emulator for the Game Boy, Game Boy Color, and Game Boy Advance

    Visual Boy Advance - M (VBA-M) is an open-source emulator designed to run Game Boy, Game Boy Color, and Game Boy Advance games on modern systems. It is a continuation and improvement of the original Visual Boy Advance project, with enhanced accuracy, performance, and compatibility. VBA-M supports multiple platforms, making it accessible across Windows, macOS, and Linux environments.
    Downloads: 53 This Week
    Last Update:
    See Project
  • 11
    C# for Visual Studio Code

    C# for Visual Studio Code

    C# support for Visual Studio Code (powered by OmniSharp)

    Welcome to the C# extension for Visual Studio Code! This extension provides the following features inside VS Code. Lightweight development tools for .NET Core. Great C# editing support, including Syntax Highlighting, IntelliSense, Go to Definition, Find All References, etc. Debugging support for .NET Core (CoreCLR). Note: Mono debugging is not supported. Desktop CLR debugging has limited support.
    Downloads: 52 This Week
    Last Update:
    See Project
  • 12
    Obsidian Visual Skills Pack

    Obsidian Visual Skills Pack

    Generate Canvas, Excalidraw, and Mermaid diagrams from text

    LLM-TLDR is a Python-based tool designed to dramatically reduce the amount of code a large language model needs to read by extracting the essential structure and context from a codebase and presenting only the most relevant parts to the model. Traditional approaches often dump entire files into a model’s context, which quickly exceeds token limits; LLM-TLDR instead indexes project structure, traces dependencies, and summarizes code in a way that preserves semantic relevance while shrinking...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    1D Visual Tokenization and Generation

    1D Visual Tokenization and Generation

    This repo contains the code for 1D tokenizer and generator

    The 1D Visual Tokenization and Generation project from ByteDance introduces a novel “one-dimensional” tokenizer designed for images: instead of representing images with large grids of 2D tokens (as in many prior generative/image-modeling systems), it compresses images into as few as 32 discrete tokens (or more, optionally) — thereby achieving a very compact, efficient representation that drastically speeds up generation and reconstruction while retaining strong fidelity.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Audio Priority Bar

    Audio Priority Bar

    A native macOS menu bar app for managing audio device priorities

    Audio Priority Bar is a lightweight macOS utility that gives users precise control over how audio output is prioritized across different apps and devices, filling a gap in the system audio stack that Apple doesn’t natively expose. Once installed, it places an always-accessible control in the menu bar that lets you assign priority levels to individual audio sources so that more important sounds (like alerts, calls, or music) can override or duck less important ones (like background noise or game audio). ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Cypress Visual Regression

    Cypress Visual Regression

    Module for adding visual regression testing to Cypress

    Module for adding visual regression testing to Cypress.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Fun Audio Chat

    Fun Audio Chat

    Large Audio Language Model built for natural interactions

    Fun Audio Chat is an interactive voice-first conversational AI platform designed to let users engage in natural spoken dialogue with large language models in real time, turning speech into context-aware responses while maintaining a smooth back-and-forth experience. It combines speech recognition, audio processing, and AI generation so users can speak simply and receive spoken replies, enabling applications such as virtual assistants, voice bots, and hands-free chat interfaces. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Step-Audio-EditX

    Step-Audio-EditX

    LLM-based Reinforcement Learning audio edit model

    Step-Audio-EditX is an open-source, 3 billion-parameter audio model from StepFun AI designed to make expressive and precise editing of speech and audio as easy as text editing. Rather than treating audio editing as low-level waveform manipulation, this model converts speech into a sequence of discrete “audio tokens” (via a dual-codebook tokenizer) — combining a linguistic token stream and a semantic (prosody/emotion/style) token stream — thereby abstracting audio editing into high-level token operations. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Step-Audio 2

    Step-Audio 2

    Multi-modal large language model designed for audio understanding

    Step-Audio2 is an advanced, end-to-end multimodal large language model designed for high-fidelity audio understanding and natural speech conversation: unlike many pipelines that separate speech recognition, processing, and synthesis, Step-Audio2 processes raw audio, reasons about semantic and paralinguistic content (like emotion, speaker characteristics, non-verbal cues), and can generate contextually appropriate responses — including potentially generating or transforming audio output. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    LTX-2.3

    LTX-2.3

    Official Python inference and LoRA trainer package

    LTX-2.3 is an open-source multimodal artificial intelligence foundation model developed by Lightricks for generating synchronized video and audio from prompts or other inputs. Unlike most earlier video generation systems that only produced silent clips, LTX-2 combines video and audio generation in a unified architecture capable of producing coherent audiovisual scenes. The model uses a diffusion-transformer-based architecture designed to generate high-fidelity visual frames while simultaneously producing corresponding audio elements such as speech, music, ambient sound, or effects. ...
    Downloads: 191 This Week
    Last Update:
    See Project
  • 20
    Prettier Formatter for Visual Studio

    Prettier Formatter for Visual Studio

    Visual Studio Code extension for Prettier

    Prettier is an opinionated code formatter. It enforces a consistent style by parsing your code and re-printing it with its own rules that take the maximum line length into account, wrapping code when necessary. To ensure that this extension is used over other extensions you may have installed, be sure to set it as the default formatter in your VS Code settings. This setting can be set for all languages or by a specific language. If you want to disable Prettier on a particular language you...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 21
    C/C++ for Visual Studio Code

    C/C++ for Visual Studio Code

    Repository for the Microsoft C/C++ extension for VS Code

    The C/C++ extension adds language support for C/C++ to Visual Studio Code, including features such as IntelliSense and debugging. C/C++ support for Visual Studio Code is provided by a Microsoft C/C++ extension to enable cross-platform C and C++ development on Windows, Linux, and macOS. C++ is a compiled language meaning your program's source code must be translated (compiled) before it can be run on your computer.
    Downloads: 74 This Week
    Last Update:
    See Project
  • 22
    Butterchurn

    Butterchurn

    Butterchurn is a WebGL implementation of the Milkdrop Visualizer

    Butterchurn is a WebGL-based music visualization engine that recreates the classic MilkDrop visualizer experience entirely in the browser using modern web technologies. It is designed to render complex, real-time audio-reactive graphics that respond dynamically to music input, producing highly immersive and fluid visual effects. The engine uses GPU acceleration through WebGL to achieve high performance, allowing it to handle intricate shader-based visualizations without overwhelming system resources. Butterchurn supports preset-based rendering, enabling users to load, customize, and switch between a wide variety of visual styles that evolve over time with the audio signal. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    LatentSync

    LatentSync

    Taming Stable Diffusion for Lip Sync

    ...The system leverages a U-Net diffusion backbone, with cross-attention of audio embeddings (via an audio encoder) and reference video frames to guide generation, and applies a set of loss functions (temporal, perceptual, sync-net based) to enforce lip-sync accuracy, visual fidelity, and temporal consistency. Over versions, LatentSync has improved temporal stability and lowered resource requirements — making inference more practical (e.g. 8 GB VRAM for earlier versions, somewhat higher for latest models).
    Downloads: 6 This Week
    Last Update:
    See Project
  • 24
    HunyuanVideo-Foley

    HunyuanVideo-Foley

    Multimodal Diffusion with Representation Alignment

    HunyuanVideo-Foley is a multimodal diffusion model from Tencent Hunyuan for high-fidelity Foley (sound effects) audio generation synchronized to video scenes. It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks. Produces high-quality 48 kHz audio output suitable for professional use. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 25
    S&box

    S&box

    s&box is a modern game engine, built on Valve's Source 2

    ...Built on a cutting-edge game engine, s&box allows creators to prototype, build, and share interactive game modes, tools, and environments using C#, JavaScript, and visual scripting, promoting accessible content creation for developers of varying skill levels. The platform emphasizes multiplayer and community experiences, giving creators direct control over networking, physics, rendering, and audio without needing to build those systems from scratch. With real-time recompilation and fast iteration loops, developers can see changes instantly, speeding up the creative process dramatically compared to traditional engines. ...
    Downloads: 164 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB