Showing 474 open source projects for "visual"

View related business solutions
  • AestheticsPro Medical Spa Software Icon
    AestheticsPro Medical Spa Software

    Our new software release will dramatically improve your medspa business performance while enhancing the customer experience

    AestheticsPro is the most complete Aesthetics Software on the market today. HIPAA Cloud Compliant with electronic charting, integrated POS, targeted marketing and results driven reporting; AestheticsPro delivers the tools you need to manage your medical spa business. It is our mission To Provide an All-in-One Cutting Edge Software to the Aesthetics Industry.
    Learn More
  • The Most Powerful Software Platform for EHSQ and ESG Management Icon
    The Most Powerful Software Platform for EHSQ and ESG Management

    Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.

    Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.
    Learn More
  • 1
    Self-Operating Computer

    Self-Operating Computer

    A framework to enable multimodal models to operate a computer

    ...Notably, it was the first known project to implement a multimodal model capable of viewing and controlling a computer screen. The framework supports features like Optical Character Recognition (OCR) and Set-of-Mark (SoM) prompting to enhance visual grounding capabilities. It is designed to be compatible with macOS, Windows, and Linux (with X server installed), and is released under the MIT license.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 2
    GalTransl

    GalTransl

    Automated translation solution for visual novels

    GalTransl is an automated translation system specifically designed for visual novels, particularly those in the “galgame” genre, leveraging large language models to streamline and enhance the translation process. It integrates support for multiple advanced LLM providers such as GPT-4, Claude, DeepSeek, and other models, enabling high-quality, context-aware translations that go beyond traditional machine translation approaches.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Unredact

    Unredact

    A simple tool for reading in poorly redacted documents

    ...Unlike traditional optical character recognition (OCR), which only reads visible text, Unredact focuses on inferring missing content where redaction has been applied by analyzing surrounding context, font characteristics, and linguistic patterns to produce candidate reconstructions. It accepts a variety of input formats, automatically identifies redacted regions, and then generates text suggestions that are presented alongside visual overlays so users can choose or refine outputs.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 4
    Moondream

    Moondream

    Tiny vision language model

    ...It serves as both a playground for the author’s artistic curiosity and a resource for other creative coders interested in generative art techniques. The repository may include shaders, canvas/WebGL code, visual demos, and utilities that demonstrate how mathematical functions or noise patterns can be harnessed for compelling visuals.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Award-Winning Medical Office Software Designed for Your Specialty Icon
    Award-Winning Medical Office Software Designed for Your Specialty

    Succeed and scale your practice with cloud-based, data-backed, AI-powered healthcare software.

    RXNT is an ambulatory healthcare technology pioneer that empowers medical practices and healthcare organizations to succeed and scale through innovative, data-backed, AI-powered software.
    Learn More
  • 5
    Depth Anything 3

    Depth Anything 3

    Recovering the Visual Space from Any Views

    Depth Anything 3 is a research-driven project that brings accurate and dense depth estimation to any input image or video, enabling foundational understanding of 3D structure from 2D visual content. Designed to work across diverse scenes, lighting conditions, and image types, it uses advanced neural networks trained on large, heterogeneous datasets, producing depth maps that reveal scene depth relationships and object surfaces with strong fidelity. The model can be applied to photography, AR/VR content creation, robotics perception, and 3D reconstruction workflows, making it versatile across industries and research domains. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 6
    Minegrub

    Minegrub

    A Grub Theme in the style of Minecraft!

    A Grub Theme in the style of Minecraft. A Grub theme inspired by Minecraft, adding visual enhancements to the bootloader.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 7
    CogVLM

    CogVLM

    A state-of-the-art open visual language model

    CogVLM is an open-source visual–language model suite—and its GUI-oriented sibling CogAgent—aimed at image understanding, grounding, and multi-turn dialogue, with optional agent actions on real UI screenshots. The flagship CogVLM-17B combines ~10B visual parameters with ~7B language parameters and supports 490×490 inputs; CogAgent-18B extends this to 1120×1120 and adds plan/next-action outputs plus grounded operation coordinates for GUI tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    PySpur

    PySpur

    Visual tool for building, testing, and deploying AI agent workflows

    ...By offering a visual representation of workflows, PySpur makes it easier to debug interactions between components and identify failures in complex pipelines. It supports iterative experimentation, allowing developers to rapidly improve agents without rebuilding systems from scratch. PySpur also enables deployment of finalized workflows after testing, making it suitable for both development and production use.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    ViMax

    ViMax

    Director, Screenwriter, Producer, and Video Generator All-in-One

    ViMax is an open-source framework for performing large-scale multi-modal vision-language modeling and reasoning by combining powerful image encoders with advanced language models to solve complex visual tasks. It integrates components like visual encoders, cross-modal fusion techniques, and reasoning modules so that users can go beyond simple captioning or classification to perform tasks such as visual question answering, multi-image inference, and structured scene understanding. ViMax’s design accommodates large image sets and supports retrieval augmentation, enabling it to work with external image databases, supplementary metadata, and semantic search to enhance context awareness. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Skillfully - The future of skills based hiring Icon
    Skillfully - The future of skills based hiring

    Realistic Workplace Simulations that Show Applicant Skills in Action

    Skillfully transforms hiring through AI-powered skill simulations that show you how candidates actually perform before you hire them. Our platform helps companies cut through AI-generated resumes and rehearsed interviews by validating real capabilities in action. Through dynamic job specific simulations and skill-based assessments, companies like Bloomberg and McKinsey have cut screening time by 50% while dramatically improving hire quality.
    Learn More
  • 10
    Qwen-Image-Layered

    Qwen-Image-Layered

    Qwen-Image-Layered: Layered Decomposition for Inherent Editablity

    Qwen-Image-Layered is an extension of the Qwen series of multimodal models that introduces layered image understanding, enabling the model to reason about hierarchical visual structures — such as separating foreground, background, objects, and contextual layers within an image. This architecture allows richer semantic interpretation, enabling use cases such as scene decomposition, object-level editing, layered captioning, and more fine-grained multimodal reasoning than with flat image encodings alone. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    ManiSkill

    ManiSkill

    SAPIEN Manipulation Skill Framework

    ...Developed by Hao Su Lab, it focuses on robotic manipulation with diverse, high-quality 3D tasks designed to challenge perception, control, and planning in robotics. ManiSkill provides both low-level control and visual observation spaces for realistic learning scenarios.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    Book2_Beauty-of-Data-Visualization

    Book2_Beauty-of-Data-Visualization

    Machine Learning, Criticism and Correction

    Book2_Beauty-of-Data-Visualization is an open educational project that teaches the principles and techniques of effective data visualization using Python and modern plotting libraries. The repository focuses on both the technical and aesthetic aspects of visual analytics, helping learners understand how to communicate data clearly and persuasively. It includes practical examples that demonstrate how different chart types reveal patterns, trends, and distributions in real datasets. The material emphasizes visual storytelling and design thinking alongside coding implementation. By combining theory with hands-on plotting exercises, the book helps readers build both analytical and presentation skills. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Barfi

    Barfi

    A Python visual Flow Based Programming library

    A Python visual Flow-Based Programming library that integrates into your existing workflow. Barfi is a Flow-Based Programming environment that provides a graphical programming interface. It is integratable into your existing Python workflows. A schema is built using barfi.Blocks. Then the schema is executed with barfi.ComputeEngine. Each barfi.Block has some properties that enable the FBP and schema building.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    WiFi DensePose

    WiFi DensePose

    Turn WiFi signals into real-time human pose estimation and detection

    ...The repository includes components for data processing, model inference, and real-time visualization, making it suitable for research and experimental deployments. Its architecture emphasizes performance and reproducibility, allowing developers to explore non-visual motion capture systems using accessible hardware. Overall, WiFi DensePose functions as an advanced research-grade toolkit for WiFi-based human sensing and pose estimation.
    Downloads: 63 This Week
    Last Update:
    See Project
  • 15
    HunyuanWorld 1.0

    HunyuanWorld 1.0

    Generating Immersive, Explorable, and Interactive 3D Worlds

    ...The architecture integrates panoramic proxy generation, semantic layering, and hierarchical 3D reconstruction to produce high-quality scene-scale 3D worlds from both text and images. HunyuanWorld-1.0 surpasses existing open-source methods in visual quality and geometric consistency, demonstrated by superior scores in BRISQUE, NIQE, Q-Align, and CLIP metrics.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 16
    Watermark-Removal

    Watermark-Removal

    Machine learning image inpainting task that removes watermarks

    Watermark-Removal repository is a machine learning project focused on removing visible watermarks from digital images using deep learning and image inpainting techniques. The system analyzes an image containing a watermark and attempts to reconstruct the underlying visual content so that the watermark is removed while preserving the original appearance of the image. The project uses neural network models inspired by research in contextual attention and gated convolution, which are methods commonly applied to image restoration tasks. Through these techniques, the model learns to identify regions of the image affected by the watermark and generate realistic replacements for the missing visual information. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 17
    Janus

    Janus

    Unified Multimodal Understanding and Generation Models

    Janus is a sophisticated open-source project from DeepSeek AI that aims to unify both visual understanding and image generation in a single model architecture. Rather than having separate systems for “look and describe” and “prompt and generate”, Janus uses an autoregressive transformer framework with a decoupled visual encoder—allowing it to ingest images for comprehension and to produce images from text prompts with shared internal representations.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Wan2.1

    Wan2.1

    Wan2.1: Open and Advanced Large-Scale Video Generative Model

    Wan2.1 is a foundational open-source large-scale video generative model developed by the Wan team, providing high-quality video generation from text and images. It employs advanced diffusion-based architectures to produce coherent, temporally consistent videos with realistic motion and visual fidelity. Wan2.1 focuses on efficient video synthesis while maintaining rich semantic and aesthetic detail, enabling applications in content creation, entertainment, and research. The model supports text-to-video and image-to-video generation tasks with flexible resolution options suitable for various GPU hardware configurations. ...
    Downloads: 96 This Week
    Last Update:
    See Project
  • 19
    Book4_Power-of-Matrix

    Book4_Power-of-Matrix

    Book_4_Matrix Power | The Iris Book: From Addition, Subtraction

    Book4_Power-of-Matrix is an open educational repository that forms part of the Visualize-ML book series, focusing on explaining matrix mathematics and linear algebra concepts through visual and intuitive methods. The project is designed to help readers progress from basic arithmetic toward machine learning fundamentals by building a strong conceptual understanding of vectors, matrices, and their operations. It combines explanatory text, diagrams, and Python examples to bridge theory and practical computation. The material emphasizes geometric interpretation and visual reasoning, which makes abstract linear algebra topics more accessible to beginners and self-learners. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    LTX-2

    LTX-2

    Python inference and LoRA trainer package for the LTX-2 audio–video

    LTX-2 is a powerful, open-source toolkit developed by Lightricks that provides a modular, high-performance base for building real-time graphics and visual effects applications. It is architected to give developers low-level control over rendering pipelines, GPU resource management, shader orchestration, and cross-platform abstractions so they can craft visually compelling experiences without starting from scratch. Beyond basic rendering scaffolding, LTX-2 includes optimized math libraries, resource loaders, utilities for texture and buffer handling, and integration points for native event loops and input systems. ...
    Downloads: 77 This Week
    Last Update:
    See Project
  • 21
    Screenshot to Code

    Screenshot to Code

    A neural network that transforms a design mock-up into static websites

    Screenshot-to-code is a tool or prototype that attempts to convert UI screenshots (e.g., of mobile or web UIs) into code representations, likely generating layouts, HTML, CSS, or markup from image inputs. It is part of a research/proof-of-concept domain in UI automation and image-to-UI code generation. Mapping visual design to code constructs. Code/UI layout (HTML, CSS, or markup). Examples/demo scripts showing “image UI code”.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    VOID

    VOID

    Video Object and Interaction Deletion

    VOID is an advanced AI video processing system developed by Netflix that focuses on removing objects from videos while preserving the physical and visual realism of the surrounding environment. Unlike traditional inpainting methods that only erase pixels or simple artifacts, VOID models the full interaction dynamics between objects and their environment, including shadows, reflections, and even physical consequences such as movement or balance changes. Built on top of transformer-based architectures and fine-tuned for video inpainting tasks, the system uses interaction-aware mask conditioning to ensure temporal consistency across frames. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 23
    VideoRAG

    VideoRAG

    "VideoRAG: Chat with Your Videos

    VideoRAG is a retrieval-augmented generation (RAG) framework tailored for video content that enables AI systems to answer questions, summarize, and reason over long videos by combining visual embeddings with contextual search. The system works by first breaking video into clips, extracting visual and audio-textual features, and indexing them into embeddings, then using an LLM with a retriever to pull relevant segments on demand. When a user query is received, VideoRAG locates semantically relevant moments in the video using the embedding index, retrieves associated clips or transcripts, and feeds them to a generative model to produce accurate, grounded answers or summaries. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    firerpa LAMDA

    firerpa LAMDA

    The most powerful Android RPA agent framework

    lamda is an Android RPA agent framework that provides visual remote desktop control and automation at scale, geared toward testing, automation validation, and device management. It exposes a clean UI to monitor and interact with connected devices and includes tooling to script actions reliably across apps and OS versions. The project emphasizes low-friction setup and powerful control primitives so teams can move from interactive validation to repeatable automation.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    DeepWiki Open

    DeepWiki Open

    AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories

    ...Users can enter a repository URL and the system will clone the project, build semantic embeddings of its codebase, extract architecture and relationships, generate human-readable documentation, and produce visual diagrams to help explain complex code structure. DeepWiki’s output turns raw repositories into interactive, web-style wikis complete with navigable sections, diagrams, and contextual explanations, making it easier for developers and collaborators to understand unfamiliar code. It includes an “Ask” feature that lets users query the generated wiki using RAG-style retrieval, enabling interactive question-answering and exploration.
    Downloads: 3 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB