Showing 118 open source projects for "computer vision"

View related business solutions
  • AestheticsPro Medical Spa Software Icon
    AestheticsPro Medical Spa Software

    Our new software release will dramatically improve your medspa business performance while enhancing the customer experience

    AestheticsPro is the most complete Aesthetics Software on the market today. HIPAA Cloud Compliant with electronic charting, integrated POS, targeted marketing and results driven reporting; AestheticsPro delivers the tools you need to manage your medical spa business. It is our mission To Provide an All-in-One Cutting Edge Software to the Aesthetics Industry.
    Learn More
  • The Most Powerful Software Platform for EHSQ and ESG Management Icon
    The Most Powerful Software Platform for EHSQ and ESG Management

    Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.

    Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.
    Learn More
  • 1
    Raster Vision

    Raster Vision

    Open source framework for deep learning satellite and aerial imagery

    Raster Vision is an open source framework for Python developers building computer vision models on satellite, aerial, and other large imagery sets (including oblique drone imagery). There is built-in support for chip classification, object detection, and semantic segmentation using PyTorch. Raster Vision allows engineers to quickly and repeatably configure pipelines that go through core components of a machine learning workflow: analyzing training data, creating training chips, training models, creating predictions, evaluating models, and bundling the model files and configuration for easy deployment. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Self-Operating Computer

    Self-Operating Computer

    A framework to enable multimodal models to operate a computer

    The Self-Operating Computer Framework is an innovative system that enables multimodal models to autonomously operate a computer by interpreting the screen and executing mouse and keyboard actions to achieve specified objectives. This framework is compatible with various multimodal models and currently integrates with GPT-4o, o1, Gemini Pro Vision, Claude 3, and LLaVa.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 3
    Vision Transformer Pytorch

    Vision Transformer Pytorch

    Implementation of Vision Transformer, a simple way to achieve SOTA

    This repository provides a from-scratch, minimalist implementation of the Vision Transformer (ViT) in PyTorch, focusing on the core architectural pieces needed for image classification. It breaks down the model into patch embedding, positional encoding, multi-head self-attention, feed-forward blocks, and a classification head so you can understand each component in isolation. The code is intentionally compact and modular, which makes it easy to tinker with hyperparameters, depth, width, and...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 4
    Kornia

    Kornia

    Open Source Differentiable Computer Vision Library

    ...With Kornia we fill the gap between classical and deep computer vision that implements standard and advanced vision algorithms for AI. Our libraries and initiatives are always according to the community needs.
    Downloads: 9 This Week
    Last Update:
    See Project
  • Award-Winning Medical Office Software Designed for Your Specialty Icon
    Award-Winning Medical Office Software Designed for Your Specialty

    Succeed and scale your practice with cloud-based, data-backed, AI-powered healthcare software.

    RXNT is an ambulatory healthcare technology pioneer that empowers medical practices and healthcare organizations to succeed and scale through innovative, data-backed, AI-powered software.
    Learn More
  • 5
    AskUI Vision Agent

    AskUI Vision Agent

    Enable AI to control your desktop, mobile and HMI devices

    ...The repository presents a feature overview, sample media, and frequent release notes, which show ongoing improvements such as CORS checks and other operational tweaks. The broader AskUI documentation covers the Python Vision Agent along with suite services and inference APIs, indicating a productized ecosystem rather than a single library. Community-curated lists also recognize Vision Agent as part of the broader “GUI agents” landscape, placing it among other computer-use agents.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 6
    Computer vision projects

    Computer vision projects

    computer vision projects | Fun AI projects related to computer vision

    Computer vision projects is an open-source collection of computer vision projects and experiments that demonstrate practical applications of modern AI techniques in image processing, robotics, and real-time visual analysis. The repository includes multiple demonstration systems implemented using languages such as Python and C++, covering topics ranging from object detection to embedded vision systems.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Phi-3-MLX

    Phi-3-MLX

    Phi-3.5 for Mac: Locally-run Vision and Language Models

    Phi-3-Vision-MLX is an Apple MLX (machine learning on Apple silicon) implementation of Phi-3 Vision, a lightweight multi-modal model designed for vision and language tasks. It focuses on running vision-language AI efficiently on Apple hardware like M1 and M2 chips.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 8
    Albumentations

    Albumentations

    Fast image augmentation library and an easy-to-use wrapper

    Albumentations is a computer vision tool that boosts the performance of deep convolutional neural networks. Albumentations is a Python library for fast and flexible image augmentations. Albumentations efficiently implements a rich variety of image transform operations that are optimized for performance, and does so while providing a concise, yet powerful image augmentation interface for different computer vision tasks, including object classification, segmentation, and detection. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    CleanVision

    CleanVision

    Automatically find issues in image datasets

    ...CleanVision helps you automatically identify common types of data issues lurking in image datasets. This package currently detects issues in the raw images themselves, making it a useful tool for any computer vision task such as: classification, segmentation, object detection, pose estimation, keypoint detection, generative modeling, etc.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Simplify Purchasing For Your Business Icon
    Simplify Purchasing For Your Business

    Manage what you buy and how you buy it with Order.co, so you have control over your time and money spent.

    Simplify every aspect of buying for your business in Order.co. From sourcing products to scaling purchasing across locations to automating your AP and approvals workstreams, Order.co is the platform of choice for growing businesses.
    Learn More
  • 10
    TorchIO

    TorchIO

    Medical imaging toolkit for deep learning

    ...Transforms include typical computer vision operations such as random affine transformations and also domain-specific ones such as simulation of intensity artifacts due to MRI magnetic field inhomogeneity.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 11
    MESHROOM

    MESHROOM

    3D reconstruction software

    ...Photography is the projection of a 3D scene onto a 2D plane, losing depth information. The goal of photogrammetry is to reverse this process. The dense modeling of the scene is the result yielded by chaining two computer vision-based pipelines, “Structure-from-Motion” (SfM) and “Multi View Stereo” (MVS). Fusion of Multi-bracketing LDR images into HDR. Alignment of panorama images. Support for fisheye optics. Automatically estimate fisheye circle or manually edit it. Take advantage of motorized-head file. Easy to integrate in your Renderfarm System. ...
    Downloads: 105 This Week
    Last Update:
    See Project
  • 12
    R1-V

    R1-V

    Witness the aha moment of VLM with less than $3

    R1-V is an initiative aimed at enhancing the generalization capabilities of Vision-Language Models (VLMs) through Reinforcement Learning in Visual Reasoning (RLVR). The project focuses on building a comprehensive framework that emphasizes algorithm enhancement, efficiency optimization, and task diversity to achieve general vision-language intelligence and visual/GUI agents. The team's long-term goal is to contribute impactful open-source research in this domain.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    SAM 2

    SAM 2

    The repository provides code for running inference with SAM 2

    SAM2 is a next-generation version of the Segment Anything Model (SAM), designed to improve performance, generalization, and efficiency in promptable image segmentation tasks. It retains the core promptable interface—accepting points, boxes, or masks—but incorporates architectural and training enhancements to produce higher-fidelity masks, better boundary adherence, and robustness to complex scenes. The updated model is optimized for faster inference and lower memory use, enabling real-time...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 14
    SAHI

    SAHI

    A lightweight vision library for performing large object detection

    A lightweight vision library for performing large-scale object detection & instance segmentation. Object detection and instance segmentation are by far the most important fields of applications in Computer Vision. However, detection of small objects and inference on large images are still major issues in practical usage. Here comes the SAHI to help developers overcome these real-world problems with many vision utilities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    X-AnyLabeling

    X-AnyLabeling

    Effortless data labeling with AI support from Segment Anything

    X-AnyLabeling is an open-source data annotation platform designed to streamline the process of labeling datasets for computer vision and multimodal AI applications. The software integrates an AI-powered labeling engine that allows users to generate annotations automatically with the assistance of modern vision models such as Segment Anything and various object detection frameworks. It supports labeling tasks across images and videos and enables developers to prepare training datasets for tasks such as object detection, segmentation, classification, tracking, and pose estimation. ...
    Downloads: 61 This Week
    Last Update:
    See Project
  • 16
    Colossal-AI

    Colossal-AI

    Making large AI models cheaper, faster and more accessible

    The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing. Together with better performance come larger model sizes. This imposes challenges to the memory wall of the current accelerator hardware such as GPU. It is never ideal to train large models such as Vision Transformer, BERT, and GPT on a single GPU or a single machine. There is an urgent demand to train models in a distributed environment. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    torchvision

    torchvision

    Datasets, transforms and models specific to Computer Vision

    The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision. We recommend Anaconda as Python package management system. Torchvision currently supports Pillow (default), Pillow-SIMD, which is a much faster drop-in replacement for Pillow with SIMD, if installed will be used as the default. Also, accimage, if installed can be activated by calling torchvision.set_image_backend('accimage'), libpng, which can be installed via conda conda install libpng or any of the package managers for debian-based and RHEL-based Linux distributions, and libjpeg, which can be installed via conda conda install jpeg or any of the package managers for debian-based and RHEL-based Linux distributions. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    FiftyOne

    FiftyOne

    The open-source tool for building high-quality datasets

    The open-source tool for building high-quality datasets and computer vision models. Nothing hinders the success of machine learning systems more than poor-quality data. And without the right tools, improving a model can be time-consuming and inefficient. FiftyOne supercharges your machine learning workflows by enabling you to visualize datasets and interpret models faster and more effectively. Improving data quality and understanding your model’s failure modes are the most impactful ways to boost the performance of your model. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 19
    Diffgram

    Diffgram

    Training data (data labeling, annotation, workflow) for all data types

    ...Annotation is required because raw media is considered to be unstructured and not usable without it. That’s why training data is required for many modern machine learning use cases including computer vision, natural language processing and speech recognition.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 20
    hCaptcha Challenger

    hCaptcha Challenger

    Gracefully face hCaptcha challenge with multimodal llms

    hCaptcha Challenger is an open-source automation framework designed to solve hCaptcha verification challenges using computer vision models and multimodal reasoning techniques. The project integrates machine learning models capable of analyzing visual captcha tasks and identifying the correct responses required to pass the verification process. Instead of relying on third-party captcha-solving services or browser scripts, the system operates independently by using pretrained neural networks that can classify images, detect objects, and interpret spatial relationships. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 21
    supervision

    supervision

    We write your reusable computer vision tools

    We write your reusable computer vision tools. Whether you need to load your dataset from your hard drive, draw detections on an image or video, or count how many detections are in a zone. You can count on us.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 22
    Datasets

    Datasets

    Hub of ready-to-use datasets for ML models

    Datasets is a library for easily accessing and sharing datasets, and evaluation metrics for Natural Language Processing (NLP), computer vision, and audio tasks. Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. Backed by the Apache Arrow format, process large datasets with zero-copy reads without any memory constraints for optimal speed and efficiency. We also feature a deep integration with the Hugging Face Hub, allowing you to easily load and share a dataset with the wider NLP community. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 23
    Advanced AI explainability for PyTorch

    Advanced AI explainability for PyTorch

    Advanced AI Explainability for computer vision

    pytorch-grad-cam is an open-source library that provides advanced explainable AI techniques for interpreting the predictions of deep learning models used in computer vision. The project implements Grad-CAM and several related visualization methods that highlight the regions of an image that most strongly influence a neural network’s decision. These visualization techniques allow developers and researchers to better understand how convolutional neural networks and transformer-based vision models make predictions. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    MetaCLIP

    MetaCLIP

    ICLR2024 Spotlight: curation/training code, metadata, distribution

    MetaCLIP is a research codebase that extends the CLIP framework into a meta-learning / continual learning regime, aiming to adapt CLIP-style models to new tasks or domains efficiently. The goal is to preserve CLIP’s strong zero-shot transfer capability while enabling fast adaptation to domain shifts or novel class sets with minimal data and without catastrophic forgetting. The repository provides training logic, adaptation strategies (e.g. prompt tuning, adapter modules), and evaluation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Screenshot to Code

    Screenshot to Code

    A neural network that transforms a design mock-up into static websites

    Screenshot-to-code is a tool or prototype that attempts to convert UI screenshots (e.g., of mobile or web UIs) into code representations, likely generating layouts, HTML, CSS, or markup from image inputs. It is part of a research/proof-of-concept domain in UI automation and image-to-UI code generation. Mapping visual design to code constructs. Code/UI layout (HTML, CSS, or markup). Examples/demo scripts showing “image UI code”.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB