OmniParser is a comprehensive method for parsing user interface screenshots into structured elements, significantly enhancing the ability of multimodal models like GPT-4 to generate actions accurately grounded in corresponding regions of the interface. It reliably identifies interactable icons within user interfaces and understands the semantics of various elements in a screenshot, associating intended actions with the correct screen regions. To achieve this, OmniParser curates an interactable icon detection dataset containing 67,000 unique screenshot images labeled with bounding boxes of interactable icons derived from DOM trees. Additionally, a collection of 7,000 icon-description pairs is used to fine-tune a caption model that extracts the functional semantics of detected elements. Evaluations on benchmarks such as SeeClick, Mind2Web, and AITW demonstrate that OmniParser outperforms GPT-4V baselines, even when using only screenshot inputs without additional information.

Features

  • Parse user interface screenshots into structured and easy-to-understand elements
  • Examples available
  • Enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface
  • Ensure you have the V2 weights downloaded in weights folder
  • Model Weights License

Project Samples

Project Activity

See All Activity >

License

Creative Commons Attribution License

Follow OmniParser

OmniParser Web Site

Other Useful Business Software
Failed Payment Recovery for Subscription Businesses Icon
Failed Payment Recovery for Subscription Businesses

For subscription companies searching for a failed payment recovery solution to grow revenue, and retain customers.

FlexPay’s innovative platform uses multiple technologies to achieve the highest number of retained customers, resulting in reduced involuntary churn, longer life span after recovery, and higher revenue. Leading brands like LegalZoom, Hooked on Phonics, and ClinicSense trust FlexPay to recover failed payments, reduce churn, and increase customer lifetime value.
Learn More
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of OmniParser!

Additional Project Details

Operating Systems

Windows

Programming Language

Python

Related Categories

Python Agentic AI Tool, Python AI Agent Frameworks, Python AI Agents

Registered

2025-02-18