At Reworkd, we iterated on all these problems across tens of thousands of real web tasks to build a powerful perception system for web agents... Tarsier! In the video below, we use Tarsier to provide webpage perception for a minimalistic GPT-4 LangChain web agent. Tarsier visually tags interactable elements on a page via brackets + an ID e.g. [23]. In doing this, we provide a mapping between elements and IDs for an LLM to take actions upon (e.g. CLICK [23]). We define interactable elements as buttons, links, or input fields that are visible on the page; Tarsier can also tag all textual elements if you pass tag_text_elements=True. Furthermore, we've developed an OCR algorithm to convert a page screenshot into a whitespace-structured string (almost like ASCII art) that an LLM even without vision can understand. Since current vision-language models still lack fine-grained representations needed for web interaction tasks, this is critical.

Features

  • Vision utilities for web interaction agents
  • Google Vision and Microsoft Azure
  • Documentation available
  • Effortlessly extract web data at scale
  • Reworkd automates your entire web data pipeline, end-to-end
  • It scans websites, generates code, runs extractors, validates results, and outputs data

Project Samples

Project Activity

See All Activity >

Categories

Web Services

License

MIT License

Follow Tarsier

Tarsier Web Site

Other Useful Business Software
The Most Powerful Software Platform for EHSQ and ESG Management Icon
The Most Powerful Software Platform for EHSQ and ESG Management

Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.

Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.
Learn More
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Tarsier!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Python

Related Categories

Python Web Services Software

Registered

2024-09-20