web scraper free download

Showing 40 open source projects for "web scraper"

View related business solutions

Linux Clear Filters & Widen Search

Next-Gen Encryption for Post-Quantum Security | CLEAR by Quantum Knight
Lock Down Any Resource, Anywhere, Anytime

CLEAR by Quantum Knight is a FIPS-140-3 validated encryption SDK engineered for enterprises requiring top-tier security. Offering robust post-quantum cryptography, CLEAR secures files, streaming media, databases, and networks with ease across over 30 modern platforms. Its compact design, smaller than a single smartphone image, ensures maximum efficiency and low energy consumption.

Learn More
Agentic AI SRE built for Engineering and DevOps teams.
No More Time Lost to Troubleshooting

NeuBird AI's agentic AI SRE delivers autonomous incident resolution, helping team cut MTTR up to 90% and reclaim engineering hours lost to troubleshooting.

Learn More
1

shot-scraper

A command-line utility for taking automated screenshots of websites

shot-scraper is a command-line utility for taking automated screenshots of web pages using a headless browser engine. After installation, a single command can capture a full-page screenshot of a URL and save it to a file, making it ideal for documentation, monitoring, and visual regression tasks. Under the hood it uses a modern browser (installed via a one-time shot-scraper install step) and exposes options for viewport size, full-page versus clipped screenshots, and device emulation. ...

Downloads: 0 This Week

Last Update: 2026-02-01
See Project
2

Scraper of Death

Scraper of Death is a web scraper. Multiple Scraping Methods Requests + BeautifulSoup (fast, lightweight) Selenium (JavaScript support, dynamic content)

Downloads: 3 This Week

Last Update: 2026-02-19
See Project
3

CyberScraper 2077

A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

CyberScraper 2077 is not just another web scraping tool – it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI, Gemini and LocalLLM Models to slice through the web's defenses, extracting the data you need with unparalleled precision and style.

Downloads: 0 This Week

Last Update: 2026-01-20
See Project
4

html-metadata

MetaData html scraper and parser for Node.js (supports Promises

The aim of this library is to be a comprehensive source for extracting all HTML-embedded metadata. Currently, it supports Schema.org microdata using a third-party library, a native BEPress, Dublin Core, Highwire Press, JSON-LD, Open Graph, Twitter, EPrints, PRISM, and COinS implementation, and some general metadata that doesn't belong to a particular standard (for instance, the content of the title tag, or meta description tags). Planned is support for RDFa, AGLS, and other yet unheard-of...

Downloads: 1 This Week

Last Update: 2025-04-30
See Project
Collect! is a highly configurable debt collection software
Everything that matters to debt collection, all in one solution.

The flexible & scalable debt collection software built to automate your workflow. From startup to enterprise, we have the solution for you.

Learn More
5

JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.

Scrape job websites into a single spreadsheet with no duplicates. Automated tool for scraping job postings into a .csv file. You can search for jobs with YAML configuration files or by passing command arguments. By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets. Run funnel with your settings YAML to populate your master CSV file with jobs from available providers. JobFunnel can be easily automated to run nightly with crontab. If you...

Downloads: 0 This Week

Last Update: 2024-09-29
See Project
6

ScrapeGraphAI

Python scraper based on AI

Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.

Downloads: 12 This Week

Last Update: 5 days ago
See Project
7

dude uncomplicated data extraction

dude uncomplicated data extraction: A simple framework

Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.

Downloads: 0 This Week

Last Update: 2024-03-02
See Project
8

Ulixee Hero

The web browser built for scraping

It's the first modern headless browsers designed specifically for scraping instead of just automated testing. Hero provides access to the W3C DOM specification without the need for Puppeteer's complicated evaluate callbacks and multi-context switching. We've recreated a fully compliant DOM directly in NodeJS allowing you bypass the headaches of previous scraper tools. The powerful Chrome engine sits under the hood, allowing for lightning fast rendering. Emulators make it easy to disguise...

Downloads: 5 This Week

Last Update: 2025-09-08
See Project
9

Spider

High-performance Rust web crawler and scraper for large-scale data

Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large...

Downloads: 14 This Week

Last Update: 2026-03-31
See Project
Skillfully - The future of skills based hiring
Realistic Workplace Simulations that Show Applicant Skills in Action

Skillfully transforms hiring through AI-powered skill simulations that show you how candidates actually perform before you hire them. Our platform helps companies cut through AI-generated resumes and rehearsed interviews by validating real capabilities in action. Through dynamic job specific simulations and skill-based assessments, companies like Bloomberg and McKinsey have cut screening time by 50% while dramatically improving hire quality.

Learn More
10

Crawl4AI

Open-source LLM Friendly Web Crawler & Scraper

Crawl4AI is a high-performance, AI‑ready web crawler tailored for LLM data ingestion and RAG pipelines. It supports adaptive crawling heuristics (stopping when enough info is gathered), structured markdown output, and high-speed parallel execution. Designed to operate at scale with optional Docker deployment and framework integrations.

Downloads: 1 This Week

Last Update: 2026-03-18
See Project
11

MDCx

Movie metadata scraper and organizer for media libraries and NFO

MDCx is an open source media metadata scraping and organization tool designed to automate the process of collecting detailed information for movie files. It retrieves metadata from multiple online sources and applies it to local media collections, helping users maintain structured and well-organized libraries. MDCx can download information such as titles, cast data, artwork, and other metadata, then generate standardized NFO files compatible with media management systems. It also supports...

Downloads: 11 This Week

Last Update: 2026-03-10
See Project
12

crwlr

Library for Rapid (Web) Crawler and Scraper Development

This library provides kind of a framework and a lot of ready-to-use, so-called steps, that you can use as building blocks, to build your own crawlers and scrapers with. Before diving into the library, let's have a look at the terms crawling and scraping. For most real-world use cases, those two things go hand in hand, which is why this library helps with and combines both. A (web) crawler is a program that (down)loads documents and follows the links in it to load them as well. A crawler...

Downloads: 9 This Week

Last Update: 2026-01-05
See Project
13

ai-scrapper

🚀 Discover AI Web Scraper! 🚀 Tired of copying and pasting data from websites? I developed a desktop application with Electron and Gemini AI to extract structured data easily and efficiently! 🤖✨

1 Review

Downloads: 5 This Week

Last Update: 2025-05-31
See Project
14

linkedin2username

Generate probable usernames from LinkedIn company employee lists

...This process helps security researchers, penetration testers, and investigators perform reconnaissance by building potential username lists for further security testing or OSINT analysis. Unlike tools that rely on official APIs, linkedin2username operates as a pure web scraper and therefore does not require API keys. The script uses Selenium to automate browser interactions and perform searches within LinkedIn to gather employee data.

Downloads: 1 This Week

Last Update: 2026-03-07
See Project
15

FungiRegEx

FungiRegEx

This tool is a web-based search engine for regular expressions in the proteomes, all the information is obtained from the JGI (Joint Genome Institute) database through a scraper for all the available species; therefore this tool only considers fungi organisms. In this version, we use React JS in front-end and NodeJS + Express for back-end. Full Documentation Available on: https://victormiguelterronmacias.slite.page/p/J7BJU3hXhd72EJ/FungiRegEx-Software-documentation If you want to buy me a coffee: https://www.paypal.com/donate/?...

Downloads: 0 This Week

Last Update: 2023-09-04
See Project
16

Goutte

Goutte, a simple PHP Web Scraper

Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses. Goutte depends on PHP 7.1+. Add fabpot/goutte as a require dependency in your composer.json file. Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\HttpBrowser). Make requests with the request() method. The method returns a Crawler object (Symfony\Component\DomCrawler\Crawler). To use your own HTTP settings, you may...

Downloads: 0 This Week

Last Update: 2023-04-01
See Project
17

Tholian Stealth

Secure, Peer-to-Peer, Private and Automateable Web Browser

Tholian Stealth is an open-source privacy-focused web browser and automation platform designed to combine secure browsing, web scraping, and proxy functionality into a unified system. It aims to prioritize user privacy and autonomy by minimizing tracking, blocking unnecessary requests, and restricting potentially harmful web technologies such as JavaScript execution. The platform operates as both a browser and a network service, capable of acting as a proxy, scraper, and content filtering system for other applications. ...

Downloads: 0 This Week

Last Update: 2026-03-17
See Project
18

AutoScraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

This project is made for automatic web scraping to make scraping easy. It gets a URL or the HTML content of a web page and a list of sample data that we want to scrape from that page. This data can be text, URL or any HTML tag value of that page. It learns the scraping rules and returns similar elements. Then you can use this learned object with new URLs to get similar content or the exact same element of those new pages.

Downloads: 2 This Week

Last Update: 2023-04-12
See Project
19

mlscraper

ML-based HTML scraper that learns extraction rules from examples

...It analyzes those examples within the HTML document and determines patterns or rules that can be used to extract the same type of information from similar pages. Once trained, the generated scraper can process new pages and return the extracted data in structured formats such as dictionaries or lists. This approach simplifies web scraping tasks by shifting the focus from rule-writing to example-based training. Internally, the project processes HTML documents, identifies relevant elements in the DOM, and builds extraction logic based on statistical or heuristic analysis of the training samples. ...

Downloads: 5 This Week

Last Update: 4 days ago
See Project
20

SecretAgent

The web scraper that's nearly impossible to block

SecretAgent is a headless browser that’s nearly impossible to detect. It achieves this by emulating real users. And it has powerful auto-replay functionality that lets you create and debug scripts in record setting time.

Downloads: 0 This Week

Last Update: 2023-08-14
See Project
21

soup

Web Scraper in Go, similar to BeautifulSoup

Web Scraper in Go, similar to BeautifulSoup. soup is a small web scraper package for Go, with its interface highly similar to that of BeautifulSoup. Pointer containing the pointer to the current html node. NodeValue containing the current html node's value, i.e. the tag name for an ElementNode, or the text in case of a TextNode. Error containing an error in a struct if one occurs, else nil is returned.

Downloads: 0 This Week

Last Update: 2023-01-25
See Project
22

X-RAY

The next web scraper, see through the <html> noise

Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing. The API is entirely composable, giving you great flexibility in how you scrape each page. Paginate through websites, scraping each page. X-ray also supports a request delay and a pagination limit. Scraped pages can be streamed to a file, so if there's an error on one page, you won't...

Downloads: 0 This Week

Last Update: 2021-10-05
See Project
23

django-dynamic-scraper

Creating Scrapy scrapers via the Django admin interface

Django Dynamic Scraper (DDS) is an app for Django build on top of the scraping framework Scrapy. While preserving many of the features of Scrapy it lets you dynamically create and manage spiders via the Django admin interface. With Django Dynamic Scraper (DDS) you can define your Scrapy scrapers dynamically via the Django admin interface and save your scraped items in the database you defined for your Django project. Since it simplifies things DDS is not usable for all kinds of scrapers, but...

Downloads: 0 This Week

Last Update: 2022-09-05
See Project
24

google-play-scraper

Node.js scraper to get data from Google Play

Node.js module to scrape application data from the Google Play store. Retrieves the full detail of an application. Retrieves a list of applications from one of the collections at Google Play. Retrieves a list of apps that results of searching by the given term. Returns the list of applications by the given developer name. Given a string returns up to five suggestions to complete a search query term. Retrieves a page of reviews for a specific application. Returns a list of similar apps to the...

Downloads: 0 This Week

Last Update: 2022-03-22
See Project
25

WebExtractServer

WebExtractServer use with WebExtractLte for use with web browsers

Browse data, fetched by WebExtractLte directly in your browser. Designed to be used with Webscraper (webscraper.io) - third party web scraper tool, available as plugin for Chrome and Firefox.

Downloads: 0 This Week

Last Update: 2019-04-29
See Project