web scraping free download

Showing 176 open source projects for "web scraping"

View related business solutions

Data management solutions for confident marketing
For companies wanting a complete Data Management solution that is native to Salesforce

Verify, deduplicate, manipulate, and assign records automatically to keep your CRM data accurate, complete, and ready for business.

Learn More
The AI workplace management platform
Plan smart spaces, connect teams, manage assets, and get insights with the leading AI-powered operating system for the built world.

By combining AI workflows, predictive intelligence, and automated insights, OfficeSpace gives leaders a complete view of how their spaces are used and how people work. Facilities, IT, HR, and Real Estate teams use OfficeSpace to optimize space utilization, enhance employee experience, and reduce portfolio costs with precision.

Learn More
1

Web Scraping for Laravel

Laravel adapter for Roach, the complete web scraping toolkit for PHP

This is the Laravel adapter for Roach, the complete web scraping toolkit for PHP. Easily integrate Roach into any Laravel application. The Laravel adapter mostly provides the necessary container bindings for the various services Roach uses, as well as making certain configuration options available via a config file. The Laravel adapter of Roach registers a few Artisan commands to make out development experience as pleasant as possible.

Downloads: 4 This Week

Last Update: 2025-03-21
See Project
2

The Web MCP

A powerful Model Context Protocol (MCP) server

Bright Data’s Web MCP server gives AI assistants robust, real-time web capabilities through an MCP interface designed to avoid blocks, rate limits, and CAPTCHAs. It presents search, crawl, navigate, and extraction tools that agents can call directly, replacing brittle scraping prompts with typed operations. The README markets it as a “gateway” to the live web so assistants don’t fall back to stale training data.

Downloads: 6 This Week

Last Update: 2026-03-29
See Project
3

spider_collection

Collection of Python web scraping scripts for data extraction tasks

spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages. ...

Downloads: 1 This Week

Last Update: 5 days ago
See Project
4

wombat

Lightweight Ruby DSL for scraping structured data from web pages

Wombat is a lightweight web crawling and scraping library written in Ruby that focuses on extracting structured data from web pages using a concise domain-specific language (DSL). It is designed to simplify the process of defining how information should be collected from HTML documents without requiring large amounts of scraping boilerplate code. Developers can declare the data fields they want and specify selectors or rules for retrieving them, allowing Wombat to parse and return structured results. ...

Downloads: 5 This Week

Last Update: 6 days ago
See Project
Next-Gen Encryption for Post-Quantum Security | CLEAR by Quantum Knight
Lock Down Any Resource, Anywhere, Anytime

CLEAR by Quantum Knight is a FIPS-140-3 validated encryption SDK engineered for enterprises requiring top-tier security. Offering robust post-quantum cryptography, CLEAR secures files, streaming media, databases, and networks with ease across over 30 modern platforms. Its compact design, smaller than a single smartphone image, ensures maximum efficiency and low energy consumption.

Learn More
5

kimuraframework

AI-first Ruby framework for building fast, flexible web scraping spide

Kimurai is an open source web scraping framework written in Ruby that simplifies the process of building automated data extraction tools. It provides a clean domain-specific language that allows developers to define scraping logic and data schemas with minimal boilerplate code. Kimurai can use AI-assisted extraction to identify where data resides in HTML pages, automatically generating selectors that are cached for future use so subsequent scraping runs operate with pure Ruby performance. ...

Downloads: 1 This Week

Last Update: 2026-04-02
See Project
6

skycaiji

Open source web scraping system for automated data collection tasks

...SkyCaiji also supports automated workflows that continuously gather data and process it based on defined collection rules. Its architecture enables users to build scalable web scraping pipelines that can run unattended once configured.

Downloads: 2 This Week

Last Update: 14 hours ago
See Project
7

crawler

Collection of JS reverse engineering examples for web scraping study

crawler is a collection of web scraping and JavaScript reverse engineering examples designed for learning how modern websites protect their data and how those protections can be analyzed. It contains many case studies that demonstrate how to analyze and replicate request parameters, cookies, and encryption logic used by real websites. Each directory in the project focuses on a specific target service or scenario, showing how browser network requests and JavaScript code can be studied to reproduce API calls programmatically. ...

Downloads: 4 This Week

Last Update: 4 days ago
See Project
8

Roach

The complete web scraping toolkit for PHP

Roach is a complete web scraping toolkit for PHP. It is a shameless clone heavily inspired by the popular Scrapy package for Python. Roach allows us to define spiders that crawl and scrape web documents. But wait, there’s more. Roach isn’t just a simple crawler, but includes an entire pipeline to clean, persist and otherwise process extracted data as well.

Downloads: 6 This Week

Last Update: 2025-03-21
See Project
9

Deep Research Web UI

AI-powered research assistant that performs iterative, deep research

Deep Research Web UI is an AI-powered research assistant interface designed to automate complex, multi-step information gathering workflows through a combination of search engines, web scraping, and large language models. It operates as a front-end system for deep research agents that iteratively refine queries, retrieve information from multiple sources, and synthesize structured outputs into coherent reports.

Downloads: 1 This Week

Last Update: 2026-03-19
See Project
Outbound sales software
Unified cloud-based platform for dialing, emailing, appointment scheduling, lead management and much more.

Adversus is an outbound dialing solution that helps you streamline your call strategies, automate manual processes, and provide valuable insights to improve your outbound workflows and efficiency.

Learn More
10

rvest

Simple web scraping for R

rvest helps you scrape (or harvest) data from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup and RoboBrowser. If you’re scraping multiple pages, I highly recommend using rvest in concert with polite. The polite package ensures that you’re respecting the robots.txt and not hammering the site with too many requests.

Downloads: 0 This Week

Last Update: 2025-08-29
See Project
11

X-Crawl

Flexible Node.js AI-assisted crawler library

A high-performance web crawling and scraping framework for Node.js, designed for large-scale data extraction.

Downloads: 7 This Week

Last Update: 2025-04-06
See Project
12

ScrapeGraphAI

Python scraper based on AI

Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.

Downloads: 12 This Week

Last Update: 5 days ago
See Project
13

Robin

AI-powered tool for dark web OSINT search and investigation

...Robin also performs scraping of discovered pages through Tor sessions, allowing users to gather additional context from dark web sites while maintaining the required network routing. By integrating AI models, the platform can interpret results, highlight key information, and produce summaries that help analysts understand findings faster. The project provides a modular architecture separating search, scraping, and AI processing components so it can be extended with new data sources.

Downloads: 18 This Week

Last Update: 2026-03-31
See Project
14

Geziyor

Blazing fast Go framework for web crawling and data scraping tasks

Geziyor is a high-performance web crawling and web scraping framework built for the Go programming language. It is designed to help developers crawl websites and extract structured information from web pages efficiently. It focuses on speed and scalability, allowing large numbers of requests to be processed concurrently. Geziyor supports use cases such as data mining, monitoring web content, and automated testing workflows.

Downloads: 2 This Week

Last Update: 6 days ago
See Project
15

Ulixee Hero

The web browser built for scraping

It's the first modern headless browsers designed specifically for scraping instead of just automated testing. Hero provides access to the W3C DOM specification without the need for Puppeteer's complicated evaluate callbacks and multi-context switching. We've recreated a fully compliant DOM directly in NodeJS allowing you bypass the headaches of previous scraper tools. The powerful Chrome engine sits under the hood, allowing for lightning fast rendering. Emulators make it easy to disguise...

Downloads: 5 This Week

Last Update: 2025-09-08
See Project
16

Scrapling

An adaptive Web Scraping framework

Scrapling is an adaptive web scraping framework designed to handle everything from a single HTTP request to large-scale, concurrent crawls. Built for modern websites, it intelligently adapts to structural changes by automatically relocating elements when page layouts update. The framework includes advanced fetchers capable of bypassing anti-bot protections such as Cloudflare Turnstile using stealth and browser automation techniques.

Downloads: 4 This Week

Last Update: 16 hours ago
See Project
17

Python-Spider

Python3 web crawler practice

Python-Spider is a repository intended to teach or provide examples for writing web spiders / crawlers in Python — part of a broader learning and resource collection by its author. The code and documentation are oriented toward beginners or intermediate learners who want to learn how to fetch, parse, and extract data from websites programmatically. As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe concurrency or scheduling to crawl multiple pages, and techniques to handle common web-scraping issues. ...

Downloads: 1 This Week

Last Update: 2025-12-08
See Project
18

crawlee

A web scraping and browser automation library for Node.js

Crawlee is a web scraping and browser automation library. It helps you build reliable crawlers. Fast. Crawlee won't fix broken selectors for you (yet), but it helps you build and maintain your crawlers faster. When a website adds JavaScript rendering, you don't have to rewrite everything, only switch to one of the browser crawlers. When you later find a great API to speed up your crawls, flip the switch back.

Downloads: 7 This Week

Last Update: 2026-02-06
See Project
19

Parsera

Lightweight library for scraping web-sites with LLMs

Scrape data from any website with only a link and column descriptions. Parsera is a tool designed to scrape web content, specifically handling poorly structured or messy websites.

Downloads: 5 This Week

Last Update: 2025-10-08
See Project
20

MDCx

Movie metadata scraper and organizer for media libraries and NFO

...It also supports image processing tasks such as downloading and cropping artwork used by media centers. It includes several interfaces, allowing users to operate it through a graphical desktop application, a browser-based web interface, or command-line utilities depending on their workflow. Its architecture separates core scraping logic from the user interfaces, allowing the same metadata processing system to be reused across different modes.

Downloads: 11 This Week

Last Update: 2026-03-10
See Project
21

DotnetSpider

Lightweight .NET framework for fast web crawling and data scraping

DotnetSpider is a web crawling and data extraction framework built on the .NET Standard platform. It is designed to help developers create efficient and scalable crawlers for collecting structured data from websites. It provides a high-level API that simplifies the process of defining spiders, managing requests, and extracting content from web pages. Developers can create custom spiders by extending base classes and configuring pipelines that handle downloading, parsing, and storing collected data. ...

Downloads: 4 This Week

Last Update: 2026-03-10
See Project
22

Actors MCP Server

Model Context Protocol (MCP) Server for Apify's Actors

The Apify Actors MCP Server is a Model Context Protocol (MCP) server that enables AI assistants to interact with Apify Actors. This integration allows AI models to utilize various web scraping and automation tools provided by Apify, facilitating tasks such as data extraction and web automation.

Downloads: 9 This Week

Last Update: 6 days ago
See Project
23

Python Code Tutorials

The Python Code Tutorials

Python Code Tutorials is a large educational repository that aggregates programming tutorials from the “The Python Code” website into a structured collection of Python projects and learning materials. The repository covers a wide range of programming topics including cybersecurity, networking, web scraping, machine learning, GUI development, and automation scripts. Each tutorial typically includes complete Python code examples and explanations that demonstrate how to build real tools and applications step by step. Many tutorials focus on practical implementations such as building network scanners, web scraping tools, object detection systems, and automation utilities using Python libraries. ...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
24

Firecrawl MCP Server

Adds powerful web scraping and search to Cursor and Claude

firecrawl-mcp-server is the official MCP integration for Firecrawl that brings high-recall web scraping, crawling, and search into IDEs and agent runtimes. It exposes tools for single-page scrape, multi-URL batch jobs, site discovery, and search enrichment, returning cleaned, structured content suitable for downstream LLM reasoning. The server is designed to run with Firecrawl’s hosted API or self-hosted deployments, making it flexible for enterprise data-governance requirements. ...

Downloads: 5 This Week

Last Update: 2025-10-08
See Project
25

HeadlessX

The undetected self-hosted browser automation platform

HeadlessX is an open-source, self-hosted browser automation platform designed to run headless browsers for tasks such as web scraping, automation, and testing. The system provides a centralized service that allows developers to programmatically control browser sessions and extract data from websites through a structured API. It is built using modern technologies including Node.js, Next.js, TypeScript, and Playwright, and uses a specialized browser engine called Camoufox based on Firefox. ...

Downloads: 6 This Week

Last Update: 2026-03-25
See Project