Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Search Results

Search Results for "web scraper extractor"

x

Sort By:

Relevance

OS

Windows 88
Linux 70
Mac 68
More...
BSD 24
ChromeOS 23
Mobile Operating Systems 3
Desktop Operating Systems 1

Category

Internet 64
Software Development 18
System 7
Business 6
Formats and Protocols 6
Multimedia 6
Scientific/Engineering 6
Communications 5
Artificial Intelligence 4
Security 3
Mobile 2
Desktop Environment 1
Education 1
Social sciences 1

License

OSI-Approved Open Source 69
Creative Commons Attribution License 5
Other License 2
Public Domain 2

Translations

English 19
Chinese (Simplified) 1
French 1
German 1
More...
Russian 1
Spanish 1

Programming Language

Python 21
Java 20
JavaScript 14
PHP 11
More...
C# 10
TypeScript 4
Go 2
Visual Basic .NET 2
Common Lisp 1
Objective C 1
R 1
Ruby 1
Rust 1
Unix Shell 1

Status

Production/Stable 20
Beta 19
Alpha 6
Pre-Alpha 4
More...
Planning 2
Mature 1

Showing 107 open source projects for "web scraper extractor"

View related business solutions

Loan management software that makes it easy.
Ideal for lending professionals who are looking for a feature rich loan management system

Bryt Software is ideal for lending professionals who are looking for a feature rich loan management system that is intuitive and easy to use. We are 100% cloud-based, software as a service. We believe in providing our customers with fair and honest pricing. Our monthly fees are based on your number of users and we have a minimal implementation charge.

Learn More
Iris Powered By Generali - Iris puts your customer in control of their identity.
Increase customer and employee retention by offering Onwatch identity protection today.

Iris Identity Protection API sends identity monitoring and alerts data into your existing digital environment – an ideal solution for businesses that are looking to offer their customers identity protection services without having to build a new product or app from scratch.

Learn More
1

shot-scraper

A command-line utility for taking automated screenshots of websites

shot-scraper is a command-line utility for taking automated screenshots of web pages using a headless browser engine. After installation, a single command can capture a full-page screenshot of a URL and save it to a file, making it ideal for documentation, monitoring, and visual regression tasks. Under the hood it uses a modern browser (installed via a one-time shot-scraper install step) and exposes options for viewport size, full-page versus clipped screenshots, and device emulation. ...

Downloads: 1 This Week

Last Update: 2026-02-01
See Project
2

Article Extractor

To extract main article from given URL with Node.js

A Node.js library for extracting main content from web articles, removing unnecessary clutter like ads and navigation elements.

Downloads: 1 This Week

Last Update: 2025-09-04
See Project
3

Google Maps Extractor

Free Google Map Extractor(With Email) | Google Maps Scraper

A free Google Map extractor for business leads—fast & efficient! This Google Maps scraper extracts phone numbers, emails, locations, and social media profiles, then exports to CSV. Visit: https://gmplus.io/

Downloads: 3 This Week

Last Update: 2025-04-12
See Project
4

CommunityScrapers

This is a public repository containing scrapers

Stash Community Scrapers is a large open-source collection of metadata extraction tools designed to work with the Stash media management platform, enabling automated scraping of content information from various online sources. The repository contains hundreds of scraper definitions written primarily in YAML and Python, each tailored to extract structured metadata such as titles, performers, tags, and media details from specific websites. These scrapers integrate directly into Stash, allowing...

Downloads: 2 This Week

Last Update: 2026-04-14
See Project
Premier Construction Software
Premier is a global leader in financial construction ERP software.

Rated #1 Construction Accounting Software by Forbes Advisor in 2022 & 2023. Our modern SAAS solution is designed to meet the needs of General Contractors, Developers/Owners, Homebuilders & Specialty Contractors.

Learn More
5

CyberScraper 2077

A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama

CyberScraper 2077 is not just another web scraping tool – it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI, Gemini and LocalLLM Models to slice through the web's defenses, extracting the data you need with unparalleled precision and style.

Downloads: 2 This Week

Last Update: 2026-01-20
See Project
6

Scraper of Death

Scraper of Death is a web scraper. Multiple Scraping Methods Requests + BeautifulSoup (fast, lightweight) Selenium (JavaScript support, dynamic content)

Downloads: 3 This Week

Last Update: 2026-02-19
See Project
7

JobFunnel

Scrape job websites into a single spreadsheet with no duplicates.

Scrape job websites into a single spreadsheet with no duplicates. Automated tool for scraping job postings into a .csv file. You can search for jobs with YAML configuration files or by passing command arguments. By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets. Run funnel with your settings YAML to populate your master CSV file with jobs from available providers. JobFunnel can be easily automated to run nightly with crontab. If you...

Downloads: 1 This Week

Last Update: 2024-09-29
See Project
8

html-metadata

MetaData html scraper and parser for Node.js (supports Promises

The aim of this library is to be a comprehensive source for extracting all HTML-embedded metadata. Currently, it supports Schema.org microdata using a third-party library, a native BEPress, Dublin Core, Highwire Press, JSON-LD, Open Graph, Twitter, EPrints, PRISM, and COinS implementation, and some general metadata that doesn't belong to a particular standard (for instance, the content of the title tag, or meta description tags). Planned is support for RDFa, AGLS, and other yet unheard-of...

Downloads: 0 This Week

Last Update: 2025-04-30
See Project
9

Web Spider, Web Crawler, Email Extractor

Free Extracts Emails, Phones and custom text from Web using JAVA Regex

In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender : https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk

Downloads: 9 This Week

Last Update: 2025-11-23
See Project
Inventory and Order Management Software for Multichannel Sellers
Avoid stockouts, overselling, and losing control as your business grows.

We are the most powerful inventory and order management platform for Amazon, Walmart, and multichannel product sellers. Centralize orders, product information, and fulfillment operations to run more efficiently, sell more products, and stay compliant with marketplace requirements so you can grow profitably.

Learn More
10

ScrapeGraphAI

Python scraper based on AI

Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.

Downloads: 3 This Week

Last Update: 3 days ago
See Project
11

dude uncomplicated data extraction

dude uncomplicated data extraction: A simple framework

Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.

Downloads: 0 This Week

Last Update: 2024-03-02
See Project
12

Crawl4AI

Open-source LLM Friendly Web Crawler & Scraper

Crawl4AI is a high-performance, AI‑ready web crawler tailored for LLM data ingestion and RAG pipelines. It supports adaptive crawling heuristics (stopping when enough info is gathered), structured markdown output, and high-speed parallel execution. Designed to operate at scale with optional Docker deployment and framework integrations.

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
13

Ulixee Hero

The web browser built for scraping

It's the first modern headless browsers designed specifically for scraping instead of just automated testing. Hero provides access to the W3C DOM specification without the need for Puppeteer's complicated evaluate callbacks and multi-context switching. We've recreated a fully compliant DOM directly in NodeJS allowing you bypass the headaches of previous scraper tools. The powerful Chrome engine sits under the hood, allowing for lightning fast rendering. Emulators make it easy to disguise...

Downloads: 0 This Week

Last Update: 2025-09-08
See Project
14

Spider

High-performance Rust web crawler and scraper for large-scale data

Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large...

Downloads: 1 This Week

Last Update: 2026-03-31
See Project
15

MDCx

Movie metadata scraper and organizer for media libraries and NFO

MDCx is an open source media metadata scraping and organization tool designed to automate the process of collecting detailed information for movie files. It retrieves metadata from multiple online sources and applies it to local media collections, helping users maintain structured and well-organized libraries. MDCx can download information such as titles, cast data, artwork, and other metadata, then generate standardized NFO files compatible with media management systems. It also supports...

Downloads: 4 This Week

Last Update: 2026-03-10
See Project
16

Media Extractor

Google Chrome extension designed to intercept and download media files

# Media Extractor Media Extractor is a Google Chrome extension designed to intercept and download media files directly from websites. The extension monitors network requests in the browser and allows users to download detected media files such as video and audio streams. ## Key Features - Intercepts media files from web pages - Supports video and audio downloads - Works directly inside Google Chrome - Simple and intuitive interface - No external tools required ## Use Cases - Downloading embedded video content - Saving audio streams from websites - Analyzing media network requests - Offline media access ## Download - Chrome Web Store (if published): https://github.com/exxellengames/Media-Extractor/releases ## Official Website - EN: https://exxellengames.great-site.net/en/ - RU: https://exxellengames.great-site.net/ru/ ## Author exxellengames

Downloads: 0 This Week

Last Update: 2025-12-31
See Project
17

Trafilatura

Python & command-line tool to gather text on the Web

...The extractor tries to strike a balance between limiting noise (precision) and including all valid parts (recall). It also has to be robust and reasonably fast, it runs in production on millions of documents.

Downloads: 0 This Week

Last Update: 2024-12-03
See Project
18

crwlr

Library for Rapid (Web) Crawler and Scraper Development

This library provides kind of a framework and a lot of ready-to-use, so-called steps, that you can use as building blocks, to build your own crawlers and scrapers with. Before diving into the library, let's have a look at the terms crawling and scraping. For most real-world use cases, those two things go hand in hand, which is why this library helps with and combines both. A (web) crawler is a program that (down)loads documents and follows the links in it to load them as well. A crawler...

Downloads: 0 This Week

Last Update: 2026-01-05
See Project
19

newpipeextractor

Library for extracting streaming site data without official APIs

...It handles many low-level tasks involved in web data extraction, including parsing responses, managing platform-specific logic, and handling errors, allowing developers to focus on implementing application features rather than scraping mechanics. Each supported service is implemented through its own extractor components that conform to a common interface, enabling consistent access to data across different platforms.

Downloads: 1 This Week

Last Update: 2026-04-10
See Project
20

Snap Lens File Extractor

Online file extractor for the Snapcha lens file format

Web Browser based JavaScript Online File Extractor, Parser, Unpacker and Zip File Converter. Reads and unpacks the Snap Camera / Snapchat Lens File Format (lens.lns / *.lns) Snap Lens Tool https://snap-lens-tool.sourceforge.io Snap Lens File Format https://snap-lens-file-format.sourceforge.io

Downloads: 0 This Week

Last Update: 2025-03-03
See Project
21

java-pdf-table-extractor-lib

Java Pdf Table extraction library

The command line application is an example of usage of the Java library. The library is based on pdfbox library and works by looking for the layout of each selected pdf page, and looking for table structure patterns. After calling the library (passing the pdf filename, and the page range), the result is a List<PdfTextElement>. PdfTextElement is an interface that has two implementations. * A basic text (outside the tables) * And PdfTextTabulaElement, for table structures. That...

Downloads: 0 This Week

Last Update: 2025-09-12
See Project
22

ConsoleWebScraper

It allows you to input a URL and it will scrape the HTML content...

...After the application has successfully completed its operation, the results will be saved on your desktop in a folder named "WebScrapperProject". Note This is a basic web scraper and may not work with all websites, especially those that heavily rely on JavaScript for rendering content or have measures in place to prevent scraping. Author Bohdan Harabadzhyu License This project is licensed under the terms of the GNU General Public License v3.0 (GPL-3.0) - see the LICENSE file for details.

Downloads: 0 This Week

Last Update: 2025-01-21
See Project
23

ai-scrapper

🚀 Discover AI Web Scraper! 🚀 Tired of copying and pasting data from websites? I developed a desktop application with Electron and Gemini AI to extract structured data easily and efficiently! 🤖✨

1 Review

Downloads: 0 This Week

Last Update: 2025-05-31
See Project
24

Media Dock

Extract and download media from any website with ease and speed.

...It operates securely without tracking your activity and ensures smooth performance without slowing down your browsing experience. Perfect for students, content creators, and professionals, Media Extractor empowers users to access web media effortlessly.

Downloads: 1 This Week

Last Update: 2025-09-14
See Project
25

scraper-with-chatgpt

It is a powerful data scraping tool that helps you extract information from various online sources. Easily collect data from Google SERP, Maps, Shopify, Zillow, and more. With a user-friendly interface, you can scrape and save data in JSON or Excel formats. Unlock insights from the web effortlessly with scrape-it.cloud API.

Downloads: 1 This Week

Last Update: 2023-08-28
See Project

Previous
You're on page 1
2
3
4
5
Next

Related Searches

email extractor

m3u url extractor

email scraper

web scraper

web scraping

scrape

web crawler

scraper

google map email lead

free bulk email extractor

Related Categories

Internet

Software Development

System

Business

Formats and Protocols

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Privacy Choices Advertise