Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Search Results

Search Results for "python crawler" - Page 2

x

Sort By:

Relevance

OS

Linux 66
Mac 59
Windows 59
More...
BSD 43
ChromeOS 41

Category

Internet 49
Security 8
Software Development 7
System 5
Games 3
Artificial Intelligence 2
Scientific/Engineering 2
Communications 1
Formats and Protocols 1
Multimedia 1

License

OSI-Approved Open Source 54
Other License 3
Creative Commons Attribution License 1

Translations

English 6
Hindi 1
Italian 1
Panjabi 1
More...
Polish 1

Programming Language

Python 61
JavaScript 5
PHP 5
C++ 4
More...
Unix Shell 4
Java 3
C 2
PowerShell 2
TypeScript 2
Go 1

Status

Beta 7
Production/Stable 6
Planning 5
Pre-Alpha 3
More...
Alpha 2

Showing 70 open source projects for "python crawler"

View related business solutions

Caller ID Reputation provides the most comprehensive view of your caller ID scores across all carriers
Instantly identify flagged caller IDs and decrease flags by up to 95% your first month.

Keep your agents on the phone with increased connection rates by monitoring your phone number reputation across all major carriers and call blocking apps.

Learn More
anny is an all-in-one platform for managing hybrid workplaces and shared resources.
For Businesses looking for a flexible solution for internal and external bookings

Enable your employees to easily book desks, meeting rooms, parking spots, equipment, and more – all in one place. With flexible rules and group permissions, you stay in full control of who can access what.

Learn More
1

dirhunt

Web crawler that finds hidden web directories without brute force

Dirhunt is an open source security tool designed to discover web directories and analyze website structures without relying on brute-force techniques. Instead of sending large numbers of guess-based requests, it operates as a specialized crawler that intelligently explores websites to identify accessible or hidden directories. Dirhunt can detect directories that expose “Index Of” listings, which may reveal files and other resources that were not intended to be publicly visible. It can also...

Downloads: 1 This Week

Last Update: 2026-03-11
See Project
2

DecryptLogin

Python library providing APIs for automated website login workflows

DecryptLogin is a Python library designed to simplify automated login processes for many popular websites by providing ready-to-use APIs that simulate authentication behavior. It focuses on implementing login mechanisms through HTTP requests, allowing developers to programmatically authenticate with supported services without manually replicating complex login flows. It includes modules that handle different authentication modes such as PC login, mobile login, and QR code login depending on...

Downloads: 0 This Week

Last Update: 5 days ago
See Project
3

AnimeGAN

A simple PyTorch Implementation of Generative Adversarial Networks

...The images are not clean, some outliers can be observed, which degrades the quality of the generated images. Anime-style images of 126 tags are collected from danbooru.donmai.us using the crawler tool gallery-dl. The images are then processed by an anime face detector python-anime face. The resulting dataset contains ~143,000 anime faces. Note that some of the tags may no longer be meaningful after cropping, i.e. the cropped face images under the 'uniform' tag may not contain visible parts of uniforms.

Downloads: 0 This Week

Last Update: 2023-03-21
See Project
4

grab-site

Web crawler for archiving and backing up sites into WARC archives

grab-site is an open source web crawling tool designed to archive and back up websites by recursively downloading their content. It works by taking a starting URL and systematically following links across the site, capturing pages and resources and saving them into WARC archive files for long-term preservation. Internally, the crawler uses a fork of the wpull engine to fetch and process web pages efficiently during large-scale crawls. grab-site includes a built-in dashboard that displays...

Downloads: 0 This Week

Last Update: 24 hours ago
See Project
Dragonfly | An In-Memory Data Store without Limits
Dragonfly Cloud is engineered to handle the heaviest data workloads with the strictest security requirements.

Dragonfly is a drop-in Redis replacement that is designed for heavy data workloads running on modern cloud hardware. Migrate in less than a day and experience up to 25X the performance on half the infrastructure.

Learn More
5

pspider

Simple Python framework for building multithreaded web crawlers

...Its modular design also makes it easier to extend the framework with additional features or integrate it into existing Python projects.

Downloads: 1 This Week

Last Update: 4 days ago
See Project
6

instagram-profilecrawl

Instagram profile crawler that extracts posts, tags, and stats

...It also provides scripts for downloading images from crawled profiles and logging statistics into CSV files for tracking metrics like followers, likes, and comments. Authentication is optional, meaning the crawler can access public profile data without logging in.

Downloads: 3 This Week

Last Update: 4 days ago
See Project
7

ReconSpider

Most Advanced Open Source Intelligence (OSINT) Framework

ReconSpider is most Advanced Open Source Intelligence (OSINT) Framework for scanning IP Addresses, Emails, Websites, and Organizations and find out information from different sources. ReconSpider can be used by Infosec Researchers, Penetration Testers, Bug Hunters, and Cyber Crime Investigators to find deep information about their target. ReconSpider aggregate all the raw data, visualize it on a dashboard, and facilitate alerting and monitoring on the data. Recon Spider also combines the...

Downloads: 6 This Week

Last Update: 2022-11-25
See Project
8

CEF Python

Python bindings for the Chromium Embedded Framework (CEF)

Python bindings for the Chromium Embedded Framework (CEF). CEF Python is an open source project founded by Czarek Tomczak in 2012 to provide Python bindings for the Chromium Embedded Framework (CEF). The Chromium project focuses mainly on Google Chrome application development while CEF focuses on facilitating embedded browser use cases in third-party applications. Lots of applications use CEF control, there are more than 100 million CEF instances installed around the world. There are...

Downloads: 11 This Week

Last Update: 2022-05-03
See Project
9

lxspider

Educational Python web scraping case collection for many sites

lxSpider is a collection of web scraping examples designed primarily for learning and experimentation with data extraction techniques. It gathers numerous crawler implementations that demonstrate how to collect data from a wide range of websites and online services. It focuses heavily on practical cases that illustrate how different platforms handle requests, authentication parameters, and anti-scraping protections. lxSpider includes examples targeting areas such as e-commerce platforms,...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
Outplacement, Executive Coaching and Career Development | Careerminds
Careerminds outplacement includes personalized coaching and a high-tech approach to help transition employees back to work faster.

By helping to avoid the potential risks of RIFs or layoffs through our global outplacement services, companies can move forward with their goals while preserving their internal culture, employer brand, and bottom lines.

Learn More
10

BotSlayer

BotSlayer Community Edition

BotSlayer is an application that helps track and detect potential manipulation of information spreading on Twitter. The tool is developed by the Observatory on Social Media at Indiana University --- the same lab that brought to you Botometer and Hoaxy. BotSlayer is not a tool to detect and remove likely social bots from your list of Twitter followers or friends. For that purpose, check out Botometer. If you just want to visualize the spread of some piece of information, consider Hoaxy....

Downloads: 0 This Week

Last Update: 2023-07-13
See Project
11

ECommerceCrawlers

Collection of Python ecommerce and website crawler examples projects

ECommerceCrawlers is a collection of practical Python web crawler projects designed to gather data from a variety of ecommerce platforms, websites, and online services. It aggregates many independent crawler examples created by contributors and organized into separate subprojects that target specific sites or data sources. These examples demonstrate how to build and operate web scrapers capable of collecting structured information such as product listings, news content, job postings, social media data, and other publicly available web data. ...

Downloads: 7 This Week

Last Update: 11 hours ago
See Project
12

Photon

Incredibly fast crawler designed for OSINT

Photon is an extremely fast web crawler built specifically for OSINT and reconnaissance use cases. It is designed to extract URLs, endpoints, files, and other intelligence artifacts from target websites with minimal overhead. The crawler prioritizes speed and breadth, making it suitable for mapping web attack surfaces and discovering hidden resources. Photon is commonly used during early reconnaissance phases to build a comprehensive inventory of reachable assets.

Downloads: 5 This Week

Last Update: 2026-03-03
See Project
13

ShadowSocksShare

Python ShadowSocks framework

This project obtains the shared ss(r) account from the ss(r) shared website crawler, redistributes the account and generates a subscription link by parsing and verifying the account connectivity. Since Google plus will be closed on April 2, 2019, almost all the available accounts crawled before come from Google plus. So if you are building your own website, please keep an eye on the updates of this project and redeploy using the latest source code.

Downloads: 0 This Week

Last Update: 2022-11-09
See Project
14

mzitu

Python crawler that downloads image galleries and analyzes titles

mzitu is a Python-based web crawling project designed to automatically download and organize image galleries from a specific photography site. It demonstrates how to build a scraper that navigates gallery pages, retrieves image links, and saves the images locally in a structured directory layout. It focuses on automating the collection of large sets of images by programmatically parsing page content and iterating through gallery entries. mzitu also includes a simple analysis script that...

Downloads: 3 This Week

Last Update: 4 days ago
See Project
15

WeChatSogou

Python library to crawl and retrieve data from WeChat accounts

WechatSogou is an open source Python library designed to retrieve data from WeChat official accounts by using the Sogou WeChat search service as its data source. It provides developers with a programmatic way to search for public accounts and collect article information without manually browsing the search interface. It functions as a crawler interface that sends requests to the search engine, retrieves results, and converts the returned pages into structured data that can be used in applications or analysis pipelines. ...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
16

pyspider

A powerful Spider(Web Crawler) system in Python

pyspider is a powerful Spider(Web Crawler) system in Python. Components are connected by message queue. Every component, including message queue, is running in their own process/thread, and replaceable. That means, when process is slow, you can have many instances of processor and make full use of multiple CPUs, or deploy to multiple machines. This architecture makes pyspider really fast. benchmarking.

Downloads: 0 This Week

Last Update: 2021-03-31
See Project
17

haipproxy

Distributed proxy IP pool for web crawlers using Scrapy and Redis

...Its architecture supports distributed deployment, allowing multiple crawler workers and validators to run across different machines.

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
18

gain

Asyncio-based Python framework for building fast web crawling spiders

Gain is a Python web crawling framework designed to simplify the process of building efficient and scalable web scrapers. It is built on top of asynchronous technologies such as asyncio, aiohttp, and uvloop to support high-performance crawling with concurrent network requests. It provides a structured framework for creating spiders that can navigate websites, extract structured data, and process the collected results. Developers define crawlers using components such as spiders, parsers, and...

Downloads: 1 This Week

Last Update: 4 days ago
See Project
19

Toapi

Convert websites into structured APIs automatically with Python tool

Toapi is a Python library designed to transform ordinary websites into usable API services. Instead of building a traditional web crawler that collects and stores data before exposing it through an API, Toapi simplifies the process by allowing developers to define data structures that automatically generate an API layer from existing web pages. It works by parsing HTML content from a source site and mapping selected elements into structured data that can be returned as JSON through API endpoints. ...

Downloads: 1 This Week

Last Update: 3 days ago
See Project
20

diskover

File system crawler and disk space usage software

diskover is a file system crawler and disk space usage software that uses Elasticsearch to index your file metadata. diskover crawls and indexes your files on a local computer or remote storage server over network mounts. diskover helps manage your storage by identifying old and unused files and give better insights into data change "hotfiles", file duplication "dupes" and wasted space. It is designed to help deal with managing large amounts of data growth and provide detailed storage...

Downloads: 0 This Week

Last Update: 2020-05-16
See Project
21

de:Code

Top-down dungeon crawler

de:Code is a 3rd-person top-down dungeon crawler. The world is procedurally generated using the file structure of the users hard drive. The game will use a mixture of different genres including steampunk, fantasy, mid-evil, and modern. The user will have to travel down 4 main paths each progressively harder than the last and each will have more than one genre conflicting inside. Each main path (connected by a central hub) will get harder as the user progresses down, finally reaching a unique...

Downloads: 0 This Week

Last Update: 2016-10-12
See Project
22

sitecheck

Modular web site spider for web developers.

More than just a link checker, sitecheck is a website spider (also known as a crawler) which can assist with SEO by testing an entire site plus both inbound links from search engines and outbound links to other sites for the following issues: looping redirects (HTTP 301/302), broken links (HTTP 404), server errors (HTTP 500), spelling mistakes, low readability scores (using the Flesch Reading Ease test), missing/empty/duplicate meta tags, duplicate content, slow page speed, W3C validation...

1 Review

Downloads: 0 This Week

Last Update: 2014-10-04
See Project
23

Domain Analyzer Security Tool

Finds all the security information for a given domain name

Domain analyzer is a security analysis tool which automatically discovers and reports information about the given domain. Its main purpose is to analyze domains in an unattended way.

Downloads: 2 This Week

Last Update: 2016-11-26
See Project
24

SauceWalk Proxy Helper

Enumeration and automation of file discovery for your sec tools.

SauceWalk is a freeware(.exe)/Open Source(.py) tool for aiding in the enumeration of web application structure. It consists of 2 parts a local executable (walk.exe) and a remote agent. Walk.exe iterates through the local files and folders of your target web application (for example a local copy of Wordpress) and generates requests via your favourite proxy (for example burp suite) against a given target url. The remote agent can be used to identify target files and folders on a live...

Downloads: 0 This Week

Last Update: 2013-09-24
See Project
25

Ancient World Of Generica

A Current Text combat game transitioning to Dungeon Crawler/Roguelike

This is a little game i have programmed for a while in python and compiled with pyinstaller. It is a very early alpha with only a few feature to boot in the first internet release version. These include: -Rudimentry Saving and loading -Weapons, armor and potions(but no inventory to store them in as of yet) - A modding interface for the above -Basic Text combat and morale implemented for a basic monster logic - Basic procedural generation for the above -Classes and Races This at the...

Downloads: 0 This Week

Last Update: 2015-06-27
See Project

Previous
1
You're on page 2
3
Next

Related Searches

osint

algorithmic trading python

lg bypass tool

link checker

scan

burp suite

•mobile phone forensics tools

osint framework

cyber security

wxpython

Related Categories

Internet

Security

Software Development

System

Games

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Privacy Choices Advertise