Search Results for "web crawler source code"

Sort By:

1852 projects for "web crawler source code" with 1 filter applied:

BSD Clear Filters & Widen Search

PairSoft | AP Automation and Doc Management
Free your team from manual processes.

Streamline operations and elevate your team's efficiency with PairSoft. Our AP automation, procurement, and document management solutions eliminate manual processes, cut costs, and free your team to focus on strategic initiatives. Experience our state-of-the-art invoice-to-pay solution, now integrated with advanced AI technology for faster, smarter results. Our customers report a significant 70% reduction in approval times and annual savings of $62,000 in employee hours. At PairSoft, we aim to transform your business operations through automation. Explore the future of automation at pairsoft.com, where you can leverage cutting-edge features like invoice capture, OCR, and comprehensive AP automation to transform your workflow. Whether you are a small business or a large enterprise, our solutions are designed to scale with your needs, providing robust functionality and ease of use. Join the growing number of businesses that trust PairSoft.

Learn More
Shoplogix Smart Factory Platform
For manufacturers looking for a powerful Manufacturing Execution solution

Real-time Visibility into Your Shop Floor's Performance. The Shoplogix smart factory platform enables manufacturers to increase overall equipment effectiveness, reduce operational costs, sustain growth and improve profitability by allowing them to visualize, integrate and act on production and machine performance in real-time. Manufacturers that trust us to drive efficiency in their factories. Real-time visual data and analytics provide valuable insights to make better informed decisions. Uncover hidden shop floor potential and drive rapid time to value. Develop a continuously improving culture through training, education and data-driven decisions. Compete in the i4.0 world by making the Shoplogix Smart Factory Platform the cornerstone of your digital transformation. Connect to any equipment or device to automate data collection and exchange it with other manufacturing technologies. Automatically monitor, report and analyze machine states to track real-time production.

Learn More
1

crawler

Collection of JS reverse engineering examples for web scraping study

crawler is a collection of web scraping and JavaScript reverse engineering examples designed for learning how modern websites protect their data and how those protections can be analyzed. It contains many case studies that demonstrate how to analyze and replicate request parameters, cookies, and encryption logic used by real websites. Each directory in the project focuses on a specific target service or scenario, showing how browser network requests and JavaScript code can be studied to reproduce API calls programmatically. ...

Downloads: 0 This Week

Last Update: 2 days ago
See Project
2

AI-Crawler

Crawl a website starting from a URL, find relevant pages

AI Crawler is an experimental AI-powered web crawling and data extraction tool that uses natural language prompts to guide the discovery and retrieval of relevant information across websites. Unlike traditional web scrapers that rely on static selectors and manual scripting, it uses AI to dynamically identify and prioritize pages based on user intent, making it more flexible and resilient to changes in website structure.

Downloads: 1 This Week

Last Update: 2026-04-02
See Project
3

Spatie Crawler

An easy to use, powerful crawler implemented in PHP

Spatie Crawler is a PHP library that allows developers to crawl websites and extract information efficiently. It can be used for web scraping, link checking, or automated testing of web pages. The library is simple to use and supports customizable crawling strategies, including controlling crawl depth and handling redirects. It’s suitable for building crawlers that navigate large or dynamically generated websites.

Downloads: 2 This Week

Last Update: 4 days ago
See Project
4

GPT Crawler

Crawl a site to generate knowledge files to create your own custom GPT

GPT Crawler is an open-source tool designed to automatically crawl websites and generate structured knowledge that can be used to build AI assistants and retrieval systems. It focuses on extracting high-quality textual content from web pages and preparing it in formats suitable for embedding, indexing, or fine-tuning workflows. The project is especially useful for teams that want to turn documentation sites or knowledge bases into conversational AI backends without building custom scrapers from scratch. ...

Downloads: 5 This Week

Last Update: 2026-03-02
See Project
Software for managing apps and accounts | WebCatalog
Tired of juggling countless browser tabs? WebCatalog Desktop turns your favorite web apps into dedicated desktop apps

Turn websites into desktop apps with WebCatalog Desktop—your all-in-one tool to manage apps and accounts. Switch between multiple accounts, organize apps by workflow, and access a curated catalog of desktop apps for Mac and Windows.

Learn More
5

tumblr-crawler

Python crawler to download photos and videos from Tumblr blogs

tumblr-crawler is an open source Python-based utility designed to download media content from Tumblr blogs. It provides a script that automatically retrieves photos and videos from specified Tumblr sites and saves them locally for offline access. Users can specify one or multiple blogs to crawl by editing a configuration file or by passing parameters through the command line. Once executed, the script fetches media from the Tumblr API and stores the downloaded files in folders named after...

Downloads: 2 This Week

Last Update: 3 days ago
See Project
6

Python API for JMComic

Python crawler and API for downloading JMComic albums and images

JMComic-Crawler-Python is a Python library and crawler framework designed to programmatically access and download comic content from the JMComic platform. It provides a structured API that allows developers to retrieve albums, chapters, and images using simple Python code while handling the necessary network requests and data processing behind the scenes.

Downloads: 1 This Week

Last Update: 2026-04-07
See Project
7

Every Code

Local AI coding agent CLI with multi-agent orchestration tools

Every Code (often referred to simply as Code) is a fast, local AI-powered coding agent designed to run directly in the terminal environment. It is a community-driven fork of the Codex CLI, with a strong emphasis on improving real-world developer ergonomics and workflows. Every Code enhances the traditional coding assistant model by introducing multi-agent orchestration, allowing multiple AI agents to collaborate, compare solutions, and refine outputs in parallel. It supports integration with...

Downloads: 23 This Week

Last Update: 2 days ago
See Project
8

Phoenix Code Editor

Phoenix is a modern open-source Code Editor for the web

Phoenix is a modern open-source and free software code editor for the web, built for the browser.

Downloads: 14 This Week

Last Update: 2026-01-19
See Project
9

dxy-covid-19-crawler

Realtime crawler for COVID-19 outbreak statistics from DXY data

DXY-COVID-19-Crawler is a Python-based project designed to collect real-time COVID-19 infection data from the public dataset provided by Ding Xiang Yuan (DXY). The crawler periodically retrieves pandemic statistics and stores them in a database so that historical changes in the outbreak can be preserved and analyzed later. It was created to make up-to-date infection data more accessible for developers, researchers, and analysts who wanted to build visualizations or conduct data analysis...

Downloads: 4 This Week

Last Update: 3 days ago
See Project
CloudZero: The Cloud Cost Optimization Platform
CloudZero automates the collection, allocation, and analysis of your infrastructure and AI spend to uncover waste and improve unit economics.

CloudZero is the leader in proactive cloud cost efficiency. We enable engineers to build cost-efficient software without slowing down innovation. CloudZero's next-generation cloud cost optimization platform automates the collection, allocation, and analysis of cloud costs to uncover savings opportunities and improve unit economics. We are the only platform that enables companies to understand 100% of their operational cloud spend and take an engineering-led approach to optimizing that spend. CloudZero is used by industry leaders worldwide, such as Coinbase, Klaviyo, Miro, Nubank, and Rapid7.

Learn More
10

Heritrix

Internet Archive's open-source, web-scale, web crawler project

Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.

Downloads: 3 This Week

Last Update: 2026-04-06
See Project
11

Screenshot to Code

A neural network that transforms a design mock-up into static websites

Screenshot-to-code is a tool or prototype that attempts to convert UI screenshots (e.g., of mobile or web UIs) into code representations, likely generating layouts, HTML, CSS, or markup from image inputs. It is part of a research/proof-of-concept domain in UI automation and image-to-UI code generation. Mapping visual design to code constructs. Code/UI layout (HTML, CSS, or markup). Examples/demo scripts showing “image UI code”.

Downloads: 2 This Week

Last Update: 2025-09-26
See Project
12

FEAPDER

Powerful Python crawler framework for scalable web scraping tasks

feapder is a Python-based web crawling framework designed to simplify the process of building scalable and efficient web scrapers. It focuses on providing a developer-friendly environment that makes it easier to create, run, and manage crawlers for a variety of data collection tasks. It includes several built-in spider types, such as AirSpider, Spider, TaskSpider, and BatchSpider, which address different crawling scenarios ranging from lightweight scraping to distributed and batch-based...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
13

Spider

High-performance Rust web crawler and scraper for large-scale data

Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large...

Downloads: 6 This Week

Last Update: 2026-03-31
See Project
14

whatsapp-web.js

WhatsApp library for NodeJS that connects through the browser app

A WhatsApp client library for NodeJS that connects through the WhatsApp Web browser app. Programmatically control WhatsApp whether you're running user or business accounts. It uses Puppeteer to run a real instance of Whatsapp Web to avoid getting blocked. Programmatically control WhatsApp whether you're running user or business accounts. Whatsapp-web.js connects to an official version of WhatsApp Web under the hood, reducing ban risks. The object-oriented approach makes it easy to get...

Downloads: 14 This Week

Last Update: 2026-01-30
See Project
15

Laravel Web Tinker

Tinker in your browser

Artisan's tinker command is a great way to tinker with your application in the terminal. Unfortunately running a few lines of code, making edits, and copy/pasting code can be bothersome. Wouldn't it be great to tinker in the browser? This package will add a route to your application where you can tinker to your heart's content. In case light hurts your eyes, there's a dark mode too.

Downloads: 2 This Week

Last Update: 2026-02-21
See Project
16

Python Code Tutorials

The Python Code Tutorials

Python Code Tutorials is a large educational repository that aggregates programming tutorials from the “The Python Code” website into a structured collection of Python projects and learning materials. The repository covers a wide range of programming topics including cybersecurity, networking, web scraping, machine learning, GUI development, and automation scripts. Each tutorial typically includes complete Python code examples and explanations that demonstrate how to build real tools and...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
17

katana

Fast CLI web crawler for discovering endpoints in modern web apps

Katana is an open source command-line web crawling and spidering framework developed by ProjectDiscovery. It is designed to efficiently crawl websites and web applications in order to discover endpoints, resources, and other useful information that may not be easily visible through manual browsing. Katana focuses on speed and automation, making it suitable for use in security reconnaissance workflows and automated pipelines. Katana supports both standard HTTP crawling and headless browser...

Downloads: 21 This Week

Last Update: 2026-03-10
See Project
18

claude-code-transcripts

Tools for publishing transcripts for Claude Code sessions

claude-code-transcripts is a command-line utility that takes session files exported from Claude Code (in JSON or JSONL format) and turns them into clean, navigable HTML transcripts that can be viewed in any modern web browser. It is designed to make the often dense and verbose outputs from AI coding sessions easier to read, share, and archive by breaking conversations into paginated, annotated pages with navigable timelines of prompts and responses. Users can run this tool locally or fetch...

Downloads: 1 This Week

Last Update: 2026-01-30
See Project
19

Proton Web Clients

Monorepo hosting the proton web clients

Proton Web Clients is a monorepo hosting the web applications for Proton’s suite of privacy-focused services, including the core Proton Mail webmail interface and related web apps like Proton Calendar, Proton Drive, Proton Account, Proton VPN, Proton Pass, and other connected tools. It consolidates all web client code, shared modules, dependencies, and development tooling into a single repository, enabling unified maintenance, consistency of design patterns, and efficient evolution of...

Downloads: 2 This Week

Last Update: 4 days ago
See Project
20

The Apache Struts web framework

Mirror of Apache Struts

The Apache Struts web framework is a free open-source solution for creating Java web applications. Web applications differ from conventional websites in that web applications can create a dynamic response. Many websites deliver only static pages. A web application can interact with databases and business logic engines to customize a response. Web applications based on JavaServer Pages sometimes commingle database code, page design code, and control flow code. ...

Downloads: 0 This Week

Last Update: 2025-10-01
See Project
21

fess

Open source enterprise search server for websites, files, and data

...Fess includes a built-in crawler that can collect content from sources such as databases, CSV files, and shared storage, making it suitable for centralized knowledge discovery. It supports indexing and searching across many document formats including office documents, PDFs, and compressed archives. It also provides a web-based administrative interface that allows administrators to configure crawling targets, manage indexing tasks, and adjust search settings from a graphical dashboard.

Downloads: 6 This Week

Last Update: 2 days ago
See Project
22

douyin

Open source Douyin crawler for collecting and downloading public data

DouyinCrawler is an open source data collection tool designed to gather publicly available information from the Douyin platform. It demonstrates how to build a Python-based web crawler combined with a graphical interface and command line functionality. It allows users to collect data from various types of Douyin content, including user profiles, videos, hashtags, and music pages.

Downloads: 5 This Week

Last Update: 2026-03-13
See Project
23

spider_collection

Collection of Python web scraping scripts for data extraction tasks

spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages....

Downloads: 2 This Week

Last Update: 3 days ago
See Project
24

Playwright Skill for Claude Code

Claude Code Skill for browser automation with Playwright

...The system supports a wide range of use cases, including testing web applications, validating user interfaces, automating workflows, and extracting data from websites. One of its key advantages is its ability to generate custom Playwright code tailored to each request, allowing flexible and context-aware automation.

Downloads: 4 This Week

Last Update: 2026-03-17
See Project
25

X-Crawl

Flexible Node.js AI-assisted crawler library

A high-performance web crawling and scraping framework for Node.js, designed for large-scale data extraction.

Downloads: 3 This Week

Last Update: 2025-04-06
See Project