1820 projects for "python web crawler" with 1 filter applied:

  • Free Website Monitoring Service | UptimeRobot Icon
    Free Website Monitoring Service | UptimeRobot

    The free online uptime monitoring service with an App is available for iOS and Android.

    With the Free Plan, you can monitor up to 50 URLs, check for a website's content (using the keyword monitor), ping your server or monitor your ports in 5-minute intervals. You can create a status page to showcase your uptime. SMS or Call alerts can be bought anytime.
    Learn More
  • Project Planning and Management Software | Planview Icon
    Project Planning and Management Software | Planview

    Connect programs, projects, resources, and financials with business outcomes using portfolio management software from Planview.

    Planview® Portfolios enables enterprises to accelerate strategic execution by seamlessly integrating business and technology planning, optimizing resources, and leveraging the power of embedded AI — Planview Anvi™ — to deliver breakthrough products, services, and customer experiences. This unified approach aligns strategy with execution, driving enhanced business performance across the organization.
    Learn More
  • 1
    crawler

    crawler

    Collection of JS reverse engineering examples for web scraping study

    crawler is a collection of web scraping and JavaScript reverse engineering examples designed for learning how modern websites protect their data and how those protections can be analyzed. It contains many case studies that demonstrate how to analyze and replicate request parameters, cookies, and encryption logic used by real websites. Each directory in the project focuses on a specific target service or scenario, showing how browser network requests and JavaScript code can be studied to reproduce API calls programmatically. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    tumblr-crawler

    tumblr-crawler

    Python crawler to download photos and videos from Tumblr blogs

    tumblr-crawler is an open source Python-based utility designed to download media content from Tumblr blogs. It provides a script that automatically retrieves photos and videos from specified Tumblr sites and saves them locally for offline access. Users can specify one or multiple blogs to crawl by editing a configuration file or by passing parameters through the command line.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Weibo Crawler

    Weibo Crawler

    Python crawler for collecting and downloading Sina Weibo user data

    weibo-crawler is a Python-based data collection tool designed to retrieve information from Sina Weibo user accounts. It automates the process of gathering posts, user profile details, and engagement metrics from one or more target accounts. weibo-crawler can extract comprehensive information about users, including profile attributes such as nickname, follower count, following count, and account metadata.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Python API for JMComic

    Python API for JMComic

    Python crawler and API for downloading JMComic albums and images

    JMComic-Crawler-Python is a Python library and crawler framework designed to programmatically access and download comic content from the JMComic platform. It provides a structured API that allows developers to retrieve albums, chapters, and images using simple Python code while handling the necessary network requests and data processing behind the scenes.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Next-generation security awareness training. Built for AI email phishing, vishing, smishing, and deepfakes. Icon
    Next-generation security awareness training. Built for AI email phishing, vishing, smishing, and deepfakes.

    Track your GenAI risk, run multichannel deepfake simulations, and engage employees with incredible security training.

    Assess how your company's digital footprint can be leveraged by cybercriminals. Identify the most at-risk individuals using thousands of public data points and take steps to proactively defend them.
    Learn More
  • 5
    Spatie Crawler

    Spatie Crawler

    An easy to use, powerful crawler implemented in PHP

    Spatie Crawler is a PHP library that allows developers to crawl websites and extract information efficiently. It can be used for web scraping, link checking, or automated testing of web pages. The library is simple to use and supports customizable crawling strategies, including controlling crawl depth and handling redirects. It’s suitable for building crawlers that navigate large or dynamically generated websites.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    AI-Crawler

    AI-Crawler

    Crawl a website starting from a URL, find relevant pages

    AI Crawler is an experimental AI-powered web crawling and data extraction tool that uses natural language prompts to guide the discovery and retrieval of relevant information across websites. Unlike traditional web scrapers that rely on static selectors and manual scripting, it uses AI to dynamically identify and prioritize pages based on user intent, making it more flexible and resilient to changes in website structure.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Python-Spider

    Python-Spider

    Python3 web crawler practice

    ...As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe concurrency or scheduling to crawl multiple pages, and techniques to handle common web-scraping issues. For people wanting to get hands-on with building scrapers, collecting data, or learning how to navigate web programming in Python, this repository acts as a didactic reference or starting point.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    GPT Crawler

    GPT Crawler

    Crawl a site to generate knowledge files to create your own custom GPT

    GPT Crawler is an open-source tool designed to automatically crawl websites and generate structured knowledge that can be used to build AI assistants and retrieval systems. It focuses on extracting high-quality textual content from web pages and preparing it in formats suitable for embedding, indexing, or fine-tuning workflows. The project is especially useful for teams that want to turn documentation sites or knowledge bases into conversational AI backends without building custom scrapers from scratch. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    dxy-covid-19-crawler

    dxy-covid-19-crawler

    Realtime crawler for COVID-19 outbreak statistics from DXY data

    DXY-COVID-19-Crawler is a Python-based project designed to collect real-time COVID-19 infection data from the public dataset provided by Ding Xiang Yuan (DXY). The crawler periodically retrieves pandemic statistics and stores them in a database so that historical changes in the outbreak can be preserved and analyzed later. It was created to make up-to-date infection data more accessible for developers, researchers, and analysts who wanted to build visualizations or conduct data analysis during the early stages of the pandemic. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • Create stunning, professional email signatures in minutes Icon
    Create stunning, professional email signatures in minutes

    For companies looking to create, assign and manage all their employees email signatures and add targeted marketing banners.

    Create, assign and manage all your employees’ email signatures and add targeted marketing banners. Stop getting worked up about your signatures! Leverage a centralized interface to easily create and manage the email signatures of all your employees. Take advantage of each email to broadcast and amplify your brand. Letsignit helps you regain control over your digital identity. Harmonize 100% of your employee’s email signatures in just a few clicks! 121 professional emails are received and 40 are sent every day by an employee. With Letsignit, turn every email into a powerful communication opportunity: send the right message to the right person at the right time! Innovative more than tech, inspiring more than following. Authentic more than overrated, close more than "think big", trustworthy more than doubtful. Hands-on more than complex, available but yet premium, fun but yet expert.
    Learn More
  • 10
    FEAPDER

    FEAPDER

    Powerful Python crawler framework for scalable web scraping tasks

    ...It also integrates monitoring and alerting capabilities to help developers track crawler performance and detect issues during execution. feapder includes browser rendering support for handling dynamic web pages and provides mechanisms for large-scale data deduplication during crawling.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Heritrix

    Heritrix

    Internet Archive's open-source, web-scale, web crawler project

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    FastAPI Python

    FastAPI Python

    FastAPI framework, high performance, easy to learn, fast to code

    FastAPI framework, high performance, easy to learn, fast to code, ready for production. FastAPI is a modern, fast (high-performance), web framework for building APIs with Python based on standard Python type hints.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 14
    douyin

    douyin

    Open source Douyin crawler for collecting and downloading public data

    DouyinCrawler is an open source data collection tool designed to gather publicly available information from the Douyin platform. It demonstrates how to build a Python-based web crawler combined with a graphical interface and command line functionality. It allows users to collect data from various types of Douyin content, including user profiles, videos, hashtags, and music pages. DouyinCrawler supports both automated scraping and batch operations to process multiple targets efficiently. It also integrates with the Aria2 download utility to enable large-scale downloading of videos and images associated with collected content. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 15
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 16
    Tabby Web

    Tabby Web

    An SSH/Telnet/Serial client in your browser

    Tabby Web brings a modern terminal experience to the browser by pairing a web UI with a backend gateway that brokers TCP connections over WebSockets. It aims to deliver an experience similar to the desktop Tabby terminal—sessions, profiles, and rich configuration—while being accessible anywhere through a login. The architecture splits concerns: a Django-based control plane manages users, auth, and configuration, while a gateway service handles network transport so browser clients can reach...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    python-socketio

    python-socketio

    Python Socket.IO server and client

    python-socketio is a robust Python library that implements the Socket.IO protocol, enabling real-time, bidirectional communication between web clients and servers. It works with multiple asynchronous frameworks such as asyncio, eventlet, and gevent, so developers can choose the concurrency model that best fits their application needs while still using a consistent API.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    autocrawler

    autocrawler

    Multiprocess Selenium crawler for downloading images by keywords

    ...Users provide search terms through a simple keyword file, and the crawler organizes downloaded images into directories for each keyword. It can download either thumbnails or full resolution images and supports multiple image formats such as JPG, GIF, and PNG. It also includes configuration options such as headless mode, download limits, proxy usage, and thread count to customize crawling behavior.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    katana

    katana

    Fast CLI web crawler for discovering endpoints in modern web apps

    Katana is an open source command-line web crawling and spidering framework developed by ProjectDiscovery. It is designed to efficiently crawl websites and web applications in order to discover endpoints, resources, and other useful information that may not be easily visible through manual browsing. Katana focuses on speed and automation, making it suitable for use in security reconnaissance workflows and automated pipelines. Katana supports both standard HTTP crawling and headless browser...
    Downloads: 26 This Week
    Last Update:
    See Project
  • 20
    Amazing-Python-Scripts

    Amazing-Python-Scripts

    Curated collection of Amazing Python scripts

    Amazing-Python-Scripts is a collaborative repository that collects a wide variety of Python scripts designed to demonstrate practical programming techniques and automation tasks. The project includes scripts ranging from beginner-level utilities to more advanced applications involving machine learning, data processing, and system automation. Its goal is to provide developers with useful coding examples that can solve everyday problems, automate repetitive tasks, or serve as learning exercises. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21
    Python Code Tutorials

    Python Code Tutorials

    The Python Code Tutorials

    Python Code Tutorials is a large educational repository that aggregates programming tutorials from the “The Python Code” website into a structured collection of Python projects and learning materials. The repository covers a wide range of programming topics including cybersecurity, networking, web scraping, machine learning, GUI development, and automation scripts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Comprehensive Python Cheatsheet

    Comprehensive Python Cheatsheet

    Comprehensive Python Cheatsheet

    ...It covers a broad range of topics including data structures, control flow, functions, object-oriented programming, standard library usage, and common patterns. The repository includes both web and printable versions, allowing users to access the material in multiple formats depending on their workflow. Because it is continuously maintained, the cheatsheet reflects modern Python usage and practical conventions. Overall, it serves as a fast lookup companion for everyday Python development.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    python-whatsapp-bot

    python-whatsapp-bot

    Build AI WhatsApp Bots with Pure Python

    python-whatsapp-bot is an open-source framework that demonstrates how to build AI-powered WhatsApp bots using pure Python and the official WhatsApp Cloud API. The project provides a practical implementation of a messaging automation system using the Flask web framework to handle webhook events and process incoming messages in real time.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    Python 100 Days

    Python 100 Days

    Python - From Novice to Master in 100 Days

    Python-100-Days is a comprehensive, practice-first learning roadmap by Luo Hao that spans 100 days from absolute Python basics to professional, production-grade skills. It starts with foundational syntax, control flow, data structures, and functions, then advances through object-oriented programming, file I/O, exceptions, and modules. The middle sections focus on real-world Python applications, including working with CSV, Excel, Word, PowerPoint, PDFs, images, email/SMS, and regular expressions. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    fess

    fess

    Open source enterprise search server for websites, files, and data

    ...Fess is built on top of OpenSearch and offers an integrated solution for crawling, indexing, and searching documents from websites, file systems, and various data stores. Fess includes a built-in crawler that can collect content from sources such as databases, CSV files, and shared storage, making it suitable for centralized knowledge discovery. It supports indexing and searching across many document formats including office documents, PDFs, and compressed archives. It also provides a web-based administrative interface that allows administrators to configure crawling targets, manage indexing tasks, and adjust search settings from a graphical dashboard.
    Downloads: 7 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB