Search Results for "python web crawler" - Page 3

Showing 3519 open source projects for "python web crawler"

View related business solutions
  • Melis Platform is an enterprise-grade Low Code Platform simplifying app creation, management, and delivery. Icon
    Melis Platform is an enterprise-grade Low Code Platform simplifying app creation, management, and delivery.

    Ideal for websites, apps, e-commerce, CRMs, and more

    Melis is a new generation of Content Management System and eCommerce platform to achieve and manage websites from a single web interface easy to use while offering the best of open source technology.
    Learn More
  • The most trusted software in construction Icon
    The most trusted software in construction

    HCSS is the gold standard software solution for winning, planning, and managing construction projects by connecting the office to the field.

    HCSS provides easy-to-use software built for construction companies that want to win more work, work smarter, and boost profits. For nearly 40 years, we've helped heavy civil contractors, infrastructure builders, and utility companies improve operations, from estimating and project management to field tracking, equipment maintenance, and safety. Tools like HeavyBid, HeavyJob, and HCSS Safety are built for the field and designed to work together, giving your team real-time visibility, tighter cost control, and better job outcomes. With 45+ accounting integrations and customizable APIs, HCSS fits seamlessly into your tech stack. We regularly update our software based on feedback from real crews, ensuring it fits the way your team works. Backed by award-winning 24/7/365 support and a proven implementation process, HCSS helps reduce risk, cut inefficiencies, and deliver fast ROI. If you're ready to grow your business and gain a competitive edge, HCSS is the partner that gets you there.
    Learn More
  • 1
    Pholcus

    Pholcus

    Distributed high-concurrency crawler software written in pure golang

    Pholcus is a high-concurrency crawler software written in pure Go language that supports distributed, only used for programming learning and research. It supports three operating modes of stand-alone, server and client, and has three operating interfaces, Web, GUI, and command line; simple and flexible rules, concurrent batch tasks, and rich output methods (mysql/mongodb/kafka/csv/excel, etc.); In addition, it also supports horizontal and vertical grabbing modes, and a series of advanced functions such as simulated login and task suspension and cancellation. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    watercrawl

    watercrawl

    AI-ready web crawler that extracts and structures website content

    WaterCrawl is an open source web crawling and data extraction platform designed to transform website content into structured data suitable for machine learning and AI workflows. It enables developers and researchers to crawl web pages, extract meaningful information, and convert it into formats that are easier to process and analyze. It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website....
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    diskover-community

    diskover-community

    Open source file indexing & storage analytics powered by Elasticsearch

    Diskover Community Edition is an open source file system indexing and storage analytics platform designed to help organizations understand and manage large volumes of file data. It crawls file systems and indexes metadata using Elasticsearch, enabling fast search, analysis, and organization of files stored across different storage systems. It allows administrators and users to explore file structures, monitor storage usage, and gain insights into how data is distributed across...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Roach

    Roach

    The complete web scraping toolkit for PHP

    Roach is a complete web scraping toolkit for PHP. It is a shameless clone heavily inspired by the popular Scrapy package for Python. Roach allows us to define spiders that crawl and scrape web documents. But wait, there’s more. Roach isn’t just a simple crawler, but includes an entire pipeline to clean, persist and otherwise process extracted data as well.
    Downloads: 2 This Week
    Last Update:
    See Project
  • The Receptionist for iPad | the Original Visitor Management System Icon
    The Receptionist for iPad | the Original Visitor Management System

    Easily keep track of visitors and say goodbye to time-wasting interruptions with The Receptionist for iPad

    The Receptionist for iPad is visitor management software that allows users to calm the chaos of the front office. Our digital check-in solution is customizable to your needs; from your company branding, to configurable buttons and drag-and-drop-design badge printing. Effectively manage and track everyone who comes to your workspace and store the information securely in the cloud: no more paper visitor log!
    Learn More
  • 5
    python-fxxk-spider

    python-fxxk-spider

    Collection of 100+ Python web scraping projects and crawler examples

    python-fxxk-spider is a curated collection of Python web scraping and crawler projects gathered in a single repository for reference and learning. It aggregates many independent scraping examples that target a wide range of websites, online services, and public data sources. Instead of being a single crawler tool, it functions as a catalog of ready-made Python spider implementations that demonstrate different scraping techniques. python-fxxk-spider includes scrapers for social media, e-commerce platforms, job listings, music services, video platforms, and various content sites. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    PaSa

    PaSa

    An advanced paper search agent powered by large language models

    PaSa is an open-source “paper search agent” built around large language models (LLMs), designed to automate the process of academic literature retrieval with human-like decision making. Instead of simply translating a query into keywords and returning a flat list of matching papers, PaSa uses a dual-agent architecture (Crawler + Selector) that can iteratively search, read, analyze, and filter academic publications — simulating how a researcher might dig through citation networks, expand...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    REST APIs with Flask and Python

    REST APIs with Flask and Python

    Projects and e-book for our course, REST APIs with Flask and Python

    A full course to teach you how to use Flask and Python to make REST APIs using multiple Flask extensions and PostgreSQL. Learn Flask, Docker, PostgreSQL, and more. Build professional-grade REST APIs with Python. No more outdated tutorials. Use Python 3.10+ and the latest versions of every Flask extension and library. Run your apps in Docker, host your code with Git, write documentation with Swagger, and test your APIs while developing. Learn how to perform user authentication using JWTs and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    X-Crawl

    X-Crawl

    Flexible Node.js AI-assisted crawler library

    A high-performance web crawling and scraping framework for Node.js, designed for large-scale data extraction.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    BotCity Framework Core Python

    BotCity Framework Core Python

    BotCity Framework - Python

    Recognize and interact with UI elements using state-of-art computer vision module. Operate any UI interface independent of the technology or platform (desktop, web, terminal). BotCity is a platform to develop, deploy, manage and maintain automation. Automation can be developed in Python or Java using open-source libraries that are market standard. Develop, deploy, manage and scale your Automation Ops using All in One platform that provides task queue, runtime environment management, reports, alerts, logs and much more.
    Downloads: 1 This Week
    Last Update:
    See Project
  • GoAnywhere Managed File Transfer (MFT) Icon
    GoAnywhere Managed File Transfer (MFT)

    Secure and simplify your file transfers

    GoAnywhere MFT provides secure managed file transfer for enterprises. Deployable on-premise, in the cloud, or in hybrid environments, GoAnywhere MFT software enables organizations to exchange data among employees, customers, and trading partners, as well as between systems, securely. GoAnywhere MFT was a recipient of the Cybersecurity Excellence Award for Secure File Transfer.
    Learn More
  • 10
    crawley

    crawley

    The unix-way web crawler

    Crawls web pages and prints any link it can find. Fast HTML SAX-parser (powered by golang.org/x/net/html) Small (below 1500 SLOC), idiomatic, 100% test-covered codebase. Grabs most of useful resources URLs (pics, videos, audios, forms, etc...) Found URLs are streamed to stdout and guaranteed to be unique (with fragments omitted) Scan depth (limited by starting host and path, by default - 0) can be configured. Can crawl rules and sitemaps from robots.txt. Brute mode - scan HTML comments for...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 11
    web-eval-agent MCP Server

    web-eval-agent MCP Server

    An MCP server that autonomously evaluates web applications

    web-eval-agent is a Model Context Protocol (MCP) server that spins up a browser-use–capable debugging agent to autonomously run and evaluate web apps straight from your editor. It’s positioned as a “let the coding agent debug itself” companion: the agent launches the app, navigates flows, captures evidence, and iterates on failures without manual copy-pasting of logs. The repository focuses on developer ergonomics, exposing typed MCP tools so clients like Claude Desktop can start sessions,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Python Core 50 Courses

    Python Core 50 Courses

    Structured learning path that organizes Python fundamentals

    Python-Core-50-Courses is a structured learning path that organizes Python fundamentals into 50 digestible lessons designed for steady, incremental progress. The curriculum starts with the basics—syntax, variables, data types, and control flow—then advances to functions, modules, object-oriented programming, and common standard-library utilities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    DeerFlow

    DeerFlow

    Deep Research framework, combining language models with tools

    DeerFlow is an open-source, community-driven “deep research” framework / multi-agent orchestration platform developed by ByteDance. It aims to combine the reasoning power of large language models (LLMs) with automated tool-use — such as web search, web crawling, Python execution, and data processing — to enable complex, end-to-end research workflows. Instead of a monolithic AI assistant, DeerFlow defines multiple specialized agents (e.g. “planner,” “searcher,” “coder,” “report generator”) that collaborate in a structured workflow, allowing tasks like literature reviews, data gathering, data analysis, code execution, and final report generation to be largely automated. ...
    Downloads: 393 This Week
    Last Update:
    See Project
  • 14
    XX-Net

    XX-Net

    A web proxy tool

    XX-Net is an easy-to-use, anti-censorship web proxy tool from China. It includes GAE_proxy and X-Tunnel, with support for multiple platforms.
    Downloads: 61 This Week
    Last Update:
    See Project
  • 15
    FastAPI

    FastAPI

    FastAPI framework, high performance, easy to learn, fast to code

    FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints. Great editor support. Completion everywhere. Less time debugging. Designed to be easy to use and learn. Less time reading docs. Minimize code duplication. Multiple features from each parameter declaration. Fewer bugs. Get production-ready code. With automatic interactive documentation.
    Downloads: 47 This Week
    Last Update:
    See Project
  • 16
    Flet

    Flet

    Flet enables developers to easily build realtime web and mobile apps

    ...With Flet you just write a monolith stateful app in Python only and get a multi-user, real-time Single-Page Application (SPA). To start developing with Flet, you just need your favorite IDE or text editor. With no SDKs, no thousands of dependencies, no complex tooling, Flet has a built-in web server with assets hosting and desktop clients.
    Downloads: 83 This Week
    Last Update:
    See Project
  • 17
    Sunshine

    Sunshine

    Self-hosted game stream host for Moonlight

    Sunshine is an open-source self‑hosted cloud gaming server that implements NVIDIA’s GameStream protocol. Compatible with Moonlight clients across platforms, it supports low‑latency streaming via software or hardware encoding (AMD/Intel/NVIDIA) and offers a browser‑based control UI for pairing.
    Downloads: 927 This Week
    Last Update:
    See Project
  • 18
    Wfuzz

    Wfuzz

    Web application fuzzer

    Wfuzz provides a framework to automate web applications security assessments and could help you to secure your web applications by finding and exploiting web application vulnerabilities. Wfuzz it is based on a simple concept: it replaces any reference to the FUZZ keyword by the value of a given payload. A payload in Wfuzz is a source of data. This simple concept allows any input to be injected in any field of an HTTP request, allowing to perform complex web security attacks in different web...
    Downloads: 27 This Week
    Last Update:
    See Project
  • 19
    NiceGUI

    NiceGUI

    Create web-based user interfaces with Python

    NiceGUI is a Python-based UI framework that enables developers to create interactive web applications using only Python code. It abstracts away the complexities of HTML, CSS, and JavaScript, allowing for rapid development of web interfaces directly from Python scripts. NiceGUI is suitable for building dashboards, control panels, and other web-based tools, especially in contexts like robotics and data visualization.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    Pyodide

    Pyodide

    Pyodide is a Python distribution for the browser and Node.js

    Pyodide brings the Python runtime to the browser by compiling Python and its scientific libraries to WebAssembly. It allows developers to run Python code directly in web browsers without a server, supporting packages like NumPy, Pandas, and Matplotlib. Pyodide opens up new possibilities for interactive data analysis, scientific computing, and educational tools in web environments, all while integrating seamlessly with JavaScript.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 21
    AWS X-Ray SDK for Python

    AWS X-Ray SDK for Python

    AWS X-Ray SDK for the Python programming language

    AWS X-Ray SDK for the Python programming language. The AWS X-Ray SDK for Python is compatible with Python 2.7, 3.4, 3.5, 3.6, 3.7, 3.8, and 3.9. X-Ray Python SDK will by default generate no-op trace and entity id for unsampled requests and secure random trace and entity id for sampled requests. If customer wants to enable generating secure random trace and entity id for all the (sampled/unsampled) requests (this is applicable for trace id injection into logs use case) then they should set the AWS_XRAY_NOOP_ID environment variable as False. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Gemini-API

    Gemini-API

    Reverse-engineered Python API for Google Gemini web app

    Gemini-API is a community-created asynchronous Python wrapper for the web interface of Google’s Gemini models (formerly Bard). It is the result of reverse-engineering the Gemini web app and exposing its functionality through a programmatic API. This enables developers to incorporate Gemini into Python applications, scripts, bots, or tools without relying solely on official SDKs. The wrapper supports streaming responses, model selection, and handling of the web-based authentication/session mechanisms used by Google’s interface. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 23
    Responder

    Responder

    A familiar HTTP Service Framework for Python

    ...Class-based views without inheritance. ASGI framework, the future of Python web services. WebSocket support! The ability to mount any ASGI / WSGI app at a subroute. f-string syntax route declaration. Mutable response object passed into each view. No need to return anything. Background tasks spawned off in a ThreadPoolExecutor. GraphQL (with GraphiQL) support! OpenAPI schema generation, with interactive documentation!
    Downloads: 5 This Week
    Last Update:
    See Project
  • 24
    uvicorn

    uvicorn

    An ASGI web server, for Python

    Uvicorn is an ASGI web server implementation for Python. Until recently Python has lacked a minimal low-level server/application interface for async frameworks. The ASGI specification fills this gap, and means we're now able to start building a common set of tooling usable across all async frameworks. Uvicorn currently supports HTTP/1.1 and WebSockets.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 25
    Pyxel

    Pyxel

    A retro game engine for Python

    ...Pyxel's specifications and APIs are inspired by PICO-8 and TIC-80. Pyxel is open source and free to use. Let's start making a retro game with Pyxel! Runs on Windows, Mac, Linux, and Web. Using the Pyxel Web Launcher or custom elements for HTML, you can run Pyxel in a web browser without any installation work. Pyxel supports a dedicated application distribution file format (Pyxel application file) that works across platforms. 8 musics that can combine arbitrary sounds.
    Downloads: 22 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB