Showing 86 open source projects for "python web crawler"

View related business solutions
  • Simplify Purchasing For Your Business Icon
    Simplify Purchasing For Your Business

    Manage what you buy and how you buy it with Order.co, so you have control over your time and money spent.

    Simplify every aspect of buying for your business in Order.co. From sourcing products to scaling purchasing across locations to automating your AP and approvals workstreams, Order.co is the platform of choice for growing businesses.
    Learn More
  • The AI workplace management platform Icon
    The AI workplace management platform

    Plan smart spaces, connect teams, manage assets, and get insights with the leading AI-powered operating system for the built world.

    By combining AI workflows, predictive intelligence, and automated insights, OfficeSpace gives leaders a complete view of how their spaces are used and how people work. Facilities, IT, HR, and Real Estate teams use OfficeSpace to optimize space utilization, enhance employee experience, and reduce portfolio costs with precision.
    Learn More
  • 1
    Spatie Crawler

    Spatie Crawler

    An easy to use, powerful crawler implemented in PHP

    Spatie Crawler is a PHP library that allows developers to crawl websites and extract information efficiently. It can be used for web scraping, link checking, or automated testing of web pages. The library is simple to use and supports customizable crawling strategies, including controlling crawl depth and handling redirects. It’s suitable for building crawlers that navigate large or dynamically generated websites.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    Python-Spider

    Python-Spider

    Python3 web crawler practice

    ...As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe concurrency or scheduling to crawl multiple pages, and techniques to handle common web-scraping issues. For people wanting to get hands-on with building scrapers, collecting data, or learning how to navigate web programming in Python, this repository acts as a didactic reference or starting point.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Best-of Web Development with Python

    Best-of Web Development with Python

    A ranked list of awesome python libraries for web development

    ...If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Contributions are very welcome! A ranked list of awesome python libraries for web development. Updated weekly.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 4
    Heritrix

    Heritrix

    Internet Archive's open-source, web-scale, web crawler project

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Collect! is a highly configurable debt collection software Icon
    Collect! is a highly configurable debt collection software

    Everything that matters to debt collection, all in one solution.

    The flexible & scalable debt collection software built to automate your workflow. From startup to enterprise, we have the solution for you.
    Learn More
  • 5
    Best-of Python

    Best-of Python

    A ranked list of awesome Python open-source libraries

    ...If you like to add or update projects, feel free to open an issue, submit a pull request, or directly edit the projects.yaml. Contributions are very welcome! Ranked list of awesome python libraries for web development. Correctly generate plurals, ordinals, indefinite articles; convert numbers. Libraries for loading, collecting, and extracting data from a variety of data sources and formats. Libraries for data batch- and stream-processing, workflow automation, job scheduling, and other data pipeline tasks.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Selenium-python Helium

    Selenium-python Helium

    Selenium-python but lighter: Helium is the best Python library

    Under the hood, Helium forwards each call to Selenium. The difference is that Helium's API is much more high-level. In Selenium, you need to use HTML IDs, XPaths and CSS selectors to identify web page elements. Helium on the other hand lets you refer to elements by user-visible labels. As a result, Helium scripts are typically 30-50% shorter than similar Selenium scripts. What's more, they are easier to read and more stable with respect to changes in the underlying web page. Selenium-python is great for web automation. Helium makes it easier to use. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    My Python Eggs

    My Python Eggs

    Python Examples

    My Python Eggs, commonly associated with the geekcomputers Python repository, is a large collection of practical Python scripts and small programs created primarily for experimentation, automation, and educational purposes. Rather than being a single cohesive application, it functions as a repository of utilities that demonstrate how Python can be used to solve everyday problems and automate repetitive tasks.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 8
    Comprehensive Python Cheatsheet

    Comprehensive Python Cheatsheet

    Comprehensive Python Cheatsheet

    ...It covers a broad range of topics including data structures, control flow, functions, object-oriented programming, standard library usage, and common patterns. The repository includes both web and printable versions, allowing users to access the material in multiple formats depending on their workflow. Because it is continuously maintained, the cheatsheet reflects modern Python usage and practical conventions. Overall, it serves as a fast lookup companion for everyday Python development.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    X-Crawl

    X-Crawl

    Flexible Node.js AI-assisted crawler library

    A high-performance web crawling and scraping framework for Node.js, designed for large-scale data extraction.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Rezku Point of Sale Icon
    Rezku Point of Sale

    Designed for Real-World Restaurant Operations

    Rezku is an all-inclusive ordering platform and management solution for all types of restaurant and bar concepts. You can now get a fully custom branded downloadable smartphone ordering app for your restaurant exclusively from Rezku.
    Learn More
  • 10
    PyMySQL

    PyMySQL

    MySQL client library for Python

    PyMySQL is a 100% Python implementation of the MySQL client protocol, allowing Python applications to connect to MySQL and MariaDB databases without requiring binary extensions. It supports standard DB‑API 2.0 features, such as cursors, transactions, and parameterized queries. PyMySQL is versatile for web applications, scripts, and tools, offering compatibility with ORMs like SQLAlchemy and frameworks like Django.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 11
    Parsera

    Parsera

    Lightweight library for scraping web-sites with LLMs

    Scrape data from any website with only a link and column descriptions. Parsera is a tool designed to scrape web content, specifically handling poorly structured or messy websites.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 12
    WeasyPrint

    WeasyPrint

    The awesome document factory

    WeasyPrint is a smart solution helping people to create PDF documents. You can generate gorgeous statistical reports, invoices, tickets, and anything you want as long as you have some webdesign skills! Design your documents just as you design your websites! WeasyPrint follows the widely used HTML and CSS specifications from the W3C. You can use your usual web tools, languages and frameworks, but for print. Creating high-quality digital documents requires features that you love to use as...
    Downloads: 25 This Week
    Last Update:
    See Project
  • 13
    WTForms

    WTForms

    A flexible forms validation and rendering library for Python

    WTForms is a flexible forms validation and rendering library for Python web development. It can work with whatever web framework and template engine you choose. It supports data validation, CSRF protection, internationalization (I18N), and more. There are various community libraries that provide closer integration with popular frameworks. WTForms is designed to work with any web framework and template engine.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Tortoise ORM

    Tortoise ORM

    Familiar asyncio ORM for python, built with relations in mind

    Tortoise ORM is an easy-to-use asyncio ORM (Object Relational Mapper) for Python, inspired by Django's ORM. It is designed to work with asynchronous frameworks, providing a simple and familiar API for interacting with databases. Tortoise ORM supports various relational databases and is suitable for building high-performance web applications.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    notebooker

    notebooker

    Productionise & schedule your Jupyter Notebooks

    Productionise and schedule your Jupyter Notebooks, just as interactively as you wrote them. Notebooker is a webapp which can execute and parametrise Jupyter Notebooks as soon as they have been committed to git. The results are stored in MongoDB and searchable via the web interface, essentially turning your Jupyter Notebook into a production-style web-based report in a few clicks.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    redis-py

    redis-py

    Redis Python client

    redis-py is the official Python client for interacting with Redis, the in-memory data structure store. It supports all Redis commands and data types, making it easy to build caching, messaging, or real-time analytics features in Python applications. With both synchronous and asyncio support, redis-py is suited for modern Python projects and integrates smoothly into web frameworks, task queues, and backend services.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    MediaManager

    MediaManager

    A modern selfhosted media management system for your media library

    ...It is designed for ease of deployment with Docker, supports standardized metadata sources such as TMDB and TVDB, and integrates OAuth/OIDC for secure authentication. Users can browse, search, and manage their media with a responsive web frontend while developers benefit from a clean codebase that uses Python and modern web technologies. Its holistic approach toward acquisition, tracking, and library maintenance reduces duplication, improves media discovery workflows, and simplifies long-term management of large media collections.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Werkzeug

    Werkzeug

    The comprehensive WSGI web application library

    Werkzeug is a comprehensive WSGI web application library. It began as a simple collection of various utilities for WSGI applications and has become one of the most advanced WSGI utility libraries. Werkzeug doesn’t enforce any dependencies. It is up to the developer to choose a template engine, database adapter, and even how to handle requests. Includes an interactive debugger that allows inspecting stack traces and source code in the browser with an interactive interpreter for any frame in...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 19
    Metaflow

    Metaflow

    A framework for real-life data science

    Metaflow is a human-friendly Python library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20
    Helium

    Helium

    Lighter web automation with Python

    Helium is a Python library built on top of Selenium to make browser automation more intuitive and human-friendly. It replaces verbose boilerplate code with natural language-like API calls such as click("Login") or write("hello", into="Name"). Helium manages browser setup, waits, and teardown, enabling quick development of scripts for testing, scraping, or task automation without requiring deep Selenium knowledge.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    List of independent blogs in Chinese

    List of independent blogs in Chinese

    List of independent blogs in Chinese

    List of independent blogs in Chinese is a curated open repository that aggregates and maintains a large list of independent Chinese-language blogs across technology, design, and personal knowledge domains. The project aims to promote the independent blogging ecosystem by making it easier for readers to discover high-quality personal sites outside major content platforms. It is community-driven, allowing contributors to submit and update blog entries so the directory remains current and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    claude-code-transcripts

    claude-code-transcripts

    Tools for publishing transcripts for Claude Code sessions

    claude-code-transcripts is a command-line utility that takes session files exported from Claude Code (in JSON or JSONL format) and turns them into clean, navigable HTML transcripts that can be viewed in any modern web browser. It is designed to make the often dense and verbose outputs from AI coding sessions easier to read, share, and archive by breaking conversations into paginated, annotated pages with navigable timelines of prompts and responses. Users can run this tool locally or fetch...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 23
    Awesome Free ChatGPT

    Awesome Free ChatGPT

    List of free ChatGPT mirror sites, continuously updated

    This is a curated directory of freely accessible ChatGPT-style services and mirror sites that offer AI chatbot interfaces without login or payment requirements. Resources often support multiple models like GPT-4, Claude, Gemini, and more. Data collected from multiple independent sites with descriptions and tags. Includes services with image upload and drawing capabilities. Aggregates free, no-login-required ChatGPT-like web services. Continually updated mirror list to maintain availability.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Playwright for .NET

    Playwright for .NET

    .NET version of the Playwright testing and automation library

    Playwright for .NET is the official language port of Playwright, the library to automate Chromium, Firefox and WebKit with a single API. Playwright is built to enable cross-browser web automation that is ever-green, capable, reliable and fast. Cross-browser. Playwright supports all modern rendering engines including Chromium, WebKit, and Firefox. Cross-platform. Test on Windows, Linux, and macOS, locally or on CI, headless or headed. Cross-language. Use the Playwright API in TypeScript, JavaScript, Python, .NET, Java. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 25
    jQuery Terminal

    jQuery Terminal

    JavaScript library for creating web-based terminals

    jQuery Terminal is a JavaScript library for creating command-line interpreters in your applications. You can use this JavaScript Terminal library to create interactive web-based terminal applications on your website. Where commands are defined by you. You can define them on the server or in the browser's JavaScript. It can automatically call JSON-RPC service when the user types a command. Alternatively, you can provide an object with methods; each method will be invoked on the user's command (the python command can create python interpreter). ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next
MongoDB Logo MongoDB