Showing 24 open source projects for "python web crawler"

View related business solutions
  • The Most Powerful Software Platform for EHSQ and ESG Management Icon
    The Most Powerful Software Platform for EHSQ and ESG Management

    Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.

    Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.
    Learn More
  • Simplify Purchasing For Your Business Icon
    Simplify Purchasing For Your Business

    Manage what you buy and how you buy it with Order.co, so you have control over your time and money spent.

    Simplify every aspect of buying for your business in Order.co. From sourcing products to scaling purchasing across locations to automating your AP and approvals workstreams, Order.co is the platform of choice for growing businesses.
    Learn More
  • 1
    katana

    katana

    Fast CLI web crawler for discovering endpoints in modern web apps

    Katana is an open source command-line web crawling and spidering framework developed by ProjectDiscovery. It is designed to efficiently crawl websites and web applications in order to discover endpoints, resources, and other useful information that may not be easily visible through manual browsing. Katana focuses on speed and automation, making it suitable for use in security reconnaissance workflows and automated pipelines. Katana supports both standard HTTP crawling and headless browser...
    Downloads: 21 This Week
    Last Update:
    See Project
  • 2
    Pholcus

    Pholcus

    Distributed high-concurrency crawler software written in pure golang

    Pholcus is a high-concurrency crawler software written in pure Go language that supports distributed, only used for programming learning and research. It supports three operating modes of stand-alone, server and client, and has three operating interfaces, Web, GUI, and command line; simple and flexible rules, concurrent batch tasks, and rich output methods (mysql/mongodb/kafka/csv/excel, etc.); In addition, it also supports horizontal and vertical grabbing modes, and a series of advanced functions such as simulated login and task suspension and cancellation. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    crawley

    crawley

    The unix-way web crawler

    Crawls web pages and prints any link it can find. Fast HTML SAX-parser (powered by golang.org/x/net/html) Small (below 1500 SLOC), idiomatic, 100% test-covered codebase. Grabs most of useful resources URLs (pics, videos, audios, forms, etc...) Found URLs are streamed to stdout and guaranteed to be unique (with fragments omitted) Scan depth (limited by starting host and path, by default - 0) can be configured. Can crawl rules and sitemaps from robots.txt. Brute mode - scan HTML comments for...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 4
    Scope Sentry

    Scope Sentry

    Cyberspace asset mapping and vulnerability scanning platform

    ScopeSentry is an open source cybersecurity tool designed for cyberspace asset mapping and automated security analysis. It helps security researchers and penetration testers discover, monitor, and analyze internet-facing assets belonging to a target scope. ScopeSentry combines multiple reconnaissance and vulnerability assessment capabilities such as subdomain enumeration, port scanning, directory scanning, and sensitive information detection. ScopeSentry can automatically identify assets and...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Failed Payment Recovery for Subscription Businesses Icon
    Failed Payment Recovery for Subscription Businesses

    For subscription companies searching for a failed payment recovery solution to grow revenue, and retain customers.

    FlexPay’s innovative platform uses multiple technologies to achieve the highest number of retained customers, resulting in reduced involuntary churn, longer life span after recovery, and higher revenue. Leading brands like LegalZoom, Hooked on Phonics, and ClinicSense trust FlexPay to recover failed payments, reduce churn, and increase customer lifetime value.
    Learn More
  • 5
    Grafana

    Grafana

    Leading open-source visualization and observability platform

    Grafana OSS is the leading open-source platform for visualization and observability. It enables teams to query, visualize, alert on, and explore telemetry data from multiple sources in a single interface. With support for 100+ data source plugins—including Prometheus, Loki, Elasticsearch, InfluxDB, SQL/NoSQL databases, and OpenTelemetry—Grafana helps teams correlate metrics, logs, and traces across applications and infrastructure. Users can build interactive dashboards with rich...
    Downloads: 26 This Week
    Last Update:
    See Project
  • 6
    Gobuster

    Gobuster

    Directory/File, DNS and VHost busting tool written in Go

    Gobuster is a tool used to brute-force. This project is born out of the necessity to have something that didn't have a fat Java GUI (console FTW), something that did not do recursive brute force, something that allowed me to brute force folders and multiple extensions at once, something that compiled to native on multiple platforms, something that was faster than an interpreted script (such as Python), and something that didn't require a runtime. Provides several modes, like the classic...
    Downloads: 39 This Week
    Last Update:
    See Project
  • 7
    WhatsApp MCP Server

    WhatsApp MCP Server

    WhatsApp MCP server enabling AI access to chats and messaging

    ...It acts as a bridge between WhatsApp and large language models, allowing controlled access to messages, chats, and contacts. whatsapp-mcp is composed of two main components: a Go-based bridge that connects to the WhatsApp Web API and stores data locally, and a Python-based MCP server that exposes tools for AI interaction. All message data is stored in a local SQLite database and is only accessed when explicitly requested through defined tools, giving users control over how their data is used. It supports both sending and receiving messages, including various media types such as images, audio, videos, and documents. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    WeKnora

    WeKnora

    LLM framework for document understanding and semantic retrieval

    WeKnora is an open source framework developed for deep document understanding and semantic information retrieval using large language models. It focuses on analyzing complex and heterogeneous documents by combining multiple processing stages such as multimodal document parsing, vector indexing, and intelligent retrieval. It follows the Retrieval-Augmented Generation (RAG) paradigm, where relevant document segments are retrieved and used by language models to generate accurate, context-aware...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    Vibium

    Vibium

    Browser automation for AI agents and humans

    ...It integrates a single lightweight binary that manages browser lifecycle, implements a WebDriver BiDi proxy, and exposes a Model Context Protocol (MCP) server so language models or automation clients can control browser behavior without complex setup. This design makes it ideal for AI agents that need to interact with the web, perform tasks, or simulate human interactions in a browser environment, and it also works well for traditional testing and automation workflows. Vibium strikes a balance between AI-native capabilities and conventional developer usability by offering language bindings and client APIs for JavaScript and Python.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Award-Winning Medical Office Software Designed for Your Specialty Icon
    Award-Winning Medical Office Software Designed for Your Specialty

    Succeed and scale your practice with cloud-based, data-backed, AI-powered healthcare software.

    RXNT is an ambulatory healthcare technology pioneer that empowers medical practices and healthcare organizations to succeed and scale through innovative, data-backed, AI-powered software.
    Learn More
  • 10
    OSV.dev

    OSV.dev

    Open source vulnerability DB and triage service

    osv.dev (Open Source Vulnerabilities) is Google’s open source platform and API for aggregating, managing, and analyzing vulnerability data across multiple ecosystems. It powers the osv.dev website, providing a unified, queryable database of vulnerabilities that map directly to open source packages and versions. The system hosts vulnerability data for ecosystems such as PyPI, npm, Go, Maven, and Debian, among others. The platform includes a web UI, API, and a Go-based dependency scanner...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    Klavis AI

    Klavis AI

    MCP integration platforms for AI agents to use tools at any scale

    Klavis AI is a Y Combinator X25-backed open-source infrastructure platform that enables AI agents to reliably connect with external tools and services at scale through Model Context Protocol (MCP). Founded by ex-Google DeepMind and ex-Lyft engineers, Klavis provides 50+ production-ready MCP servers with enterprise OAuth support for GitHub, Slack, Gmail, Salesforce, Linear, Notion, and more. The flagship product Strata solves tool overload through progressive discovery, achieving +13% higher...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    PC-Gui

    PC-Gui

    Lightweight PC-Gui framework for AI, typewriter stream Gemini-like

    ...PC-GUI helps you meet strong market demands by building compact, powerful, commercial-grade applications with a simple and stable tech stack. We adopt a "backend-first approach" to desktop development: a stable Go backend (net/http) powers a standard web frontend (HTML/CSS/JS), coupled with encrypted SQLite storage for an extremely lightweight and high-performance design. Key Advantages: ✅ Zero runtime dependencies—a single Go binary, no WebView2/Python/Node.js installations required. ✅ Modern UI via HTML—fast templating, AI-friendly styling. ✅ Simple async streaming for AI output vs. complex callbacks elsewhere. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Crawlab

    Crawlab

    Distributed web crawler admin platform for spiders management

    Golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Please use docker-compose to one-click to start up. By doing so, you don't even have to configure MongoDB database. The frontend app interacts with the master node, which communicates with other components such as MongoDB, SeaweedFS and worker nodes. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 14
    chatgpt-web

    chatgpt-web

    Privatized web program based on ChatGPT3.5 API

    ...There are more than 20 parameter examples in the document, such as AI chatbot, product name generation, python code fixer, etc.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    crawlergo

    crawlergo

    Headless Chrome crawler for collecting URLs for vulnerability scans

    crawlergo is a browser-based web crawler designed to collect URLs and request data that can be used by web vulnerability scanning tools. It uses a Chrome headless environment to render web pages and observe behavior during the DOM rendering stage in order to capture as many accessible endpoints as possible. By monitoring the page lifecycle and interacting with web elements, the crawler automatically triggers JavaScript events and navigational actions that would normally occur during real user interaction. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    PhoenixC2

    PhoenixC2

    Command & Control-Framework created for collaboration in python3

    PhoenixC2 is a command & control framework. The purpose of this software is, to aid red teamers and penetration testers in their operations, by providing a way to manage hacked devices.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Hakrawler

    Hakrawler

    Fast Go web crawler for discovering URLs and web app endpoints

    hakrawler is a lightweight command-line web crawler built in Go that is designed to quickly discover URLs, endpoints, and assets within web applications. It is primarily used during the reconnaissance phase of security testing, bug bounty hunting, and penetration testing. It works by automatically crawling web pages and extracting links, JavaScript file locations, and other resources that may reveal additional attack surface or hidden functionality. hakrawler is implemented as a simple and efficient crawler using the Gocolly library, which allows it to perform fast and concurrent crawling of web pages. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Glazier

    Glazier

    A tool for automating the installation of Windows OS

    ...Its extensibility makes it easy for administrators to create custom actions using Python or PowerShell, enabling tailored automation for diverse enterprise environments. Designed for engineers, Glazier emphasizes repeatability, maintainability, and transparency in Windows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    gocrawl

    gocrawl

    Polite concurrent web crawler library for Go with flexible hooks

    gocrawl is a lightweight web crawling library written in the Go programming language that enables developers to build custom web crawlers and data extraction tools. gocrawl focuses on providing a minimal yet powerful crawling engine that can be easily extended and adapted for different web scraping or indexing tasks. It is designed to be polite when accessing websites by respecting crawling rules such as robots.txt policies and applying crawl delays for each host. It executes requests...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    proxypool

    proxypool

    Proxy crawler that aggregates, tests, and serves usable proxy nodes

    ...The behavior of the crawler and the sources it scans can be configured through configuration files, enabling users to customize how nodes are gathered and maintained. It also supports scheduled crawling to continuously update the proxy list and keep the pool current with newly discovered nodes.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 21
    Rendora

    Rendora

    dynamic server-side rendering using headless Chrome

    Rendora is a dynamic renderer to provide zero-configuration server-side rendering mainly to web crawlers in order to effortlessly improve SEO for websites developed in modern Javascript frameworks such as React.js, Vue.js, Angular.js, etc. Rendora works totally independently of your frontend and backend stacks. Rendora can be seen as a reverse HTTP proxy server sitting between your backend server (e.g. Node.js/Express.js, Python/Django, etc...) and potentially your frontend proxy server (e.g. nginx, traefik, apache, etc...) or even directly to the outside world that does actually nothing but transporting requests and responses as they are except when it detects whitelisted requests according to the config. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    NTK RTMP SERVER

    NTK RTMP SERVER

    Naam Tamilar Web TV Live Streamer

    Naam Tamilar RTMP Server This project updated as open source for future use of Naam Tamilar Political Party. To contribute to the party and in some case if there is any possibilities if i cannot support them for long term. I thought of sharing this source code so in future it may be helpful for the community and party in which other software developers can help them to upgrade. This source is forked from - https://github.com/arut/nginx-rtmp-module and modified with multiple broadcast...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    alpaca

    alpaca

    Given a web API, Generate client libraries in node, php, python, etc.

    API libraries powered and created by Alpaca. Tired of maintaining API libraries in different languages for your website API? This is for you. Do you have an API for your website but no API libraries for whatever reason? This is for you. You are planning to build an API for your website and develop API libraries? This is for you. You define your API according to the format given below, alpaca builds the API libraries along with their documentation. All you have to do is publishing them to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    turkdevops.github.io

    Türk Geliştirici Operasyonları SourceForge

    "TurkDevOps SourForce" Herkese selamlar, Geliştirici Ekipleri için açık kaynağa olan ilgimiz ve katkılarımız sayesinde bizlere yardımcı olarak kuruluşumuzu destekleyen "SourceForge" kuruluşuna teşekkür ederiz. Güncellemeler için lütfen abone olunuz, her zaman güvende kalın. Posta listeleri ve tartışma forumu dahilinde her türlü fikir ve yorum alışverişine açığız Topluluklarımızda misafirperver dayanışmamız saygı ve hoşgörü ile karşılanır, profesyonel bir ortamda davranış...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB