Search Results for "web crawler source code" - Page 3

Showing 4125 open source projects for "web crawler source code"

View related business solutions
  • AestheticsPro Medical Spa Software Icon
    AestheticsPro Medical Spa Software

    Our new software release will dramatically improve your medspa business performance while enhancing the customer experience

    AestheticsPro is the most complete Aesthetics Software on the market today. HIPAA Cloud Compliant with electronic charting, integrated POS, targeted marketing and results driven reporting; AestheticsPro delivers the tools you need to manage your medical spa business. It is our mission To Provide an All-in-One Cutting Edge Software to the Aesthetics Industry.
    Learn More
  • The Most Powerful Software Platform for EHSQ and ESG Management Icon
    The Most Powerful Software Platform for EHSQ and ESG Management

    Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.

    Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.
    Learn More
  • 1
    fess

    fess

    Open source enterprise search server for websites, files, and data

    ...Fess includes a built-in crawler that can collect content from sources such as databases, CSV files, and shared storage, making it suitable for centralized knowledge discovery. It supports indexing and searching across many document formats including office documents, PDFs, and compressed archives. It also provides a web-based administrative interface that allows administrators to configure crawling targets, manage indexing tasks, and adjust search settings from a graphical dashboard.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Pholcus

    Pholcus

    Distributed high-concurrency crawler software written in pure golang

    Pholcus is a high-concurrency crawler software written in pure Go language that supports distributed, only used for programming learning and research. It supports three operating modes of stand-alone, server and client, and has three operating interfaces, Web, GUI, and command line; simple and flexible rules, concurrent batch tasks, and rich output methods (mysql/mongodb/kafka/csv/excel, etc.); In addition, it also supports horizontal and vertical grabbing modes, and a series of advanced...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    Python-Spider

    Python-Spider

    Python3 web crawler practice

    Python-Spider is a repository intended to teach or provide examples for writing web spiders / crawlers in Python — part of a broader learning and resource collection by its author. The code and documentation are oriented toward beginners or intermediate learners who want to learn how to fetch, parse, and extract data from websites programmatically. As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Playwright Skill for Claude Code

    Playwright Skill for Claude Code

    Claude Code Skill for browser automation with Playwright

    ...The system supports a wide range of use cases, including testing web applications, validating user interfaces, automating workflows, and extracting data from websites. One of its key advantages is its ability to generate custom Playwright code tailored to each request, allowing flexible and context-aware automation.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Intelligent Retail Management Icon
    Intelligent Retail Management

    Retail space, product categories, planograms, automatic ordering, and shelf labels management

    Quant offers a wide range of solutions for retail. Within one integrated software system, it allows you to efficiently combine the management of retail space, shelf labels and marketing materials with task management, reporting and automatic replenishment.
    Learn More
  • 5
    Fluent UI Web

    Fluent UI Web

    Collection of utilities andcomponents for building web applications

    A collection of UX frameworks for creating beautiful, cross-platform apps that share code, design, and interaction behavior. Build for one platform or for all. Everything you need is here. Build your own apps using the same open source components we do, with accessibility, internationalization, and performance included. From tutorials to a fun collection of API references, find what you need to design and develop your own Fluent experience.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    miniblink49

    miniblink49

    Lighter, faster browser kernel of blink to integrate HTML UI in apps

    miniblink is an open source, one file, small browser widget based on chromium. By using C interface, you can create a browser with just some line code. miniblink is an open source, single-file, and currently the smallest known chromium-based browser control. Through its exported pure C interface, a browser control can be created in a few lines of code. C++, C#, Delphi and other language calls (support C++, C#, Delphi language to call).
    Downloads: 8 This Week
    Last Update:
    See Project
  • 7
    Mongoose Embedded Web Server

    Mongoose Embedded Web Server

    An embedded web server

    Mongoose is a networking library for C/C++. It implements event-driven non-blocking APIs for TCP, UDP, HTTP, WebSocket, MQTT. It is designed for connecting devices and bringing them online. On the market since 2004, used by vast number of open source and commercial products - it even runs on the International Space Station! Mongoose makes embedded network programming fast, robust, and easy. Cross-platform, works on Linux/UNIX, MacOS, Windows, Android, FreeRTOS, etc. Supported embedded...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 8
    X-Crawl

    X-Crawl

    Flexible Node.js AI-assisted crawler library

    A high-performance web crawling and scraping framework for Node.js, designed for large-scale data extraction.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    autocrawler

    autocrawler

    Multiprocess Selenium crawler for downloading images by keywords

    AutoCrawler is a Python-based image crawling tool designed to automatically download large numbers of images from search engines using automated browser interaction. It uses Selenium and a Chrome browser driver to navigate image search pages and collect image sources based on keywords provided by the user. AutoCrawler supports multiprocess and multithreaded downloading, which allows it to retrieve images faster by running several tasks simultaneously. Users provide search terms through a...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Empowering Companies To Excel In Safety Data Sheet Compliance Icon
    Empowering Companies To Excel In Safety Data Sheet Compliance

    For any organization using chemicals that require Safety Data Sheets

    Effortless setup and maintenance: Simplified management and seamless online access to safety data sheets for your team
    Learn More
  • 10
    crawley

    crawley

    The unix-way web crawler

    Crawls web pages and prints any link it can find. Fast HTML SAX-parser (powered by golang.org/x/net/html) Small (below 1500 SLOC), idiomatic, 100% test-covered codebase. Grabs most of useful resources URLs (pics, videos, audios, forms, etc...) Found URLs are streamed to stdout and guaranteed to be unique (with fragments omitted) Scan depth (limited by starting host and path, by default - 0) can be configured. Can crawl rules and sitemaps from robots.txt. Brute mode - scan HTML comments for...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 11
    Bot Framework Web Chat

    Bot Framework Web Chat

    A highly-customizable web-based client for Azure Bot Services

    This repository contains code for the Bot Framework Web Chat component. The Bot Framework Web Chat component is a highly-customizable web-based client for the Bot Framework V4 SDK. The Bot Framework SDK v4 enables developers to model conversation and build sophisticated bot applications. This repo is part of the Microsoft Bot Framework, a comprehensive framework for building enterprise-grade conversational AI experiences.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Agregore Browser

    Agregore Browser

    A minimal browser for the distributed web (Desktop version)

    A minimal web browser for the distributed web. Web Extension support. Built-in Markdown/Gemini/JSON rendering extension. Built-in QR code scanner and generator extension. Generate a QR code for the current page. Scan a QR code from the browser action window. Right-click a link or image to generate a QR code for it. Built-in ad blocker (ublock origin). Built-in support for creating web archives via ArchiveWeb.page.
    Downloads: 29 This Week
    Last Update:
    See Project
  • 13
    mslearn-tailspin-spacegame-web

    mslearn-tailspin-spacegame-web

    Code used in Microsoft Learn modules to support Azure DevOps

    The Tailspin Space Game Web project is a sample application created by Microsoft as part of its learning resources. It’s a web-based game application used in Microsoft Learn modules and documentation to demonstrate concepts such as Azure App Services, continuous integration and delivery (CI/CD) pipelines, and DevOps practices with GitHub Actions and Azure Pipelines. The project is intentionally lightweight and easy to deploy so learners can quickly experiment with cloud deployment, testing,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    ADAMANT Messenger Progressive Web App

    ADAMANT Messenger Progressive Web App

    ADAMANT decentralized Messenger, progressive web application

    A messaging application client for ADAMANT Blockchain. ADAMANT is a decentralized anonymous messenger based on the blockchain system. It’s independent of any governments or corporations, and even developers due to the distributed network infrastructure that contains an open-source code. The ADAMANT blockchain system belongs to its users. Nobody can control, block, deactivate, restrict or censor accounts. Users take full responsibility for their content, messages, media, and goals and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    IntelliJ Community Edition

    IntelliJ Community Edition

    IntelliJ IDEA & IntelliJ Platform

    IntelliJ Community is the open source upstream of JetBrains’ IntelliJ IDEA, forming the core of a powerful, extensible, and intelligent development environment. It provides foundational features like a robust editor with code completion, syntax highlighting, refactoring tools, version control integrations, terminal, debugger, and plugin architecture. Since it’s open, community developers can contribute to language supports, UI tweaks, and platform enhancements.
    Downloads: 1,847 This Week
    Last Update:
    See Project
  • 16
    PaSa

    PaSa

    An advanced paper search agent powered by large language models

    PaSa is an open-source “paper search agent” built around large language models (LLMs), designed to automate the process of academic literature retrieval with human-like decision making. Instead of simply translating a query into keywords and returning a flat list of matching papers, PaSa uses a dual-agent architecture (Crawler + Selector) that can iteratively search, read, analyze, and filter academic publications — simulating how a researcher might dig through citation networks, expand...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Blockly

    Blockly

    The web-based visual programming editor

    The Blockly library adds an editor to your app that represents coding concepts as interlocking blocks. It outputs syntactically correct code in the programming language of your choice. Custom blocks may be created to connect to your own application. Blockly in a browser allows web pages to include a visual code editor for any of Blockly's five supported programming languages, or your own. Blockly plugins are self-contained pieces of code that add functionality to Blockly. Blockly codelabs...
    Downloads: 90 This Week
    Last Update:
    See Project
  • 18
    AWS Toolkit for Visual Studio Code

    AWS Toolkit for Visual Studio Code

    Local Lambda debug, CodeWhisperer, SAM/CFN syntax, etc.

    The AWS Toolkit extension for Visual Studio Code enables you to interact with Amazon Web Services (AWS). Try the AWS Code Sample Catalog to start coding with the AWS SDK. The AWS Explorer provides access to the AWS services that you can work with when using the Toolkit. To see the AWS Explorer, choose the AWS icon in the Activity bar. The Developer Tools panel is a section for developer-focused tooling curated for working in an IDE. The Developer Tools panel can be found underneath the AWS...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    Sleeper OS Website (Open source)
    Welcome to the official website of "Sleeper OS," source code! This repository houses the web content for Sleeper OS without including certain proprietary elements. What will you find? You'll discover (almost) all content, including documentation ("docs") and the underlying HTML, CSS, and JS code. Important Note This website utilizes Bootstrap as its primary framework. Contributors are encouraged to possess prior knowledge of Bootstrap before making contributions. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    DeerFlow

    DeerFlow

    Deep Research framework, combining language models with tools

    DeerFlow is an open-source, community-driven “deep research” framework / multi-agent orchestration platform developed by ByteDance. It aims to combine the reasoning power of large language models (LLMs) with automated tool-use — such as web search, web crawling, Python execution, and data processing — to enable complex, end-to-end research workflows. Instead of a monolithic AI assistant, DeerFlow defines multiple specialized agents (e.g.
    Downloads: 535 This Week
    Last Update:
    See Project
  • 21
    webrpc

    webrpc

    webrpc is a schema-driven approach to writing backend services

    webrpc is a schema-driven approach to writing backend servers for the Web. Write your server's API interface in a schema format of RIDL or JSON, and then run webrpc-gen to generate the networking source code for your server and client apps. From the schema, webrpc-gen will generate application-based class types/interfaces, JSON encoders, and networking code. In doing so, it's able to generate fully functioning and typed client libraries to communicate with your server. ...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 22
    Roach

    Roach

    The complete web scraping toolkit for PHP

    Roach is a complete web scraping toolkit for PHP. It is a shameless clone heavily inspired by the popular Scrapy package for Python. Roach allows us to define spiders that crawl and scrape web documents. But wait, there’s more. Roach isn’t just a simple crawler, but includes an entire pipeline to clean, persist and otherwise process extracted data as well. It’s your all-in-one resource for web scraping in PHP. Roach doesn’t depend on a specific framework. Instead, you can use the core...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    Monaco Editor

    Monaco Editor

    A browser based code editor

    Monaco Editor is the rich, browser-based code editor that powers Visual Studio Code, providing advanced editing capabilities as a standalone embeddable library for web applications. Models are at the heart of Monaco editor. It's what you interact with when managing content. A model represents a file that has been opened. This could represent a file that exists on a file system, but it doesn't have to. For example, the model holds the text content, determines the language of the content, and...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 24
    FastAPI

    FastAPI

    FastAPI framework, high performance, easy to learn, fast to code

    FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints. Great editor support. Completion everywhere. Less time debugging. Designed to be easy to use and learn. Less time reading docs. Minimize code duplication. Multiple features from each parameter declaration. Fewer bugs. Get production-ready code. With automatic interactive documentation. Based on (and fully compatible with) the open standards for APIs: OpenAPI...
    Downloads: 47 This Week
    Last Update:
    See Project
  • 25
    Next.js

    Next.js

    The React Framework

    Next.js is the React framework for lightweight apps, static websites, pre-rendered apps and more. It solves the most common problems associated with building a complete web application with React, such as those involving code bundling and transforming, production automizations, page rendering and having to write server-side code. Next.js offers a best in class “Developer Experience” through such capabilities as pre-rendering, single command static exporting, automatic code-splitting, hot...
    Downloads: 40 This Week
    Last Update:
    See Project