Showing 13 open source projects for "web scraping"

View related business solutions
  • Easy management of simple and complex projects Icon
    Easy management of simple and complex projects

    We help different businesses become digital, manage projects, teams, communicate effectively and control tasks online.

    Plan more projects with Worksection. Use Gantt chart and Kanban boards to organize your projects, get your team onboard and assign tasks and due dates.
    Learn More
  • Employees get more done with Rippling Icon
    Employees get more done with Rippling

    Streamline your business with an all-in-one platform for HR, IT, payroll, and spend management.

    Effortlessly manage the entire employee lifecycle, from hiring to benefits administration. Automate HR tasks, ensure compliance, and streamline approvals. Simplify IT with device management, software access, and compliance monitoring, all from one dashboard. Enjoy timely payroll, real-time financial visibility, and dynamic spend policies. Rippling empowers your business to save time, reduce costs, and enhance efficiency, allowing you to focus on growth. Experience the power of unified management with Rippling today.
    Learn More
  • 1
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    crawler

    crawler

    Collection of JS reverse engineering examples for web scraping study

    crawler is a collection of web scraping and JavaScript reverse engineering examples designed for learning how modern websites protect their data and how those protections can be analyzed. It contains many case studies that demonstrate how to analyze and replicate request parameters, cookies, and encryption logic used by real websites. Each directory in the project focuses on a specific target service or scenario, showing how browser network requests and JavaScript code can be studied to reproduce API calls programmatically. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    DotnetSpider

    DotnetSpider

    Lightweight .NET framework for fast web crawling and data scraping

    DotnetSpider is a web crawling and data extraction framework built on the .NET Standard platform. It is designed to help developers create efficient and scalable crawlers for collecting structured data from websites. It provides a high-level API that simplifies the process of defining spiders, managing requests, and extracting content from web pages. Developers can create custom spiders by extending base classes and configuring pipelines that handle downloading, parsing, and storing collected data. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    UI.Vision RPA

    UI.Vision RPA

    Open-Source RPA Software (formerly Kantu)

    The UI Vision RPA software is the tool for visual process automation, codeless UI test automation, web scraping and screen scraping. Automate tasks on Windows, Mac and Linux. The UI Vision RPA core is open-source with enterprise security. The free and open-source browser extension can be extended with local apps for desktop UI automation. UI.Vision RPA's computer-vision visual UI testing commands allow you to write automated visual tests with UI.Vision RPA - this makes UI.Vision RPA the first and only Chrome and Firefox extension (and Selenium IDE) that has "👁👁 eyes". ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • Silverware is an enterprise-grade hospitality platform built for hotels, resorts, and complex multi-venue operations. Icon
    Silverware is an enterprise-grade hospitality platform built for hotels, resorts, and complex multi-venue operations.

    Silverware powers high-end hospitality environments

    Silverware is built for hotel, resort, and multi-venue hospitality operators who need enterprise-grade control, deep integrations, and always-on reliability to run complex operations at scale.
    Learn More
  • 5
    BrowserBox

    BrowserBox

    Remote isolated browser API for security

    Remote isolated browser API for security, automation visibility and interactivity. Run-on our cloud, or bring your own. Full scope double reverse web proxy with a multi-tab, mobile-ready browser UI frontend. Plus co-browsing, advanced adaptive streaming, secure document viewing and more! But only in the Pro version. BrowserBox is a full-stack component for a web browser that runs on a remote server, with a UI you can embed on the web. BrowserBox lets your provide controllable access to web...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 6
    douyin

    douyin

    Open source Douyin crawler for collecting and downloading public data

    DouyinCrawler is an open source data collection tool designed to gather publicly available information from the Douyin platform. It demonstrates how to build a Python-based web crawler combined with a graphical interface and command line functionality. It allows users to collect data from various types of Douyin content, including user profiles, videos, hashtags, and music pages. DouyinCrawler supports both automated scraping and batch operations to process multiple targets efficiently. It also integrates with the Aria2 download utility to enable large-scale downloading of videos and images associated with collected content. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7
    newpipeextractor

    newpipeextractor

    Library for extracting streaming site data without official APIs

    ...It handles many low-level tasks involved in web data extraction, including parsing responses, managing platform-specific logic, and handling errors, allowing developers to focus on implementing application features rather than scraping mechanics. Each supported service is implemented through its own extractor components that conform to a common interface, enabling consistent access to data across different platforms.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    ScrapBot 1.40 64bits

    ScrapBot 1.40 64bits

    Task automation software for accessing and manipulating website data.

    ScrapBot is a task automation software that allows you to access, authenticate, extract, and insert data on any website. The software utilizes JavaScript to execute tasks, eliminating the need for server or additional software installations. The system can control the accessed webpage through JavaScript, and the entire navigation can be viewed in the program window. The main.js script runs in a separate frame from the navigation frame but can access all page content without any restrictions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Tholian Stealth

    Tholian Stealth

    Secure, Peer-to-Peer, Private and Automateable Web Browser

    Tholian Stealth is an open-source privacy-focused web browser and automation platform designed to combine secure browsing, web scraping, and proxy functionality into a unified system. It aims to prioritize user privacy and autonomy by minimizing tracking, blocking unnecessary requests, and restricting potentially harmful web technologies such as JavaScript execution. The platform operates as both a browser and a network service, capable of acting as a proxy, scraper, and content filtering system for other applications. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Secure Online Fax and Business Text Messaging Service Icon
    Secure Online Fax and Business Text Messaging Service

    Elevate your business communications with secure SMS and fax solutions.

    Send and receive SMS and fax online, from email, app or with our developer friendly SMS & fax API. HIPAA compliant & ISO 27001 certified. Outstanding value and 5-star service.
    Learn More
  • 10
    serverless-chrome

    serverless-chrome

    Run headless Chrome/Chromium on AWS Lambda

    ...Serverless Chrome takes care of building and bundling the Chrome binaries and making sure Chrome is running when your serverless function executes. In addition, this project also provides a few example services for common patterns (e.g. taking a screenshot of a page, printing to PDF, some scraping, etc.). Why? Because it's neat. It also opens up interesting possibilities for using the Chrome DevTools Protocol (and tools like Chromeless or Puppeteer) in serverless architectures and doing testing/CI, web-scraping, pre-rendering, etc. You must configure your AWS credentials either by defining AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environmental variables, or using an AWS profile.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    X-RAY

    X-RAY

    The next web scraper, see through the <html> noise

    Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing. The API is entirely composable, giving you great flexibility in how you scrape each page. Paginate through websites, scraping each page. X-ray also supports a request delay and a pagination limit. Scraped pages can be streamed to a file, so if there's an error on one page, you won't...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    pxer

    pxer

    Pixiv crawler userscript for downloading artwork and galleries easily

    Pxer is an open source tool designed to help users collect and download artwork from the Pixiv illustration platform. It is implemented primarily in client-side JavaScript and runs directly in the browser through a userscript environment, allowing it to integrate seamlessly with Pixiv pages. Pxer provides functionality to crawl and gather images, artwork metadata, and other related content from supported Pixiv pages. It is designed to be accessible even for users who are not developers,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    datalus
    PHP web API designed to simplify object handling(loading, saving, querying, displaying, and editing), abstract the data from its display structure, and layout and allow the target data to be delivered to any supported format without special logic.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB