Showing 12 open source projects for "web scraping"

View related business solutions
  • Software Defined Storage Icon
    Software Defined Storage

    The layered architecture of QuantaStor provides solution engineers with unprecedented flexibility and application design options.

    QuantaStor is a unified Software-Defined Storage platform designed to scale up and out to make storage management easy while reducing overall enterprise storage costs.
    Learn More
  • Effortlessly Manage Product Information Icon
    Effortlessly Manage Product Information

    OneTimePIM is a comprehensive Product Information Management System designed to streamline the import and distribution of product data.

    A single source of truth for all of your product information with easy ways to distribute that data to wherever it needs to go, including the most powerful e-commerce connectors in the industry.
    Learn More
  • 1
    newpipeextractor

    newpipeextractor

    Library for extracting streaming site data without official APIs

    ...It handles many low-level tasks involved in web data extraction, including parsing responses, managing platform-specific logic, and handling errors, allowing developers to focus on implementing application features rather than scraping mechanics. Each supported service is implemented through its own extractor components that conform to a common interface, enabling consistent access to data across different platforms.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    lxspider

    lxspider

    Educational Python web scraping case collection for many sites

    lxSpider is a collection of web scraping examples designed primarily for learning and experimentation with data extraction techniques. It gathers numerous crawler implementations that demonstrate how to collect data from a wide range of websites and online services. It focuses heavily on practical cases that illustrate how different platforms handle requests, authentication parameters, and anti-scraping protections. lxSpider includes examples targeting areas such as e-commerce platforms, social media services, content sites, research databases, and information portals. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    jBrowserDriver

    jBrowserDriver

    A programmable, embeddable web browser driver

    jBrowserDriver is a programmable, embeddable web browser driver compatible with the Selenium WebDriver specification, implemented in pure Java and based on WebKit.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Shoplogix Smart Factory Platform Icon
    Shoplogix Smart Factory Platform

    For manufacturers looking for a powerful Manufacturing Execution solution

    Real-time Visibility into Your Shop Floor's Performance. The Shoplogix smart factory platform enables manufacturers to increase overall equipment effectiveness, reduce operational costs, sustain growth and improve profitability by allowing them to visualize, integrate and act on production and machine performance in real-time. Manufacturers that trust us to drive efficiency in their factories. Real-time visual data and analytics provide valuable insights to make better informed decisions. Uncover hidden shop floor potential and drive rapid time to value. Develop a continuously improving culture through training, education and data-driven decisions. Compete in the i4.0 world by making the Shoplogix Smart Factory Platform the cornerstone of your digital transformation. Connect to any equipment or device to automate data collection and exchange it with other manufacturing technologies. Automatically monitor, report and analyze machine states to track real-time production.
    Learn More
  • 5
    GitGet

    GitGet

    Ever wanted to download only a part of a Git repository.

    Ever wanted to download only a part of a Git repository. Just paste the URL of the repo you want to download and sit back and enjoy. This simple java application makes use of Web Scraping and downloads only those files you need, thus helping you save your precious bandwidth and space.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Gecco

    Gecco

    Lightweight Java web crawler framework with jQuery-style extraction

    Gecco is a lightweight web crawler framework written in Java that simplifies the process of building web scraping applications. It is designed to make crawler development straightforward by allowing developers to extract page elements using jQuery-style selectors rather than complex parsing logic. It integrates several well-known Java libraries and frameworks, including tools for HTTP requests, HTML parsing, JSON processing, and application development.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit

    DSTK - DataScience ToolKit for All of Us

    DSTK - DataScience ToolKit is an opensource free software for statistical analysis, data visualization, text analysis, and predictive analytics. Newer version and smaller file size can be found at: https://sourceforge.net/projects/dstk3/ It is designed to be straight forward and easy to use, and familar to SPSS user. While JASP offers more statistical features, DSTK tends to be a broad solution workbench, including text analysis and predictive analytics features. Of course you may specify...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    Simple-Scrape is a simple web-scraping library that allows for programmatic access to HTML code. No further techniques are needed and the library is very compact and thus easy to use.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    ...webStraktor relies on the Apache HttpClient for retrieving content via the HTTP protocol. It adheres to the Robots Exclusion Protocol and it can be configured to operate in an anonymous way by connecting to the predominant types of web proxy servers. webStraktor extends the functionality of web crawlers, spiders or bots by integrating scraping and crawling capabilities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • anny is an all-in-one platform for managing hybrid workplaces and shared resources. Icon
    anny is an all-in-one platform for managing hybrid workplaces and shared resources.

    For Businesses looking for a flexible solution for internal and external bookings

    Enable your employees to easily book desks, meeting rooms, parking spots, equipment, and more – all in one place. With flexible rules and group permissions, you stay in full control of who can access what.
    Learn More
  • 10
    Aracnis is a Java based framework for building distributed web spiders. These spiders can be used to accomplish a variety of tasks, for example, screen-scraping and link integrity checking.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Switchboard is a conceptual-level interface to many web and network related functions (SOAP, REST, XML parsing, screen-scraping, FTP, network sniffing), designed for the Processing environment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Jice is a Java-based screen scraping and parsing utility used by developers to extract specific content from any document publicly accessible via the HTTP protocol.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB