Showing 125 open source projects for "data processing"

View related business solutions
  • The Most Powerful Software Platform for EHSQ and ESG Management Icon
    The Most Powerful Software Platform for EHSQ and ESG Management

    Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.

    Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.
    Learn More
  • Award-Winning Medical Office Software Designed for Your Specialty Icon
    Award-Winning Medical Office Software Designed for Your Specialty

    Succeed and scale your practice with cloud-based, data-backed, AI-powered healthcare software.

    RXNT is an ambulatory healthcare technology pioneer that empowers medical practices and healthcare organizations to succeed and scale through innovative, data-backed, AI-powered software.
    Learn More
  • 1
    fluentbit

    fluentbit

    Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX

    Fluent Bit is a super-fast, lightweight, and highly scalable logging and metrics processor and forwarder. It is the preferred choice for cloud and containerized environments. A robust, lightweight, and portable architecture for high throughput with low CPU and memory usage from any data source to any destination. Proven across distributed cloud and container environments. Highly available with I/O handlers to store data for disaster recovery. Granular management of data parsing and routing....
    Downloads: 12 This Week
    Last Update:
    See Project
  • 2
    Acl

    Acl

    A powerful server and network library, including coroutine

    The Acl (Advanced C/C++ Library) project a is powerful multi-platform network communication library and service framework, supporting LINUX, WIN32, Solaris, FreeBSD, MacOS, AndroidOS, iOS. Many applications written by Acl run on these devices with Linux, Windows, iPhone and Android and serve billions of users. There are some important modules in Acl project, including network communcation, server framework, application protocols, multiple coders, etc. The common protocols such as...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 3
    Jimp

    Jimp

    An image processing library written entirely in JavaScript for Node

    An image processing library for Node written entirely in JavaScript, with zero native dependencies. If you're using this library with TypeScript the method of importing slightly differs from JavaScript. Instead of using require, you must import it with ES6 default import scheme. If you're using a web bundles (webpack, rollup, parcel) you can benefit from using the module build of jimp. Using the module build will allow your bundler to understand your code better and exclude things you aren't...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    Tesla

    Tesla

    The flexible HTTP client library for Elixir

    The flexible HTTP client library for Elixir, with support for middleware and multiple adapters. Tesla is an HTTP client loosely based on Faraday. It embraces the concept of middleware when processing the request/response cycle. Define module with use Tesla and choose from a variety of middleware. Tesla is built around the concept of composable middlewares. This is very similar to how Plug Router works. All HTTP functions, such as Tesla.get/3 and Tesla.post/4, can take a dynamic client as the...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Next-Gen Encryption for Post-Quantum Security | CLEAR by Quantum Knight Icon
    Next-Gen Encryption for Post-Quantum Security | CLEAR by Quantum Knight

    Lock Down Any Resource, Anywhere, Anytime

    CLEAR by Quantum Knight is a FIPS-140-3 validated encryption SDK engineered for enterprises requiring top-tier security. Offering robust post-quantum cryptography, CLEAR secures files, streaming media, databases, and networks with ease across over 30 modern platforms. Its compact design, smaller than a single smartphone image, ensures maximum efficiency and low energy consumption.
    Learn More
  • 5
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    ...In addition to raw data collection, some spiders include basic data processing and analysis using tools such as pandas and simple visualization with matplotlib. It also contains examples of proxy pool integration and encapsulation to support more reliable crawling when working with sites that enforce request limits.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Google Mobile Ads Unity Plugin

    Google Mobile Ads Unity Plugin

    }Unity Plugin for the Google Mobile Ads SDK

    ...The plugin provides a C# interface for requesting ads that is used by C# scripts in your Unity project. You can help improve the Google Mobile Ads Unity plugin by opting-in to sending usage data to Google. The data collected is general information about how you are using the plugin (such as ad unit creation and processing errors).
    Downloads: 7 This Week
    Last Update:
    See Project
  • 7
    geckodriver

    geckodriver

    WebDriver for Firefox

    geckodriver is an implementation of WebDriver, and WebDriver can be used for widely different purposes. How you invoke geckodriver largely depends on your use case. If you are using geckodriver through Selenium, you must ensure that you have version 3.11 or greater. Because geckodriver implements the W3C WebDriver standard and not the same Selenium wire protocol older drivers are using, you may experience incompatibilities and migration problems when making the switch from FirefoxDriver to...
    Downloads: 75 This Week
    Last Update:
    See Project
  • 8
    Qualitis

    Qualitis

    Qualitis is a one-stop data quality management platform

    Qualitis is a data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems caused by data processing. Based on Spring Boot, Qualitis submits quality model task to Linkis platform. It provides functions such as data quality model construction, data quality model execution, data quality verification, reports of data quality generation and so on. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Python API for JMComic

    Python API for JMComic

    Python crawler and API for downloading JMComic albums and images

    ...It provides a structured API that allows developers to retrieve albums, chapters, and images using simple Python code while handling the necessary network requests and data processing behind the scenes. It supports both web-based and mobile API interfaces, enabling flexible interaction with the platform depending on the available endpoints. Its architecture includes components for configuration management, download orchestration, and client communication, allowing users to automate the retrieval of manga chapters or entire albums. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Skillfully - The future of skills based hiring Icon
    Skillfully - The future of skills based hiring

    Realistic Workplace Simulations that Show Applicant Skills in Action

    Skillfully transforms hiring through AI-powered skill simulations that show you how candidates actually perform before you hire them. Our platform helps companies cut through AI-generated resumes and rehearsed interviews by validating real capabilities in action. Through dynamic job specific simulations and skill-based assessments, companies like Bloomberg and McKinsey have cut screening time by 50% while dramatically improving hire quality.
    Learn More
  • 10
    douyin

    douyin

    Open source Douyin crawler for collecting and downloading public data

    DouyinCrawler is an open source data collection tool designed to gather publicly available information from the Douyin platform. It demonstrates how to build a Python-based web crawler combined with a graphical interface and command line functionality. It allows users to collect data from various types of Douyin content, including user profiles, videos, hashtags, and music pages. DouyinCrawler supports both automated scraping and batch operations to process multiple targets efficiently. It...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 11
    syslog-ng

    syslog-ng

    Log management solution that improves the performance of SIEM

    syslog-ng is the log management solution that improves the performance of your SIEM solution by reducing the amount and improving the quality of data feeding your SIEM. With syslog-ng Store Box, you can find the answer. Search billions of logs in seconds using full text queries with Boolean operators to pinpoint critical logs. syslog-ng Store Box provides secure, tamper-proof storage and custom reporting to demonstrate compliance. syslog-ng can deliver data from a wide variety of sources to...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 12
    watercrawl

    watercrawl

    AI-ready web crawler that extracts and structures website content

    WaterCrawl is an open source web crawling and data extraction platform designed to transform website content into structured data suitable for machine learning and AI workflows. It enables developers and researchers to crawl web pages, extract meaningful information, and convert it into formats that are easier to process and analyze. It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website....
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    Spider is a high-performance web crawler and web scraping library written in Rust that enables developers to crawl and index websites efficiently. It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large datasets in a short period of time. Spider also provides mechanisms for subscribing to crawl events so developers can process page data such as URLs, status codes, or HTML content as it is discovered. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 14
    MDCx

    MDCx

    Movie metadata scraper and organizer for media libraries and NFO

    ...It retrieves metadata from multiple online sources and applies it to local media collections, helping users maintain structured and well-organized libraries. MDCx can download information such as titles, cast data, artwork, and other metadata, then generate standardized NFO files compatible with media management systems. It also supports image processing tasks such as downloading and cropping artwork used by media centers. It includes several interfaces, allowing users to operate it through a graphical desktop application, a browser-based web interface, or command-line utilities depending on their workflow. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 15
    Trafilatura

    Trafilatura

    Python & command-line tool to gather text on the Web

    Trafilatura is a Python package and command-line tool designed to gather text on the Web. It includes discovery, extraction and text-processing components. Its main applications are web crawling, downloads, scraping, and extraction of main texts, metadata and comments. It aims at staying handy and modular: no database is required, the output can be converted to various commonly used formats. Going from raw HTML to essential parts can alleviate many problems related to text quality, first by avoiding the noise caused by recurring elements (headers, footers, links/blogroll etc.) and second by including information such as author and date in order to make sense of the data. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Shelf

    Shelf

    Web server middleware for Dart

    ...Map server logic into a simple function: a single argument for the request, the response is the return value. Trivially mix and match synchronous and asynchronous processing. Flexibility to return a simple string or a byte stream with the same model. An adapter must handle all errors from the handler, including the handler returning a null response. It should print each error to the console if possible, then act as though the handler returned a 500 response. The adapter may include body data for the 500 response, but this body data must not include information about the error that occurred. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    BrowserOS

    BrowserOS

    Agentic browser; privacy-first alternative to ChatGPT Atlas

    BrowserOS is an open-source, agentic web browser built on a Chromium base that integrates AI agents directly into the browsing experience. Rather than just doing standard browsing, it places AI intelligence at the core: you can connect your own API keys (for e.g., OpenAI, Anthropic, Google Gemini) or run local models (via e.g., Ollama) so that your browsing data and automation stay on your machine — privacy and control are emphasized throughout. The interface remains familiar to users of...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 18
    diskover-community

    diskover-community

    Open source file indexing & storage analytics powered by Elasticsearch

    Diskover Community Edition is an open source file system indexing and storage analytics platform designed to help organizations understand and manage large volumes of file data. It crawls file systems and indexes metadata using Elasticsearch, enabling fast search, analysis, and organization of files stored across different storage systems. It allows administrators and users to explore file structures, monitor storage usage, and gain insights into how data is distributed across...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    RESTinio

    RESTinio

    HTTP/WebSocket server C++14 library

    ...Async request handling. Cannot get the response data immediately? That's ok, store the request handle somewhere and/or pass it to another execution context and get back to it when the data is ready.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    QueryList

    QueryList

    Progressive PHP web crawler framework with jQuery-like DOM parsing

    QueryList is an extensible PHP web scraping and crawling framework designed to extract and process data from web pages. It provides a simple and expressive API that allows developers to collect structured information from HTML documents using familiar DOM traversal techniques. It is built on top of phpQuery and uses CSS3 selectors similar to those found in jQuery, making it easy for developers to query and manipulate page elements during scraping tasks. QueryList supports common data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Snoop Project

    Snoop Project

    This is the most powerful software taking into account CIS location

    ...Snoop is a research work (own database / closed bugbounty) in the field of searching and processing public data on the Internet. In terms of specialized search, Snoop is able to compete with traditional search engines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Tweepy

    Tweepy

    Twitter for Python

    An easy-to-use Python library for accessing the Twitter API. You can also use Git to clone the repository from GitHub to install the latest development version. The easiest way to install the latest version from PyPI is by using pip. Twitter requires all requests to use OAuth for authentication. The API class provides access to the entire twitter RESTful API methods. Each method can accept various parameters and return responses. When we invoke an API method most of the time returned back to...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    Aimeos headless distribution

    Aimeos headless distribution

    Aimeos cloud-native, API-first ecommerce headless distribution

    Aimeos Headless is an open-source headless eCommerce distribution built on top of the Laravel framework, designed to provide a fast and scalable API-driven commerce backend. The project exposes a comprehensive REST and GraphQL API that allows developers to build custom storefronts or commerce applications using any frontend technology. Because the platform follows a headless architecture, it separates the commerce logic from the presentation layer, enabling developers to build web, mobile,...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 24
    news-please

    news-please

    Python tool for crawling and extracting structured data from news site

    ...It provides an integrated pipeline that crawls news sites, retrieves article pages, and extracts structured information such as headlines, authors, publication dates, and article text. news-please can recursively follow internal links and read RSS feeds to gather both recent and archived articles from a news outlet when given only the root URL of a site. It combines several established technologies and libraries to perform web crawling and content extraction, enabling reliable processing across a wide range of news sources. Developers can use the software either as a standalone command line application or integrate it into their own Python applications through its library interface. Extracted article data can be stored in different formats and systems, including JSON files or database-backed storage solutions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    SingleFile

    SingleFile

    Web Extension for saving a copy of complete web page in a single file

    Web Extension for Firefox/Chrome/MS Edge and CLI tool to save a faithful copy of an entire web page in a single HTML file. SingleFile is a Web Extension (and a CLI tool) compatible with Chrome, Firefox (Desktop and Mobile), Microsoft Edge, Vivaldi, Brave, Waterfox, Yandex Browser, and Opera. It helps you to save a complete web page into a single HTML file. Wait until the page is fully loaded. Click on the SingleFile button in the extension toolbar to save the page. You can click again on the...
    Downloads: 13 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB