Search Results for "data processing" - Page 2

Showing 1673 open source projects for "data processing"

View related business solutions
  • ERP Software To Simplify Your Manufacturing Icon
    ERP Software To Simplify Your Manufacturing

    From quote to cash and with AI in mind, our ERP software will become the most valuable asset at your company.

    Global Shop Solutions AI-integrated ERP software provides the applications needed to deliver a quality part on time, every time from quote to cash and everything in between, including shop management, scheduling, inventory, accounting, quality control, CRM and 25 more.
    Learn More
  • A warehouse and inventory management software that scales with your business. Icon
    A warehouse and inventory management software that scales with your business.

    For leading 3PLs and high-volume brands searching for an advanced WMS

    Logiwa is a leader in cloud-native fulfillment technology, revolutionizing high-volume fulfillment for third-party logistics (3PLs), B2B and B2C fulfillment networks, and direct-to-consumer brands. Our flagship product, Logiwa IO, is an advanced Fulfillment Management System (FMS) designed to scale operations in the digital era. Logiwa elevates digital warehousing to new heights, ensuring dynamic and efficient fulfillment processes. Our commitment to AI-driven technology, combined with a focus on customer-centricity, equips businesses to adeptly navigate and excel in rapidly changing market landscapes. Discover the future of smart fulfillment and how you can fulfill brilliantly with Logiwa IO.
    Learn More
  • 1
    fluentbit

    fluentbit

    Fast and Lightweight Logs and Metrics processor for Linux, BSD, OSX

    Fluent Bit is a super-fast, lightweight, and highly scalable logging and metrics processor and forwarder. It is the preferred choice for cloud and containerized environments. A robust, lightweight, and portable architecture for high throughput with low CPU and memory usage from any data source to any destination. Proven across distributed cloud and container environments. Highly available with I/O handlers to store data for disaster recovery. Granular management of data parsing and routing....
    Downloads: 10 This Week
    Last Update:
    See Project
  • 2
    GLM

    GLM

    OpenGL Mathematics (GLM)

    ...This project isn't limited to GLSL features. An extension system, based on the GLSL extension conventions, provides extended capabilities: matrix transformations, quaternions, data packing, random numbers, noise, etc. This library works perfectly with OpenGL but it also ensures interoperability with other third party libraries and SDK. It is a good candidate for software rendering (raytracing / rasterisation), image processing, physics simulations and any development context that requires a simple and convenient mathematics library. ...
    Downloads: 81 This Week
    Last Update:
    See Project
  • 3
    Pathway

    Pathway

    Python ETL framework for stream processing, real-time analytics, LLM

    ...Unlike traditional batch processing frameworks, Pathway continuously updates the results of your data logic as new events arrive, functioning more like a database that reacts in real-time. It supports Python, integrates with modern data tools, and offers a deterministic dataflow model to ensure reproducibility and correctness.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Awesome Fraud Detection Research Papers

    Awesome Fraud Detection Research Papers

    A curated list of data mining papers about fraud detection

    A curated list of data mining papers about fraud detection from several conferences.
    Downloads: 0 This Week
    Last Update:
    See Project
  • PeerGFS PEER Software - File Sharing and Collaboration Icon
    PeerGFS PEER Software - File Sharing and Collaboration

    One Solution to Simplify File Management and Orchestration Across Edge, Data Center, and Cloud Storage

    PeerGFS is a software-only solution developed to solve file management/file replication challenges in multi-site, multi-platform, and hybrid multi-cloud environments.
    Learn More
  • 5
    ThingsBoard

    ThingsBoard

    Device management, data collection, processing and visualization

    ...Define relations between your devices, assets, customers or any other entities. Collect and store telemetry data in a scalable and fault-tolerant way. Visualize your data with built-in or custom widgets and flexible dashboards. Share dashboards with your customers. Define data processing rule chains. Transform and normalize your device data. Raise alarms on incoming telemetry events, attribute updates, device inactivity, and user actions.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    Pachyderm

    Pachyderm

    Data-Centric Pipelines and Data Versioning

    ...Pachyderm provides a powerful solution to optimize data processing, MLOps, and ML Lifecycles.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Siddhi Core Libraries

    Siddhi Core Libraries

    Stream Processing and Complex Event Processing Engine

    Fully open source, cloud-native, scalable, micro streaming, and complex event processing system capable of building event-driven applications for use cases such as real-time analytics, data integration, notification management, and adaptive decision-making. Event processing logic can be written using Streaming SQL queries via graphical and source editors, to capture events from diverse data sources, process and analyze them, integrate with multiple services and data stores, and publish output to various endpoints in real time. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    The Grand Complete Data Science Guide

    The Grand Complete Data Science Guide

    Data Science Guide With Videos And Materials

    The Grand Complete Data Science Materials is a repository curated by a data-science educator that aggregates a wide range of learning resources — from basic programming and math foundation to advanced topics in machine learning, deep learning, natural language processing, computer vision, and deployment practices — into a structured, centralized collection aimed at learners seeking a comprehensive path to data science mastery.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    TeXworks

    TeXworks

    A simple interface for working with TeX documents

    TeXworks is a free and simple working environment for authoring TeX (LaTeX, ConTeXt and XeTeX) documents. Inspired by Dick Koch's award-winning TeXShop program for Mac OS X, it makes entry into the TeX world easier for those using desktop operating systems other than OS X. It provides an integrated, easy-to-use environment for users on other platforms particularly GNU/Linux and Windows and features a clean, simple interface accessible to casual and non-technical users.
    Downloads: 107 This Week
    Last Update:
    See Project
  • Planfix: Manage Projects, Team's Tasks and Business Processes Icon
    Planfix: Manage Projects, Team's Tasks and Business Processes

    All-in-One Enterprise-Level Software is Now Available for SMB

    Planfix is like a souped-up business process management system for folks who really know their stuff. It's built to help you dive deeper and gives you more options than your run-of-the-mill project and task management systems. Best part? Even small businesses and non-profits can get in on the action.
    Learn More
  • 10
    Diffgram

    Diffgram

    Training data (data labeling, annotation, workflow) for all data types

    ...Training Data is the art of supervising machines through data. This includes the activities of annotation, which produces structured data; ready to be consumed by a machine learning model. Annotation is required because raw media is considered to be unstructured and not usable without it. That’s why training data is required for many modern machine learning use cases including computer vision, natural language processing and speech recognition.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    QSV

    QSV

    Blazing-fast Data-Wrangling toolkit

    qsv is a fast, command-line CSV data toolkit written in Rust that extends the capabilities of xsv. It’s designed to make working with CSV files at scale easy and efficient, offering over 40 powerful subcommands for tasks like querying, sampling, splitting, deduplicating, and more. qsv is ideal for data engineers, analysts, and developers who need high-performance CSV manipulation on the command line.
    Downloads: 87 This Week
    Last Update:
    See Project
  • 12
    HDF5

    HDF5

    Official HDF5® Library Repository

    HDF5 (Hierarchical Data Format v5) is a widely-used data management library and file format for storing large and complex scientific data sets efficiently.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 13
    SageMaker Spark Container

    SageMaker Spark Container

    Docker image used to run data processing workloads

    Apache Spark™ is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    DALI

    DALI

    A GPU-accelerated library containing highly optimized building blocks

    The NVIDIA Data Loading Library (DALI) is a library for data loading and pre-processing to accelerate deep learning applications. It provides a collection of highly optimized building blocks for loading and processing image, video and audio data. It can be used as a portable drop-in replacement for built-in data loaders and data iterators in popular deep learning frameworks.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Fondant

    Fondant

    Production-ready data processing made easy and shareable

    Fondant is a modular, pipeline-based framework designed to simplify the preparation of large-scale datasets for training machine learning models, especially foundation models. It offers an end-to-end system for ingesting raw data, applying transformations, filtering, and formatting outputs—all while remaining scalable and traceable. Fondant is designed with reproducibility in mind and supports containerized steps using Docker, making it easy to share and reuse data processing components. It’s built for use in research and production, empowering data scientists to streamline dataset curation and preprocessing workflows efficiently.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Apache Beam

    Apache Beam

    Unified programming model for Batch and Streaming

    Apache Beam is an open source, unified programming model to define both batch and streaming data-parallel processing pipelines, as well as certain language-specific SDKs for constructing pipelines and Runners. These pipelines are executed on one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Beam is especially useful for Embarrassingly Parallel data processing tasks, and caters to the different needs and backgrounds of end users, SDK writers and runner writers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Addax

    Addax

    Addax is a versatile open-source ETL tool

    Addax is a data integration and ETL (Extract, Transform, Load) tool designed for high-performance data migration tasks. It simplifies the process of moving data between different systems and formats.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 18
    Flink CDC

    Flink CDC

    Flink CDC is a streaming data integration tool

    Apache Flink CDC is a distributed data integration tool that captures data changes in real-time from various databases. It leverages Change Data Capture (CDC) technology to stream data changes into Apache Flink, enabling real-time analytics and data processing. Flink CDC simplifies data pipeline development with its declarative YAML configurations.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Miller

    Miller

    Miller is like awk, sed, cut, join, and sort for name-indexed data

    Miller is like awk, sed, cut, join, and sort for data formats such as CSV, TSV, JSON, JSON Lines, and positionally-indexed. With Miller, you get to use named fields without needing to count positional indices, using familiar formats such as CSV, TSV, JSON, JSON Lines, and positionally-indexed. Then, on the fly, you can add new fields which are functions of existing fields, drop fields, sort, aggregate statistically, pretty-print, and more. Miller operates on key-value-pair data while the...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 20
    Unstructured.IO

    Unstructured.IO

    Open source libraries and APIs to build custom preprocessing pipelines

    The unstructured library provides open-source components for ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and many more. The use cases of unstructured revolve around streamlining and optimizing the data processing workflow for LLMs. unstructured modular bricks and connectors form a cohesive system that simplifies data ingestion and pre-processing, making it adaptable to different platforms and is efficient in transforming unstructured data into structured outputs.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    DataDreamer

    DataDreamer

    DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models

    DataDreamer is a tool designed to assist in the generation and manipulation of synthetic data for various applications, including testing and machine learning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    ChatLab

    ChatLab

    Local-first AI chat analysis tool for insights from conversation data

    ...ChatLab emphasizes a local-first approach, meaning all chat data is processed and stored on the user’s device rather than being uploaded to external servers. It supports large-scale datasets through streaming parsing and multi-worker processing, allowing it to handle millions of messages efficiently. ChatLab also includes visualization features that present trends, activity patterns, and interaction metrics in a clear and accessible format.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 23
    GridDB

    GridDB

    GridDB is a next-generation open source database

    ...Multi-model architecture capable of supporting various data stores with time-series data-oriented and pluggable data stores for efficient real-time processing and management of huge amounts of time-series data at high frequency. Various architectural innovations, such as in-memory orientation with "memory as the main unit and disk as the secondary unit" and event-driven design with minimal overhead, have been incorporated to achieve processing capabilities that can handle petabyte-scale applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    lxml

    lxml

    The lxml XML toolkit for Python

    A Python library for efficient XML and HTML processing, known for speed and compatibility. The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API. The latest release works with all CPython versions from 3.6 to 3.12. See the introduction for more information about the...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 25
    iLovePDF Api

    iLovePDF Api

    iLovePDF Rest Api - PHP Library

    Develop and automate PDF processing tasks like Compress PDF, merging PDF, Split PDF, converting Office to PDF, PDF to JPG, Images to PDF, adding Page Numbers, Rotate PDF, Unlocking PDF, stamping a Watermark, and Repair PDF. Each one with several settings to get your desired results. Strong infrastructure to offer the best-dedicated processing power. You might know us from ilovepdf.com where we process millions of PDFs daily. We offer a simple and concise API Reference and Guide as well as...
    Downloads: 13 This Week
    Last Update:
    See Project