Showing 2016 open source projects for "jpk data processing"

View related business solutions
  • Atera IT Autopilot Icon
    Atera IT Autopilot

    Ensure operational efficiency at any scale with 24/7 autonomous IT support.

    IT Autopilot takes the pressure off your team by handling first-tier support across the channels your end users already live in — email, chat, Slack, Teams, and your Customer Portal. It doesn’t just respond to end-user queries, issues, and crises — it solves them.
    Learn More
  • A privacy-first API that predicts global consumer preferences Icon
    A privacy-first API that predicts global consumer preferences

    Qloo AI adds value to a wide range of Fortune 500 companies in the media, technology, CPG, hospitality, and automotive sectors.

    Through our API, we provide contextualized personalization and insights based on a deep understanding of consumer behavior and more than 575 million people, places, and things.
    Learn More
  • 1
    Data-Juicer

    Data-Juicer

    Data processing for and with foundation models

    Data-Juicer is an open-source data processing and augmentation framework designed to enhance the quality and diversity of datasets for machine learning tasks. It includes a modular pipeline for scalable data transformation.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Polymarket Data

    Polymarket Data

    Polymarket Data Retriever that fetches, processes, and structures data

    Polymarket Data is a comprehensive data engineering pipeline designed to collect, process, and structure trading activity from the Polymarket prediction market ecosystem into analyzable datasets. The system operates as a multi-stage pipeline that integrates data from both off-chain APIs and on-chain event sources, enabling users to reconstruct full trading activity including markets, order events, and executed trades. It begins by fetching market metadata such as questions, outcomes, and...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 3
    Data Formulator

    Data Formulator

    Create rich visualizations with AI

    To create rich visualizations, data analysts often need to iterate back and forth among data processing and chart specification to achieve their goals. To achieve this, analysts need not only proficiency in data transformation and visualization tools but also efforts to manage the branching history consisting of many different versions of data and charts. Recent LLM-powered AI systems have greatly improved visualization authoring experiences, for example by mitigating manual data transformation barriers via LLMs' code generation ability. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Synthetic Data Generator

    Synthetic Data Generator

    SDG is a specialized framework

    ...It also includes a data processing module capable of handling different data types, preprocessing columns, managing missing values, and converting formats automatically before model training.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Time tracking software for the global workforce Icon
    Time tracking software for the global workforce

    Teams of all sizes and in various industries that want the best time tracking and employee monitoring solution.

    It's easy with Hubstaff, a time-tracking and workforce management platform that automates almost every aspect of running or growing a business. Teams can track time to projects and to-dos using Hubstaff's desktop, web, or mobile applications. You'll be able to see how much time your team spends on different tasks, plus productivity metrics like activity rates and app usage through Hubstaff's online dashboard. Most of the available features are customizable on a per-user basis, so you can create the team management tool you need.
    Learn More
  • 5
    Agentic Data Scientist

    Agentic Data Scientist

    An end-to-end Data Scientist

    ...Each agent is designed to independently call functions, interact with data sources, and adapt to uncertainties during processing, enabling iterative refinement of models without manual coordination. The framework supports interoperability with existing data tools and libraries, letting the agents leverage libraries like pandas, scikit-learn, and visualization frameworks to perform real computations rather than mock demonstrations.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    NYC Taxi Data

    NYC Taxi Data

    Import public NYC taxi and for-hire vehicle (Uber, Lyft)

    The nyc-taxi-data repository is a rich dataset and exploratory project around New York City taxi trip records. It collects and preprocesses large-scale trip datasets (fares, pickup/dropoff, timestamps, locations, passenger counts) to enable data analysis, modeling, and visualization efforts. The project includes scripts and notebooks for cleaning and filtering the raw data, memory-efficient processing for large CSV/Parquet files, and aggregation workflows (e.g. trips per hour, heatmaps of pickups/dropoffs). ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Arroyo

    Arroyo

    Distributed stream processing engine in Rust

    Arroyo is a distributed stream processing engine written in Rust, designed to efficiently perform stateful computations on streams of data. Unlike traditional batch processing, streaming engines can operate on both bounded and unbounded sources, emitting results as soon as they are available.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    go-streams

    go-streams

    A lightweight stream processing library for Go

    A lightweight stream processing library for Go. go-streams provides a simple and concise DSL to build data pipelines. In computing, a pipeline, also known as a data pipeline, is a set of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time-sliced fashion.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    Kapacitor

    Kapacitor

    Open source framework for processing, monitoring, and alerting

    Open source framework for processing, monitoring, and alerting on time series data. Kapacitor is a real-time data processing engine for monitoring and alerting, specifically designed to work with time-series data from InfluxDB.
    Downloads: 1 This Week
    Last Update:
    See Project
  • SIEM | API Security | Log Management Software Icon
    SIEM | API Security | Log Management Software

    AI-Powered Security and IT Operations Without Compromise.

    Built on the Graylog Platform, Graylog Security is the industry’s best-of-breed threat detection, investigation, and response (TDIR) solution. It simplifies analysts’ day-to-day cybersecurity activities with an unmatched workflow and user experience while simultaneously providing short- and long-term budget flexibility in the form of low total cost of ownership (TCO) that CISOs covet. With Graylog Security, security analysts can:
    Learn More
  • 10
    LAStools

    LAStools

    efficient tools for LiDAR processing

    LAStools is a collection of efficient, multi-core, scriptable tools for processing LiDAR data. It supports various formats, including LAS, LAZ, Terrasolid BIN, and ESRI Shapefiles, providing a comprehensive suite for LiDAR data management and analysis.
    Downloads: 24 This Week
    Last Update:
    See Project
  • 11
    Numaflow

    Numaflow

    Kubernetes-native platform to run massively parallel data/streaming

    Numaflow is a Kubernetes-native tool for running massively parallel stream processing. A Numaflow Pipeline is implemented as a Kubernetes custom resource and consists of one or more source, data processing, and sink vertices. Numaflow installs in a few minutes and is easier and cheaper to use for simple data processing applications than a full-featured stream processing platform.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    CyberChef

    CyberChef

    A web app for encryption, encoding, compression and data analysis

    CyberChef, developed by GCHQ, is a versatile web application dubbed the "Cyber Swiss Army Knife." It enables users to perform a wide array of operations on data, including encryption, encoding, compression, and analysis, all within a browser interface.​
    Downloads: 41 This Week
    Last Update:
    See Project
  • 13
    MeshLab

    MeshLab

    The open source mesh processing system

    ...VCG can be used as a stand-alone large-scale automated mesh processing pipeline, while MeshLab makes it easy to experiment with its algorithms interactively. The open source system for processing and editing 3D triangular meshes. It provides a set of tools for editing, cleaning, healing, inspecting, rendering, texturing and converting meshes. It offers features for processing raw data produced by 3D digitization tools/devices and for preparing models for 3D printing.
    Downloads: 36 This Week
    Last Update:
    See Project
  • 14
    Pathway

    Pathway

    Python ETL framework for stream processing, real-time analytics, LLM

    ...Unlike traditional batch processing frameworks, Pathway continuously updates the results of your data logic as new events arrive, functioning more like a database that reacts in real-time. It supports Python, integrates with modern data tools, and offers a deterministic dataflow model to ensure reproducibility and correctness.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 15
    Bytewax

    Bytewax

    Python Stream Processing

    ...Bytewax is a Python framework and Rust distributed processing engine that uses a dataflow computational model to provide parallelizable stream processing and event processing capabilities similar to Flink, Spark, and Kafka Streams. You can use Bytewax for a variety of workloads from moving data à la Kafka Connect style all the way to advanced online machine learning workloads. Bytewax is not limited to streaming applications but excels anywhere that data can be distributed at the input and output.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    pdfcpu

    pdfcpu

    A PDF processor written in Go

    pdfcpu is a PDF processing library written in Go supporting encryption. It provides both an API and a CLI. Supported are all versions up to PDF 1.7 (ISO-32000). This is an effort to build a comprehensive PDF processing library from the ground up written in Go. Over time pdfcpu aims to support the standard range of PDF processing features and also any interesting use cases that may present themselves along the way. The main focus lies on strong support for batch processing and scripting via a...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 17
    ExtractThinker

    ExtractThinker

    ExtractThinker is a Document Intelligence library for LLMs

    ExtractThinker is a tool designed to facilitate the extraction and analysis of information from various data sources, aiding in data processing and knowledge discovery.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 18
    jq

    jq

    Lightweight and flexible command-line JSON processor

    jq is like sed for JSON data - you can use it to slice, filter, map and transform structured data with the same ease that sed, awk, grep and friends let you play with text. jq is written in portable C, and it has zero runtime dependencies. You can download a single binary, scp it to a far away machine of the same type, and expect it to work. jq can mangle the data format that you have into the one that you want with very little effort, and the program to do so is often shorter and simpler...
    Downloads: 102 This Week
    Last Update:
    See Project
  • 19
    Serial Studio

    Serial Studio

    Multi-purpose serial data visualization & processing

    Serial Studio is a simple, multi-platform, and multi-purpose serial data visualization program that allows embedded developers to visualize, analyze, and present data generated from their projects and devices while avoiding the need to write project-specific visualization software. Over my many CanSat-based competitions, I found myself writing and maintaining several Ground Station software for each program. However, I decided that it would be easier and more sustainable to define one...
    Downloads: 26 This Week
    Last Update:
    See Project
  • 20
    ThingsBoard

    ThingsBoard

    Device management, data collection, processing and visualization

    ...Define relations between your devices, assets, customers or any other entities. Collect and store telemetry data in a scalable and fault-tolerant way. Visualize your data with built-in or custom widgets and flexible dashboards. Share dashboards with your customers. Define data processing rule chains. Transform and normalize your device data. Raise alarms on incoming telemetry events, attribute updates, device inactivity, and user actions.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 21
    Airborne Data Processing and Analysis

    Airborne Data Processing and Analysis

    Software to processing and analyze of airborne measurements.

    The Airborne Data Processing and Analysis (ADPAA) package is an open-source software package containing a collection of programs and scripts to process and analyze data from in-situ instruments deployed on airborne platforms. The ADPAA package was started to process data on the North Dakota Citation Research Aircraft but has been used to process data on many airborne platforms.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 22
    Reactor Core

    Reactor Core

    Non-Blocking Reactive Foundation for the JVM

    Reactor Core is a foundational library for building reactive applications in Java, providing a powerful API for asynchronous, non-blocking programming.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 23
    Apache Spark

    Apache Spark

    A unified analytics engine for large-scale data processing

    Apache Spark is a unified engine for large-scale data processing, offering APIs for batch jobs, streaming, machine learning, and graph computation. It builds on resilient distributed datasets (RDDs) and the newer DataFrame/Dataset abstractions to provide fault-tolerant, in-memory computation across clusters. Spark’s execution engine handles scheduling, shuffles, caching, and data locality so users can focus on transformations rather than infrastructure plumbing. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 24
    Siddhi Core Libraries

    Siddhi Core Libraries

    Stream Processing and Complex Event Processing Engine

    Fully open source, cloud-native, scalable, micro streaming, and complex event processing system capable of building event-driven applications for use cases such as real-time analytics, data integration, notification management, and adaptive decision-making. Event processing logic can be written using Streaming SQL queries via graphical and source editors, to capture events from diverse data sources, process and analyze them, integrate with multiple services and data stores, and publish output to various endpoints in real time. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    LOTUS

    LOTUS

    AI-Powered Data Processing: Use LOTUS to process all of your datasets

    LOTUS is an open-source framework and query engine designed to enable efficient processing of structured and unstructured datasets using large language models. The system provides a declarative programming model that allows developers to express complex AI data operations using high-level commands rather than manually orchestrating model calls. It offers a Python interface with a Pandas-like API, making it familiar for data scientists and engineers already working with data analysis libraries. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB