data processing free download

Showing 396 open source projects for "data processing"

View related business solutions

Java Clear Filters & Widen Search

Award-Winning Medical Office Software Designed for Your Specialty
Succeed and scale your practice with cloud-based, data-backed, AI-powered healthcare software.

RXNT is an ambulatory healthcare technology pioneer that empowers medical practices and healthcare organizations to succeed and scale through innovative, data-backed, AI-powered software.

Learn More
AestheticsPro Medical Spa Software
Our new software release will dramatically improve your medspa business performance while enhancing the customer experience

AestheticsPro is the most complete Aesthetics Software on the market today. HIPAA Cloud Compliant with electronic charting, integrated POS, targeted marketing and results driven reporting; AestheticsPro delivers the tools you need to manage your medical spa business. It is our mission To Provide an All-in-One Cutting Edge Software to the Aesthetics Industry.

Learn More
1

ThingsBoard

Device management, data collection, processing and visualization

...Define relations between your devices, assets, customers or any other entities. Collect and store telemetry data in a scalable and fault-tolerant way. Visualize your data with built-in or custom widgets and flexible dashboards. Share dashboards with your customers. Define data processing rule chains. Transform and normalize your device data. Raise alarms on incoming telemetry events, attribute updates, device inactivity, and user actions.

Downloads: 16 This Week

Last Update: 2026-03-30
See Project
2

Siddhi Core Libraries

Stream Processing and Complex Event Processing Engine

Fully open source, cloud-native, scalable, micro streaming, and complex event processing system capable of building event-driven applications for use cases such as real-time analytics, data integration, notification management, and adaptive decision-making. Event processing logic can be written using Streaming SQL queries via graphical and source editors, to capture events from diverse data sources, process and analyze them, integrate with multiple services and data stores, and publish output to various endpoints in real time. ...

Downloads: 3 This Week

Last Update: 2025-03-05
See Project
3

Reactor Core

Non-Blocking Reactive Foundation for the JVM

Reactor Core is a foundational library for building reactive applications in Java, providing a powerful API for asynchronous, non-blocking programming.

Downloads: 7 This Week

Last Update: 2026-03-10
See Project
4

ZXing

Barcode scanning library for Java, Android

ZXing or “Zebra Crossing” is an open source multi-format 1D/2D barcode image processing library that’s been implemented in Java, and also comes with ports to other languages. It currently supports the following formats: UPC-A and UPC-E EAN-8 and EAN-13 Code 39 Code 93 Code 128 ITF Codabar RSS-14 (all variants) RSS Expanded (most variants) QR Code Data Matrix Aztec ('beta' quality) PDF 417 ('alpha' quality) MaxiCode ZXing is made up of several modules, including a core image decoding library, JavaSE-specific client code, and Android client Barcode Scanner. ...

Downloads: 74 This Week

Last Update: 2025-11-12
See Project
Simplify Purchasing For Your Business
Manage what you buy and how you buy it with Order.co, so you have control over your time and money spent.

Simplify every aspect of buying for your business in Order.co. From sourcing products to scaling purchasing across locations to automating your AP and approvals workstreams, Order.co is the platform of choice for growing businesses.

Learn More
5

Apache Beam

Unified programming model for Batch and Streaming

Apache Beam is an open source, unified programming model to define both batch and streaming data-parallel processing pipelines, as well as certain language-specific SDKs for constructing pipelines and Runners. These pipelines are executed on one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Beam is especially useful for Embarrassingly Parallel data processing tasks, and caters to the different needs and backgrounds of end users, SDK writers and runner writers.

Downloads: 1 This Week

Last Update: 2026-03-30
See Project
6

Dolphin Scheduler

A distributed and extensible workflow scheduler platform

Apache DolphinScheduler is a distributed and extensible workflow scheduler platform with powerful DAG visual interfaces, dedicated to solving complex job dependencies in the data pipeline and providing various types of jobs available `out of the box`. Dedicated to solving the complex task dependencies in data processing, making the scheduler system out of the box for data processing. Decentralized multi-master and multi-worker, HA is supported by itself, overload processing. All process definition operations are visualized, Visualization process defines key information at a glance, One-click deployment. ...

Downloads: 6 This Week

Last Update: 2026-03-01
See Project
7

Addax

Addax is a versatile open-source ETL tool

Addax is a data integration and ETL (Extract, Transform, Load) tool designed for high-performance data migration tasks. It simplifies the process of moving data between different systems and formats.

Downloads: 14 This Week

Last Update: 2026-04-03
See Project
8

Flink CDC

Flink CDC is a streaming data integration tool

Apache Flink CDC is a distributed data integration tool that captures data changes in real-time from various databases. It leverages Change Data Capture (CDC) technology to stream data changes into Apache Flink, enabling real-time analytics and data processing. Flink CDC simplifies data pipeline development with its declarative YAML configurations.

Downloads: 2 This Week

Last Update: 2026-03-29
See Project
9

KCloud‑Platform‑IoT

KCloud-Platform-IoT

KCloud-Platform-IoT is a comprehensive open-source IoT management platform built with Spring Cloud and Vue.js. It supports device registration, data collection, rule-based processing, and dashboard visualization. Designed for scalability and modularity, the platform is ideal for managing large IoT fleets in industrial or smart city environments.

Downloads: 7 This Week

Last Update: 2 days ago
See Project
The Most Powerful Software Platform for EHSQ and ESG Management
Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.

Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.

Learn More
10

Apache Hudi

Upserts, Deletes And Incremental Processing on Big Data

Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and Incrementals. Hudi manages the storage of large analytical datasets on DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage). Apache Hudi is a transactional data lake platform that brings database and data warehouse capabilities to the data lake. Hudi reimagines slow old-school batch data processing with a powerful new incremental processing framework for low latency minute-level analytics. Hudi provides efficient upserts, by mapping a given hoodie key (record key + partition path) consistently to a file id, via an indexing mechanism. ...

Downloads: 0 This Week

Last Update: 2025-12-18
See Project
11

eXist-db

eXist Native XML Database and Application Platform

eXist-db is an open-source, native XML database and application platform that provides a powerful environment for storing, querying, and managing XML documents. It is designed for complex data management needs, offering XQuery, XSLT, and RESTful web services for interacting with structured data.

Downloads: 6 This Week

Last Update: 2026-03-05
See Project
12

Logstash

Centralize, transform and stash your data

Logstash is a server-side data processing pipeline that dynamically ingests data from numerous sources, transforms it, and ships it to your favorite “stash” regardless of format or complexity. It supports and ingests data of all shapes, sizes and sources, dynamically transforms and prepares this data, and transports it to the output of your choice. Logstash is extensible, with over 200 plugins available to let you create and configure your pipeline how you choose.

Downloads: 9 This Week

Last Update: 6 days ago
See Project
13

PULSAR

Distributed pub-sub messaging system

Apache Pulsar is a cloud-native, distributed messaging and streaming platform originally created at Yahoo! and now a top-level Apache Software Foundation project. Easy to deploy, lightweight compute process, developer-friendly APIs, no need to run your own stream processing engine. Run in production at Yahoo! scale for over 5 years, with millions of messages per second across millions of topics. Expand capacity seamlessly to hundreds of nodes. Low publish latency (< 5ms) at scale with strong durability guarantees. Configurable replication between data centers across multiple geographic regions. Built from the ground up as a multi-tenant system. ...

Downloads: 0 This Week

Last Update: 2026-03-31
See Project
14

Apache Sedona

Cluster computing framework for processing large-scale geospatial data

Apache Sedona™ is a cluster computing system for processing large-scale spatial data. Sedona extends existing cluster computing systems, such as Apache Spark and Apache Flink, with a set of out-of-the-box distributed Spatial Datasets and Spatial SQL that efficiently load, process, and analyze large-scale spatial data across machines. According to our benchmark and third-party research papers, Sedona runs 2X - 10X faster than other Spark-based geospatial data systems on computation-intensive query workloads. ...

Downloads: 1 This Week

Last Update: 2026-01-05
See Project
15

Spring Batch

Spring Batch is a framework for writing batch applications using Java

A lightweight, comprehensive batch framework designed to enable the development of robust batch applications vital for the daily operations of enterprise systems. Spring Batch provides reusable functions that are essential in processing large volumes of records, including logging/tracing, transaction management, job processing statistics, job restart, skip, and resource management. It also provides more advanced technical services and features that will enable extremely high-volume and high...

Downloads: 9 This Week

Last Update: 2026-03-18
See Project
16

Qualitis

Qualitis is a one-stop data quality management platform

Qualitis is a data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems caused by data processing. Based on Spring Boot, Qualitis submits quality model task to Linkis platform. It provides functions such as data quality model construction, data quality model execution, data quality verification, reports of data quality generation and so on. ...

Downloads: 3 This Week

Last Update: 2025-10-17
See Project
17

Apache Flink

Stream processing framework with powerful stream

Apache Flink is a distributed engine for stateful computations over data streams and batches, designed for low-latency processing at scale. Its core runtime executes dataflow graphs with fine-grained backpressure and checkpointing, allowing applications to recover consistently from failures. Flink’s event-time model and watermarks enable accurate out-of-order processing, windowing, and complex time semantics that typical real-time systems struggle with.

Downloads: 1 This Week

Last Update: 2025-11-27
See Project
18

FIT Framework

An enterprise-level AI development framework

FIT Framework is an open-source infrastructure designed to support the development, training, and evaluation of machine learning and AI models through a modular and scalable architecture. It aims to streamline the lifecycle of AI systems by providing standardized components for data processing, model training, evaluation, and deployment. The framework is particularly useful for research and production environments where reproducibility and consistency are critical, as it enforces structured workflows and configurable pipelines. It supports experimentation with different models and datasets, allowing developers to iterate quickly while maintaining clear organization of results and configurations. ...

Downloads: 7 This Week

Last Update: 2026-03-19
See Project
19

Stanford CoreNLP

Stanford CoreNLP, a Java suite of core NLP tools

CoreNLP is your one stop shop for natural language processing in Java! CoreNLP enables users to derive linguistic annotations for text, including token and sentence boundaries, parts of speech, named entities, numeric and time values, dependency and constituency parses, coreference, sentiment, quote attributions, and relations. CoreNLP currently supports 6 languages, Arabic, Chinese, English, French, German, and Spanish.

Downloads: 5 This Week

Last Update: 2025-06-07
See Project
20

Hazelcast

Open-source distributed computation and storage platform

...You can deploy it at any scale from small edge devices to a large cluster of cloud instances. A cluster of Hazelcast nodes share both the data storage and computational load which can dynamically scale up and down. When you add new nodes to the cluster, the data is automatically rebalanced across the cluster and currently running computational tasks (known as jobs) snapshot their state and scale with processing guarantees.

Downloads: 12 This Week

Last Update: 2025-10-15
See Project
21

Kestra

Kestra is an infinitely scalable orchestration and scheduling platform

Build reliable workflows, blazingly fast, deploy in just a few clicks. Kestra is an open-source, event-driven orchestrator that simplifies data operations and improves collaboration between engineers and business users. By bringing Infrastructure as Code best practices to data pipelines, Kestra allows you to build reliable workflows and manage them with confidence. Thanks to the declarative YAML interface for defining orchestration logic, everyone who benefits from analytics can participate...

Downloads: 2 This Week

Last Update: 5 days ago
See Project
22

Apache InLong

Apache InLong - a one-stop integration framework for massive data

Apache InLong is a one-stop integration framework for massive data that provides automatic, secure and reliable data transmission capabilities. InLong supports both batch and stream data processing at the same time, which offers great power to build data analysis, modeling and other real-time applications based on streaming data. InLong (应龙) is a divine beast in Chinese mythology who guides the river into the sea, and it is regarded as a metaphor of the InLong system for reporting data streams. ...

Downloads: 1 This Week

Last Update: 2025-11-13
See Project
23

Genie

Distributed Big Data Orchestration Service

Genie is a completely open source distributed job orchestration engine developed by Netflix. Genie provides REST-ful APIs to run a variety of big data jobs like Hadoop, Pig, Hive, Spark, Presto, Sqoop and more. It also provides APIs for managing the metadata of many distributed processing clusters and the commands and applications which run on them.

Downloads: 0 This Week

Last Update: 2025-08-05
See Project
24

Data Crow

The ultimate cataloguer

Data Crow allows you to use the standard movie & video (divx, xvid, DVD, Blu-ray, etc), book (and eBooks), images, board games, comic books, games & software, music (mp3 and other music files) cataloguing modules. Besides these modules, which you can change to fit your requirements, you can create new modules (want to catalogue your stamps, equipment, or anything else?). The GUI is skinnable. Reporting (using JasperReports and their community edition JasperSoft Developer Studio ), loan...

57 Reviews

Downloads: 299 This Week

Last Update: 2026-03-11
See Project
25

Google Cloud Dataflow Template Pipelines

Cloud Dataflow Google-provided templates for solving data tasks

DataflowTemplates is the source repository for Google-provided Dataflow templates that are intended to solve large-scale in-cloud data processing tasks without requiring users to build everything from scratch in a full development environment. The repository is centered on templated pipelines powered by Google Cloud Dataflow and Apache Beam, making it easier to run common integration and movement jobs such as data import, export, backup, restore, and bulk API operations. ...

Downloads: 7 This Week

Last Update: 7 days ago
See Project