Browse free open source Data Pipeline tools and projects for Linux below. Use the toggles on the left to filter open source Data Pipeline tools by OS, license, language, programming language, and project status.
Pentaho offers comprehensive data integration and analytics platform.
Real-time, incremental ETL library for ML with record-level depend
Conduit streams data between data stores. Kafka Connect replacement
lakeFS - Git-like capabilities for your object storage
Backstage is an open platform for building developer portals
SeaTunnel is a distributed, high-performance data integration platform
Making DAG construction easier
Mirror of Apache Kafka
Privacy and Security focused Segment-alternative, in Golang
The open standard for data logging
Build, run, and manage data pipelines for integrating data
StarRocks is a next-gen sub-second MPP database for full analytics
Build data pipelines, the easy way
AutoGluon: AutoML for Image, Text, and Tabular Data
A distributed and extensible workflow scheduler platform
A lightweight stream processing library for Go
Next-Generation Event Processing Platform
A ranked list of awesome Python open-source libraries
A fast script language for Go
Light-weight, flexible, expressive statistical data testing library
Open-source data observability for analytics engineers
Kestra is an infinitely scalable orchestration and scheduling platform
Design, automate, operate and publish data pipelines at scale
Open Source Data Orchestration for the Cloud
Automated Tool for Optimized Modelling