Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Business
Data Management
Data Integration Tools
Search Results

Search Results for "python data analysis"

x

Sort By:

Relevance

Clear All Filters

OS

Linux 29
Windows 27
Mac 24
More...
BSD 9
ChromeOS 8

Category

Business 31
- Data Management 31
- Enterprise 3
Scientific/Engineering 8
Database 6
Formats and Protocols 3
Internet 3
Software Development 3
System 3
Artificial Intelligence 1
Communications 1
Education 1
Printing 1

License

OSI-Approved Open Source 26
Creative Commons Attribution License 1
Other License 1

Translations

English 5
Catalan 1
Chinese (Simplified) 1
French 1
More...
German 1
Italian 1
Javanese 1
Korean 1
Polish 1
Portuguese 1
Romanian 1
Russian 1
Spanish 1

Programming Language

Python 12
Java 9
Go 2
Unix Shell 2
More...
C# 1
JavaScript 1
PHP 1
R 1

Status

Production/Stable 8
Beta 4
Mature 2
Pre-Alpha 1

Showing 31 open source projects for "python data analysis"

View related business solutions

Data Integration Clear Filters & Widen Search

Apify is a full-stack web scraping and automation platform helping anyone get value from the web.
Get web data. Build automations.

Actors are serverless cloud programs that extract data, automate web tasks, and run AI agents. Developers build them using JavaScript, Python, or Crawlee, Apify's open-source library. Build once, publish to Store, and earn when others use it. Thousands of developers do this - Apify handles infrastructure, billing, and monthly payouts.

Learn More
Jesta I.S. | Enterprise Software For Retail and Supply Chain
Transition from fragmented entry-level or legacy systems to an enterprise suite.

Unify your people and operations across all departments and channels. Discover end-to-end retail, wholesale, and supply chain management software suites designed to scale.

Learn More
1

reticulate

R Interface to Python

reticulate is an R package from Posit that creates seamless interoperability between R and Python. It lets you call Python modules, classes, and functions from within R, automatically translating between R and Python data structures. Useful for combining Python tooling with R projects, data analysis, and RMarkdown reports.

Downloads: 0 This Week

Last Update: 2026-02-13
See Project
2

Recap

Recap tracks and transform schemas across your whole application

Recap is a schema language and multi-language toolkit to track and transform schemas across your whole application. Your data passes through web services, databases, message brokers, and object stores. Recap describes these schemas in a single language, regardless of which system your data passes through. Recap schemas can be defined in YAML, TOML, JSON, XML, or any other compatible language.

Downloads: 9 This Week

Last Update: 2025-12-30
See Project
3

Dagster

An orchestration platform for the development, production

Dagster is an orchestration platform for the development, production, and observation of data assets. Dagster as a productivity platform: With Dagster, you can focus on running tasks, or you can identify the key assets you need to create using a declarative approach. Embrace CI/CD best practices from the get-go: build reusable components, spot data quality issues, and flag bugs early. Dagster as a robust orchestration engine: Put your pipelines into production with a robust...

Downloads: 25 This Week

Last Update: 2026-04-09
See Project
4

Airbyte

Data integration platform for ELT pipelines from APIs, databases

We believe that only an open-source solution to data movement can cover the long tail of data sources while empowering data engineers to customize existing connectors. Our ultimate vision is to help you move data from any source to any destination. Airbyte already provides the largest catalog of 300+ connectors for APIs, databases, data warehouses, and data lakes. Moving critical data with Airbyte is as easy and reliable as flipping on a switch. Our teams process more than 300 billion rows...

Downloads: 11 This Week

Last Update: 2025-10-15
See Project
Cycloid: Hybrid Cloud DevOps collaboration platform
For Developers, DevOps, IT departments, MSPs

Enable your developers to do their best work and increase time-to-market speed with a leading DevOps and Hybrid Cloud platform.

Learn More
5

Apache DevLake

Apache DevLake is an open-source dev data platform

Apache DevLake is an open-source dev data platform that ingests, analyzes, and visualizes the fragmented data from DevOps tools to extract insights for engineering excellence, developer experience, and community growth. Apache DevLake is designed for developer teams looking to make better sense of their development process and to bring a more data-driven approach to their own practices. You can ask Apache DevLake many questions regarding your development process. Just connect and query. Your...

Downloads: 9 This Week

Last Update: 2026-03-12
See Project
6

CellTypist

A tool for semi-automatic cell type classification, harmonization

CellTypist is an automated tool for cell type classification, harmonization, and integration. Classification, transfer cell type labels from the reference to query dataset. Harmonization, match and harmonize cell types defined by independent datasets. integration, integrate cell and cell types with supervision from harmonization. CellTypist recapitulates cell type structure and biology of independent datasets. Regularised linear models with Stochastic Gradient Descent provide a fast and...

Downloads: 0 This Week

Last Update: 2025-06-25
See Project
7

harmonypy

Integrate multiple high-dimensional datasets with fuzzy k-means

Harmony is an algorithm for integrating multiple high-dimensional datasets. harmonypy is a port of the harmony R package by Ilya Korsunsky. Harmony is a general-purpose R package with an efficient algorithm for integrating multiple data sets. It is especially useful for large single-cell datasets such as single-cell RNA-seq.

Downloads: 0 This Week

Last Update: 2026-01-09
See Project
8

KubeRay

A toolkit to run Ray applications on Kubernetes

KubeRay is a powerful, open-source Kubernetes operator that simplifies the deployment and management of Ray applications on Kubernetes. It offers several key components. KubeRay core: This is the official, fully-maintained component of KubeRay that provides three custom resource definitions, RayCluster, RayJob, and RayService. These resources are designed to help you run a wide range of workloads with ease.

Downloads: 4 This Week

Last Update: 2026-03-19
See Project
9

Stanford Data Miner

Tools for integration and analysis of heterogeneous immunological data

An extensive description of this system is published in the Journal of Translational Medicine (http://www.translational-medicine.com/). In brief, the system consists of two main web applications, a data integration app and a data exploration app. The data integration app is a fully custom Java "Web 2.0" product called Sherpa. Sherpa uses Seam, a platform integrating Asynchronous JavaScript and XML (AJAX), JavaServer Faces (JSF), the Java Persistence API (JPA), and Enterprise Java Beans...

Downloads: 0 This Week

Last Update: 2026-04-03
See Project
LinkSquares: All-in-One Contract Management Platform
#1 Customer Rated CLM Any Contract. Every Department. One Platform.

LinkSquares is the leading Contract Lifecycle Management (CLM) software designed to help legal, procurement, and business operations teams master the entire contract lifecycle, from creation to execution and renewal. The platform transforms how companies manage agreements by centralizing data, automating routine work, and providing actionable insights powered by AI. This single, connected source of truth helps teams eliminate manual processes, streamline workflows, boost visibility, and ensure compliance across thousands of contracts, ultimately reducing risk and administrative burden.

Learn More
10

Pytente

Uma Ferramenta Computacional para Análise e Recuperação de Patentes

O Pytente é uma solução avançada para automatizar o processo de coleta, armazenamento e tratamento de dados bibliográficos de patentes. A ferramenta foi projetada para simplificar a coleta de grandes volumes de dados em repositórios de acesso aberto. O Pytente garante o armazenamento estruturado das informações, além da validação e eliminação de registros duplicados. Dentre as diversas funcionalidades disponibilizadas pela ferramenta, destacam-se a extração personalizada de subconjuntos de...

Downloads: 0 This Week

Last Update: 2025-11-03
See Project
11

Mara Pipelines

A lightweight opinionated ETL framework, halfway between plain scripts

This package contains a lightweight data transformation framework with a focus on transparency and complexity reduction. Data integration pipelines as code: pipelines, tasks and commands are created using declarative Python code. PostgreSQL as a data processing engine. Extensive web ui. The web browser as the main tool for inspecting, running and debugging pipelines. GNU make semantics.

Downloads: 0 This Week

Last Update: 2023-12-06
See Project
12

PANDORA

Revolutionizing Biomedical Research with Advanced Machine Learning

...Join us and make SIMON even cooler! Exploratory analysis of machine learning results with the help of many different visualization techniques will give you instant insights into models and data.

Downloads: 1 This Week

Last Update: 2023-07-26
See Project
13

scArches

Reference mapping for single-cell genomics

Single-cell architecture surgery (scArches) is a package for reference-based analysis of single-cell data. scArches allows your single-cell query data to be analyzed by integrating it into a reference atlas. By mapping your data into an integrated reference you can transfer cell-type annotation from reference to query, identify disease states by mapping to healthy atlas, and advanced applications such as imputing missing data modalities or spatial locations.

Downloads: 0 This Week

Last Update: 2023-06-13
See Project
14

Open Source Data Quality and Profiling

World's first open source data quality & data preparation project

This project is dedicated to open source data quality and data preparation solutions. Data Quality includes profiling, filtering, governance, similarity check, data enrichment alteration, real time alerting, basket analysis, bubble chart Warehouse validation, single customer view etc. defined by Strategy. This tool is developing high performance integrated data management platform which will seamlessly do Data Integration, Data Profiling, Data Quality, Data Preparation, Dummy Data Creation, Meta Data Discovery, Anomaly Discovery, Data Cleansing, Reporting and Analytic. ...

8 Reviews

Downloads: 1 This Week

Last Update: 2021-01-20
See Project
15

CloverDX

Design, automate, operate and publish data pipelines at scale

...Simple data manipulation jobs can be created visually. More complex business logic can be implemented using Clover's domain-specific-language CTL, in Java or languages like Python or JavaScript. Through its DataServices functionality, it allows to quickly turn data pipelines into REST API endpoints. The platform allows to easily scale your data job across multiple cores or nodes/machines.

4 Reviews

Downloads: 1 This Week

Last Update: 2023-05-04
See Project
16

Grinn

graph database and R package for omic data integration

http://kwanjeeraw.github.io/grinn/

Downloads: 0 This Week

Last Update: 2018-07-31
See Project
17

Hetionet

Hetionet: an integrative network of disease

Hetionet is a hetnet — network with multiple node and edge (relationship) types — which encodes biology. The hetnet was designed for Project Rephetio, which aims to systematically identify why drugs work and predict new therapies for drugs. The JSON and Neo4j formats contain node and edge properties, which are absent in the TSV and matrix formats, including licensing information. Therefore the recommended formats are JSON and Neo4j. Our hetio package in Python reads the JSON format, but it...

Downloads: 0 This Week

Last Update: 2023-06-12
See Project
18

optPBN

An optimization toolbox for probabilistic Boolean networks

We introduce optPBN, a Matlab-based toolbox for the optimization of probabilistic Boolean networks (PBN) which operates under the framework of the BN/PBN toolbox from Shmulevich et al. optPBN offers an easy generation of probabilistic Boolean networks from Boolean rule-based modeling and allows for flexible measurement data integration from multiple experiments and a subsequent integrated optimization problem generation which then can be solved with different optimizers. Thereby optPBN...

Downloads: 0 This Week

Last Update: 2015-09-24
See Project
19

Open Information Integration

Open Information Integration Tool Suite (Open II) is used by analysts and programmers to accelerate data integration and harmonization across organizations. OpenII has a neutral schema repository for browsing and comparing all sorts of data models. OpenII is built as a Rich Client Platform Application on top of Eclipse 3.x. Developers need to download Eclipse, install the RCP support, the Fatjar plugin and the Delta Pack in one of the 3.x flavors. Release Notes Release Date: Jan...

Downloads: 14 This Week

Last Update: 2017-03-09
See Project
20

CMIS Input plugin for Pentaho

Allows querying Content Management Systems that use the CMIS.

...Imagine using the information extracted for statistical purposes, for creating reports and, more generally, to analyse your document archives in a way unthinkable until now with the current tools available. All this is possible within the Pentaho Suite, the Open Source Business Intelligence platform, which is useful to the extraction and analysis of structured and semi-structured data. With this goal (the extraction and analysis of data) has been designed and developed the CMIS Input plugin for Pentaho Data Integration (Kettle) that allows querying Content Management Systems that use the CMIS interoperability standard. The data, once extracted, can be stored and analyzed and perhaps presented in customized reports be published in various formats for the end user (PDF, Excel, etc..).

Downloads: 0 This Week

Last Update: 2014-11-09
See Project
21

RDF Content Provider for iQser GIN

Plugin to connect RDF sources with the GIN Server

GIN Server is a semantic middleware for easy data integration and automized analysis. The extendable architecture allows to plugin in data sources, analytics and event handling. This RDF Content Provider enables access to Semantic Web Content as an RDF file or SPAEQL endpoint.

Downloads: 0 This Week

Last Update: 2014-05-21
See Project
22

ONDEX Suite

Framework for text mining, data integration and data analysis. Keywords: ontology and graph alignment, relation mining, warehouse, semantic database integration, bioinformatics, systems biology, microarray, Java.

Downloads: 3 This Week

Last Update: 2019-05-15
See Project
23

openISI : topical data integration

A tool for autonomous and virtual topical data integration using the focused web-harvesting method.

Downloads: 0 This Week

Last Update: 2013-04-09
See Project
24

ISBiology

This disease-centric project contributes data integration and analysis tools from the Institute for Systems Biology (ISB). We offer this project to the research community to further our efforts in disease prediction and prevention.

1 Review

Downloads: 0 This Week

Last Update: 2013-10-31
See Project
25

DataSync Suite

DataSync Suite is an open source platform for integrating tools like Zimbra, SugarCRM, and Drupal. The tool is focused on a single sign-on, application data integration, and fast, flexible deployment.

Downloads: 0 This Week

Last Update: 2015-12-21
See Project

Previous
You're on page 1
2
Next

Related Searches

eclipse rcp

sha256sum

etl tool

pandora ddos tool

data analytics

etl

pipeline

avro

kettle

oracle profiler

Related Categories

Business

Scientific/Engineering

Database

Formats and Protocols

Internet

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Privacy Choices Advertise