Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Business
Data Management
Data Science Tools
Search Results

Search Results for "python data analysis"

x

Sort By:

Relevance

Clear All Filters

OS

Linux 43
Windows 39
Mac 38
More...
ChromeOS 8
BSD 7
Mobile Operating Systems 1

Category

Business 43
Artificial Intelligence 17
Software Development 17
Scientific/Engineering 5
Education 3
Internet 3
Database 1
Formats and Protocols 1
Productivity 1
Social sciences 1
System 1
Terminals 1

License

OSI-Approved Open Source 32
Creative Commons Attribution License 5
GNU Free Documentation License 2
Other License 1

Translations

English 2
French 1
German 1

Programming Language

Python 25
C++ 4
JavaScript 4
Java 2
More...
TypeScript 2
C 1
C# 1
R 1
Scala 1

Status

Production/Stable 3
Alpha 1
Beta 1

Showing 43 open source projects for "python data analysis"

View related business solutions

Data Science Linux Clear Filters & Widen Search

Curtain LogTrace File Activity Monitoring
For any organizations (up to 10,000 PCs)

Curtain LogTrace File Activity Monitoring is an enterprise file activity monitoring solution. It tracks user actions: create, copy, move, delete, rename, print, open, close, save. Includes source/destination paths and disk type. Perfect for monitoring user file activities.

Learn More
AI-powered SAST and AppSec platform that helps companies find and fix vulnerabilities.
Trusted by 750+ companies and performing 200k+ code scans monthly.

ZeroPath (YC S24) is an AI-native application security platform that delivers comprehensive code protection beyond traditional SAST. Founded by security engineers from Tesla and Google, ZeroPath combines large language models with advanced program analysis to find and automatically fix vulnerabilities.

Learn More
1

Cookiecutter Data Science

Project structure for doing and sharing data science work

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. When we think about data analysis, we often think just about the resulting reports, insights, or visualizations. While these end products are generally the main event, it's easy to focus on making the products look nice and ignore the quality of the code that generates them. Because these end products are created programmatically, code quality is still important! ...

Downloads: 7 This Week

Last Update: 2025-07-24
See Project
2

AI Data Science Team

An AI-powered data science team of agents

AI Data Science Team is a Python library and agent ecosystem designed to accelerate and automate common data science workflows by modeling them as specialized AI “agents” that can be orchestrated to perform tasks like data cleaning, transformation, analysis, visualization, and machine learning. It provides a modular agent framework where each agent focuses on a step in the typical data science pipeline — for example, loading data from CSV/Excel files, cleaning and wrangling messy datasets, engineering predictive features, building models with AutoML, connecting to SQL databases, and producing visual outputs — all driven by natural language or programmatic instructions. ...

Downloads: 1 This Week

Last Update: 2026-01-26
See Project
3

Positron

Positron, a next-generation data science IDE

Positron is a next-generation integrated development environment (IDE) created by Posit PBC (formerly RStudio Inc) specifically tailored for data science workflows in Python, R, and multi-language ecosystems. It aims to unify exploratory data analysis, production code, and data-app authoring in a single environment so that data scientists move from “question → insight → application” without switching tools. Built on the open-source Code-OSS foundation, Positron provides a familiar coding experience along with specialized panes and tooling for variable inspection, data-frame viewing, plotting previews, and interactive consoles designed for analytical work. ...

Downloads: 11 This Week

Last Update: 3 days ago
See Project
4

Quadratic

Data science spreadsheet with Python & SQL

Quadratic enables your team to work together on data analysis to deliver better results, faster. You already know how to use a spreadsheet, but you’ve never had this much power before. Quadratic is a Web-based spreadsheet application that runs in the browser and as a native app (via Electron). Our goal is to build a spreadsheet that enables you to pull your data from its source (SaaS, Database, CSV, API, etc) and then work with that data using the most popular data science tools today (Python, Pandas, SQL, JS, Excel Formulas, etc). ...

Downloads: 8 This Week

Last Update: 2026-02-26
See Project
Complete Data Management for Nonprofits
Designed to fit with multi-level non-profit organization, across any sector

NewOrg is a robust platform built with enhanced features to help non-profit organizations that capture and integrate the information from all of their operational areas to better manage volunteers, clients, programs, outcome reporting, activity sign-ups & scheduling, communications, surveys, fundraising activities and Development campaigns. NewOrg can truly deliver an intuitive product that will help manage your Committees, Donors, Events, and Memberships so that the organization runs efficiently.

Learn More
5

marimo

A reactive notebook for Python

marimo is an open-source reactive notebook for Python, reproducible, git-friendly, executable as a script, and shareable as an app. marimo notebooks are reproducible, extremely interactive, designed for collaboration (git-friendly!), deployable as scripts or apps, and fit for modern Pythonista. Run one cell and marimo reacts by automatically running affected cells, eliminating the error-prone chore of managing the notebook state. marimo's reactive UI elements, like data frame GUIs and plots, make working with data feel refreshingly fast, futuristic, and intuitive. ...

Downloads: 4 This Week

Last Update: 6 days ago
See Project
6

Dask

Parallel computing with task scheduling

Dask is a Python library for parallel and distributed computing, designed to scale analytics workloads from single machines to large clusters. It integrates with familiar tools like NumPy, Pandas, and scikit-learn while enabling execution across cores or nodes with minimal code changes. Dask excels at handling large datasets that don’t fit into memory and is widely used in data science, machine learning, and big data pipelines.

Downloads: 5 This Week

Last Update: 2026-03-18
See Project
7

Metaflow

A framework for real-life data science

Metaflow is a human-friendly Python library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning.

Downloads: 7 This Week

Last Update: 2026-03-21
See Project
8

cuDF

GPU DataFrame Library

Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data. cuDF provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming. For additional examples, browse our complete API documentation, or check out our more detailed notebooks. cuDF can be installed with conda (miniconda, or the full Anaconda distribution) from the rapidsai channel. cuDF is supported only on Linux, and with Python versions 3.7 and later. ...

Downloads: 5 This Week

Last Update: 2026-04-08
See Project
9

NannyML

Detecting silent model failure. NannyML estimates performance

NannyML is an open-source python library that allows you to estimate post-deployment model performance (without access to targets), detect data drift, and intelligently link data drift alerts back to changes in model performance. Built for data scientists, NannyML has an easy-to-use interface, and interactive visualizations, is completely model-agnostic, and currently supports all tabular classification use cases.

Downloads: 5 This Week

Last Update: 2025-07-12
See Project
Entity Management Software
Filejet’s entity management software organizes & files all of your entity reports: every year, in every jurisdiction – with total visibility.

Automate everything: entity compliance, registered agent services, annual report and BOI filings, org charts, and DBA/fictitious name and business registration renewals, so you can focus on higher-value work.

Learn More
10

Great Expectations

Always know what to expect from your data

Great Expectations helps data teams eliminate pipeline debt, through data testing, documentation, and profiling. Software developers have long known that testing and documentation are essential for managing complex codebases. Great Expectations brings the same confidence, integrity, and acceleration to data science and data engineering teams. Expectations are assertions for data. They are the workhorse abstraction in Great Expectations, covering all kinds of common data issues. Expectations...

Downloads: 5 This Week

Last Update: 1 day ago
See Project
11

PySyft

Data science on data without acquiring a copy

Most software libraries let you compute over the information you own and see inside of machines you control. However, this means that you cannot compute on information without first obtaining (at least partial) ownership of that information. It also means that you cannot compute using machines without first obtaining control over those machines. This is very limiting to human collaboration and systematically drives the centralization of data, because you cannot work with a bunch of data...

Downloads: 5 This Week

Last Update: 2025-02-13
See Project
12

AWS SDK for pandas

Easy integration with Athena, Glue, Redshift, Timestream, Neptune

aws-sdk-pandas (formerly AWS Data Wrangler) bridges pandas with the AWS analytics stack so DataFrames flow seamlessly to and from cloud services. With a few lines of code, you can read from and write to Amazon S3 in Parquet/CSV/JSON/ORC, register tables in the AWS Glue Data Catalog, and query with Amazon Athena directly into pandas. The library abstracts efficient patterns like partitioning, compression, and vectorized I/O so you get performant data lake operations without hand-rolling boilerplate. ...

Downloads: 1 This Week

Last Update: 2026-04-08
See Project
13

RStudio

RStudio is an integrated development environment (IDE) for R

RStudio is a powerful, full-featured integrated development environment (IDE) tailored primarily for the R programming language but increasingly supportive of other languages like Python and Julia. It brings together console, editor, plotting, workspace, history, and file-management panes into a unified interface, helping data scientists, statisticians, and analysts to work more productively. The IDE is cross-platform: there are desktop versions for Windows, macOS and Linux, as well as a server version for remote or multi-user deployment via a web browser. ...

Downloads: 35 This Week

Last Update: 2026-03-25
See Project
14

Awesome Fraud Detection Research Papers

A curated list of data mining papers about fraud detection

A curated list of data mining papers about fraud detection from several conferences.

Downloads: 0 This Week

Last Update: 2026-01-05
See Project
15

DearPyGui

Graphical User Interface Toolkit for Python with minimal dependencies

...DPG offers a solid framework for developing scientific, engineering, gaming, data science and other applications that require fast and interactive interfaces. The Tutorials will provide a great overview and links to each topic in the API Reference for more detailed reading. Complete theme and style control. GPU-based rendering and efficient C/C++ code.

Downloads: 7 This Week

Last Update: 2026-02-04
See Project
16

tsfresh

Automatic extraction of relevant features from time series

...Further tsfresh is compatible with pythons pandas and scikit-learn APIs, two important packages for Data Science endeavours in python. The extracted features can be used to describe or cluster time series based on the extracted characteristics. Further, they can be used to build models that perform classification/regression tasks on the time series. Often the features give new insights into time series and their dynamics.

Downloads: 7 This Week

Last Update: 2025-08-30
See Project
17

ClearML

Streamline your ML workflow

ClearML is an open source platform that automates and simplifies developing and managing machine learning solutions for thousands of data science teams all over the world. It is designed as an end-to-end MLOps suite allowing you to focus on developing your ML code & automation, while ClearML ensures your work is reproducible and scalable. The ClearML Python Package for integrating ClearML into your existing scripts by adding just two lines of code, and optionally extending your experiments and other workflows with ClearML powerful and versatile set of classes and methods. ...

Downloads: 1 This Week

Last Update: 2026-03-24
See Project
18

NVIDIA Merlin

Library providing end-to-end GPU-accelerated recommender systems

NVIDIA Merlin is an open-source library that accelerates recommender systems on NVIDIA GPUs. The library enables data scientists, machine learning engineers, and researchers to build high-performing recommenders at scale. Merlin includes tools to address common feature engineering, training, and inference challenges. Each stage of the Merlin pipeline is optimized to support hundreds of terabytes of data, which is all accessible through easy-to-use APIs. For more information, see NVIDIA...

Downloads: 2 This Week

Last Update: 2024-06-14
See Project
19

SageMaker Training Toolkit

Train machine learning models within Docker containers

Train machine learning models within a Docker container using Amazon SageMaker. Amazon SageMaker is a fully managed service for data science and machine learning (ML) workflows. You can use Amazon SageMaker to simplify the process of building, training, and deploying ML models. To train a model, you can include your training script and dependencies in a Docker container that runs your training code. A container provides an effectively isolated environment, ensuring a consistent runtime and...

Downloads: 7 This Week

Last Update: 2025-09-22
See Project
20

XGBoost

Scalable and Flexible Gradient Boosting

...XGBoost works by implementing machine learning algorithms under the Gradient Boosting framework. It also offers parallel tree boosting (GBDT, GBRT or GBM) that can quickly and accurately solve many data science problems. XGBoost can be used for Python, Java, Scala, R, C++ and more. It can run on a single machine, Hadoop, Spark, Dask, Flink and most other distributed environments, and is capable of solving problems beyond billions of examples.

Downloads: 10 This Week

Last Update: 2026-02-10
See Project
21

Recommenders

Best practices on recommendation systems

The Recommenders repository provides examples and best practices for building recommendation systems, provided as Jupyter notebooks. The module reco_utils contains functions to simplify common tasks used when developing and evaluating recommender systems. Several utilities are provided in reco_utils to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several...

Downloads: 0 This Week

Last Update: 2024-12-23
See Project
22

MCPower

MCPower — simple Monte Carlo power analysis for complex models

MCPower-GUI is a desktop application that provides a graphical interface for the MCPower Monte Carlo power analysis library. It guides users through the full workflow across three tabs: Model setup (formula input with live parsing, CSV data upload with auto-detected variable types, effect size sliders, and correlation editing), Analysis configuration (find power for a given sample size or find the minimum sample size for a target power, with multiple testing correction and scenario analysis), and Results (interactive charts, exportable tables, and auto-generated Python replication scripts). ...

1 Review

Downloads: 20 This Week

Last Update: 2026-03-01
See Project
23

Synapse Machine Learning

Simple and distributed Machine Learning

SynapseML (previously MMLSpark) is an open source library to simplify the creation of scalable machine learning pipelines. SynapseML builds on Apache Spark and SparkML to enable new kinds of machine learning, analytics, and model deployment workflows. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with the Open Neural Network Exchange (ONNX), LightGBM, The Cognitive Services, Vowpal Wabbit,...

Downloads: 0 This Week

Last Update: 2026-04-04
See Project
24

Catbird Linux

Linux for content creation, web scraping, coding, and data analysis.

Catbird Linux is a USB pluggable Live Linux operating system built for media creation, web scraping, and software coding. It is the daily driver you want for retrieving data, making videos or podcasts, and making software tools to automate the repetitive tasks. It is ready for work in Python, Lua, and Go languages, with numerous packages for web scraping or downloading data via API calls. Using Catbird Linux, it is possible to accomplish in depth stock market analysis, track weather trends, follow social media sentiment, or do other tasks in data science. ...

Downloads: 16 This Week

Last Update: 2025-08-29
See Project
25

sadsa

SADSA (Software Application for Data Science and Analytics)

SADSA (Software Application for Data Science and Analytics) is a Python-based desktop application designed to simplify statistical analysis, machine learning, and data visualization for students, researchers, and data professionals. Built using Python for the GUI, SADSA provides a menu-driven interface for handling datasets, applying transformations, running advanced statistical tests, machine learning algorithms, and generating insightful plots — all without writing code.

1 Review

Downloads: 1 This Week

Last Update: 2025-04-17
See Project

Previous
You're on page 1
2
Next

Related Searches

rstudio

vba support library excel

artificial intelligence projects in c++

lotto prediction algorithm

artificial intelligence personal assistant python

ide

fraud

graphical user interfaces

time series

nvidia

Related Categories

Business

Artificial Intelligence

Software Development

Scientific/Engineering

Education

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Privacy Choices Advertise