Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence
Machine Learning Software
Search Results

Search Results for "data"

x

Sort By:

Relevance

Clear All Filters

OS

Windows 280
Linux 279
Mac 268
More...
BSD 74
ChromeOS 72
Mobile Operating Systems 1

Category

Artificial Intelligence 289
Software Development 50
Business 34
Scientific/Engineering 20
System 9
Multimedia 8
Education 4
Formats and Protocols 2
Internet 2
Communications 1
Database 1
Productivity 1
Social sciences 1

License

OSI-Approved Open Source 267
Creative Commons Attribution License 2
GNU Free Documentation License 2

Translations

English 8

Programming Language

Python 289
C++ 5
JavaScript 4
C 2
C# 1
More...
Fortran 1
Java 1
PL/SQL 1
Prolog 1
R 1
Rust 1
TypeScript 1
Unix Shell 1

Status

Beta 13
Production/Stable 7
Alpha 3
Pre-Alpha 2

Showing 289 open source projects for "data"

View related business solutions

Machine Learning Python Clear Filters & Widen Search

Native Teams: Payments and Employment for International Teams
Expand Your Global Team in 85+ Countries

With Native Teams’ Employer of Record (EOR) service, you can compliantly hire in 85+ countries without setting up a legal entity. From dedicated employee support and localised benefits to tax optimisation, we help you build a global team that feels truly cared for.

Learn More
The fastest way to host, scale and get paid on WordPress
For developers searching for a web hosting solution

Lightning-fast hosting, AI-assisted site management, and enterprise payments all in one platform designed for agencies and growth-focused businesses.

Learn More
1

Data Science Articles from CodeCut

Collection of useful data science topics along with articles

The Data-science repository from CodeCutTech is a curated collection of educational content focused on practical tools and workflows used in modern data science projects. Instead of providing a single software package, the repository aggregates articles, tutorials, and examples covering many topics within the data science ecosystem. The materials address areas such as MLOps, data management, project organization, testing practices, visualization techniques, and productivity tools used by data scientists. ...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
2

Data Science Interviews

Data science interview questions and answers

Data Science Interviews is an open-source repository that collects common data science interview questions along with community-provided answers and explanations. The project serves as a preparation resource for students, job seekers, and professionals who want to review the technical knowledge required for data science roles. The repository organizes questions into different categories including theoretical machine learning concepts, technical programming questions, and probability or statistics problems. ...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
3

cracking-the-data-science-interview

A Collection of Cheatsheets, Books, Questions, and Portfolio

Cracking the Data Science Interview is an open educational repository that collects study materials, resources, and reference links for preparing for data science interviews. The project organizes content across many fundamental areas of data science, including statistics, probability, SQL, machine learning, and deep learning. It includes cheat sheets that summarize important technical concepts commonly discussed during technical interviews.

Downloads: 3 This Week

Last Update: 2026-03-11
See Project
4

Label Studio

Label Studio is a multi-type data labeling and annotation tool

...Support for multiple data types including images, audio, text, HTML, time-series, and video.

Downloads: 33 This Week

Last Update: 2026-03-13
See Project
Enterprise-Class Managed File Transfer.
For organizations that need to automate secure file transfers to protect sensitive data.

Diplomat MFT by Coviant Software is a secure, reliable managed file transfer solution designed to simplify and automate SFTP, FTPS, and HTTPS file transfers. Built for seamless integration, Diplomat MFT works across major cloud storage platforms, including AWS S3, Azure Blob, Google Cloud, Oracle Cloud, SharePoint, Dropbox, Box, and more.

Learn More
5

scikit-learn

Machine learning in Python

scikit-learn is an open source Python module for machine learning built on NumPy, SciPy and matplotlib. It offers simple and efficient tools for predictive data analysis and is reusable in various contexts.

Downloads: 18 This Week

Last Update: 2025-12-10
See Project
6

Diffgram

Training data (data labeling, annotation, workflow) for all data types

From ingesting data to exploring it, annotating it, and managing workflows. Diffgram is a single application that will improve your data labeling and bring all aspects of training data under a single roof. Diffgram is world’s first truly open source training data platform that focuses on giving its users an unlimited experience. This is aimed to reduce your data labeling bills and increase your Training Data Quality.

Downloads: 8 This Week

Last Update: 2024-10-14
See Project
7

Arize Phoenix

Uncover insights, surface problems, monitor, and fine tune your LLM

Phoenix provides ML insights at lightning speed with zero-config observability for model drift, performance, and data quality. Phoenix is an Open Source ML Observability library designed for the Notebook. The toolset is designed to ingest model inference data for LLMs, CV, NLP and tabular datasets. It allows Data Scientists to quickly visualize their model data, monitor performance, track down issues & insights, and easily export to improve. Deep Learning Models (CV, LLM, and Generative) are an amazing technology that will power many of future ML use cases. ...

Downloads: 10 This Week

Last Update: 1 day ago
See Project
8

NannyML

Detecting silent model failure. NannyML estimates performance

NannyML is an open-source python library that allows you to estimate post-deployment model performance (without access to targets), detect data drift, and intelligently link data drift alerts back to changes in model performance. Built for data scientists, NannyML has an easy-to-use interface, and interactive visualizations, is completely model-agnostic, and currently supports all tabular classification use cases. NannyML closes the loop with performance monitoring and post deployment data science, empowering data scientist to quickly understand and automatically detect silent model failure. ...

Downloads: 5 This Week

Last Update: 2025-07-12
See Project
9

FiftyOne

The open-source tool for building high-quality datasets

...FiftyOne provides the building blocks for optimizing your dataset analysis pipeline. Use it to get hands-on with your data, including visualizing complex labels, evaluating your models, exploring scenarios of interest, identifying failure modes, finding annotation mistakes, and much more! Surveys show that machine learning engineers spend over half of their time wrangling data, but it doesn't have to be that way.

Downloads: 6 This Week

Last Update: 2026-04-06
See Project
Effortlessly Manage Product Information
OneTimePIM is a comprehensive Product Information Management System designed to streamline the import and distribution of product data.

A single source of truth for all of your product information with easy ways to distribute that data to wherever it needs to go, including the most powerful e-commerce connectors in the industry.

Learn More
10

Apache Hamilton

Helps data scientists define testable self-documenting dataflows

...This approach encourages modular, testable, and maintainable data pipelines because each transformation is isolated and easily unit tested. The framework also automatically tracks lineage and metadata about how data is produced, which improves debugging, reproducibility, and transparency in data workflows.

Downloads: 6 This Week

Last Update: 2026-03-12
See Project
11

PySyft

Data science on data without acquiring a copy

...Wherever your data wants to live in your ownership, the Syft ecosystem exists to help keep it there while allowing it to be used privately.

Downloads: 5 This Week

Last Update: 2025-02-13
See Project
12

Bytewax

Python Stream Processing

...You can use Bytewax for a variety of workloads from moving data à la Kafka Connect style all the way to advanced online machine learning workloads. Bytewax is not limited to streaming applications but excels anywhere that data can be distributed at the input and output.

Downloads: 7 This Week

Last Update: 2024-11-25
See Project
13

OpenBB

Investment Research for Everyone, Everywhere

...Create charts directly from raw data in seconds. Create charts directly from raw data in seconds. Customize your dashboards to build your dream terminal, integrate with your private datasets and bring your own fine-tuned AI copilots.

Downloads: 8 This Week

Last Update: 2026-03-09
See Project
14

X-AnyLabeling

Effortless data labeling with AI support from Segment Anything

X-AnyLabeling is an open-source data annotation platform designed to streamline the process of labeling datasets for computer vision and multimodal AI applications. The software integrates an AI-powered labeling engine that allows users to generate annotations automatically with the assistance of modern vision models such as Segment Anything and various object detection frameworks.

Downloads: 45 This Week

Last Update: 2026-03-26
See Project
15

MindsDB

Making Enterprise Data Intelligent and Responsive for AI

MindsDB is an AI data solution that enables humans, AI, agents, and applications to query data in natural language and SQL, and get highly accurate answers across disparate data sources and types. MindsDB connects to diverse data sources and applications, and unifies petabyte-scale structured and unstructured data. Powered by an industry-first cognitive engine that can operate anywhere (on-prem, VPC, serverless), it empowers both humans and AI with highly informed decision-making capabilities. ...

Downloads: 6 This Week

Last Update: 2026-03-03
See Project
16

AutoGluon

AutoGluon: AutoML for Image, Text, and Tabular Data

AutoGluon enables easy-to-use and easy-to-extend AutoML with a focus on automated stack ensembling, deep learning, and real-world applications spanning image, text, and tabular data. Intended for both ML beginners and experts, AutoGluon enables you to quickly prototype deep learning and classical ML solutions for your raw data with a few lines of code. Automatically utilize state-of-the-art techniques (where appropriate) without expert knowledge. Leverage automatic hyperparameter tuning, model selection/ensembling, architecture search, and data processing. ...

Downloads: 4 This Week

Last Update: 2025-12-19
See Project
17

marimo

A reactive notebook for Python

...Run one cell and marimo reacts by automatically running affected cells, eliminating the error-prone chore of managing the notebook state. marimo's reactive UI elements, like data frame GUIs and plots, make working with data feel refreshingly fast, futuristic, and intuitive. Version with git, run as Python scripts, import symbols from a notebook into other notebooks or Python files, and lint or format with your favorite tools. You'll always be able to reproduce your collaborators' results. Notebooks are executed in a deterministic order, with no hidden state, delete a cell and marimo deletes its variables while updating affected cells.

Downloads: 4 This Week

Last Update: 6 days ago
See Project
18

Evidently

Evaluate and monitor ML models from validation to production

Evidently is an open-source Python library for data scientists and ML engineers. It helps evaluate, test, and monitor ML models from validation to production. It works with tabular, text data and embeddings.

Downloads: 13 This Week

Last Update: 2026-03-10
See Project
19

Deepchecks

Test Suites for validating ML models & data

Deepchecks is the leading tool for testing and for validating your machine learning models and data, and it enables doing so with minimal effort. Deepchecks accompany you through various validation and testing needs such as verifying your data’s integrity, inspecting its distributions, validating data splits, evaluating your model and comparing between different models. While you’re in the research phase, and want to validate your data, find potential methodological problems, and/or validate your model and evaluate it. ...

Downloads: 5 This Week

Last Update: 2024-12-15
See Project
20

SageMaker Training Toolkit

Train machine learning models within Docker containers

Train machine learning models within a Docker container using Amazon SageMaker. Amazon SageMaker is a fully managed service for data science and machine learning (ML) workflows. You can use Amazon SageMaker to simplify the process of building, training, and deploying ML models. To train a model, you can include your training script and dependencies in a Docker container that runs your training code. A container provides an effectively isolated environment, ensuring a consistent runtime and reliable training process. ...

Downloads: 7 This Week

Last Update: 2025-09-22
See Project
21

mosaicml composer

Supercharge Your Model Training

composer is a deep learning training framework built on PyTorch and designed to make large-scale model training more efficient, scalable, and customizable. At the center of the project is a highly optimized Trainer abstraction that simplifies the management of training loops, parallelization, metrics, logging, and data loading. The framework is intended for modern workloads that may span anything from a single GPU to very large distributed training environments, which makes it suitable for both experimentation and production-scale development. It includes built-in support for distributed training strategies such as Fully Sharded Data Parallelism and standard Distributed Data Parallel execution, helping teams scale models without having to assemble as much infrastructure by hand.

Downloads: 6 This Week

Last Update: 2026-03-10
See Project
22

tslearn

The machine learning toolkit for time series analysis in Python

...The three dimensions correspond to the number of time series, the number of measurements per time series and the number of dimensions respectively (n_ts, max_sz, d). In order to get the data in the right format.

Downloads: 7 This Week

Last Update: 2026-03-13
See Project
23

Flyte

Build production-grade data and ML workflows, hassle-free The infinitely scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks. Don’t let friction between development and production slow down the deployment of new data/ML workflows and cause an increase in production bugs. Flyte enables rapid experimentation with production-grade software.

Downloads: 2 This Week

Last Update: 2026-04-03
See Project
24

Pandas Profiling

Create HTML profiling reports from pandas DataFrame objects

pandas-profiling generates profile reports from a pandas DataFrame. The pandas df.describe() function is handy yet a little basic for exploratory data analysis. pandas-profiling extends pandas DataFrame with df.profile_report(), which automatically generates a standardized univariate and multivariate report for data understanding. High correlation warnings, based on different correlation metrics (Spearman, Pearson, Kendall, Cramér’s V, Phik). Most common categories (uppercase, lowercase, separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic). ...

Downloads: 1 This Week

Last Update: 2026-01-13
See Project
25

RAGFlow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine

...It offers a streamlined RAG workflow for businesses of any scale, combining LLM (Large Language Models) to provide truthful question-answering capabilities, backed by well-founded citations from various complex formatted data.

Downloads: 5 This Week

Last Update: 2026-02-10
See Project

Previous
You're on page 1
2
3
4
5
Next

Related Searches

label studio

machine learning

roof

phoenix

lotto prediction algorithm

artificial intelligence personal assistant python

ai pro free

bi

ragflow

studio

Related Categories

Artificial Intelligence

Software Development

Business

Scientific/Engineering

System

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Privacy Choices Advertise