Showing 3451 open source projects for "data"

View related business solutions
  • Premier Construction Software Icon
    Premier Construction Software

    Premier is a global leader in financial construction ERP software.

    Rated #1 Construction Accounting Software by Forbes Advisor in 2022 & 2023. Our modern SAAS solution is designed to meet the needs of General Contractors, Developers/Owners, Homebuilders & Specialty Contractors.
    Learn More
  • MicroStation by Bentley Systems is the trusted computer-aided design (CAD) software built specifically for infrastructure design. Icon
    MicroStation by Bentley Systems is the trusted computer-aided design (CAD) software built specifically for infrastructure design.

    Microstation enables architects, engineers, and designers to create precise 2D and 3D drawings that bring complex projects to life.

    MicroStation is the only computer-aided design software for infrastructure design, helping architects and engineers like you bring their vision to life, present their designs to their clients, and deliver their projects to the community.
    Learn More
  • 1
    Data-Juicer

    Data-Juicer

    Data processing for and with foundation models

    Data-Juicer is an open-source data processing and augmentation framework designed to enhance the quality and diversity of datasets for machine learning tasks. It includes a modular pipeline for scalable data transformation.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    Orange Data Mining

    Orange Data Mining

    Orange: Interactive data analysis

    Open source machine learning and data visualization. Build data analysis workflows visually, with a large, diverse toolbox. Perform simple data analysis with clever data visualization. Explore statistical distributions, box plots and scatter plots, or dive deeper with decision trees, hierarchical clustering, heatmaps, MDS and linear projections. Even your multidimensional data can become sensible in 2D, especially with clever attribute ranking and selections. ...
    Downloads: 51 This Week
    Last Update:
    See Project
  • 3
    AWS Data Wrangler

    AWS Data Wrangler

    Pandas on AWS, easy integration with Athena, Glue, Redshift, etc.

    An AWS Professional Service open-source python initiative that extends the power of Pandas library to AWS connecting DataFrames and AWS data-related services. Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON, and EXCEL). Built on top of other open-source projects like Pandas, Apache Arrow and Boto3, it offers abstracted functions to execute usual ETL tasks like load/unload data from Data Lakes, Data Warehouses, and Databases. ...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 4
    Synthetic Data Generator

    Synthetic Data Generator

    SDG is a specialized framework

    ...It also includes a data processing module capable of handling different data types, preprocessing columns, managing missing values, and converting formats automatically before model training.
    Downloads: 16 This Week
    Last Update:
    See Project
  • Turn traffic into pipeline and prospects into customers Icon
    Turn traffic into pipeline and prospects into customers

    For account executives and sales engineers looking for a solution to manage their insights and sales data

    Docket is an AI-powered sales enablement platform designed to unify go-to-market (GTM) data through its proprietary Sales Knowledge Lake™ and activate it with intelligent AI agents. The platform helps marketing teams increase pipeline generation by 15% by engaging website visitors in human-like conversations and qualifying leads. For sales teams, Docket improves seller efficiency by 33% by providing instant product knowledge, retrieving collateral, and creating personalized documents. Built for GTM teams, Docket integrates with over 100 tools across the revenue tech stack and offers enterprise-grade security with SOC 2 Type II, GDPR, and ISO 27001 compliance. Customers report improved win rates, shorter sales cycles, and dramatically reduced response times. Docket’s scalable, accurate, and fast AI agents deliver reliable answers with confidence scores, empowering teams to close deals faster.
    Learn More
  • 5
    Data Version Control

    Data Version Control

    Git-based data version control for machine learning workflows

    DVC (Data Version Control) is an open source tool designed to bring version control principles to machine learning and data science workflows. It enables developers and data scientists to track datasets, machine learning models, and experiment results in a way that integrates with existing Git repositories. Instead of storing large datasets directly in Git, DVC keeps lightweight metadata in the repository while storing the actual data in external storage systems. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 6
    Cookiecutter Data Science

    Cookiecutter Data Science

    Project structure for doing and sharing data science work

    A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. When we think about data analysis, we often think just about the resulting reports, insights, or visualizations. While these end products are generally the main event, it's easy to focus on making the products look nice and ignore the quality of the code that generates them. Because these end products are created programmatically, code quality is still important! ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 7
    Pythonic Data Structures and Algorithms

    Pythonic Data Structures and Algorithms

    Minimal examples of data structures and algorithms in Python

    The Pythonic Data Structures and Algorithms repository by keon is a hands-on collection of implementations of classical data structures and algorithms written in Python. It offers working, often well-commented code for many standard algorithmic problems — from sorting/searching to graph algorithms, dynamic programming, data structures, and more — making it a valuable resource for learning and reference.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    The Data Engineering Handbook

    The Data Engineering Handbook

    Links to everything you'd ever want to learn about data engineering

    The Data Engineering Handbook is a comprehensive, community-curated repository that aggregates essential learning resources for anyone interested in becoming a professional data engineer. Rather than being a code project itself, it’s a learning handbook that links to books, articles, tutorials, community groups, boot camps, and real-world project examples that collectively form a roadmap to mastering data engineering skills.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    Agentic Data Scientist

    Agentic Data Scientist

    An end-to-end Data Scientist

    Agentic Data Scientist is an experimental AI-driven research framework that orchestrates data science workflows through autonomous agents that can reason, plan, and execute complex analytics tasks. Unlike traditional scripted pipelines, this project lets AI agents break down high-level research goals into sub-tasks such as data acquisition, cleaning, modeling, evaluation, and reporting, with minimal human direction.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Simplify Purchasing For Your Business Icon
    Simplify Purchasing For Your Business

    Manage what you buy and how you buy it with Order.co, so you have control over your time and money spent.

    Simplify every aspect of buying for your business in Order.co. From sourcing products to scaling purchasing across locations to automating your AP and approvals workstreams, Order.co is the platform of choice for growing businesses.
    Learn More
  • 10
    Data Science Articles from CodeCut

    Data Science Articles from CodeCut

    Collection of useful data science topics along with articles

    The Data-science repository from CodeCutTech is a curated collection of educational content focused on practical tools and workflows used in modern data science projects. Instead of providing a single software package, the repository aggregates articles, tutorials, and examples covering many topics within the data science ecosystem. The materials address areas such as MLOps, data management, project organization, testing practices, visualization techniques, and productivity tools used by data scientists. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Data Science Interviews

    Data Science Interviews

    Data science interview questions and answers

    Data Science Interviews is an open-source repository that collects common data science interview questions along with community-provided answers and explanations. The project serves as a preparation resource for students, job seekers, and professionals who want to review the technical knowledge required for data science roles. The repository organizes questions into different categories including theoretical machine learning concepts, technical programming questions, and probability or statistics problems. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Book2_Beauty-of-Data-Visualization

    Book2_Beauty-of-Data-Visualization

    Machine Learning, Criticism and Correction

    Book2_Beauty-of-Data-Visualization is an open educational project that teaches the principles and techniques of effective data visualization using Python and modern plotting libraries. The repository focuses on both the technical and aesthetic aspects of visual analytics, helping learners understand how to communicate data clearly and persuasively. It includes practical examples that demonstrate how different chart types reveal patterns, trends, and distributions in real datasets. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Dash Data Agent

    Dash Data Agent

    Self-learning data agent that grounds its answers in layers of content

    Dash is a self-learning data agent built by the Agno AI community that generates grounded answers to English queries over structured data by synthesizing SQL and reasoning based on six layers of context, improving automatically with each run. It sidesteps common limitations of simple text-to-SQL agents by incorporating multiple context layers — including schema structure, human annotations, known query patterns, institutional knowledge from docs, machine-discovered error patterns, and live runtime context — to generate SQL queries that are both technically correct and semantically meaningful. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Synthetic Data Kit

    Synthetic Data Kit

    Tool for generating high quality Synthetic datasets

    Synthetic Data Kit is a CLI-centric toolkit for generating high-quality synthetic datasets to fine-tune Llama models, with an emphasis on producing reasoning traces and QA pairs that line up with modern instruction-tuning formats. It ships an opinionated, modular workflow that covers ingesting heterogeneous sources (documents, transcripts), prompting models to create labeled examples, and exporting to fine-tuning schemas with minimal glue code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Synthetic Data Vault (SDV)

    Synthetic Data Vault (SDV)

    Synthetic Data Generation for tabular, relational and time series data

    The Synthetic Data Vault (SDV) is a Synthetic Data Generation ecosystem of libraries that allows users to easily learn single-table, multi-table and timeseries datasets to later on generate new Synthetic Data that has the same format and statistical properties as the original dataset. Synthetic data can then be used to supplement, augment and in some cases replace real data when training Machine Learning models.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    AI Data Science Team

    AI Data Science Team

    An AI-powered data science team of agents

    AI Data Science Team is a Python library and agent ecosystem designed to accelerate and automate common data science workflows by modeling them as specialized AI “agents” that can be orchestrated to perform tasks like data cleaning, transformation, analysis, visualization, and machine learning. It provides a modular agent framework where each agent focuses on a step in the typical data science pipeline — for example, loading data from CSV/Excel files, cleaning and wrangling messy datasets, engineering predictive features, building models with AutoML, connecting to SQL databases, and producing visual outputs — all driven by natural language or programmatic instructions. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Yahoo! Finance market data downloader

    Yahoo! Finance market data downloader

    Yahoo! Finance market data downloader

    Ever since Yahoo! finance decommissioned their historical data API, many programs that relied on it to stop working. yfinance aims to solve this problem by offering a reliable, threaded, and Pythonic way to download historical market data from Yahoo! finance. yfinance aimed to offer a temporary fix to the problem by scraping the data from Yahoo! Finance and returning a the data in the same format as pandas_datareader's get_data_yahoo(), thus keeping the code changes in existing software to a minimum. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 18
    cracking-the-data-science-interview

    cracking-the-data-science-interview

    A Collection of Cheatsheets, Books, Questions, and Portfolio

    Cracking the Data Science Interview is an open educational repository that collects study materials, resources, and reference links for preparing for data science interviews. The project organizes content across many fundamental areas of data science, including statistics, probability, SQL, machine learning, and deep learning. It includes cheat sheets that summarize important technical concepts commonly discussed during technical interviews.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    The Grand Complete Data Science Guide

    The Grand Complete Data Science Guide

    Data Science Guide With Videos And Materials

    The Grand Complete Data Science Materials is a repository curated by a data-science educator that aggregates a wide range of learning resources — from basic programming and math foundation to advanced topics in machine learning, deep learning, natural language processing, computer vision, and deployment practices — into a structured, centralized collection aimed at learners seeking a comprehensive path to data science mastery.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    pandas

    pandas

    Fast, flexible and powerful Python data analysis toolkit

    pandas is a Python data analysis library that provides high-performance, user friendly data structures and data analysis tools for the Python programming language. It enables you to carry out entire data analysis workflows in Python without having to switch to a more domain specific language. With pandas, performance, productivity and collaboration in doing data analysis in Python can significantly increase.
    Downloads: 144 This Week
    Last Update:
    See Project
  • 21
    data-diff

    data-diff

    Efficiently diff rows across two different databases

    We're excited to announce the launch of a new open-source product, data-diff that makes comparing datasets across databases fast at any scale. data-diff automates data quality checks for data replication and migration. In modern data platforms, data is constantly moving between systems, and at the modern data volume and complexity, systems go out of sync all the time. Until now, there has not been any tooling to ensure that when the data is correctly copied. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 22
    Blender GIS

    Blender GIS

    Blender addons to make the bridge between Blender and geographic data

    Import in Blender most commons GIS data format, Shapefile vector, raster image, geotiff DEM, OpenStreetMap XML. There are a lot of possibilities to create a 3D terrain from geographic data with BlenderGIS, check the Flowchart to have an overview. Display dynamics web maps inside Blender 3d view, requests for OpenStreetMap data (buildings, roads, etc.), get true elevation data from the NASA SRTM mission.
    Downloads: 117 This Week
    Last Update:
    See Project
  • 23
    Anna’s Archive

    Anna’s Archive

    Comprehensive search engine for books, papers, comics, magazines

    ...It relies heavily on technologies such as Elasticsearch for search functionality and MariaDB for structured data storage, enabling fast and efficient querying across massive datasets. The system is designed with redundancy and replication in mind, allowing distributed deployments and mirrored environments to handle high traffic and large data volumes. It also includes tooling for importing datasets, managing metadata, and maintaining structured archives using custom formats.
    Downloads: 139 This Week
    Last Update:
    See Project
  • 24
    CKAN

    CKAN

    CKAN is an open-source DMS for powering data hubs

    ...Around the globe, government organizations trust CKAN as their data management system of choice. CKAN is a complete out-of-the-box software solution that makes data accessible and usable – by providing tools to streamline publishing, sharing, finding and using data (including storage of data and provision of robust data APIs). CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 25
    Dagster

    Dagster

    An orchestration platform for the development, production

    ...Dagster as a unified control plane: The ‘single plane of glass’ data teams love to use. Rein in the chaos and maintain control over your data as the complexity scales. Centralize your metadata in one tool with built-in observability, diagnostics, cataloging, and lineage. Spot any issues and identify performance improvement opportunities.
    Downloads: 20 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB