Showing 640 open source projects for "data"

View related business solutions
  • Ango Hub | All-in-one data labeling platform Icon
    Ango Hub | All-in-one data labeling platform

    For AI teams and Computer Vision team in organizations of all size

    AI-Assisted features of the Ango Hub will automate your AI data workflows to improve data labeling efficiency and model RLHF, all while allowing domain experts to focus on providing high-quality data.
    Learn More
  • Enterprise-Class Managed File Transfer. Icon
    Enterprise-Class Managed File Transfer.

    For organizations that need to automate secure file transfers to protect sensitive data.

    Diplomat MFT by Coviant Software is a secure, reliable managed file transfer solution designed to simplify and automate SFTP, FTPS, and HTTPS file transfers. Built for seamless integration, Diplomat MFT works across major cloud storage platforms, including AWS S3, Azure Blob, Google Cloud, Oracle Cloud, SharePoint, Dropbox, Box, and more.
    Learn More
  • 1
    Pythonic Data Structures and Algorithms

    Pythonic Data Structures and Algorithms

    Minimal examples of data structures and algorithms in Python

    The Pythonic Data Structures and Algorithms repository by keon is a hands-on collection of implementations of classical data structures and algorithms written in Python. It offers working, often well-commented code for many standard algorithmic problems — from sorting/searching to graph algorithms, dynamic programming, data structures, and more — making it a valuable resource for learning and reference.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    The Data Engineering Handbook

    The Data Engineering Handbook

    Links to everything you'd ever want to learn about data engineering

    The Data Engineering Handbook is a comprehensive, community-curated repository that aggregates essential learning resources for anyone interested in becoming a professional data engineer. Rather than being a code project itself, it’s a learning handbook that links to books, articles, tutorials, community groups, boot camps, and real-world project examples that collectively form a roadmap to mastering data engineering skills.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    Synthetic Data Kit

    Synthetic Data Kit

    Tool for generating high quality Synthetic datasets

    Synthetic Data Kit is a CLI-centric toolkit for generating high-quality synthetic datasets to fine-tune Llama models, with an emphasis on producing reasoning traces and QA pairs that line up with modern instruction-tuning formats. It ships an opinionated, modular workflow that covers ingesting heterogeneous sources (documents, transcripts), prompting models to create labeled examples, and exporting to fine-tuning schemas with minimal glue code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Synthetic Data Vault (SDV)

    Synthetic Data Vault (SDV)

    Synthetic Data Generation for tabular, relational and time series data

    The Synthetic Data Vault (SDV) is a Synthetic Data Generation ecosystem of libraries that allows users to easily learn single-table, multi-table and timeseries datasets to later on generate new Synthetic Data that has the same format and statistical properties as the original dataset. Synthetic data can then be used to supplement, augment and in some cases replace real data when training Machine Learning models.
    Downloads: 5 This Week
    Last Update:
    See Project
  • Professional Streaming and Video Hosting - GDPR Compliant - 3Q Icon
    Professional Streaming and Video Hosting - GDPR Compliant - 3Q

    Secure hosting, scalable streaming, and easy integration for internal and external communications

    3Q offers a multifunctional video platform for hosting, managing and distributing video and audio content on all channels. Live and on-demand.
    Learn More
  • 5
    Dagster

    Dagster

    An orchestration platform for the development, production

    ...Dagster as a unified control plane: The ‘single plane of glass’ data teams love to use. Rein in the chaos and maintain control over your data as the complexity scales. Centralize your metadata in one tool with built-in observability, diagnostics, cataloging, and lineage. Spot any issues and identify performance improvement opportunities.
    Downloads: 20 This Week
    Last Update:
    See Project
  • 6
    Gretel Synthetics

    Gretel Synthetics

    Synthetic data generators for structured and unstructured text

    Unlock unlimited possibilities with synthetic data. Share, create, and augment data with cutting-edge generative AI. Generate unlimited data in minutes with synthetic data delivered as-a-service. Synthesize data that are as good or better than your original dataset, and maintain relationships and statistical insights. Customize privacy settings so that data is always safe while remaining useful for downstream workflows.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 7
    Positron

    Positron

    Positron, a next-generation data science IDE

    Positron is a next-generation integrated development environment (IDE) created by Posit PBC (formerly RStudio Inc) specifically tailored for data science workflows in Python, R, and multi-language ecosystems. It aims to unify exploratory data analysis, production code, and data-app authoring in a single environment so that data scientists move from “question → insight → application” without switching tools. Built on the open-source Code-OSS foundation, Positron provides a familiar coding experience along with specialized panes and tooling for variable inspection, data-frame viewing, plotting previews, and interactive consoles designed for analytical work. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 8
    CTGAN

    CTGAN

    Conditional GAN for generating synthetic tabular data

    CTGAN is a collection of Deep Learning based synthetic data generators for single table data, which are able to learn from real data and generate synthetic data with high fidelity. If you're just getting started with synthetic data, we recommend installing the SDV library which provides user-friendly APIs for accessing CTGAN. The SDV library provides wrappers for preprocessing your data as well as additional usability features like constraints. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 9
    Cython

    Cython

    The most widely used Python to C compiler

    ...Easily tune readable Python code into plain C performance by adding static type declarations, also in Python syntax. Use combined source code level debugging to find bugs in your Python, Cython, and C code. Interact efficiently with large data sets, e.g. using multi-dimensional NumPy arrays. Quickly build your applications within the large, mature, and widely used CPython ecosystem. Integrate natively with existing code and data from legacy, low-level or high-performance libraries and applications. The Cython language is a superset of the Python language that additionally supports calling C functions and declaring C types on variables and class attributes.
    Downloads: 141 This Week
    Last Update:
    See Project
  • Information Security Made Simple and Affordable | Carbide Icon
    Information Security Made Simple and Affordable | Carbide

    For companies requiring a solution to scale their business without incurring security debt

    Get expert guidance and smart tools to launch or level up your security and compliance efforts without the complexity.
    Learn More
  • 10
    Kedro

    Kedro

    A Python framework for creating reproducible, maintainable code

    Kedro is an open sourced Python framework for creating maintainable and modular data science code. Provides the scaffolding to build more complex data and machine-learning pipelines. In addition, there's a focus on spending less time on the tedious "plumbing" required to maintain data science code; this means that you have more time to solve new problems. Standardises team workflows; the modular structure of Kedro facilitates a higher level of collaboration when teams solve problems together. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 11
    SDGym

    SDGym

    Benchmarking synthetic data generation methods

    The Synthetic Data Gym (SDGym) is a benchmarking framework for modeling and generating synthetic data. Measure performance and memory usage across different synthetic data modeling techniques – classical statistics, deep learning and more! The SDGym library integrates with the Synthetic Data Vault ecosystem. You can use any of its synthesizers, datasets or metrics for benchmarking.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 12
    Union Pandera

    Union Pandera

    Light-weight, flexible, expressive statistical data testing library

    The open-source framework for precision data testing for data scientists and ML engineers. Pandera provides a simple, flexible, and extensible data-testing framework for validating not only your data but also the functions that produce them. A simple, zero-configuration data testing framework for data scientists and ML engineers seeking correctness. Access a comprehensive suite of built-in tests, or easily create your own validation rules for your specific use cases. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    Copulas

    Copulas

    A library to model multivariate data using copulas

    Copulas is a Python library for modeling multivariate distributions and sampling from them using copula functions. Given a table of numerical data, use Copulas to learn the distribution and generate new synthetic data following the same statistical properties. Choose from a variety of univariate distributions and copulas – including Archimedian Copulas, Gaussian Copulas and Vine Copulas. Compare real and synthetic data visually after building your model. Visualizations are available as 1D histograms, 2D scatterplots and 3D scatterplots. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 14
    spyder

    spyder

    The scientific Python development environment

    Spyder is a free and open source scientific environment written in Python, for Python, and designed by and for scientists, engineers and data analysts. It features a unique combination of the advanced editing, analysis, debugging, and profiling functionality of a comprehensive development tool with the data exploration, interactive execution, deep inspection, and beautiful visualization capabilities of a scientific package. Spyder’s multi-language Editor integrates a number of powerful tools right out of the box for an easy to use, efficient editing experience. ...
    Downloads: 207 This Week
    Last Update:
    See Project
  • 15
    Matplotlib

    Matplotlib

    matplotlib: plotting with Python

    Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Matplotlib makes easy things easy and hard things possible. Matplotlib ships with several add-on toolkits, including 3D plotting with mplot3d, axes helpers in axes_grid1 and axis helpers in axisartist. A large number of third party packages extend and build on Matplotlib functionality, including several higher-level plotting interfaces (seaborn, HoloViews, ggplot, ...), and a...
    Downloads: 29 This Week
    Last Update:
    See Project
  • 16
    Streamlit

    Streamlit

    The fastest way to build data apps in Python

    A faster way to build and share data apps. Streamlit turns data scripts into shareable web apps in minutes. All in pure Python. No front‑end experience is required. Build an app in a few lines of code with our magically simple API. Then see it automatically update as you iteratively save the source file. Adding a widget is the same as declaring a variable. No need to write a backend, define routes, handle HTTP requests, connect a frontend, write HTML, CSS, JavaScript, etc. ...
    Downloads: 46 This Week
    Last Update:
    See Project
  • 17
    Superduper

    Superduper

    Superduper: Integrate AI models and machine learning workflows

    ...This allows developers to completely avoid implementing MLOps, ETL pipelines, model deployment, data migration, and synchronization. Using Superduper is simply "CAPE": Connect to your data, apply arbitrary AI to that data, package and reuse the application on arbitrary data, and execute AI-database queries and predictions on the resulting AI outputs and data.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 18
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    ...It can be used for data mining, monitoring and automated testing.
    Downloads: 28 This Week
    Last Update:
    See Project
  • 19
    AIOHTTP

    AIOHTTP

    Asynchronous HTTP client/server framework for asyncio and Python

    ...Now it is possible by registering special signal handlers on every request processing stage. The main change is dropping yield from support and using async/await everywhere. Farewell, Python 3.4. You often want to send some sort of data in the URL’s query string. If you were constructing the URL by hand, this data would be given as key/value pairs in the URL after a question mark, e.g. httpbin.org/get?key=val. Requests allows you to provide these arguments as a dict, using the params keyword argument. aiohttp internally performs URL canonicalization before sending request.
    Downloads: 183 This Week
    Last Update:
    See Project
  • 20
    lxml

    lxml

    The lxml XML toolkit for Python

    A Python library for efficient XML and HTML processing, known for speed and compatibility. The lxml XML toolkit is a Pythonic binding for the C libraries libxml2 and libxslt. It is unique in that it combines the speed and XML feature completeness of these libraries with the simplicity of a native Python API, mostly compatible but superior to the well-known ElementTree API. The latest release works with all CPython versions from 3.6 to 3.12. See the introduction for more information about the...
    Downloads: 35 This Week
    Last Update:
    See Project
  • 21
    django-import-export

    django-import-export

    Django application and library for importing and exporting data

    ...Also, the report_skipped option controls whether skipped records appear in the import Result object, and if using the admin whether skipped records will show in the import preview page. Not all data can be easily extracted from an object/model attribute. In order to turn complicated data model into a (generally simpler) processed data structure on export, dehydrate_<fieldname> method should be defined.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 22
    Pydantic-Core

    Pydantic-Core

    Core validation logic for pydantic written in rust

    pydantic-core is the Rust-based core validation logic for Pydantic, a widely used data validation library in Python. It offers significant performance improvements over its predecessor, enabling faster and more efficient data parsing and validation.​
    Downloads: 10 This Week
    Last Update:
    See Project
  • 23
    PyInstaller

    PyInstaller

    Freeze (package) Python programs into stand-alone executables

    ...You’ll never be required to look for tricks in wikis and apply custom modification to your files or your setup scripts. As an example, libraries like PyQt, Django or matplotlib are fully supported, without having to handle plugins or external data files manually.
    Downloads: 155 This Week
    Last Update:
    See Project
  • 24
    Deepchecks

    Deepchecks

    Test Suites for validating ML models & data

    Deepchecks is the leading tool for testing and for validating your machine learning models and data, and it enables doing so with minimal effort. Deepchecks accompany you through various validation and testing needs such as verifying your data’s integrity, inspecting its distributions, validating data splits, evaluating your model and comparing between different models. While you’re in the research phase, and want to validate your data, find potential methodological problems, and/or validate your model and evaluate it. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 25
    pydantic

    pydantic

    Data parsing and validation using Python type hints

    Data validation and settings management using Python type hinting. Fast and extensible, pydantic plays nicely with your linters/IDE/brain. Define how data should be in pure, canonical Python 3.6+; validate it with pydantic. id is of type int; the annotation-only declaration tells pydantic that this field is required. Strings, bytes or floats will be coerced to ints if possible; otherwise an exception will be raised. name is inferred as a string from the provided default; because it has a default, it is not required. signup_ts is a datetime field which is not required (and takes the value None if it's not supplied). pydantic will process either a unix timestamp int (e.g. 1496498400) or a string representing the date & time. friends uses python's typing system, and requires a list of integers. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB