Data Quality Tools for Linux

View 20 business solutions
  • The Most Powerful Software Platform for EHSQ and ESG Management Icon
    The Most Powerful Software Platform for EHSQ and ESG Management

    Addresses the needs of small businesses and large global organizations with thousands of users in multiple locations.

    Choose from a complete set of software solutions across EHSQ that address all aspects of top performing Environmental, Health and Safety, and Quality management programs.
    Learn More
  • AestheticsPro Medical Spa Software Icon
    AestheticsPro Medical Spa Software

    Our new software release will dramatically improve your medspa business performance while enhancing the customer experience

    AestheticsPro is the most complete Aesthetics Software on the market today. HIPAA Cloud Compliant with electronic charting, integrated POS, targeted marketing and results driven reporting; AestheticsPro delivers the tools you need to manage your medical spa business. It is our mission To Provide an All-in-One Cutting Edge Software to the Aesthetics Industry.
    Learn More
  • 1
    Feathr

    Feathr

    A scalable, unified data and AI engineering platform for enterprise

    Feathr is a data and AI engineering platform that is widely used in production at LinkedIn for many years and was open sourced in 2022. It is currently a project under LF AI & Data Foundation. Define data and feature transformations based on raw data sources (batch and streaming) using Pythonic APIs. Register transformations by names and get transformed data(features) for various use cases including AI modeling, compliance, go-to-market and more. Share transformations and data(features) across team and company. Feathr is particularly useful in AI modeling where it automatically computes your feature transformations and joins them to your training data, using point-in-time-correct semantics to avoid data leakage, and supports materializing and deploying your features for use online in production.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    FiftyOne

    FiftyOne

    The open-source tool for building high-quality datasets

    The open-source tool for building high-quality datasets and computer vision models. Nothing hinders the success of machine learning systems more than poor-quality data. And without the right tools, improving a model can be time-consuming and inefficient. FiftyOne supercharges your machine learning workflows by enabling you to visualize datasets and interpret models faster and more effectively. Improving data quality and understanding your model’s failure modes are the most impactful ways to boost the performance of your model. FiftyOne provides the building blocks for optimizing your dataset analysis pipeline. Use it to get hands-on with your data, including visualizing complex labels, evaluating your models, exploring scenarios of interest, identifying failure modes, finding annotation mistakes, and much more! Surveys show that machine learning engineers spend over half of their time wrangling data, but it doesn't have to be that way.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    A simple little engine to do fuzzy name & address searching. Helps improve data quality and avoids duplicate data entry.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Desktop application for managing spatial metadata.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Empower Your Workforce and Digitize Your Shop Floor Icon
    Empower Your Workforce and Digitize Your Shop Floor

    Benefits to Manufacturers

    Easily connect to most tools and equipment on the shop floor, enabling efficient data collection and boosting productivity with vital insights. Turn information into action to generate new ideas and better processes.
    Learn More
  • 5

    MOIRAI

    Simple Scientific Workflow System for CAGE Analysis

    Cap analysis of gene expression (CAGE) is a sequencing based technology to capture the 5’ ends of RNAs in a biological sample. After mapping, a CAGE peak on the genome indicates the position of an active transcriptional start site (TSS) and the number of reads correspond to its expression level. CAGE is prominently used in both the FANTOM and ENCODE project. MOIRAI is a compact yet flexible workflow system designed to carry out the main steps in data processing and analysis of CAGE data. MOIRAI has a graphical interface allowing wet-lab researchers to create, modify and run analysis workflows. Embedded within the workflows are graphical quality control indicators allowing users assess data quality and to quickly spot potential problems. MOIRAI package comes with three main workflows allowing users to map, annotate and perform an expression analysis over multiple samples.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Medrechaincode

    Medrechaincode

    Lifetime medical records in decentralized chain code-network

    Medrechaincode uses a blockchain technology to store patient life time medical records in decentralized node with single true version of patient medical records” Medrechaincode provides a consensus based access of patient medical record to different healthcare professionals like R&D labs, doctors, hospital & pharmacist. This platform will also help the healthcare professional to use integrated data quality tool to remove duplicity of patient records , profiling of medical records, metadata discovery , data cleansing , classification of medical records, bucketization, anomaly discovery to find out the anomalous trend of medical records. m This platform will help patient to reduced their consultation time, monetize their medical records, for labs to get the historical records & come up with new medicine and make a clinical trial.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    MentDB Projects

    MentDB Projects

    Generalized Interoperability and Strong AI

    MentDB is an open-source platform driving research into next-generation AI and universal data exchange. Our architecture is built around the revolutionary Mentalese Query Language (MQL). MentDB Weak (Generalized Interoperability): A unified data layer enabling seamless data exchange and application integration (SOA, ETL, Data Quality). We eliminate data silos through a single, generalized data language. MentDB Strong (Strong AI / AGI): The framework for exploring and building Machine Consciousness, free will, and advanced ethical reasoning systems. Based on new-generation AI algorithms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    NBi

    NBi

    NBi is a testing framework (add-on to NUnit)

    NBi is a testing framework (add-on to NUnit) for Business Intelligence. It supports most of the relational databases (SQL server, MySQL, postgreSQL ...) and OLAP platforms (Analysis Services, Mondrian ...) but also ETL and reporting components (Microsoft technologies). The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# code to specify your tests! Either, you don't need Visual Studio to compile your test suite. Just create an Xml file and let the framework interpret it and play your tests. The framework is designed as an add-on of NUnit but with the possibility to port it easily to other testing frameworks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    ODD Platform

    ODD Platform

    First open-source data discovery and observability platform

    Unlock the power of big data with OpenDataDiscovery Platform. Experience seamless end-to-end insights, powered by unprecedented observability and trust - from ingestion to production - while building your ideal tech stack! Democratize data and accelerate insights. Find data that fits your use case and discover hints left by your peers to leverage existing knowledge. Explore tags, ownership details, links to other sources and other information to shorten and simplify data discovery phase. Forget unnerved stakeholders and wasting too much time on digging the root cause of data issues when it fails. With ODD’s automatic company-wide ingestion-to-product lineage you’ll have answers in just seconds and stakeholders won’t need to wait. Sleep well, knowing all your data is in check. Forget manual testing, days of debugging, and weeks of worrying. Know the impact of each code change with automatic testing. Enjoy lineage and alerts powered with data quality information.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Powering the next decade of business messaging | Twilio MessagingX Icon
    Powering the next decade of business messaging | Twilio MessagingX

    For organizations interested programmable APIs built on a scalable business messaging platform

    Build unique experiences across SMS, MMS, Facebook Messenger, and WhatsApp – with our unified messaging APIs.
    Learn More
  • 10
    Open Source Data Quality and Profiling

    Open Source Data Quality and Profiling

    World's first open source data quality & data preparation project

    This project is dedicated to open source data quality and data preparation solutions. Data Quality includes profiling, filtering, governance, similarity check, data enrichment alteration, real time alerting, basket analysis, bubble chart Warehouse validation, single customer view etc. defined by Strategy. This tool is developing high performance integrated data management platform which will seamlessly do Data Integration, Data Profiling, Data Quality, Data Preparation, Dummy Data Creation, Meta Data Discovery, Anomaly Discovery, Data Cleansing, Reporting and Analytic. It also had Hadoop ( Big data ) support to move files to/from Hadoop Grid, Create, Load and Profile Hive Tables. This project is also known as "Aggregate Profiler" Resful API for this project is getting built as (Beta Version) https://sourceforge.net/projects/restful-api-for-osdq/ apache spark based data quality is getting built at https://sourceforge.net/projects/apache-spark-osdq/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    PrepMS is a simple-to-use graphical application for MS data preprocessing, peak detection, and visual data quality assessment. PrepMS is a compiled stand-alone application. Project homepage: http://stat.tamu.edu/~yuliya/prepMS.html
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Qualitis

    Qualitis

    Qualitis is a one-stop data quality management platform

    Qualitis is a data quality management platform that supports quality verification, notification, and management for various datasource. It is used to solve various data quality problems caused by data processing. Based on Spring Boot, Qualitis submits quality model task to Linkis platform. It provides functions such as data quality model construction, data quality model execution, data quality verification, reports of data quality generation and so on. At the same time, Qualitis provides enterprise-level features of financial-level resource isolation, management and access control. It is also guaranteed working well under high-concurrency, high-performance and high-availability scenarios.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Restful APIs for Data Cleansing

    Restful APIs for Data Cleansing

    This is sister project for osDQ which provide Restful APIs

    (Beta Version) This is sister project for https://sourceforge.net/projects/dataquality/ . It provides Restful APIs for features for data quality and data preparation features. This project will help projects which want embed data quality and data preparation features in their project or UI using restful calls. Data Cleansing APIs Dockerfile: # Pull base image FROM frnde/jetty-9.4.2-jre8-alpine-cet ADD osdq-v0.0.1.war /var/lib/jetty/webapps/osdq.war EXPOSE 8080 Docker Image https://hub.docker.com/r/vreddym/osdq-web/tags
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    SQLBucket

    SQLBucket

    Lightweight library to write, orchestrate and test your SQL ETL

    SQLBucket is a lightweight framework to help write, orchestrate and validate SQL data pipelines. It gives the possibility to set variables and introduces some control flow using the fantastic Jinja2 library. It also implements a very simplistic unit and integration test framework where you can validate the results of your ETL in the form of SQL checks. With SQLBucket, you can apply TDD principles when writing data pipelines. To start working, you need to instantiate your SQLBucket core object with the project_folder parameter. That folder will contain all your SQL ETL. The python file where you create your SQLBucket object is also a good place to instantiate your command line interface.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Toolsverse ETL Framework

    Toolsverse ETL Framework

    Open source Extract Transform Load engine written in Java

    ETL Framework is a standalone Extract Transform Load engine written in Java. It includes executables for all major platforms and can be easily integrated into other applications. Key Features: * embeddable, open source and free * fast and scalable * uses target database features to do transformations and loads * manual and automatic data mapping * data streaming * bulk data loads * data quality features using SQL, JavaScript? and regex * data transformations Requirements * Java 1.6 and up * At least 4 MB of RAM New in 3.2 (01/18/2013) * Improved auto-update functionality * Bug fixes
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    WhyLogs Java Library

    WhyLogs Java Library

    Profile and monitor your ML data pipeline end-to-end

    This is a Java implementation of WhyLogs, with support for Apache Spark integration for large scale datasets. Understanding the properties of data as it moves through applications is essential to keeping your ML/AI pipeline stable and improving your user experience, whether your pipeline is built for production or experimentation. WhyLogs is an open source statistical logging library that allows data science and ML teams to effortlessly profile ML/AI pipelines and applications, producing log files that can be used for monitoring, alerts, analytics, and error analysis. WhyLogs calculates approximate statistics for datasets of any size up to TB-scale, making it easy for users to identify changes in the statistical properties of a model's inputs or outputs. Using approximate statistics allows the package to run on minimal infrastructure and monitor an entire dataset, rather than miss outliers and other anomalies by only using a sample of the data to calculate statistics.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    apache spark data pipeline osDQ

    apache spark data pipeline osDQ

    osDQ dedicated to create apache spark based data pipeline using JSON

    This is an offshoot project of open source data quality (osDQ) project https://sourceforge.net/projects/dataquality/ This sub project will create apache spark based data pipeline where JSON based metadata (file) will be used to run data processing , data pipeline , data quality and data preparation and data modeling features for big data. This uses java API of apache spark. It can run in local mode also. Get json example at https://github.com/arrahtech/osdq-spark How to run Unzip the zip file Windows : java -cp .\lib\*;osdq-spark-0.0.1.jar org.arrah.framework.spark.run.TransformRunner -c .\example\samplerun.json Mac UNIX java -cp ./lib/*:./osdq-spark-0.0.1.jar org.arrah.framework.spark.run.TransformRunner -c ./example/samplerun.json For those on windows, you need to have hadoop distribtion unzipped on local drive and HADOOP_HOME set. Also copy winutils.exe from here into HADOOP_HOME\bin
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    dbt-re-data

    dbt-re-data

    re_data - fix data issues before your users & CEO would discover them

    re_data is an open-source data reliability framework for the modern data stack. Currently, re_data focuses on observing the dbt project (together with underlaying data warehouse - Postgres, BigQuery, Snowflake, Redshift). Data transformations in re_data are implemented and exposed as models & macros in this dbt package. Gather all relevant outputs about your data in one place using our cloud. Invite your team and debug it easily from there. Go back in time, and see your past metadata. Set up Slack notifications to always know when a new report is produced or an existing one got updated.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    DNAA is the DNA analysis package, for analyzing next-generation post-alignment whole genome resequencing data. Specifically, DNAA is able to find structural variation, SNP and indel variants, as well as evaluating the mapping and data quality.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    gravitino

    gravitino

    Unified metadata lake for data & AI assets.

    Apache Gravitino is a high-performance, geo-distributed, and federated metadata lake. It manages metadata directly in different sources, types, and regions, providing users with unified metadata access for data and AI assets.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    lakeFS

    lakeFS

    lakeFS - Git-like capabilities for your object storage

    Increase data quality and reduce the painful cost of errors. Data engineering best practices using git-like operations on data. lakeFS is an open-source data version control for data lakes. It enables zero-copy Dev / Test isolated environments, continuous quality validation, atomic rollback on bad data, reproducibility, and more. Data is dynamic, it changes over time. Dealing with that without a data version control system is error-prone and labor-intensive. With lakeFS, your data lake is version controlled and you can easily time-travel between consistent snapshots of the lake. Easier ETL testing - test your ETLs on top of production data, in isolation, without copying anything. Safely experiment and test on full production data. Easily Collaborate on production data with your team. Automate data quality checks within data pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22

    methylQA

    methylation sequence data quality assessment tool

    methylation sequence data quality assessment tool
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    re_data

    re_data

    re_data - fix data issues before your users & CEO would discover them

    re_data is an open-source data reliability framework for the modern data stack. Currently, re_data focuses on observing the dbt project (together with underlying data warehouse - Postgres, BigQuery, Snowflake, Redshift). Gather all relevant outputs about your data in one place using our cloud. Invite your team and debug it easily from there. Go back in time, and see your past metadata. Set up Slack notifications to always know when a new report is produced or an existing one got updated.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    SolexaQA is a software to calculate quality statistics and visual representations of data quality for second-generation sequencing data.
    Downloads: 0 This Week
    Last Update:
    See Project