awesome-data-quality

A curated list of awesome tools for testing and monitoring data quality - typically at the data warehouse/lake or within running data pipelines.

If you want to contribute to this list (please do), send me a pull request or contact me.

elementary - Data monitoring and observability tailored to dbt.
mobydq - tool for data engineering teams to run & automate data quality checks on their data pipeline.
ydata-quality - python library for assessing data quality throughout stages of the data pipeline development.
great-expectations - tool for data testing, documentation, and profiling.
deepqu - libray by Amazon for defining unit tests for data with focus on large datasets. Based on Apache Spark.
soda - enables data testing through extended SQL queries.
dqm - another data quality monitoring tool implemented using Spark.
owl-sanitizer - yet another Spark based lightweight data validation framework.
griffin - Data Quality solution for distributed data systems at any scale in both streaming and batch data context.
drunken-data-quality
DataQuality for BigData
TopNotch
Phasor Data Quality Tracker
DataCleaner
data-quality

Geared for ML

deepchecks - tool for validating your machine learning models and data. Implemented test suites tailored towards ML models datasets and outputs.
evidently - analyze and track data and ML model output quality.

Pipelines with data quality included

dbt, dataform - ELT tools that comes with a handy utility to define tests as SQL queries.

Paid

Offering ranges from data to pipelines testing, with focus on real-time monitoring, automation of tests creation & threshold setting, and addditional enterprise features.

Bigeye
Soda
Databand
Monte Carlo
great expectations
Sifflet
Validio
Lightup
Lantern
Metaplane
Datafold
Acceldata
Anomalo
Marquez

TODOs

Add tools for unstructured data (Arthur, Robust)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

awesome-data-quality

Table of Contents

Frameworks and Libraries

Open sourced

Geared for ML

Pipelines with data quality included

Paid

Files

README.md

Latest commit

History

README.md

File metadata and controls

awesome-data-quality

Table of Contents

Frameworks and Libraries

Open sourced

Geared for ML

Pipelines with data quality included

Paid