Skip to content

Initial Open Source release of Data Quality Profiler and Rules Engine

Latest
Compare
Choose a tag to compare
@danielsmith-eu danielsmith-eu released this 26 Jul 16:22
13664fc

Provides the following:

  • Data Profilers for large volume data profiling in Spark
  • Assertion rule definitions and checking
  • Reference data loading and joining
  • Excel and CSV reference data parsing
  • JSON output enriched with data quality markers/profilers
  • Metrics and summary dataframe output
  • Dimensional tagging of profiler outputs (additional identifiers)
  • JSON flattener
  • JSON and CSV loader, extensible to other formats
  • Custom key pre-processor and custom parquet row reader functionality
  • Comprehensive built-in assertion rules modules, extensible
  • Built-in set of field-level profile masks
  • Compound assertion rule definition (i.e. a set of sub-rules must all pass)
  • Human-readable Data Quality and Assertion Rule Compliance report output