Skip to content

robudnik/data-quality-profiler-and-rules-engine

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Quality Profiler and Rules Engine

Provides the following:

  • Data Profilers for large volume data profiling in Spark
  • Assertion rule definitions and checking
  • Reference data loading and joining
  • Excel and CSV reference data parsing
  • JSON output enriched with data quality markers/profilers
  • Metrics and summary dataframe output
  • Dimensional tagging of profiler outputs (additional identifiers)
  • JSON flattener
  • JSON and CSV loader, extensible to other formats
  • Custom key pre-processor and custom parquet row reader functionality
  • Comprehensive built-in assertion rules modules, extensible
  • Built-in set of field-level profile masks
  • Compound assertion rule definition (i.e. a set of sub-rules must all pass)
  • Human-readable Data Quality and Assertion Rule Compliance report output

Repository Layout

Licence

Licensed under the MIT License. See LICENSE

About

Data Quality Profiler and Rules Engine

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Scala 98.0%
  • Mustache 2.0%