-
Notifications
You must be signed in to change notification settings - Fork 3
/
README.Rmd
150 lines (113 loc) · 3.96 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
library(validatetools)
```
<!-- badges: start -->
[![R-CMD-check](https://github.com/data-cleaning/validatetools/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/data-cleaning/validatetools/actions/workflows/R-CMD-check.yaml)
[![CRAN status](https://www.r-pkg.org/badges/version/validatetools)](https://CRAN.R-project.org/package=validatetools)
[![codecov](https://codecov.io/github/data-cleaning/validatetools/graph/badge.svg?token=3tIe5HAUWm)](https://codecov.io/github/data-cleaning/validatetools)
[![Mentioned in Awesome Official Statistics](https://awesome.re/mentioned-badge.svg)](http://www.awesomeofficialstatistics.org)
<!-- badges: end -->
# validatetools
`validatetools` is a utility package for managing validation rule sets that are defined with `validate`.
In production systems validation rule sets tend to grow organically and accumulate redundant or (partially)
contradictory rules. `validatetools` helps to identify problems with large rule sets and includes simplification
methods for resolving issues.
## Installation
`validatetools` is available from CRAN and can be installed with
```r
install.packages("validatetools")
```
The latest beta version of `validatetools` can be installed with
``` r
install.packages("validatetools", repos = "https://data-cleaning.github.io/drat")
```
The adventurous can install an (unstable) development version of `validatetools` from github with:
``` r
# install.packages("devtools")
devtools::install_github("data-cleaning/validatetools")
```
## Example
### Check for feasibility
```{r}
rules <- validator( x > 0)
is_infeasible(rules)
rules <- validator( rule1 = x > 0
, rule2 = x < 0
)
is_infeasible(rules)
detect_infeasible_rules(rules)
make_feasible(rules)
# find out the conflict with this rule
is_contradicted_by(rules, "rule1")
```
## Simplifying
The function `simplify_rules` combines most simplification methods of `validatetools` to simplify a rule set.
For example, it reduces the following rule set to a simpler form:
```{r}
rules <- validator( if (age < 16) income == 0
, job %in% c("yes", "no")
, if (job == "yes") income > 0
)
simplify_rules(rules, age = 13)
#or
simplify_rules(rules, job = "yes")
```
`simplify_rules` combines the following simplification and substitution methods:
### Value substitution
```{r}
rules <- validator( rule1 = height > 5
, rule2 = max_height >= height
, rule3 = if (gender == "male") weight > 100
, rule4 = gender %in% c("male", "female")
)
substitute_values(rules, height = 6, gender = "male")
```
### Finding fixed values
```{r}
rules <- validator( x >= 0, x <=0)
detect_fixed_variables(rules)
simplify_fixed_variables(rules)
rules <- validator( rule1 = x1 + x2 + x3 == 0
, rule2 = x1 + x2 >= 0
, rule3 = x3 >=0
)
simplify_fixed_variables(rules)
```
### Simplifying conditional statements
```{r}
# non-relaxing clause
rules <- validator( r1 = if (income > 0) age >= 16
, r2 = age < 12
)
# age > 16 is always FALSE so r1 can be simplified
simplify_conditional(rules)
# non-constraining clause
rules <- validator( if (age < 16) income == 0
, if (age >=16) income >= 0
)
simplify_conditional(rules)
```
### Removing redundant rules
```{r}
rules <- validator( rule1 = age > 12
, rule2 = age > 18
)
# rule1 is superfluous
remove_redundancy(rules)
rules <- validator( rule1 = age > 12
, rule2 = age > 12
)
# standout: rule1 and rule2, first rule wins
remove_redundancy(rules)
# Note that detection signifies both rules!
detect_redundancy(rules)
```