added examples to exclude for general fixers hook

capitalone · Jul 14, 2023 · b6cc55c · b6cc55c
1 parent c7e0bc7
commit b6cc55c
Show file tree

Hide file tree

Showing 4 changed files with 21 additions and 11 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -32,3 +32,13 @@ repos:
     hooks:
       - id: check-manifest
         additional_dependencies: ['scikit-learn', 'dataprofiler', 'numpy','scipy']
+  # General fixers: format files for white spaces and trailing new lines, warn on debug statements
+  # https://github.com/pre-commit/pre-commit-hooks#hooks-available
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.0.1
+    hooks:
+      - id: trailing-whitespace
+        exclude: (^tests/data/|^examples/sample_datasets)
+      - id: debug-statements
+      - id: end-of-file-fixer
+        exclude: (^tests/data/|^examples/sample_datasets)
diff --git a/README.md b/README.md
@@ -7,10 +7,10 @@ Ideally, the method would provide a concise specification to generate tabular da
 
 Copulas are a model for specifying
 the joint probability p(x1, x2, ..., xn) given a correlation structure along
-with specifications for the marginal distribution of each feature. The current implementation uses a multivariate normal distribution with specified covariance matrix.  Future work can expand this choice to other multivariate distributions. 
+with specifications for the marginal distribution of each feature. The current implementation uses a multivariate normal distribution with specified covariance matrix.  Future work can expand this choice to other multivariate distributions.
 
 
-### Parameters  
+### Parameters
 | name          | type       | default        | description                                                                                                                      |
 | ------------- | ---------- | -------------- | -------------------------------------------------------------------------------------------------------------------------------- |
 | n_samples     | int        | (default=100)  | The number of samples.                                                                                                           |
@@ -59,7 +59,7 @@ pre-commit run
 ```
 
 ### Referencing this library
-If you use this library in your work, please cite our paper:  
+If you use this library in your work, please cite our paper:
 ```
 @inproceedings{barr:2020,
   author    = {Brian Barr and Ke Xu and Claudio Silva and Enrico Bertini and Robert Reilly and  C. Bayan Bruss and Jason D. Wittenbach},
@@ -69,11 +69,11 @@ If you use this library in your work, please cite our paper:
   booktitle = {2020 ICML Workshop on Human Interpretability in Machine Learning (WHI 2020)},
   date = {2020-07-17},
   pages = {362-367},
-}                             
+}
 ```
 
 ### Notes
-If you have tabular data, and want to fit a copula from it, consider this python library:  [copulas](https://sdv-dev.github.io/Copulas/index.html)  
+If you have tabular data, and want to fit a copula from it, consider this python library:  [copulas](https://sdv-dev.github.io/Copulas/index.html)
 Quick [visual tutorial](https://twiecki.io/blog/2018/05/03/copulas/) of copulas and probability integral transform.
 
 To run the examples, you should run:

diff --git a/Roadmap.md b/Roadmap.md
@@ -7,14 +7,14 @@ Inputs:
 - [ ] redundant (correlated and dependent - say by a linear combo of informative features)
 
 - [ ] separation between classes (can we filter +/- k% on either side of p_thresh to create separation?)
-- [ ] overlap - since we have ground truth probabilities, we could sample from a binomial distribution with probability of (py|x) to determine labels - this would work in conjuction with sig_k which controls the steepness of the sigmoid  
-- [ ] noise level - apply *during* generation of regression values/labels  
+- [ ] overlap - since we have ground truth probabilities, we could sample from a binomial distribution with probability of (py|x) to determine labels - this would work in conjuction with sig_k which controls the steepness of the sigmoid
+- [ ] noise level - apply *during* generation of regression values/labels
   - [ ] sample coefficients of symbolic expression from std normal distribution
-- [ ] outlier generation 
+- [ ] outlier generation
 - [ ] create fake PII with [pydbgen](https://github.com/tirthajyoti/pydbgen)  (stretch, *new*)
 
 Output:
-- [ ] mapping from y_reg value to y_class 
+- [ ] mapping from y_reg value to y_class
     - [ ] partition and label - e.g. `y_class = y_reg < np.median(y_reg)`
     - [ ] Gompertz curve (a parameterized sigmoid - would give control over uncertainty?
   -[ ] noise (e.g. `flip_y`)

diff --git a/tests/.coveragerc b/tests/.coveragerc
@@ -2,12 +2,12 @@
 omit = tests/* .venv
 
 [report]
-exclude_lines = 
+exclude_lines =
     pragma: no cover
     def __repr__
     if self.debug:
     if settings.DEBUG:
     raise AssertionError
     raise NotImplementedError
     if 0:
-    if __name__ == .__main__.:
+    if __name__ == .__main__.: