Skip to content

Commit

Permalink
added examples to exclude for general fixers hook
Browse files Browse the repository at this point in the history
  • Loading branch information
drahc1R committed Jul 14, 2023
1 parent c7e0bc7 commit b6cc55c
Show file tree
Hide file tree
Showing 4 changed files with 21 additions and 11 deletions.
10 changes: 10 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,13 @@ repos:
hooks:
- id: check-manifest
additional_dependencies: ['scikit-learn', 'dataprofiler', 'numpy','scipy']
# General fixers: format files for white spaces and trailing new lines, warn on debug statements
# https://github.com/pre-commit/pre-commit-hooks#hooks-available
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.0.1
hooks:
- id: trailing-whitespace
exclude: (^tests/data/|^examples/sample_datasets)
- id: debug-statements
- id: end-of-file-fixer
exclude: (^tests/data/|^examples/sample_datasets)
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ Ideally, the method would provide a concise specification to generate tabular da

Copulas are a model for specifying
the joint probability p(x1, x2, ..., xn) given a correlation structure along
with specifications for the marginal distribution of each feature. The current implementation uses a multivariate normal distribution with specified covariance matrix. Future work can expand this choice to other multivariate distributions.
with specifications for the marginal distribution of each feature. The current implementation uses a multivariate normal distribution with specified covariance matrix. Future work can expand this choice to other multivariate distributions.


### Parameters
### Parameters
| name | type | default | description |
| ------------- | ---------- | -------------- | -------------------------------------------------------------------------------------------------------------------------------- |
| n_samples | int | (default=100) | The number of samples. |
Expand Down Expand Up @@ -59,7 +59,7 @@ pre-commit run
```

### Referencing this library
If you use this library in your work, please cite our paper:
If you use this library in your work, please cite our paper:
```
@inproceedings{barr:2020,
author = {Brian Barr and Ke Xu and Claudio Silva and Enrico Bertini and Robert Reilly and C. Bayan Bruss and Jason D. Wittenbach},
Expand All @@ -69,11 +69,11 @@ If you use this library in your work, please cite our paper:
booktitle = {2020 ICML Workshop on Human Interpretability in Machine Learning (WHI 2020)},
date = {2020-07-17},
pages = {362-367},
}
}
```

### Notes
If you have tabular data, and want to fit a copula from it, consider this python library: [copulas](https://sdv-dev.github.io/Copulas/index.html)
If you have tabular data, and want to fit a copula from it, consider this python library: [copulas](https://sdv-dev.github.io/Copulas/index.html)
Quick [visual tutorial](https://twiecki.io/blog/2018/05/03/copulas/) of copulas and probability integral transform.

To run the examples, you should run:
Expand Down
8 changes: 4 additions & 4 deletions Roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,14 @@ Inputs:
- [ ] redundant (correlated and dependent - say by a linear combo of informative features)

- [ ] separation between classes (can we filter +/- k% on either side of p_thresh to create separation?)
- [ ] overlap - since we have ground truth probabilities, we could sample from a binomial distribution with probability of (py|x) to determine labels - this would work in conjuction with sig_k which controls the steepness of the sigmoid
- [ ] noise level - apply *during* generation of regression values/labels
- [ ] overlap - since we have ground truth probabilities, we could sample from a binomial distribution with probability of (py|x) to determine labels - this would work in conjuction with sig_k which controls the steepness of the sigmoid
- [ ] noise level - apply *during* generation of regression values/labels
- [ ] sample coefficients of symbolic expression from std normal distribution
- [ ] outlier generation
- [ ] outlier generation
- [ ] create fake PII with [pydbgen](https://github.com/tirthajyoti/pydbgen) (stretch, *new*)

Output:
- [ ] mapping from y_reg value to y_class
- [ ] mapping from y_reg value to y_class
- [ ] partition and label - e.g. `y_class = y_reg < np.median(y_reg)`
- [ ] Gompertz curve (a parameterized sigmoid - would give control over uncertainty?
-[ ] noise (e.g. `flip_y`)
Expand Down
4 changes: 2 additions & 2 deletions tests/.coveragerc
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,12 @@
omit = tests/* .venv

[report]
exclude_lines =
exclude_lines =
pragma: no cover
def __repr__
if self.debug:
if settings.DEBUG:
raise AssertionError
raise NotImplementedError
if 0:
if __name__ == .__main__.:
if __name__ == .__main__.:

0 comments on commit b6cc55c

Please sign in to comment.