SynRD Package

A Differentially Private (DP) Synthetic Data benchmarking package, posing the question: "Can a DP Synthesizer produce private (tabular) data that preserves scientific findings?" In other words, do DP Synthesizers satisfy Epistemic Parity?

Citation: Rosenblatt, L., Holovenko, A., Rumezhak, T., Stadnik, A., Herman, B., Stoyanovich, J., & Howe, B. (2022). Epistemic Parity: Reproducibility as an Evaluation Metric for Differential Privacy. arXiv preprint arXiv:2208.12700. (under review)

Installing the benchmark

The benchmark is currently in beta-0.1. Still, you can install the development version by running the following commands:

Create your preferred package management environment with python=3.7 (for example, conda create -n "synrd" python=3.7)
git clone https://github.com/DataResponsibly/SynRD.git
cd SynRD
pip install git+https://github.com/ryan112358/private-pgm.git
pip install .

Step (4) installs a non-PyPi dependency (this excellent package for DP synthesizers here: (https://github.com/ryan112358/private-pgm)[https://github.com/ryan112358/private-pgm]).

Note: This package is under heavy development - if functionality doesn't work/is missing, feel free to add an issue or submit a PR to fix!

Note on using GEMSynthesizer

If you would like to use the GEMSynthesizer, you must follow an alternative installation process for SynRD:

Create your preferred package management environment with python=3.7 (for example, conda create -n "synrd" python=3.7)
Git clone the SynRD repo: git clone https://github.com/DataResponsibly/SynRD
cd SynRD/synthesizers
Git clone the dp-query-release repo: git clone https://github.com/terranceliu/dp-query-release.git
Move src/ folder out of dp-query-release/ and into SynRD/synthesizers/
From the top level of SynRD clone, run pip install .

Further dependency notes

If you would like to benchmark with the paper Fruiht2018Naturally, please follow some of the following rpy2 installation instructions to configure your R-Python interface package.

Install Option 1 for R

If you have a mac with an M1 chip, you may have success installing rpy2 via the following:

Uninstall existing R versions on your machine.
Install R-4.2.2-arm64.pkg from https://cran.r-project.org/bin/macosx/.
conda install -n base conda-forge::mamba
mamba install -c conda-forge rpy2

Install Option 2 for R

To run analysis for papers using R, you must ensure that R is downloaded and your R_HOME environment variable is set to the path of the R executable.

For installing with Anaconda, you may use conda install r-base r-essentials.

For confirming rpy2 is working as expected, try the following in Python:

import rpy2

rpy2.robjects.r['pi']  # Returns R object with the number pi

Notes on structure of package

Each "paper" in the benchmark is named according to bibtex convention (authorYEARfirstword).

Notes on benchmark construction, reasoning, etc.

Taxonomy of findings

How to add a new paper

Brief details on how to add a new paper.

Create a new folder with (authorYEARfirstword)
Create a process.ipynb notebook as your data playground. Use this to investigate data cleaning/processing/results generation.
In parellel with (2), create a authorYEARfirstword.py file, and extend the Publication() metaclass with AuthorYEARFirstword(Publication). Add the relevant details (see meta_classes.py for notes on what this means). Then, begin to move over findings from process.ipynb into replicable lambdas in AuthorYEARFirstword(Publication).
Ensure that AuthorYEARFirstword(Publication) has a FINDINGS list class attribute. This should consist of Finding objects that wrap each finding_i(self) lambda in the proper Finding, VisualFinding or FigureFinding metaclass, and adds it to the list.
See Saw2018Cross for an example of a cleanly implemented Publication class.

Addendum on finding lambdas

Finding lambdas should have a particular structure that should be strictly adhered to. Consider the following example, and note particularly the return values

def finding_i_j(self): # there can be kwargs
    """
    (Text from paper, usually 2 or 3 sentences)
    """
    # often can use a table finding directly or 
    # as a starting point to quickly recreate 
    # finding
    results = self.table() 

    # (pandas stuff happens here to generate 
    # the findings)

    return ([values], 
            soft_finding, 
            [hard_findings])

The finding lambdas can essentially perform any computation necessary, but must return a tuple of

A list of values (these are a set of any relevant values to the soft finding, non-exhaustive)

For example:
```
[interest_stem_ninth,interest_stem_eleventh]
```
A soft_finding boolean (this is simply a boolean that reflects the primary inequality/contrast presented in the original paper for this finding)

For example:
```
soft_finding = interest_stem_ninth > interest_stem_eleventh
```
A list of hard findings i.e. values (this could be the difference or set of differences that affected the soft_finding inequality. F)

For example:
```
hard_finding = interest_stem_ninth - interest_stem_eleventh
hard_findings = [hard_finding] 
```

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
.github/workflows		.github/workflows
SynRD		SynRD
annotated_pdfs		annotated_pdfs
archive		archive
data		data
datasets		datasets
imgs		imgs
misc/icpsr-scraping/publications		misc/icpsr-scraping/publications
notebooks		notebooks
samples		samples
tests		tests
web		web
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
_config.yml		_config.yml
_toc.yml		_toc.yml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SynRD Package

Installing the benchmark

Note on using GEMSynthesizer

Further dependency notes

Install Option 1 for R

Install Option 2 for R

Notes on structure of package

Notes on benchmark construction, reasoning, etc.

Taxonomy of findings

How to add a new paper

Addendum on finding lambdas

For example:

For example:

For example:

About

Releases

Packages

Contributors 7

Languages

DataResponsibly/SynRD

Folders and files

Latest commit

History

Repository files navigation

SynRD Package

Installing the benchmark

Note on using GEMSynthesizer

Further dependency notes

Install Option 1 for R

Install Option 2 for R

Notes on structure of package

Notes on benchmark construction, reasoning, etc.

Taxonomy of findings

How to add a new paper

Addendum on finding lambdas

For example:

For example:

For example:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Languages

Packages