Cohere exposes an application-layer API where a variety of differentially private systems can express their privacy resource needs.
The prototype resource planner expects a --requests argument containing all candidate requests in a JSON file.
Although it is possible to directly express privacy resource requests, employing a request adapter that seamlessly integrates with a DP library significantly streamlines the process.
As part of the prototype, we provide a request adapter that integrates directly with Tumult Analytics, a DP library designed for aggregate queries on tabular data, as well as Opacus, a PyTorch-based DP library for ML training.
To get a local demo up and running follow these simple steps.
-
curl -sSL https://install.python-poetry.org | python3 - -
sudo apt install default-jre
-
Local clone of the repository (with submodules)
git clone --recurse-submodules git@github.com:pps-lab/cohere.git
-
Setup PyTorch (may require adding CUDA to the
LD_LIBRARY_PATHenvironment variable):export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.2/lib64/ -
Run the demo for Tumult Analytics and Opacus (from the project root):
make request-adapter-demo
The demo in request-adapter/request_adapter/main.py illustrates how to use the request adapter for Tumult Analytics and Opacus.
In addition, there is a set of tests to ensure that the privacy costs in the requests match the costs of the DP library.
A Tumult Analytics application starts with a Session, which lists all available tables and initializes privacy accounting parameters.
Developers then utilize the QueryBuilder to express aggregate queries using a SQL-like Domain-Specific Language (DSL).
Normally, these queries would be executed within the Session.
However, in the context of Cohere, rather than direct execution, our aim is to formulate a resource request.
To achieve this, we construct a ConverterConfig and subsequently invoke create_tumult_request(..).
This function creates a Cohere request, based on inputs such as the Session, the QueryExpr from the QueryBuilder, and the queries' privacy budget.
# init a standard Tumult Session
session = Session.from_dataframe(
privacy_budget=RhoZCDPBudget(float('inf')), # budget is unused
source_id="data",
dataframe=private_data,
)
# building a Tumult query
query_expr = QueryBuilder("data")
.groupby(KeySet.from_dict({"A": ["0", "1"]}))
.average("B", 0.0, 5.0)
# convert to a Cohere Request
config = ConverterConfig(
active_time_window=timedelta(weeks=12),
allocation_interval=timedelta(weeks=1))
request = create_tumult_request(
session=session,
query_expr=query_expr,
budget=RhoZCDPBudget(0.2),
converter_config=config,
population_dnf=None,
population_prob=1.0,
utility=5)In an Opacus application, there is a PrivacyEngine that encapsulates a standard PyTorch model, optimizer, and data loader. Moreover, the PrivacyEngine initializes privacy accounting parameters.
To obtain a Cohere request, we need to define a ConverterConfig and subsequently utilize the create_opacus_request(..) function, with the necessary inputs.
# use Opacus to initialize DP model training
model, optimizer, train_loader = PrivacyEngine(accountant="rdp").make_private_with_epsilon(
module=model,
optimizer=optimizer,
data_loader=train_loader,
epochs=epochs,
target_epsilon=3.0,
target_delta=1e-8,
max_grad_norm=2.5,
)
# convert to a Cohere request
config = ConverterConfig(
active_time_window=timedelta(weeks=12), allocation_interval=timedelta(weeks=1))
request = create_opacus_request(
optimizer=optimizer,
n_batches_per_epoch=len(train_loader),
epochs=EPOCHS,
converter_config=config,
population_dnf=None,
population_prob=1.0,
utility=5)The request-adapter contains a test suite designed to assess the adapter's functionality.
-
Download the adapter test data: Download (392 MB)
-
Unarchive the file:
adapter-testdata.zip -
Move the result folders to data:
# the directory should look like: data/ ├─ cifar10 └─ dummy-netflix -
Execute the test suite:
# takes ~37 mins poetry run pytest -vvv