Merge pull request #3 from private-attribution/dp-start

Add some DP text
private-attribution · Sep 12, 2024 · 3debe2d · 3debe2d
2 parents 71f3dec + aaa0d00
commit 3debe2d
Show file tree

Hide file tree

Showing 3 changed files with 356 additions and 4 deletions.
diff --git a/Makefile b/Makefile
@@ -1,6 +1,8 @@
-.PHONY: all venv clean
+.PHONY: all venv clean images
 .SUFFIXES: .bs .html
 
+IMAGES := $(wildcard images/*.svg)
+
 all: build/index.html
 
 clean:
@@ -21,5 +23,12 @@ $(bikeshed): $(venv-marker) Makefile
 build:
 	mkdir -p $@
 
-build/index.html: api.bs build $(bikeshed)
+build/index.html: api.bs $(IMAGES) build $(bikeshed)
 	$(bikeshed) --die-on=warning spec $< $@
+
+images:
+	@echo "Regenerating images"
+	for i in $(IMAGES); do \
+	  tmp="$$(mktemp)"; \
+	  npx aasvg --extract --embed <"$$i" >"$$tmp" && mv "$$tmp" "$$i"; \
+	done
diff --git a/api.bs b/api.bs
@@ -281,7 +281,6 @@ Open questions:
     option of either? => via conversion site
 *   Epochs
 
-TODO
 
 ## ListAggregationSystems API ## {#list-aggregation-systems-api}
 
@@ -445,20 +444,278 @@ TODO
 
 TODO
 
+## Anti-Replay Requirements ## {#anti-replay}
+
+<!-- TODO link to definition of "conversion report" -->
+Conversion reports generated by browsers are bound
+to the amount of [=privacy budget=]
+that was expended by the site that requested the report.
+
+TODO
+
 
 # Differential Privacy # {#dp}
 
-This design uses the concept of differential privacy as the basis of its privacy design.
+This design uses the concept of [=differential privacy=]
+as the basis of its privacy design. [[PPA-DP]]
+
+<dfn>Differential privacy</dfn> is a mathematical definition of privacy
+that can guarantee the amount of private information
+that is revealed by a system. [[DP]]
+Differential privacy is not the only means
+by which privacy is protected in this system,
+but it is the most rigorously defined and analyzed.
+As such, it provides the strongest privacy guarantees.
+
+Differential privacy uses randomized noise
+to hide private data contributions
+to an aggregated dataset.
+The effect of noise is to hide
+individual contributions to the dataset,
+but to retain the usefulness of any aggregated analysis.
+
+To apply differential privacy,
+it is necessary to define what information is protected.
+In this system, the protected information is
+the [=impressions=] of a single user profile,
+on a single user agent,
+over a single week,
+for a single website that registers [=conversions=].
+[[#dp-unit]] describes the implications of this design
+in more detail.
+
+This attribution design uses a form of differential privacy
+called <dfn>individual differential privacy</dfn>.
+In this model, user agents are each separately responsible
+for ensuring that they limit the information
+that is contributed.
+
+The [=individual differential privacy=] design of this API
+has three primary components:
+
+1.  User agents limit the number of times
+    that they use [=impressions=] in [=conversion reports=].
+    [[#dp-budget]] explores this in greater depth.
+
+2.  [=Aggregation services=] ensure that any given [=conversion report=] is
+    only used in accordance with the [=privacy budget=].
+    [[#anti-replay]] describes requirements on aggregation services
+    in more detail.
+
+3.  Noise is added by [=aggregation services=].
+    [[#dp-mechanism]] details the mechanisms that might be used.
+
+Together, these measures place limits
+on the information that is released for each [=privacy unit=].
+
+
+## Privacy Unit ## {#dp-unit}
+
+An implementation of differential privacy
+requires a clear definition for what is protected.
+This is known as the <dfn>privacy unit</dfn>,
+which represents the entity that receives privacy protection.
+
+This system adopts a [=privacy unit=]
+that is the combination of three values:
+
+1.  A user agent profile.
+    That is, an instance of a user agent,
+    as used by a single person.
+
+2.  The [=site=] that requests information about impressions.
+
+    <p class=note>The sites that register impressions
+    are not considered.
+    Those sites do not receive information from this system directly.
+
+3.  The current week.
+
+A change to any of these values produces a new privacy unit,
+which results in a separate [=privacy budget=].
+Each site that a person visits receives a bounded amount of information
+for each week.
+
+Ideally, the [=privacy unit=] is a single person.
+Though ideal, it is not possible to develop a useful system
+that guarantees perfect correspondance with a person,
+for a number of reasons:
+
+*   People use multiple browsers and multiple devices,
+    often without coordination.
+
+*   A unit that covered all websites
+    could be exhausted by one site,
+    denying other sites any information.
+
+*   Advertising is an ongoing activity.
+    Without renewing the [=privacy budget=] periodically,
+    sites could exhaust their budget forever.
+
+
+### Browser Instances ### {#dp-instance}
+
+Each browser instance manages a separate [=privacy budget=].
+
+Coordination between browser instances might be possible,
+but not expected.
+That coordination might allow privacy to be improved
+by reducing the total amount of information that is released.
+It might also improve the utility of attribution
+by allowing impressions on one browser instance
+to be converted on another.
+
+Coordination across different implementations
+is presently out of scope for this work.
+Implementations can perform some coordination
+between instances that are known to be for the same person,
+but this is not mandatory.
+
+
+### Per-Site Limits ### {#dp-site}
+
+The information released to websites is done on the basis of [=site=].
+This aligns with the same boundary used in other privacy-relevant functions.
+
+A finer privacy unit, such as an [=origin=],
+would make it trivial to obtain additional information.
+Information about the same person could be gathered
+from multiple origins.
+That information could then be combined
+by exploiting the free flow of information within the site,
+using cookies [[COOKIES]] or similar.
+
+[[#dp-safety]] discusses attacks that exploit this limit
+and some additional [=safety limits=] that might be implemented
+by user agents
+to protect against those attacks.
+
+
+### Refresh Interval ### {#dp-refresh}
+
+The differential privacy budget available to a site
+is refreshed at an interval of one week.
+
+This budget applies to the [=impressions=]
+that are registered with the user agent
+and later queried,
+not conversions.
+
+From the perspective of the analysis [[PPA-DP]]
+each week of impressions forms a separate database.
+A finite number of queries can be made of each database,
+as determined by the [=privacy budget=]
+associated with that database.
+
+The goal is to set a value that is as large as feasible.
+A longer period of time allows for a better privacy/utility balance
+because sites can be allocated a larger overall budget
+at any point in time,
+while keeping the overall rate of privacy loss low.
+However, a longer interval means that it is easier to
+exhaust a privacy budget completely,
+yield no information until the next refresh.
+
+The choice of a week is largely arbitrary.
+One week is expected to be enough to allow sites
+the ability to make decisions about how to spend [=privacy budgets=]
+without careful planning that needs to account for
+changes that might occur days or weeks in the future.
+
+[[#dp-budget]] describes the process for budgeting in more detail.
 
 
 ## Privacy Budgets ## {#dp-budget}
 
 Browsers maintain a <dfn>privacy budget</dfn>,
 which is a means of limiting the amount of privacy loss.
 
+This specification uses an individual form
+of (&epsilon;, &delta;)-differential privacy as its basis.
+In this model, privacy loss is measured using the value &epsilon;.
+The &delta; value is handled by the [=aggregation service=]
+when adding noise to aggregates.
+
+Each user agent instance is responsible for
+managing privacy budgets.
+
+Each [=conversion report=] that is requested specifies an &epsilon; value
+that represents the amount of privacy budget
+that the report consumes.
+
+When searching for impressions for the conversion report,
+the user agent deducts the specified &epsilon; value from
+the budget for the week in which those impressions fall.
+If the privacy budget for that week is not sufficient,
+the impressions from that week are not used.
+
+<div class=example id=ex-budget>
+    In the following figure,
+    impressions are recorded from a number of different sites,
+    shown with circles.
+
+    <figure>
+    <pre class=include-raw>
+    path:images/budget.svg
+    </pre>
+    <figcaption>An example of a store of impressions over time</figcaption>
+    </figure>
+
+    A [=conversion report=] might be requested at the time marked with "now".
+    That conversion report selects impressions marked with black circles,
+    corresponding to impressions from Site B, C, and E.
+
+    As a result, privacy budgets for the querying site is deducted
+    from weeks 1, 3, 4, and 5.
+    No impressions were recorded for week 2,
+    so no budget is deducted from that week.
+</div>
+
+
 TODO
 
 
+### Safety Limits ### {#dp-safety}
+
+The basic [=privacy unit=] is vulnerable to attack
+by an adversary that is able to correlate activity for the same person
+across multiple [=sites=].
+
+Groups of sites can sometimes coordinate their activity,
+such as when they have shared ownership or strong agreements.
+A group of sites that can be sure that particular visitor is the same person--
+using any means, including something like FedCM [[FEDCM]]--
+can combine information gained from this API.
+
+This can be used to increase the rate
+at which a site gains information from attribution,
+proportional to the number of sites
+across which coordination occurs.
+The default privacy unit places no limit on the information released
+in this way.
+
+To counteract this effect, user agents can implement <dfn>safety limits</dfn>,
+which are additional privacy budgets that do not consider site.
+Safety limits might be significantly higher than per-site budgets,
+so that they are not reached for most normal browsing activity.
+The goal would be to ensure that they are only effective
+for intensive activity or when being attacked.
+
+Like the per-site privacy budget,
+it is critical that sites be unable to determine
+whether their request for a [=conversion report=] has caused
+a safety limit to be exceeded.
+
+
+
+
+## Differential Privacy Mechanisms ## {#dp-mechanism}
+
+The specific mechanisms that are used
+depend on the type of [=aggregation service=].
+
+
+
 # Security # {#security}
 
 TODO
@@ -482,8 +739,21 @@ The broad shape of this level of the API is based on an idea from Luke Winstrom.
 The privacy architecture is courtesy of the authors of [[PPA-DP]].
 
 
+<pre class=link-defaults>
+spec:html; type:dfn; text:site
+</pre>
 <pre class=biblio>
 {
+  "dp": {
+    "authors": [
+      "Cynthia Dwork",
+      "Aaron Roth"
+    ],
+    "date": "2014",
+    "href": "https://doi.org/10.1561/0400000042",
+    "title": "The Algorithmic Foundations of Differential Privacy",
+    "publisher": "now, Foundations and Trends in Theoretical Computer Science, Vol. 9, Nos. 3–4"
+  },
   "ppa-dp": {
     "authors": [
       "Pierre Tholoniat",