-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Self-service task provisioning and BYOHelper #1486
Labels
byohelper
needed for BYOHelper and self-service task provisioning (#1486)
Comments
Once we have achieved consensus with a critical core of external partners, we will file issues across this project and https://github.com/divviup/divviup-api to track implementation of specific portions of this proposal. |
tgeoghegan
added
the
byohelper
needed for BYOHelper and self-service task provisioning (#1486)
label
Jun 16, 2023
This was referenced Jun 16, 2023
All the work for this has landed. Further tweaks to this design and their implementation are tracked elsewhere. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Goals
We want Divvi Up (DU) subscribers to be able to use the DU website or API to provision tasks in which DU acts as the DAP leader. This should require a single API operation, such that DU can guarantee that when the subcriber gets a successful API response and is provided task parameters, both aggregators are ready to process reports in the task.
We want subscribers to be able to use a Janus helper that they are running themselves, which is not previously known to Divvi Up (this last part is what we call BYOHelper; Bring Your Own Helper).
As much as possible, this solution should absolve subscribers of generating secrets. Further, secrets associated with tasks should be exposed to the minimum possible number of protocol participants.
We want all of this to be doable by the subscriber without manual intervention by the Divvi Up team (this is the "self-service" part).
This memo describes a plan for implementing this set of features in our control (
divviup-api
) and data (janus
) planes. It builds upon but is not necessarily consistent with ideas and APIs described in the control plane design and inter-aggregator task provisioning documents.Non-goals
Software components
Janus is Divvi Up's implementation of the Distributed Aggregation Protocol and constitutes the data plane for Divvi Up's deployment. Janus is open source and container images are published to public container registries. Divvi Up encourages partners to use Janus in their deployments, even if they're not using Divvi Up.
The Janus aggregator API is a set of endpoints implemented in Janus that allow managing DAP tasks. It is the interface between Divvi Up's control plane and data plane.
divviup-api
implements the control plane for Divvi Up's deployment. It is responsible for providing a UI that allow subscribers to configure DAP tasks, and is responsible for using the Janus aggregator API to provision tasks into the data plane. Whiledivviup-api
is open source, Divvi Up does not recommend that external partners use it, as it makes several decisions tailored to our deployment that may not be suitable for other organizations (e.g., the use of Auth0 for user authentication), and does not provide configuration mechanisms to disable or replace the dependencies brought in by those decisions.What do we have now?
divviup-api
has a routePOST /accounts/:account_id/tasks
for creating tasks. The subscriber must specify a URL for the other aggregator's DAP API, and that aggregator is assumed to be configured with the task parameters out of band of any interaction withdivviup-api
. Divvi Up can be configured as either the leader or helper. The subscriber may specify all, some or none of the task's ID, VDAF verify key, aggregator auth token(s) and collector auth token(s). After some validation,divviup-api
uses an aggregator API endpoint it's configured with to provision the task into the data plane.The
janus
aggregator API has a routePOST /tasks
(note there is no account path parameter; the data plane has no notion of DU accounts or users). If any of the task ID, VDAF verify key, aggregator auth token or collector auth token were not specified in the request, they are generated byjanus
and then echoed back to the subscriber. The aggregator API is authenticated but private, exposed on a different TCP port than the DAP API, and is only ever used over a private network.Solution proposal
To achieve our goals, we need two major changes to our API surface. First,
divviup-api
needs to be able to configure a task in a helper chosen by the subscriber, including generation of the necessary secrets. That means the helper needs to expose an API allowing task provisioning. The Janus aggregator API fits the bill quite nicely, and has the additional virtue of already being part of the Janus helper image that our subscribers are using. But to use that API for task provisioning, the subscriber would have to expose their helper's aggregator API to the internet so that ourdivviup-api
can use it. We also need to establish trust between Divvi Up and the subscriber's helper. Concretely, that meansdivviup-api
needs a bearer token so it can authenticate requests to the subscriber's helper.That process, which I'm referring to as "pairing a helper" (analogously to pairing a computer with a Bluetooth device), is the second major API surface change.
divviup-api
has APIs and entities for users and accounts and tasks which belong to accounts. We need to allow subscribers to manage an aggregator resource to which task creation requests can refer. This aggregator entity will also be used to manage what we call "global" aggregators, which would be available to all Divvi Up subscribers without needing to run their own servers or do any pairing.Beyond the task provisioning flow described in this document, the only additional provisioning flow we expect to support is "BYOLeader", where a subscriber uses the Divvi Up website to configure DU as a helper and their own aggregator as a leader. The current design does not support that flow, but this should be possible to do with minimal extensions.
On the other hand, we do propose to simplify the
divviup-api
and Janus aggregator API interfaces so that they only support the flow described in this document. Once this design is implemented, it will no longer be possible for subscribers to assume responsibility for provisioning tasks into either aggregator. The motivation is to reduce the code and API surface that Divvi Up has to maintain, since we're not aware of any subscribers who want to use Divvi Up in this manner. Readers should also keep in mind that since Divvi Up will eventually (hopefully) make global aggregators available to subscribers, this does not mean that all subscribers must run their own DAP aggregator to use Divvi Up.Pairing a helper with Divvi Up
This sequence diagram illustrates how a subscriber would go about pairing a new helper with Divvi Up, making it available for use in many subsequently provisioned tasks. Subscribers will need to do this once per deployed helper (multiple replicas of a helper configured with the same parameters count as a single helper for this protocol).
diagram source
What changes do we need to make for this?
divviup-api
A new API resource and persistence entity: aggregators. An aggregator entity includes (but is not limited to):
Open question: Should DAP URL be in here, or should we allow the response to
POST /tasks
on the aggregator API to specify the DAP URL?Open question: Are aggregators mutable? If so, then this needs an
updated_at
field and we need aPATCH
API.divviup/divviup-api#196
All aggregators, including Divvi Up itself, will be represented as aggregator entities. That means we will have to work out a means of bootstrapping Divvi Up's aggregator descriptor into
divviup-api
deployments.divviup/divviup-api#198
The
divviup-api
will expose the following API endpoints to manage aggregators:POST /accounts/:account_id/aggregators
Creates a new aggregator in an account. The request must be authenticated with a valid token for
:account_id
. The body is a JSON object containing:If the request is valid,
divviup-api
adds a row to its aggregators table for the new aggregator, marking it as belonging to:account_id
. On success, the response is 201 Created. The response body is a JSON object containing the aggregator ID. Subsequently, requests toPOST /accounts/:account_id/tasks
may reference the aggregator ID.POST /aggregators
Creates a new global aggregator, usable by any account. Only
divviup-api
administrator users may use this API. The request and responses are identical toPOST /accounts/:account_id/aggregators
. n.b.: this interaction is not illustrated in any sequence diagram.GET /accounts/:account_id/aggregators
Lists all the aggregators that this account may use in tasks, including the global aggregators. The request must be authenticated with a valid token for
:account_id
. The response is 200 OK and the body is a list of JSON objects containing:Open question: Would a client ever want to get a list of just the account-specific aggregators, or just the global ones? If so, then we need a separate
GET /aggregators
, but I think the combined list will always be short enough that it's OK to return them all and make the client filter on theglobal
boolean.GET /aggregators/:aggregator_id
Queries for an existing aggregator's information. The response is 200 OK and the body is a JSON object containing a single object of the same format as that returned from
GET /accounts/:account_id/aggregators
.DELETE /aggregators/:aggregator_id
Deactivates an aggregator. There is no request body. The response is 204 No Content. To delete an aggregator associated with an account, the client must present a valid token for that account. Only
divviup-api
administrators may delete a global aggregator.Deactivating an aggregator means that no new tasks may be created referencing the aggregator. However any existing tasks using that aggregator may continue to run until they expire.
divviup/divviup-api#197
janus
Janus should not need code changes to support this. Partners running Janus will be instructed to configure their helper with
AGGREGATOR_API_AUTH_TOKENS
set to at least two tokens: one that they use themselves, and another that they will provide todivviup-api
when pairing their aggregator. Then, they will have to expose their helper's aggregator API to the internet so that ourdivviup-api
may access it.Self-service task provisioning
This sequence diagram illustrates how a subscriber would go about provisioning a task into Divvi Up where DU acts as the DAP leader and some aggregator previously paired with DU acts as the DAP helper.
diagram source
What changes do we need to make for this?
divviup-api
The
POST /accounts/:account_id/tasks
needs to allow specifying an aggregator ID instead ofpartner_url
, allowingdivviup-api
to look up the information of a previously paired aggregator. The handler also needs to reach out to the aggregator's aggregator API to provision the task in it. We will have to revisit theNewTask
message definition to remove the fields for (task)id
,vdaf_verify_key
,aggregator_auth_token
andcollector_auth_token
.divviup/divviup-api#199
For resilience against transient errors, we should make sure
divviup-api
's interfaces are idempotent. The challenge is establishing a useful idempotency key: in this proposal, the subscriber is no longer responsible for choosing any unique task parameter like task ID or VDAF verify key. In the near term, we can accept that when task creation fails, it may leave orphaned tasks in one or the other aggregator's database, which is not a significant concern because it only costs a single database row and it should be easy enough for something like Janus' garbage collector to identify and reap the orphaned tasks. In the longer term, we can implement support for an idempotency key in an HTTP header (seedraft-ietf-httpapi-idempotency-key-header
for one solution strategy) that would make this endpoint properly idempotent.divviup/divviup-api#200
janus
Revisit the
PostTaskReq
message definition to (non-exhaustive list of necessary changes):Revisit aggregator API's
PostTaskReq
for self-service task provisioning #1506This proposal requires making some or all of the aggregator API publicly accessible. "Security considerations", below, discusses some of the implications of this for operators. As implementers, we will need to:
divviup-api
, with a custom content format that incorporates a version. Aggregator API should be versioned #1475Farther out, we can make some enhancements to the aggregator API to make it more resilient against transient failures. In particular, this means making sure all the API endpoints are idempotent. The aggregator API's
POST /tasks
endpoint lends itself quite naturally to idempotence using the VDAF verify key as the idempotence key.#1507
Security considerations
A taxonomy of tokens
This proposal discusses the provisioning and generation of a variety of authenticaton credentials. It's worth taking some time to disambiguate the different tokens in play across
divviup-api
, Janus' DAP API and Janus' aggregator API.divviup-api
Currently, subscribers can only authenticate to
divviup-api
via Auth0, and then must present a Divvi Up session cookie with subsequent requests. This proposal doesn't require any further authentication schemes fordivviup-api
, but we might consider enabling DU users to mint bearer tokens so that subscribers can programmatically manage tasks.Janus aggregator API
Currently, the only client of the Janus aggregator API is
divviup-api
. It authenticates to the aggregator API using a bearer token, which is wired into either side using environment variables backed by Kubernetes secrets. This credential is not bound to any DAP task, because it authorizes its bearer to use the entire aggregator API to manage tasks.Janus DAP API credentials
Each DAP task provisioned into Janus has one or more aggregator auth tokens and one or more collector auth tokens. Aggregator auth tokens are used by a Janus leader when making requests to the helper, or used by a Janus helper to authenticate incoming requests from the leader. Collector auth tokens are used by a Janus leader to authenticate incoming requests from the collector. Both aggregator and collector auth tokens are scoped to a task.
Putting Janus aggregator API on the internet
This proposal requires partner organizations running Janus helpers to expose API surface beyond the data plane DAP API to the internet. This is mitigated by serving that API over TLS and requiring clients to authenticate with a bearer token.
The goal of this proposal is to make task provisioning easier for Divvi Up partners, both in the sense of reducing how much code they have to write and the operational work they take on. If the risk is unacceptable to some partner, they are free to take on provisioning tasks in both aggregators (see "Alternatives considered" for some discussion).
Scope of aggregator API auth tokens
Tokens provided to Janus'
AGGREGATOR_API_AUTH_TOKENS
configuration value are all treated equivalently, meaning they all have complete access to the entire aggregator API. This meansdivviup-api
will have access to the partner's entire aggregator API.This presents a risk if the partner's aggregator is itself being used with multiple partners (that is, if the helper is multi-tenant): Divvi Up would be able to enumerate and delete tasks that don't belong to it, because Janus and its aggregator API have no notion of users or accounts or who owns a task.
This should not be a security risk for the Mozilla or Horizontal deployments, since they are exclusively partnering with Divvi Up for the time being. It can also be mitigated by the partner by blocking anything put the
POST /tasks
route.In the future, we can mitigate this further by introducing a richer representation of tokens and ACLs into Janus, enabling Janus operators to mint tokens that e.g. exclusively permit adding tasks.
Subscriber verification of task parameters
In this proposal, the subscriber delegates to
divviup-api
the ability to create tasks. This introduces the risk thatdivviup-api
could tamper with the subscriber's task parameters to attack privacy, for example by setting the task's minimum batch size to 1.The subscriber mitigates this risk by querying their helper for the newly created task's parameters after
divviup-api
reports it has been successfully provisioned. They do this with theGET /tasks/:task_id
endpoint, which is already exposed by the aggregator API. If the returned parameters match what the subscriber sent todivviup-api
, then the subscriber can proceed with distributing task configuration to its clients and start uploading reports. If they don't match, the subscriber can abort and call their lawyers to find out why Divvi Up is cheating. This interaction is illustrated in the "self-service task provisioning" sequence diagram, above.Alternatives considered
Make subscriber responsible for provisioning task into both aggregators
One attempt to describe this provisioning flow is in the inter-aggregator task provisioning document.
I think this is worse than making
divviup-api
manage provisioning tasks in both aggregators for these reasons:divviup-api
and thejanus
aggregator API, then we control all the code involved and I believe the solution can be deployed more quickly and smoothly.Token API in Janus aggregator API
The aggregator API could mint bearer tokens that can later be used to authenticate requests to the aggregator API (like
POST /tasks
).POST /tokens
creates a new token. There is no request body. On creation, the new token will be stored in the Janus Postgres database, in a new table akin totask_aggregator_auth_tokens
. The response is 201 Created and the body is a JSON object containing:The new token can later be used to authenticate requests to the aggregator API (like
POST /tasks
) but not to make new tokens. This has the following implications for aggregator API request authentication:We decided against this because we can instead require helper operators to supply multiple aggregator API auth tokens to their helpers via
AGGREGATOR_API_AUTH_TOKENS
, which will require no (or less) new code.The text was updated successfully, but these errors were encountered: