Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-service task provisioning and BYOHelper #1486

Closed
tgeoghegan opened this issue Jun 12, 2023 · 2 comments
Closed

Self-service task provisioning and BYOHelper #1486

tgeoghegan opened this issue Jun 12, 2023 · 2 comments
Labels
byohelper needed for BYOHelper and self-service task provisioning (#1486)

Comments

@tgeoghegan
Copy link
Contributor

tgeoghegan commented Jun 12, 2023

Goals

We want Divvi Up (DU) subscribers to be able to use the DU website or API to provision tasks in which DU acts as the DAP leader. This should require a single API operation, such that DU can guarantee that when the subcriber gets a successful API response and is provided task parameters, both aggregators are ready to process reports in the task.

We want subscribers to be able to use a Janus helper that they are running themselves, which is not previously known to Divvi Up (this last part is what we call BYOHelper; Bring Your Own Helper).

As much as possible, this solution should absolve subscribers of generating secrets. Further, secrets associated with tasks should be exposed to the minimum possible number of protocol participants.

We want all of this to be doable by the subscriber without manual intervention by the Divvi Up team (this is the "self-service" part).

This memo describes a plan for implementing this set of features in our control (divviup-api) and data (janus) planes. It builds upon but is not necessarily consistent with ideas and APIs described in the control plane design and inter-aggregator task provisioning documents.

Non-goals

  • This proposal does not address the case where Divvi Up runs a DAP helper.
  • This proposal assumes that helpers brought by subscribers are running Janus, which means we can assume they have the Janus aggregator API. In the future, we could write a specification for the aggregator API that any helper implementation can conform to.

Software components

Janus is Divvi Up's implementation of the Distributed Aggregation Protocol and constitutes the data plane for Divvi Up's deployment. Janus is open source and container images are published to public container registries. Divvi Up encourages partners to use Janus in their deployments, even if they're not using Divvi Up.

The Janus aggregator API is a set of endpoints implemented in Janus that allow managing DAP tasks. It is the interface between Divvi Up's control plane and data plane.

divviup-api implements the control plane for Divvi Up's deployment. It is responsible for providing a UI that allow subscribers to configure DAP tasks, and is responsible for using the Janus aggregator API to provision tasks into the data plane. While divviup-api is open source, Divvi Up does not recommend that external partners use it, as it makes several decisions tailored to our deployment that may not be suitable for other organizations (e.g., the use of Auth0 for user authentication), and does not provide configuration mechanisms to disable or replace the dependencies brought in by those decisions.

What do we have now?

divviup-api has a route POST /accounts/:account_id/tasks for creating tasks. The subscriber must specify a URL for the other aggregator's DAP API, and that aggregator is assumed to be configured with the task parameters out of band of any interaction with divviup-api. Divvi Up can be configured as either the leader or helper. The subscriber may specify all, some or none of the task's ID, VDAF verify key, aggregator auth token(s) and collector auth token(s). After some validation, divviup-api uses an aggregator API endpoint it's configured with to provision the task into the data plane.

The janus aggregator API has a route POST /tasks (note there is no account path parameter; the data plane has no notion of DU accounts or users). If any of the task ID, VDAF verify key, aggregator auth token or collector auth token were not specified in the request, they are generated by janus and then echoed back to the subscriber. The aggregator API is authenticated but private, exposed on a different TCP port than the DAP API, and is only ever used over a private network.

Solution proposal

To achieve our goals, we need two major changes to our API surface. First, divviup-api needs to be able to configure a task in a helper chosen by the subscriber, including generation of the necessary secrets. That means the helper needs to expose an API allowing task provisioning. The Janus aggregator API fits the bill quite nicely, and has the additional virtue of already being part of the Janus helper image that our subscribers are using. But to use that API for task provisioning, the subscriber would have to expose their helper's aggregator API to the internet so that our divviup-api can use it. We also need to establish trust between Divvi Up and the subscriber's helper. Concretely, that means divviup-api needs a bearer token so it can authenticate requests to the subscriber's helper.

That process, which I'm referring to as "pairing a helper" (analogously to pairing a computer with a Bluetooth device), is the second major API surface change. divviup-api has APIs and entities for users and accounts and tasks which belong to accounts. We need to allow subscribers to manage an aggregator resource to which task creation requests can refer. This aggregator entity will also be used to manage what we call "global" aggregators, which would be available to all Divvi Up subscribers without needing to run their own servers or do any pairing.

Beyond the task provisioning flow described in this document, the only additional provisioning flow we expect to support is "BYOLeader", where a subscriber uses the Divvi Up website to configure DU as a helper and their own aggregator as a leader. The current design does not support that flow, but this should be possible to do with minimal extensions.

On the other hand, we do propose to simplify the divviup-api and Janus aggregator API interfaces so that they only support the flow described in this document. Once this design is implemented, it will no longer be possible for subscribers to assume responsibility for provisioning tasks into either aggregator. The motivation is to reduce the code and API surface that Divvi Up has to maintain, since we're not aware of any subscribers who want to use Divvi Up in this manner. Readers should also keep in mind that since Divvi Up will eventually (hopefully) make global aggregators available to subscribers, this does not mean that all subscribers must run their own DAP aggregator to use Divvi Up.

Pairing a helper with Divvi Up

This sequence diagram illustrates how a subscriber would go about pairing a new helper with Divvi Up, making it available for use in many subsequently provisioned tasks. Subscribers will need to do this once per deployed helper (multiple replicas of a helper configured with the same parameters count as a single helper for this protocol).

image
diagram source

What changes do we need to make for this?

divviup-api

A new API resource and persistence entity: aggregators. An aggregator entity includes (but is not limited to):

struct Aggregator {
  /// Unique identifier for the aggregator.
  id: Uuid,
  /// URL at which this aggregator's aggregator API may be accessed.
  api_url: Url,
  /// Bearer token for authenticating requests to this aggregator's aggregator API.
  bearer_token: Vec<u8>,
  /// URL at which this aggregator's DAP API may be accessed.
  dap_url: Url,
  /// Display name for this aggregator. Used in Divvi Up website.
  name: String,
  /// The Divvi Up subscriber account this aggregator may belong to. If `Some`, then only tasks in
  /// this account may use this aggregator. If `None`, then the aggregator is global and may be used
  /// by all tasks.
  ///
  /// An aggregator may belong to exactly one account. If two different accounts wish to use the
  /// same aggregator, they will separately pair that aggregator with their accounts.
  account_id: Option<Uuid>,
  /// Date at which this aggregator was created.
  created_at: OffsetDateTime,
}

Open question: Should DAP URL be in here, or should we allow the response to POST /tasks on the aggregator API to specify the DAP URL?
Open question: Are aggregators mutable? If so, then this needs an updated_at field and we need a PATCH API.

divviup/divviup-api#196

All aggregators, including Divvi Up itself, will be represented as aggregator entities. That means we will have to work out a means of bootstrapping Divvi Up's aggregator descriptor into divviup-api deployments.

divviup/divviup-api#198

The divviup-api will expose the following API endpoints to manage aggregators:

POST /accounts/:account_id/aggregators

Creates a new aggregator in an account. The request must be authenticated with a valid token for :account_id. The body is a JSON object containing:

  • The display name for the aggregator,
  • The aggregator API URL;
  • A bearer token for authenticating requests to the aggregator API;
  • The DAP API URL.

If the request is valid, divviup-api adds a row to its aggregators table for the new aggregator, marking it as belonging to :account_id. On success, the response is 201 Created. The response body is a JSON object containing the aggregator ID. Subsequently, requests to POST /accounts/:account_id/tasks may reference the aggregator ID.

POST /aggregators

Creates a new global aggregator, usable by any account. Only divviup-api administrator users may use this API. The request and responses are identical to POST /accounts/:account_id/aggregators. n.b.: this interaction is not illustrated in any sequence diagram.

GET /accounts/:account_id/aggregators

Lists all the aggregators that this account may use in tasks, including the global aggregators. The request must be authenticated with a valid token for :account_id. The response is 200 OK and the body is a list of JSON objects containing:

  • Aggregator ID;
  • Aggregator API URL;
  • Bearer token for authenticating to the aggregator API;
  • DAP API URL;
  • Boolean indicating whether the aggregator is global.
  • Aggregator display name.

Open question: Would a client ever want to get a list of just the account-specific aggregators, or just the global ones? If so, then we need a separate GET /aggregators, but I think the combined list will always be short enough that it's OK to return them all and make the client filter on the global boolean.

GET /aggregators/:aggregator_id

Queries for an existing aggregator's information. The response is 200 OK and the body is a JSON object containing a single object of the same format as that returned from GET /accounts/:account_id/aggregators.

DELETE /aggregators/:aggregator_id

Deactivates an aggregator. There is no request body. The response is 204 No Content. To delete an aggregator associated with an account, the client must present a valid token for that account. Only divviup-api administrators may delete a global aggregator.

Deactivating an aggregator means that no new tasks may be created referencing the aggregator. However any existing tasks using that aggregator may continue to run until they expire.

divviup/divviup-api#197

janus

Janus should not need code changes to support this. Partners running Janus will be instructed to configure their helper with AGGREGATOR_API_AUTH_TOKENS set to at least two tokens: one that they use themselves, and another that they will provide to divviup-api when pairing their aggregator. Then, they will have to expose their helper's aggregator API to the internet so that our divviup-api may access it.

Self-service task provisioning

This sequence diagram illustrates how a subscriber would go about provisioning a task into Divvi Up where DU acts as the DAP leader and some aggregator previously paired with DU acts as the DAP helper.

image
diagram source

What changes do we need to make for this?

divviup-api

The POST /accounts/:account_id/tasks needs to allow specifying an aggregator ID instead of partner_url, allowing divviup-api to look up the information of a previously paired aggregator. The handler also needs to reach out to the aggregator's aggregator API to provision the task in it. We will have to revisit the NewTask message definition to remove the fields for (task) id, vdaf_verify_key, aggregator_auth_token and collector_auth_token.

divviup/divviup-api#199

For resilience against transient errors, we should make sure divviup-api's interfaces are idempotent. The challenge is establishing a useful idempotency key: in this proposal, the subscriber is no longer responsible for choosing any unique task parameter like task ID or VDAF verify key. In the near term, we can accept that when task creation fails, it may leave orphaned tasks in one or the other aggregator's database, which is not a significant concern because it only costs a single database row and it should be easy enough for something like Janus' garbage collector to identify and reap the orphaned tasks. In the longer term, we can implement support for an idempotency key in an HTTP header (see draft-ietf-httpapi-idempotency-key-header for one solution strategy) that would make this endpoint properly idempotent.

divviup/divviup-api#200

janus

Revisit the PostTaskReq message definition to (non-exhaustive list of necessary changes):

This proposal requires making some or all of the aggregator API publicly accessible. "Security considerations", below, discusses some of the implications of this for operators. As implementers, we will need to:

Farther out, we can make some enhancements to the aggregator API to make it more resilient against transient failures. In particular, this means making sure all the API endpoints are idempotent. The aggregator API's POST /tasks endpoint lends itself quite naturally to idempotence using the VDAF verify key as the idempotence key.

#1507

Security considerations

A taxonomy of tokens

This proposal discusses the provisioning and generation of a variety of authenticaton credentials. It's worth taking some time to disambiguate the different tokens in play across divviup-api, Janus' DAP API and Janus' aggregator API.

divviup-api

Currently, subscribers can only authenticate to divviup-api via Auth0, and then must present a Divvi Up session cookie with subsequent requests. This proposal doesn't require any further authentication schemes for divviup-api, but we might consider enabling DU users to mint bearer tokens so that subscribers can programmatically manage tasks.

Janus aggregator API

Currently, the only client of the Janus aggregator API is divviup-api. It authenticates to the aggregator API using a bearer token, which is wired into either side using environment variables backed by Kubernetes secrets. This credential is not bound to any DAP task, because it authorizes its bearer to use the entire aggregator API to manage tasks.

Janus DAP API credentials

Each DAP task provisioned into Janus has one or more aggregator auth tokens and one or more collector auth tokens. Aggregator auth tokens are used by a Janus leader when making requests to the helper, or used by a Janus helper to authenticate incoming requests from the leader. Collector auth tokens are used by a Janus leader to authenticate incoming requests from the collector. Both aggregator and collector auth tokens are scoped to a task.

Putting Janus aggregator API on the internet

This proposal requires partner organizations running Janus helpers to expose API surface beyond the data plane DAP API to the internet. This is mitigated by serving that API over TLS and requiring clients to authenticate with a bearer token.

The goal of this proposal is to make task provisioning easier for Divvi Up partners, both in the sense of reducing how much code they have to write and the operational work they take on. If the risk is unacceptable to some partner, they are free to take on provisioning tasks in both aggregators (see "Alternatives considered" for some discussion).

Scope of aggregator API auth tokens

Tokens provided to Janus' AGGREGATOR_API_AUTH_TOKENS configuration value are all treated equivalently, meaning they all have complete access to the entire aggregator API. This means divviup-api will have access to the partner's entire aggregator API.

This presents a risk if the partner's aggregator is itself being used with multiple partners (that is, if the helper is multi-tenant): Divvi Up would be able to enumerate and delete tasks that don't belong to it, because Janus and its aggregator API have no notion of users or accounts or who owns a task.

This should not be a security risk for the Mozilla or Horizontal deployments, since they are exclusively partnering with Divvi Up for the time being. It can also be mitigated by the partner by blocking anything put the POST /tasks route.

In the future, we can mitigate this further by introducing a richer representation of tokens and ACLs into Janus, enabling Janus operators to mint tokens that e.g. exclusively permit adding tasks.

Subscriber verification of task parameters

In this proposal, the subscriber delegates to divviup-api the ability to create tasks. This introduces the risk that divviup-api could tamper with the subscriber's task parameters to attack privacy, for example by setting the task's minimum batch size to 1.

The subscriber mitigates this risk by querying their helper for the newly created task's parameters after divviup-api reports it has been successfully provisioned. They do this with the GET /tasks/:task_id endpoint, which is already exposed by the aggregator API. If the returned parameters match what the subscriber sent to divviup-api, then the subscriber can proceed with distributing task configuration to its clients and start uploading reports. If they don't match, the subscriber can abort and call their lawyers to find out why Divvi Up is cheating. This interaction is illustrated in the "self-service task provisioning" sequence diagram, above.

Alternatives considered

Make subscriber responsible for provisioning task into both aggregators

One attempt to describe this provisioning flow is in the inter-aggregator task provisioning document.

I think this is worse than making divviup-api manage provisioning tasks in both aggregators for these reasons:

  • It makes it harder to handle the VDAF verify parameter. Either you have to make the subscriber choose the VDAF verify key so it can share it with both aggregators, or you have to implement a scheme by which the two aggregators can negotiate the VDAF verify key on their own. The former is bad because we should avoid exposing the VDAF verify key unnecessarily. The latter is bad because if you go to the trouble of wiring up that leader/helper interaction, I think you may as well have the leader drive task provisioning entirely, as I have described in the rest of this document.
  • If we make this the subscriber's problem, then Mozilla and Horizontal each have to design, implement and operate solutions. If we pursue the design in this document in which task provisioning is handled by divviup-api and the janus aggregator API, then we control all the code involved and I believe the solution can be deployed more quickly and smoothly.

Token API in Janus aggregator API

The aggregator API could mint bearer tokens that can later be used to authenticate requests to the aggregator API (like POST /tasks).

POST /tokens creates a new token. There is no request body. On creation, the new token will be stored in the Janus Postgres database, in a new table akin to task_aggregator_auth_tokens. The response is 201 Created and the body is a JSON object containing:

  • The bearer token;
  • A unique token ID (a UUID).

The new token can later be used to authenticate requests to the aggregator API (like POST /tasks) but not to make new tokens. This has the following implications for aggregator API request authentication:

  • Janus has to look in two places for valid aggregator API auth tokens: the configuration and the database.
  • Janus must distinguish between distinguish between tokens that allow minting other tokens and tokens that only allow creating tasks.

We decided against this because we can instead require helper operators to supply multiple aggregator API auth tokens to their helpers via AGGREGATOR_API_AUTH_TOKENS, which will require no (or less) new code.

@tgeoghegan
Copy link
Contributor Author

Once we have achieved consensus with a critical core of external partners, we will file issues across this project and https://github.com/divviup/divviup-api to track implementation of specific portions of this proposal.

@tgeoghegan tgeoghegan added the byohelper needed for BYOHelper and self-service task provisioning (#1486) label Jun 16, 2023
tgeoghegan added a commit that referenced this issue Jun 26, 2023
Updates the `in_cluster` test harness code to use the divviup-api API
resource for managing aggregators and the automated task provisioning
flow (#1486).

Resolves #1528
tgeoghegan added a commit that referenced this issue Jun 27, 2023
Updates the `in_cluster` test harness code to use the divviup-api API
resource for managing aggregators and the automated task provisioning
flow (#1486).

Resolves #1528
tgeoghegan added a commit that referenced this issue Jun 28, 2023
Updates the `in_cluster` test harness code to use the divviup-api API
resource for managing aggregators and the automated task provisioning
flow (#1486).

Resolves #1528
tgeoghegan added a commit that referenced this issue Jun 28, 2023
Updates the `in_cluster` test harness code to use the divviup-api API
resource for managing aggregators and the automated task provisioning
flow (#1486).

Resolves #1528
@tgeoghegan
Copy link
Contributor Author

All the work for this has landed. Further tweaks to this design and their implementation are tracked elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
byohelper needed for BYOHelper and self-service task provisioning (#1486)
Projects
None yet
Development

No branches or pull requests

1 participant