Skip to content

Consider relaxing type of matchValue #58

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
andyleiserson opened this issue Dec 17, 2024 · 12 comments
Open

Consider relaxing type of matchValue #58

andyleiserson opened this issue Dec 17, 2024 · 12 comments

Comments

@andyleiserson
Copy link
Collaborator

We are currently specifying that this be an integer, but if we want it to be extensible to more complex filtering in the future, possibly we should specify it as any.

@andyleiserson andyleiserson changed the title Consider relaxing type of filterData. Consider relaxing type of filterData Dec 17, 2024
@apasel422
Copy link
Collaborator

I assume that we can start with integers for now and loosen that requirement in the future backwards-compatibly?

@apasel422
Copy link
Collaborator

Making filtering more powerful than simple integer equality is useful, though it is worth pointing out what we learned from Attribution Reporting’s declarative filtering API: Users are always going to want something more general than the declarative API allows.

ARA added operations over time (first single-value equality, then “any-of” matching in lists of values, then disjunctions over multiple such lists, then negation of these matches), which has led to a complex specification and implementation that is still not completely general, and that is hard for users to express what should in essence be arbitrary boolean predicates in.

It makes sense for PPA to support declarative filtering for simple operations like integer equality between values specified on the impression and conversion sides, but once aggregates are involved (lists of values, key-value associations), or operations other than equality, it will quickly become complicated to specify how conversions do things like filterData[someField].contains(someValue) && filterData[otherField] > 5, effectively a domain-specific language representing the execution of a pure function that accepts metadata about the impression and produces either true or false to indicate whether the conversion should match it.

Before deciding whether impressions should be able to set other data types as their filterData field, I think we should consider how exactly the conversion side will operate on those values, especially given the need to sandbox these expressions for security, performance (limitations on time/memory consumption), and to avoid retention of PPA-level impression data between operations (i.e. no side-channels involving network/storage). In other words, it doesn’t seem like we could simply allow measureConversion to accept a normal JS callback in its options, as there would be no way for PPA to execute it in a manner consistent with those restrictions (at the very least, callbacks could easily exfiltrate filter data for all matching impressions).

We will also need to consider how these predicates are provided in the HTTP API, if measureConversion is ever exposed through it, making an approach based on JS callbacks alone insufficient, even if they could be executed safely.

It might be possible to do something like this with worklets (perhaps similar to Shared Storage).

@martinthomson
Copy link
Member

I have a strong desire to avoid the use of worklets in this API: hiding timing side channels in isolated processing is something fenced frames and protected audience fails at; I don't want this API to be the reason we add another such system.

I do think that we need to have a discussion about what the future of the API is in terms of querying capabilities. I know that @michaelkleber expressed a desire to turn the decision-making process into a simple inner product. Or even a matrix product followed by a vector product. This gives you a lot more power than people often appreciate, even if it is not necessarily user-friendly. The advantage would be fixed expectations about running time.

(Having that discussion here might not be ideal. I would like to rename filterData to something better (#24) and then talk about the addition of other fields that can be used to more precisely select impressions for consideration. I had thought that this could be done incrementally, but it makes sense to plot out a bigger plan. Should we open an issue for discussion at our upcoming face-to-face? We might make some progress ahead of that time, but this probably isn't where people will expect to find that discussion.)

@apasel422
Copy link
Collaborator

Maybe for now it's sufficient to allow the conversion side to specify a list of integers to match for disjunction.

@apasel422
Copy link
Collaborator

apasel422 commented Apr 4, 2025

A related question here:

Per the Web IDL standard:

The unsigned long type corresponds to 32-bit unsigned integers.

Do we think that's sufficient, even in the integer-only use case? If not, there's further complexity due to JavaScript's Number.MAX_SAFE_INTEGER and structured header integers being limited to the range [-999,999,999,999,999, 999,999,999,999,999].

@csharrison
Copy link
Collaborator

32 bits is not a lot. I would like to support the use-case where these integers can be hashes of strings if needed, so my preference would be to support either:

  1. Each API surface just specifies its own limits, and we internally support 64 bit integers
  2. We clamp JS API to the (more limited) structured header max / min

@apasel422
Copy link
Collaborator

apasel422 commented Apr 4, 2025

32 bits is not a lot. I would like to support the use-case where these integers can be hashes of strings if needed, so my preference would be to support either:

  1. Each API surface just specifies its own limits, and we internally support 64 bit integers
  2. We clamp JS API to the (more limited) structured header max / min

Another option is bigint, which implementations can clamp to their own limit. On the structured-header side, we would support either integers or string-encoded values above the integer limits.

@csharrison
Copy link
Collaborator

I would prefer avoiding string parsing on the structured header side unless we know it is needed. I think we could add bigint support later on in a backwards compatible way if needed?

@martinthomson
Copy link
Member

If we only permit subsetting in the form of an inclusion list, why would having more bits in the integer be useful?

@csharrison
Copy link
Collaborator

Upthread I mentioned a use-case:

I would like to support the use-case where these integers can be hashes of strings if needed

This is useful if you want to filter on some fields which are not already densely encoded.

@apasel422
Copy link
Collaborator

Now that the initial HTTP Save-Impression API has been specified, we should re-investigate whether we need 64 bits for match values, and, if so, figure out how to give the HTTP parity with IDL.

@martinthomson martinthomson changed the title Consider relaxing type of filterData Consider relaxing type of matchValue May 21, 2025
@martinthomson
Copy link
Member

HTTP and RFC 9651 cannot express the full range of values in an unsigned 64-bit value. We could:

  1. Constrain other interfaces to the same range of values.
  2. Simply observe that HTTP cannot address part of the potential range of values.

I lean toward (2). Sites need to coordinate their use of these values and so can coordinate to avoid parts of the space their system can't address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants