Skip to content

"Persisted queries" / "stored operations" #38

@benjie

Description

@benjie
Member

The terms "persisted queries" and "stored operations" are typically used interchangably. The traditional name is "persisted queries", but the term "stored operations" acknowledges that this approach can be used with mutation and subscription operations in addition to queries.

There's lots of different implementations of this feature, often with different aims. Sometimes it's used to block all but a small allow-list of operations. Sometimes it's used to reduce network bandwidth for GraphQL requests. Sometimes it's used to improve HTTP cacheability (GET requests with short URLs).

Relevant links:

Some implementations require queries to be persisted to the server in advance of receiving queries (e.g. at build time); others negotiate the storage of the query with the server for bandwidth optimisation (e.g. automatic persisted queries).

Typically persisted queries replace the query in {query, variables, operationName} with another field (typically id) that indicates an identifier for the operation - perhaps a number or hash.

Typically stored operations remove the query in {query, variables, operationName} and instead indicate the operation via a hash or similar identifier in the extensions property.

One of the optimisations that stored operations enable is that the parse and validate steps of GraphQL execution can be performed once per stored operation and cached; future requests for the same stored operation can jump straight to the execute step.

Activity

added this to the Post 1.0 release milestone on Jul 1, 2022
benjie

benjie commented on Jul 27, 2022

@benjie
MemberAuthor

@n1ru4l points out a more correct term is actually "stored documents"

n1ru4l

n1ru4l commented on Jul 27, 2022

@n1ru4l
Contributor

We are currently building GraphQL Yoga v3 plugins for Automatic persisted queries and also traditional built-time extracted persisted documents.

I see Automatic persisted queries as an upstream client-to-origin payload size optimization and stored/persisted operations/documents as a security feature for preventing the processing of arbitrary GraphQL documents.

It seems like all urql and apollo-client have built-in support for Automatic persisted queries (which itself does also not have a real specification) and the "extensions-protocol" became the de-facto standard for both APQ and persisted document implementations (see mercurius).

However, we would rather prefer to use something for persisted operations that actually follows a specification. IMHO the GraphQL over HTTP specification would be perfect for this.

In enisdenjo/graphql-ws#188 (comment) there has been a hint that this could simply be solved by introducing an id field that if set replaces the query field.

Shane32

Shane32 commented on Oct 3, 2023

@Shane32
Contributor

I may not be able to make it to the October 2023 meeting, but I'd like to supply my comments for the suggested Persisted Operations RFC. I'll provide comments from two viewpoints: as a GraphQL.NET maintainer, and as a user.

Note: see @benjie 's comments here

As a GraphQL.NET maintainer...

As a GraphQL.NET maintainer, the suggestion of storing the documentId in a property separate from query or extensions is problematic. Let me explain:

.NET is strongly typed, and GraphQL.NET includes a type that matches the form of the GraphQL request, having four properties: Query, OperationName, Variables and Extensions. GraphQL.NET Server (and GraphQL.NET Client) takes this known type and serializes/deserializes it as needed when executing requests, formatting it properly such as converting the property names to camelCase. Then the request is passed to a configurable request chain for execution.

The existing APQ code is configured by the user within this request chain, which intercepts the request, either (a) responding directly for invalid doc IDs, (b) pulling the query from the cache if the doc ID is known, or (c) passing through requests that do not contain APQ metadata. Note that this APQ middleware is entirely separate from GraphQL.NET Server, as it can be applied to any transport - most notably, WebSocket protocols.

GraphQL.NET also supports caching the parsing and validation steps of queries. This is done independently of APQ and works for both APQ queries and non-APQ queries. This is just another step in the configured execution chain.

This all works well as Apollo's APQ specification stores the APQ metadata within the extensions property. But with the documentId stored as a sibling to the query, variables, operationName and extensions properties, the server would now need to be APQ-aware in order to handle requests. In short, we would need to add DocumentId as a property to the class (visible to all code within the execution chain) just so that the Persisted Queries module could function.

Second, GraphQL.NET caches the original query when APQ is in use so that it can be used for error messages pursuant to the GraphQL specification for the location property. If the client "strips ignored tokens" prior to hashing the query, the locations of any error responses may not correctly match up to the client's original query string.

Third, @benjie wrote:

the attendees should have been using persisted operations (NOT APQ)

I'm assuming this refers to the two Apollo feature sets, both of which I believe are implemented the same at the protocol level. As such, I believe this comment lies outside the discussion here. Please correct me if I'm wrong.

As a GraphQL user...

Please keep in mind that the following are personal opinions, largely based on my specific use case:

The specification indicates that the hash should be comprised of the query "stripping ignored tokens". This is a major problem for my e-commerce site as it requires that the client contain code to parse and recreate a GraphQL Document. GraphQL's beauty is in its simplicity. All browsers, even IE, support native JSON parse and stringify operations. And a GraphQL request/response comprises of a query in the form of a static text string combined with some variables, POSTed to a server, and the response interpreted as JSON. While it is true that many people likely use Apollo's GraphQL client library (which references graphql-tag, which probably has the requisite functionality), including these additional libraries would place a much heavier bandwidth and CPU burden on the client than transmitting the queries literally without APQ. And for an e-commerce site, when Google ranks you based on how fast your webpage reaches idle, it's critical to eliminate unnecessary dependencies.

So the persisted queries specification as written in the current RFC would not be feasible for me, whereas the Apollo APQ specification would only require a small amount of additional client code. Note that all current browsers can natively perform SHA hashes, and can do so within another cpu thread as it is an asynchronous API. So the hash algorithm would have minimal effect on page download / display time.

I suppose ideally the hash would be determined at compile-time and embedded directly in to the source code of the client application. I have not investigated the difficulty required to implement this.

I also have intranet applications, where these constraints do not exist to the same extent. While using persisted queries should be a net benefit there, they also are expected to be used by users that have corporate high-bandwidth connections, and as such the additional bandwidth required per GraphQL request makes little difference.

Conclusion

While I endorse having a standardized protocol for persisted queries, I would recommend following Apollo's design choices regarding (a) persisted-query metadata stored within extensions, and (b) hash based on the literal query. My only particular compliant is that they are using the error message as an error code. I would wish that the error message would be undefined and left up to the implementation, while some other property be used to identify the APQ error. For example, the error's extension property could also contain persisted-query metadata so that clients could identify it. But if the client was not persisted-query aware, it would display a human-readable error message. However, this is a trifle; anyone sending APQ requests is likely to be APQ-aware of the responses.

JoviDeCroock

JoviDeCroock commented on Oct 3, 2023

@JoviDeCroock
Member

I appreciate the insights, thank you!

the suggestion of storing the documentId in a property separate from query or extensions is problematic

I would like to avoid extensions as that feels less like this is becoming a first-class citizen of the spec, I agree with you that for strongly typed languages this is an additional property and might become a bit harder.

The specification indicates that the hash should be comprised of the query "stripping ignored tokens". This is a major problem for my e-commerce site as it requires that the client contain code to parse and recreate a GraphQL Document

I am not married to the idea of stripping ignored tokens/... my main thinking here was that we could both verify that a SHA is valid and that it reduces potentially duplicate operations. Nothing here sais that the tokens should be stripped at runtime, I have seen approaches with GraphQL Code Generator where this is part of the generated document so it's just present at runtime without prior calculations. The verification of a valid SHA was me assuming that this would come in handy for APQ as well to prevent folks from persisting erroneous documents.

I would recommend following Apollo's design choices regarding (a) persisted-query metadata stored within extensions

My main reasoning for document_id comes from following the Relay specification here which uses doc_id as a way to convey this alongside query/...

I guess maybe, after evaluating some of these points, we need to omit APQ for that reason, encouraging folks in a good direction.

benjie

benjie commented on Oct 9, 2023

@benjie
MemberAuthor

the suggestion of storing the documentId in a property separate from query or extensions is problematic
I would like to avoid extensions as that feels less like this is becoming a first-class citizen of the spec, I agree with you that for strongly typed languages this is an additional property and might become a bit harder.

100%; we must not use the extensions property for this because that is reserved for end-user usage. If we start specifying things inside of extensions then we'd need to define a new level (extensions.extensions) that is future-safe for users to use for their own purposes - it would undo the point of having extensions in the first place (which was to make it so that GraphQL can make changes at the top level without breaking any end-user customizations of request/response).

We're actually dealing with two concretely different types of request:

type GraphQLRequest {
  query: string;
  variables?: Record<string, any>;
  operationName?: string;
  extensions?: Record<string, any>;
}
type GraphQLPersisitedDocumentRequest {
  documentId: string;
  variables?: Record<string, any>;
  operationName?: string;
  extensions?: Record<string, any>;
}

(Note query is required; documentId is required. Note: Apollo's APQ is not covered in this; but a similar proposal could be by adding query?: string to GraphQLPersisitedDocumentRequest.)

When your server receives a GraphQLPersisitedDocumentRequest, one option is to take that request, fetch the relevant document, and turn it into a GraphQLRequest to feed through to the main GraphQL service. This is, in my opinion, what we should actually specify. Servers may implement optimizations over this, but that's not something that we should specify in my opinion (and it's entirely possible to cache parsing and validation with the GraphQL document (or a hash thereof) as the key without needing persisted operations - GraphQL.NET and PostGraphile have both done this for years - as have likely loads of other servers).

So "persisted documents" is effectively a middleware in front of a GraphQL service - the incoming request is decoded as GraphQLPersisitedDocumentRequest, converted to GraphQLRequest and then fed onto the GraphQL service.

In short, we would need to add DocumentId as a property to the class (visible to all code within the execution chain) just so that the Persisted Queries module could function.

I really don't think this would be necessary; if, after reading the above, you still think it is necessary, please could you expand?

If the client "strips ignored tokens" prior to hashing the query, the locations of any error responses may not correctly match up to the client's original query string.

We should not strip the tokens; we should hash the raw document. Clients may choose to generate a minimized document and use that as the input to persisted documents; that's entirely up to them, but is not something we should specify.

the attendees should have been using persisted operations (NOT APQ)

I'm assuming this refers to the two Apollo feature sets, both of which I believe are implemented the same at the protocol level. As such, I believe this comment lies outside the discussion here. Please correct me if I'm wrong.

APQ is an Apollo feature that negotiates the storage of a query as a hash for bandwidth optimization.

Persisted documents is a general GraphQL principle that has existed much longer (independent of Apollo; in fact used inside Facebook before GraphQL was even open-sourced as far as I am aware) and has significantly greater benefits - not least in terms of security and cacheability of requests. Relay uses the documentId key for persisted documents.

I'm proposing that we specify persisted documents as a document allowlist; and then we express an equivalent of APQ as an optional extension of that, but noting that it loses a lot of the benefits. In my opinion the vast majority of GraphQL users (everyone whose GraphQL API is for their own apps and websites only; not intended to be used by third parties) should be using persisted documents, and only a subset of the remainder (e.g. GraphQL APIs designed to be used by third parties, such as the GitHub GraphQL API) should be using APQ or something similar.

The specification indicates...

Please note that RFC documents get merged without much in the way of review; this isn't even officially a Stage 0 proposal yet; it's definitely not "the specification" for persisted documents.

Relay specifies an MD5 hash: https://relay.dev/docs/guides/persisted-queries/

I think we would recommend the usage of a prefix; e.g. {"documentId": "sha256:ecf4edb46db40b5132295c0291d62fb65d6759a9eedfa4d5d612dd5ec54a6b38"} and RECOMMEND that the sha256: prefix is implemented in the interests of compatibility, but note that servers MAY support whatever document identification strategy they like so long as the same goal is achieved.

as it requires that the client contain code to parse and recreate a GraphQL Document

"Stripping ignored tokens" should 100% not be a requirement to use persisted operations; any transform that the client wishes to make to the document before hashing it/storing it should be up to them.

I should note that in most cases, for "persisted operations" (and, again, not for APQ, which lacks many of the benefits of persisted operations) the hashing of the document should take place at build time, so there is no requirement to include any additional data in your bundle other than the hashes of the operations you're using. For some clients; you don't even need to include the operations themselves any more, just the hashes, so persisted operations could make your bundle smaller!

I guess maybe, after evaluating some of these points, we need to omit APQ for that reason, encouraging folks in a good direction.

💯 If we specify APQ, it should be done with significant indication of the downsides of using it. It's a handly bandwidth optimization for public GraphQL APIs, but almost all other APIs should be using persisted documents as an operation allowlist.

benjie

benjie commented on Oct 9, 2023

@benjie
MemberAuthor

I've written up my first draft of spec edits to accommodate persisted documents: #264

(Note I've also added a definition of a "persisted operation" in that to be a "persisted document" containing exactly one operation - I think persisted operations are the most typical usage today.)

Shane32

Shane32 commented on Oct 9, 2023

@Shane32
Contributor

So "persisted documents" is effectively a middleware in front of a GraphQL service - the incoming request is decoded as GraphQLPersisitedDocumentRequest, converted to GraphQLRequest and then fed onto the GraphQL service.

In short, we would need to add DocumentId as a property to the class (visible to all code within the execution chain) just so that the Persisted Queries module could function.

I really don't think this would be necessary; if, after reading the above, you still think it is necessary, please could you expand?

I'll expand, but it's more of an implementation detail of GraphQL.NET and may not apply to other languages. We can change the design easily enough to accommodate the specification.

So far, this is how GraphQL.NET wires up requests:

Loading
graph TD;
subgraph a [Server]
A[Server deserializes request]
end
A-->|ExecutionOptions instance| C;
subgraph b [Optional middleware]
C[Configure custom validation rules]-->D
D[Handle APQ support]-->E
E[Handle document caching]-->F
F[Configure authentication]-->F2
F2["Start tracing/logging"]
end
F2-->|ExecutionOptions instance| G
subgraph c [Execution engine]
G["Parse (if applicable)"]-->H
H[Validate]-->Execute
end
Execute-->|ExecutionResult instance| Tracing
subgraph d [Optional middleware]
Tracing["Add tracing data / finish logging"]
end
Tracing-->|ExecutionResult instance| Response
subgraph e [Server]
Response[Server serializes response]
end

Middleware is configured as simple functions that accept ExecutionOptions and return ExecutionResult, such as in this simple example:

.ConfigureExecution(async (options, next) =>
{
    // record time execution started
    var start = DateTime.UtcNow;
    // enable metrics
    options.EnableMetrics = true;
    // execute next step in the chain
    var ret = await next(options).ConfigureAwait(false);
    // add apollo tracing data to execution result
    ret.EnrichWithApolloTracing(start);
    // return execution result
    return ret;
});

As is the case now, and as noted by @benjie above, the design should be that "'persisted documents' is effectively a middleware in front of a GraphQL service" - and so it would be required that DocumentId is a newly added property to the ExecutionOptions instance in the above sample, even though it cannot be used directly by the execution engine. Not a big deal if it's part of the official spec. The server will be aware of the DocumentId field during deserialization and the execution engine will reject the request if it has not been replaced by a query string (or pre-parsed document instance) by middleware.

Notice that within GraphQL.NET, the server deserializes the request as it is first received, prior to being passed to any middleware. This allows for various forms of requests, such as application/graphql or GET requests. GraphQLRequest is only used during deserialization of JSON requests, prior to it being passed to the middleware chain. As such, having a separate representation of GraphQLPersisitedDocumentRequest vs GraphQLRequest, while logical, is not useful within GraphQL.NET.

12 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @benjie@Shane32@n1ru4l@ghmcadams@JoviDeCroock

        Issue actions

          "Persisted queries" / "stored operations" · Issue #38 · graphql/graphql-over-http