-
Notifications
You must be signed in to change notification settings - Fork 66
Description
The terms "persisted queries" and "stored operations" are typically used interchangably. The traditional name is "persisted queries", but the term "stored operations" acknowledges that this approach can be used with mutation and subscription operations in addition to queries.
There's lots of different implementations of this feature, often with different aims. Sometimes it's used to block all but a small allow-list of operations. Sometimes it's used to reduce network bandwidth for GraphQL requests. Sometimes it's used to improve HTTP cacheability (GET requests with short URLs).
Relevant links:
- https://relay.dev/docs/guides/persisted-queries/
- https://www.apollographql.com/docs/apollo-server/performance/apq/
- https://github.com/graphile/persisted-operations
- https://chillicream.com/docs/hotchocolate/v13/performance/persisted-queries/
- https://chillicream.com/docs/hotchocolate/v13/performance/automatic-persisted-queries
- https://github.com/restorando/graphql-query-whitelist
Some implementations require queries to be persisted to the server in advance of receiving queries (e.g. at build time); others negotiate the storage of the query with the server for bandwidth optimisation (e.g. automatic persisted queries).
Typically persisted queries replace the query
in {query, variables, operationName}
with another field (typically id
) that indicates an identifier for the operation - perhaps a number or hash.
Typically stored operations remove the query
in {query, variables, operationName}
and instead indicate the operation via a hash or similar identifier in the extensions
property.
One of the optimisations that stored operations enable is that the parse
and validate
steps of GraphQL execution can be performed once per stored operation and cached; future requests for the same stored operation can jump straight to the execute
step.
Activity
benjie commentedon Jul 27, 2022
@n1ru4l points out a more correct term is actually "stored documents"
n1ru4l commentedon Jul 27, 2022
We are currently building GraphQL Yoga v3 plugins for Automatic persisted queries and also traditional built-time extracted persisted documents.
I see Automatic persisted queries as an upstream client-to-origin payload size optimization and stored/persisted operations/documents as a security feature for preventing the processing of arbitrary GraphQL documents.
It seems like all urql and apollo-client have built-in support for Automatic persisted queries (which itself does also not have a real specification) and the "extensions-protocol" became the de-facto standard for both APQ and persisted document implementations (see mercurius).
However, we would rather prefer to use something for persisted operations that actually follows a specification. IMHO the GraphQL over HTTP specification would be perfect for this.
In enisdenjo/graphql-ws#188 (comment) there has been a hint that this could simply be solved by introducing an
id
field that if set replaces thequery
field.Shane32 commentedon Oct 3, 2023
I may not be able to make it to the October 2023 meeting, but I'd like to supply my comments for the suggested Persisted Operations RFC. I'll provide comments from two viewpoints: as a GraphQL.NET maintainer, and as a user.
Note: see @benjie 's comments here
As a GraphQL.NET maintainer...
As a GraphQL.NET maintainer, the suggestion of storing the
documentId
in a property separate fromquery
orextensions
is problematic. Let me explain:.NET is strongly typed, and GraphQL.NET includes a type that matches the form of the GraphQL request, having four properties:
Query
,OperationName
,Variables
andExtensions
. GraphQL.NET Server (and GraphQL.NET Client) takes this known type and serializes/deserializes it as needed when executing requests, formatting it properly such as converting the property names to camelCase. Then the request is passed to a configurable request chain for execution.The existing APQ code is configured by the user within this request chain, which intercepts the request, either (a) responding directly for invalid doc IDs, (b) pulling the query from the cache if the doc ID is known, or (c) passing through requests that do not contain APQ metadata. Note that this APQ middleware is entirely separate from GraphQL.NET Server, as it can be applied to any transport - most notably, WebSocket protocols.
GraphQL.NET also supports caching the parsing and validation steps of queries. This is done independently of APQ and works for both APQ queries and non-APQ queries. This is just another step in the configured execution chain.
This all works well as Apollo's APQ specification stores the APQ metadata within the
extensions
property. But with thedocumentId
stored as a sibling to thequery
,variables
,operationName
andextensions
properties, the server would now need to be APQ-aware in order to handle requests. In short, we would need to addDocumentId
as a property to the class (visible to all code within the execution chain) just so that the Persisted Queries module could function.Second, GraphQL.NET caches the original query when APQ is in use so that it can be used for error messages pursuant to the GraphQL specification for the
location
property. If the client "strips ignored tokens" prior to hashing the query, thelocations
of any error responses may not correctly match up to the client's original query string.Third, @benjie wrote:
I'm assuming this refers to the two Apollo feature sets, both of which I believe are implemented the same at the protocol level. As such, I believe this comment lies outside the discussion here. Please correct me if I'm wrong.
As a GraphQL user...
Please keep in mind that the following are personal opinions, largely based on my specific use case:
The specification indicates that the hash should be comprised of the query "stripping ignored tokens". This is a major problem for my e-commerce site as it requires that the client contain code to parse and recreate a GraphQL Document. GraphQL's beauty is in its simplicity. All browsers, even IE, support native JSON parse and stringify operations. And a GraphQL request/response comprises of a query in the form of a static text string combined with some variables, POSTed to a server, and the response interpreted as JSON. While it is true that many people likely use Apollo's GraphQL client library (which references
graphql-tag
, which probably has the requisite functionality), including these additional libraries would place a much heavier bandwidth and CPU burden on the client than transmitting the queries literally without APQ. And for an e-commerce site, when Google ranks you based on how fast your webpage reaches idle, it's critical to eliminate unnecessary dependencies.So the persisted queries specification as written in the current RFC would not be feasible for me, whereas the Apollo APQ specification would only require a small amount of additional client code. Note that all current browsers can natively perform SHA hashes, and can do so within another cpu thread as it is an asynchronous API. So the hash algorithm would have minimal effect on page download / display time.
I suppose ideally the hash would be determined at compile-time and embedded directly in to the source code of the client application. I have not investigated the difficulty required to implement this.
I also have intranet applications, where these constraints do not exist to the same extent. While using persisted queries should be a net benefit there, they also are expected to be used by users that have corporate high-bandwidth connections, and as such the additional bandwidth required per GraphQL request makes little difference.
Conclusion
While I endorse having a standardized protocol for persisted queries, I would recommend following Apollo's design choices regarding (a) persisted-query metadata stored within
extensions
, and (b) hash based on the literal query. My only particular compliant is that they are using the error message as an error code. I would wish that the error message would be undefined and left up to the implementation, while some other property be used to identify the APQ error. For example, the error's extension property could also contain persisted-query metadata so that clients could identify it. But if the client was not persisted-query aware, it would display a human-readable error message. However, this is a trifle; anyone sending APQ requests is likely to be APQ-aware of the responses.JoviDeCroock commentedon Oct 3, 2023
I appreciate the insights, thank you!
I would like to avoid
extensions
as that feels less like this is becoming a first-class citizen of the spec, I agree with you that for strongly typed languages this is an additional property and might become a bit harder.I am not married to the idea of stripping ignored tokens/... my main thinking here was that we could both verify that a SHA is valid and that it reduces potentially duplicate operations. Nothing here sais that the tokens should be stripped at runtime, I have seen approaches with
GraphQL Code Generator
where this is part of the generated document so it's just present at runtime without prior calculations. The verification of a valid SHA was me assuming that this would come in handy for APQ as well to prevent folks from persisting erroneous documents.My main reasoning for
document_id
comes from following the Relay specification here which usesdoc_id
as a way to convey this alongsidequery
/...I guess maybe, after evaluating some of these points, we need to omit APQ for that reason, encouraging folks in a good direction.
benjie commentedon Oct 9, 2023
100%; we must not use the
extensions
property for this because that is reserved for end-user usage. If we start specifying things inside ofextensions
then we'd need to define a new level (extensions.extensions
) that is future-safe for users to use for their own purposes - it would undo the point of havingextensions
in the first place (which was to make it so that GraphQL can make changes at the top level without breaking any end-user customizations of request/response).We're actually dealing with two concretely different types of request:
(Note
query
is required;documentId
is required. Note: Apollo's APQ is not covered in this; but a similar proposal could be by addingquery?: string
toGraphQLPersisitedDocumentRequest
.)When your server receives a
GraphQLPersisitedDocumentRequest
, one option is to take that request, fetch the relevant document, and turn it into aGraphQLRequest
to feed through to the main GraphQL service. This is, in my opinion, what we should actually specify. Servers may implement optimizations over this, but that's not something that we should specify in my opinion (and it's entirely possible to cache parsing and validation with the GraphQL document (or a hash thereof) as the key without needing persisted operations - GraphQL.NET and PostGraphile have both done this for years - as have likely loads of other servers).So "persisted documents" is effectively a middleware in front of a GraphQL service - the incoming request is decoded as
GraphQLPersisitedDocumentRequest
, converted toGraphQLRequest
and then fed onto the GraphQL service.I really don't think this would be necessary; if, after reading the above, you still think it is necessary, please could you expand?
We should not strip the tokens; we should hash the raw document. Clients may choose to generate a minimized document and use that as the input to persisted documents; that's entirely up to them, but is not something we should specify.
APQ is an Apollo feature that negotiates the storage of a query as a hash for bandwidth optimization.
Persisted documents is a general GraphQL principle that has existed much longer (independent of Apollo; in fact used inside Facebook before GraphQL was even open-sourced as far as I am aware) and has significantly greater benefits - not least in terms of security and cacheability of requests. Relay uses the
documentId
key for persisted documents.I'm proposing that we specify persisted documents as a document allowlist; and then we express an equivalent of APQ as an optional extension of that, but noting that it loses a lot of the benefits. In my opinion the vast majority of GraphQL users (everyone whose GraphQL API is for their own apps and websites only; not intended to be used by third parties) should be using persisted documents, and only a subset of the remainder (e.g. GraphQL APIs designed to be used by third parties, such as the GitHub GraphQL API) should be using APQ or something similar.
Please note that RFC documents get merged without much in the way of review; this isn't even officially a Stage 0 proposal yet; it's definitely not "the specification" for persisted documents.
Relay specifies an MD5 hash: https://relay.dev/docs/guides/persisted-queries/
I think we would recommend the usage of a prefix; e.g.
{"documentId": "sha256:ecf4edb46db40b5132295c0291d62fb65d6759a9eedfa4d5d612dd5ec54a6b38"}
andRECOMMEND
that thesha256:
prefix is implemented in the interests of compatibility, but note that serversMAY
support whatever document identification strategy they like so long as the same goal is achieved."Stripping ignored tokens" should 100% not be a requirement to use persisted operations; any transform that the client wishes to make to the document before hashing it/storing it should be up to them.
I should note that in most cases, for "persisted operations" (and, again, not for APQ, which lacks many of the benefits of persisted operations) the hashing of the document should take place at build time, so there is no requirement to include any additional data in your bundle other than the hashes of the operations you're using. For some clients; you don't even need to include the operations themselves any more, just the hashes, so persisted operations could make your bundle smaller!
💯 If we specify APQ, it should be done with significant indication of the downsides of using it. It's a handly bandwidth optimization for public GraphQL APIs, but almost all other APIs should be using persisted documents as an operation allowlist.
benjie commentedon Oct 9, 2023
I've written up my first draft of spec edits to accommodate persisted documents: #264
(Note I've also added a definition of a "persisted operation" in that to be a "persisted document" containing exactly one operation - I think persisted operations are the most typical usage today.)
Shane32 commentedon Oct 9, 2023
I'll expand, but it's more of an implementation detail of GraphQL.NET and may not apply to other languages. We can change the design easily enough to accommodate the specification.
So far, this is how GraphQL.NET wires up requests:
Middleware is configured as simple functions that accept
ExecutionOptions
and returnExecutionResult
, such as in this simple example:As is the case now, and as noted by @benjie above, the design should be that "'persisted documents' is effectively a middleware in front of a GraphQL service" - and so it would be required that
DocumentId
is a newly added property to theExecutionOptions
instance in the above sample, even though it cannot be used directly by the execution engine. Not a big deal if it's part of the official spec. The server will be aware of theDocumentId
field during deserialization and the execution engine will reject the request if it has not been replaced by a query string (or pre-parsed document instance) by middleware.Notice that within GraphQL.NET, the server deserializes the request as it is first received, prior to being passed to any middleware. This allows for various forms of requests, such as
application/graphql
or GET requests.GraphQLRequest
is only used during deserialization of JSON requests, prior to it being passed to the middleware chain. As such, having a separate representation ofGraphQLPersisitedDocumentRequest
vsGraphQLRequest
, while logical, is not useful within GraphQL.NET.12 remaining items