Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signature normalization, ordering fragments #2282

Closed
wants to merge 10 commits into from

Conversation

trevor-scheer
Copy link
Member

@trevor-scheer trevor-scheer commented Feb 7, 2019

The purpose of this PR is to add another level of normalization to documents when we parse and transform them.

What is missing?
Currently, query signature normalization breaks, depending on the order of operations and fragments with respect to each other within a document.

Before beginning work on this, I'd like to confirm that this is problematic in the way that I think it is. Two new test cases demonstrate an identical signature with varying test results.
Related issue submitted by @justinanastos: apollographql/apollo-tooling#986
See first commit for example test cases

Implementation is relatively straightforward. We need to sort all ExecutableDefinitionNodes within the provided AST in a deterministic manner. This can be accomplished within the visitor (thanks @abernix!).

TODO:

  • Update CHANGELOG.md with your change (include reference to issue & this PR)
  • Make sure all of the significant new logic is covered by tests
    • Self note: refine new tests
  • Rebase your changes on master so that they can be merged easily
  • Make sure all tests and linter rules pass

Sort document definitions array first by operation,
followed by an alphabetized list of fragments.

This change adds one more layer of "determinism" to how
we normalize document signatures. This is critical for
properly registering operations, as well as comparing
incoming operations against the manifest.
Copy link
Contributor

@justinanastos justinanastos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! Great job with the tests. I left a few NBD comments and some suggestions.

packages/apollo-graphql/src/__tests__/signature.test.ts Outdated Show resolved Hide resolved
packages/apollo-graphql/src/transforms.ts Outdated Show resolved Hide resolved
// sortAST sorts most multi-child nodes alphabetically. Using this as part of
// your signature calculation function may make it easier to tell the difference
// between queries that are similar to each other, and if for some reason your
// GraphQL client generates query strings with elements in nondeterministic
// order, it can make sure the queries are treated as identical.
export function sortAST(ast: DocumentNode): DocumentNode {
return visit(ast, {
const [operation] = ast.definitions.filter(isOperationDefinitionNode);
const fragments = ast.definitions.filter(isFragmentDefinitionNode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear that this is overly specific. Given the definition of DefinitionNode and ExecutableDefinitionNode:

export type DefinitionNode =
  | ExecutableDefinitionNode
  | TypeSystemDefinitionNode
  | TypeSystemExtensionNode;

export type ExecutableDefinitionNode =
  | OperationDefinitionNode
  | FragmentDefinitionNode;

I don't know what TypeSystemDefinitionNode or TypeSystemExtensionNode are, but they are legal values in a DefinitionNode. This also would be susceptible to be other new things added in the future. With that lack of knowledge, I think it would make sense to favor using "fragments and everything else" versus "fragments and operations". Thoughts?

const nonFragments = ast.definitions.filter((node) => ! isFragmentDefinitionNode(node))
const fragments = ast.definitions.filter(isFragmentDefinitionNode);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm operating under the assumption that the document will only have ExecutableDefinitionNodes else it wouldn't be a valid operation signature. I wouldn't expect the other two TypeSystemXNodes to turn up in an operation (they're used in SDL) but that may be an incorrect assumption.

Regardless of what I find, I'll add some documentation here to explain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is totally true for our use case. I don't think you can be positive that it covers all use cases for both us and consumers of apollo-graphql.

The function is called sortAST and accepts a DocumentNode; I think you need to account for all legal values that might exist in a DocumentNode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a piece off the above comment: this function is called sortAST; so it might be used to sort any kid of AST. Restricting this to DocumentNodes that only include ExecutableDefinitionNodes doesn't feel right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think giving this a more specific name, like lexicographicSortOperations (by analogy with lexicographicSortSchema might be more appropriate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filtering out non-ExecutableDefinitionNodes (as you currently do) might be ok, that is also what execute does. Alternatively, maybe we could put them all at the end?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support a different name! But I'm pretty okay with not exposing sortAST as a public API, instead only exposing abstractions that implement it (like defaultEngineReportingSignature in the adjacent index.ts)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 2ad998b

Following @martijnwalraven 's suggestion of parity with the graphql function, I've opted to rename (and effectively repurpose) this: lexicographicSortOperations.

This is worth discussing - it's a breaking change but this package is new and may not be in use. Regardless, this requires a major version bump.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abernix I'm just now seeing your comments as I had a stale page up while responding to comments. I'm open to not exporting this function publicly.

packages/apollo-graphql/src/transforms.ts Outdated Show resolved Hide resolved
packages/apollo-graphql/src/transforms.ts Outdated Show resolved Hide resolved
// sortAST sorts most multi-child nodes alphabetically. Using this as part of
// your signature calculation function may make it easier to tell the difference
// between queries that are similar to each other, and if for some reason your
// GraphQL client generates query strings with elements in nondeterministic
// order, it can make sure the queries are treated as identical.
export function sortAST(ast: DocumentNode): DocumentNode {
return visit(ast, {
const [operation] = ast.definitions.filter(isOperationDefinitionNode);
const fragments = ast.definitions.filter(isFragmentDefinitionNode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think giving this a more specific name, like lexicographicSortOperations (by analogy with lexicographicSortSchema might be more appropriate.

// sortAST sorts most multi-child nodes alphabetically. Using this as part of
// your signature calculation function may make it easier to tell the difference
// between queries that are similar to each other, and if for some reason your
// GraphQL client generates query strings with elements in nondeterministic
// order, it can make sure the queries are treated as identical.
export function sortAST(ast: DocumentNode): DocumentNode {
return visit(ast, {
const [operation] = ast.definitions.filter(isOperationDefinitionNode);
const fragments = ast.definitions.filter(isFragmentDefinitionNode);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filtering out non-ExecutableDefinitionNodes (as you currently do) might be ok, that is also what execute does. Alternatively, maybe we could put them all at the end?

Copy link
Member

@abernix abernix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for opening this, @trevor-scheer. I think this looks great, but I've left a few comments within which might strengthen it a bit more for other possible use cases and also fully leverage the AST visitor from graphql-js rather than doing traditional transformations.

We need to carefully pair this with a release of apollo-server-plugin-operation-registry and apollo CLI, but I think we can knock both of those out relatively quickly once this lands. We can land this on Monday and cut a release of apollo-graphql pretty easily, but we need to think about the Apollo Engine portion of this change a bit since we do log these hashes — they are present in existing manifests — and I'd like to try to not invalidate those that already exist. I suspect some documents with fragments are currently executing okay under the right conditions!

packages/apollo-graphql/src/transforms.ts Outdated Show resolved Hide resolved
// sortAST sorts most multi-child nodes alphabetically. Using this as part of
// your signature calculation function may make it easier to tell the difference
// between queries that are similar to each other, and if for some reason your
// GraphQL client generates query strings with elements in nondeterministic
// order, it can make sure the queries are treated as identical.
export function sortAST(ast: DocumentNode): DocumentNode {
return visit(ast, {
const [operation] = ast.definitions.filter(isOperationDefinitionNode);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think we should avoid doing direct filtering passes on the AST to account for Nodes (like FragmentDefinitionNodes) which may not exist.

Also, while the typing of this sortAST function currently accepts a DocumentNode (right now), it should theoretically be able to handle sub-ASTs as well (GraphQL's visit can actually handle more than just DocumentNode), if we ever desired to do so.

I think remaining aligned with the original behavior of this sortAST to be a direct implementation of a visitor may be still possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, totally agree! Are you recommending changes for the input type (DocumentNode) or just pointing out that we may be able to make this more flexible in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Visitor comment resolved in d341e08.
Leaving this thread open for question above.

// sortAST sorts most multi-child nodes alphabetically. Using this as part of
// your signature calculation function may make it easier to tell the difference
// between queries that are similar to each other, and if for some reason your
// GraphQL client generates query strings with elements in nondeterministic
// order, it can make sure the queries are treated as identical.
export function sortAST(ast: DocumentNode): DocumentNode {
return visit(ast, {
const [operation] = ast.definitions.filter(isOperationDefinitionNode);
const fragments = ast.definitions.filter(isFragmentDefinitionNode);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support a different name! But I'm pretty okay with not exposing sortAST as a public API, instead only exposing abstractions that implement it (like defaultEngineReportingSignature in the adjacent index.ts)

packages/apollo-graphql/src/transforms.ts Outdated Show resolved Hide resolved
packages/apollo-graphql/src/transforms.ts Outdated Show resolved Hide resolved

const astWithSortedDefinitions = {
...ast,
definitions: [operation, ...sortBy(fragments, 'name.value')],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that in order to achieve a more reliable normalized signature, we should first sort the definitions on a DocumentNode by their kind property, in a similar way to how we sort SelectionSet selections: first by kind, then name.value. This would place FragmentNodes above DocumentNodes consistently.

I believe the only Node without a name at that level should be an OperationNode in which it's optional, except for when there are multiple OperationNodes.

I tried to quickly piece together what I thought the sorting would be like in the most complex situation, and I think it'd be:

DirectiveDefinition
EnumTypeDefinitionNode
EnumTypeExtensionNode
ExecutableDefinitionNode
FragmentDefinitionNode
InputObjectTypeDefinitionNode
InputObjectTypeExtensionNode
InterfaceTypeDefinitionNode
InterfaceTypeExtensionNode
ObjectTypeDefinitionNode
ObjectTypeExtensionNode
OperationDefinitionNode
ScalarTypeDefinitionNode
ScalarTypeExtensionNode
SchemaDefinitionNode
SchemaExtensionNode
UnionTypeDefinitionNode
UnionTypeExtensionNode

It's worth noting that since we're only concerned with executable operations, based on current server implementations in which the presence of most of those Node types would cause a validation error, I think the only sorting we'll end up with that matters is a fraction of those Nodes.

definitions: [operation, ...sortBy(fragments, 'name.value')],
};

return visit(astWithSortedDefinitions, {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return visit(astWithSortedDefinitions, {
return visit(ast, {
Document(
node: DocumentNode
): DocumentNode {
return {
...node,
// Use sortBy because 'definitions' is not optional.
definitions: sortBy(node.definitions, 'kind', 'name.value'),
};
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent. See: d341e08

I'd like your take on the iteratees that I've added.

  • operation will give a nice grouping of query mutation subscription. I don't see this adding any value to the normalization itself, but I imagine it will read much more nicely. Do we expect human eyes on these at any point? I don't know the answer to that.
  • The stringification for anonymous operations - after reading your explanation above about single anonymous operations only, considering this edge case may be overkill.

output:
'mutation A{user{name...Bar...Baz}}' +
'fragment Bar on User{asd}fragment Baz on User{jkl}',
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are invaluable. Thank you so much for adding these!

I suspect we would see additional value (now or in the future) in a test which had multiple named operations in it. The spec states that if there is a single operation in a document, it need not be named, however if there are 2 or more, each must be named. I think it would be smart of us to try to incorporate tests which test that sorting behavior. Would that be something you'd entertain adding?

Currently, if a document contains multiple operations, the name of the operation for which single execution is desired should be passed in. The graphql/graphql-spec#375 RFC proposes another plan for the future, if you're interested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very helpful articulation of some behavior I was seeing during further implementation today.

I'll gladly add some tests specific to what you've mentioned (if I haven't already - will go back and take a look). I went a little crazy adding tests since I found TDD to be the fastest and best way to implement these changes :woo:!

packages/apollo-graphql/src/__tests__/signature.test.ts Outdated Show resolved Hide resolved
different equivalent (same functions, but composed differently).

This is strictly for consistency - if the composition changes, this should
be reflected in the tests. The output of defaultEngineSignature is what
we're most concerned with.
Update sorting logic to sort by:
1) node type (operation / fragment)
2) operation type (query / mutation / subscription)
3) operation name
4) a stringified version of the node (covers edge case of anonymous operations)
@trevor-scheer
Copy link
Member Author

@abernix going to leave a special comment to remind us both about discussing backcompat. Happy weekend! 🎉

@trevor-scheer
Copy link
Member Author

Closing this, see PR here for related work:
apollographql/apollo-tooling#1027

@trevor-scheer trevor-scheer deleted the trevor/signature-normalization branch March 5, 2019 17:38
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants