Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How lookahead and queries work #522

Closed
ab-pm opened this issue Sep 16, 2019 · 7 comments
Closed

How lookahead and queries work #522

ab-pm opened this issue Sep 16, 2019 · 7 comments

Comments

@ab-pm
Copy link
Contributor

ab-pm commented Sep 16, 2019

…in greater detail

Originally posted by @benjie in graphile/crystal#387 (comment):

When it comes to actually implementing the lookahead, it comes down to getDataFromParsedResolveInforFragment's second argument.

I don't want to derail the discussion about implementing interfaces over there, so I'll start a new thread.

I've already seen the graphile-engine lookahead documentation (describing parseResolveInfo and simplifyParsedResolveInfoFragmentWithType), but there's some open questions:

  • What exactly is the format of a data generator? From what I've seen they're either objects with a map property or objects with a pgQuery property (apparently containing a callback to modify a QueryBuilder)
  • That getDataFromParsedResolveInforFragment actually is the function that runs the data generators was not obvious from the name, and not well-documented. I'd also like to see such descriptions in the code comments - or a link to the graphile.org documentation page to avoid duplication.
    I see the docs use the term "meta-data", maybe it would be appropriate to rename the methods to addFieldMetadata, addFieldArgMetadata, and getMetadataFromParsedResolveInfo? getData… at first sounded as if it was already fetching the data from the db, then a second guess was that it constructed a grapqhl resolver ("makeDataGetter"), before it hit me when looking at the implementation
  • The next issue is where and how the metadata is used further. Most of the plugins whose code I studied do use the pgQueryFromResolveData function, which is implemented here and added to the build object here. It seems to create the actual SQL (fragment), used either in subqueries of other data generators (e.g. here or there), or in the root queries (e.g. here or there where the sql query is subsequently passed to sql.compile and then to pgPrepareAndRun which actually executes it.
    In a makeExtendSchemaPlugin however one is advised to use selectGraphQLResultFromTable method which does all of the above in one step, but executes a normal pgClient.query not a prepared statement.
    I think I got this now, would be nice if this was explained somewhere. (Where?)
  • WTH is selectGraphQLResultFromTable put on a .graphile property on the resolveInfo object that is passed to the user-supplied resolver? Wouldn't this fit much, much better on the context (where all the other [post]graphile helper methods live)? Especially since it uses the pgClient from that very context, being a closure over it.
  • The QueryBuilder is documented on the makeExtendSchemaPlugin page, I kinda had expected a separate page in the graphile-build docs.
    The parentQueryBuilder appears like a hack, where is this really used? It seems to be set whenever a nested query is constructed, but via direct assignment - I kinda would have expected a setParent() method (that e.g. checks for overwrites of an existing parent).
    Why do the select and orderBy methods take callbacks?
    And what do all those internal methods (that one shouldn't call) do? I'd expect some comments in the code on how this "locking" scheme works and what it is used for.
  • Are any of these steps cached? Building the SQL query on every request seems like it could have considerable performance impact, although it should be the same for each GraphQL query.
    Could these be stored on the parsed queries that PostGraphile caches? Probably no, because the ArgData generators are a little different every time, because the pg-sql2 fragments contain the variable values themselves, and because the heavy use of closures in the builder allows lots of context stuff to leak inside the queries (instead of them being purely built from the query). But maybe this would be an idea for a V5 architecture :-)
@ab-pm ab-pm changed the title How lookahead works (WIP) How lookahead works Sep 16, 2019
@benjie
Copy link
Member

benjie commented Sep 16, 2019

Should this be in the docs repo? https://github.com/graphile/graphile.github.io

@benjie
Copy link
Member

benjie commented Sep 16, 2019

Added to the megathread: graphile/graphile.github.io#153 (comment)

@ab-pm ab-pm changed the title (WIP) How lookahead works How lookahead and queries work Sep 16, 2019
@ab-pm
Copy link
Contributor Author

ab-pm commented Sep 16, 2019

@benjie Some things might have implications on the docs repo, yeah, but I wanted to discuss a few points first which also might have effects on the engine code.

@benjie
Copy link
Member

benjie commented Sep 16, 2019

  • What exactly is the format of a data generator? From what I've seen they're either objects with a map property or objects with a pgQuery property (apparently containing a callback to modify a QueryBuilder)

It's a generic map {[key: string]: any} that any plugins can hook into and register properties on that are useful.

We use pgQuery extensively, but there's quite a few other bits of data we register internally. Here's some of them being used in queryFromResolveData:

const {
pgQuery,
pgAggregateQuery,
pgCursorPrefix: reallyRawCursorPrefix,
pgDontUseAsterisk,
calculateHasNextPage,
calculateHasPreviousPage,
usesCursor: explicitlyUsesCursor,
} = resolveData;

Remember that graphile-build is completely database independent, so though it created the data generator interface it knows nothing about PostgreSQL. graphile-build-pg uses these custom entries, but other plugins are free to use their own data. There is no central registry, however we do encourage the use of namespaces in the name, e.g. graphile-build-pg uses the pg namespace.

  • That getDataFromParsedResolveInfoFragment actually is the function that runs the data generators was not obvious from the name, and not well-documented. I'd also like to see such descriptions in the code comments - or a link to the graphile.org documentation page to avoid duplication.

PRs welcome 👍

I see the docs use the term "meta-data", maybe it would be appropriate to rename the methods to addFieldMetadata, addFieldArgMetadata, and getMetadataFromParsedResolveInfo? getData… at first sounded as if it was already fetching the data from the db, then a second guess was that it constructed a grapqhl resolver ("makeDataGetter"), before it hit me when looking at the implementation

Though maybe we should have called this metadata from the start, doing so now would be a breaking change.

  • The next issue is where and how the metadata is used further. Most of the plugins whose code I studied do use the pgQueryFromResolveData function, which is implemented here and added to the build object here. It seems to create the actual SQL (fragment), used either in subqueries of other data generators (e.g. here or there), or in the root queries (e.g. here or there where the sql query is subsequently passed to sql.compile and then to pgPrepareAndRun which actually executes it.

Yes, this has evolved over time.

In a makeExtendSchemaPlugin however one is advised to use selectGraphQLResultFromTable method which does all of the above in one step, but executes a normal pgClient.query not a prepared statement.

Currently. pgPrepareAndRun is very recent, more recent than makeExtendSchemaPlugin.

I think I got this now, would be nice if this was explained somewhere. (Where?)

Great. Not sure; where would you have looked for this? Most users don't need to know this level of detail, so it shouldn't be too prominent as it could be overwhelming. This is why the Graphile Engine and PostGraphile docs are separate.

  • WTH is selectGraphQLResultFromTable put on a .graphile property on the resolveInfo object that is passed to the user-supplied resolver? Wouldn't this fit much, much better on the context (where all the other [post]graphile helper methods live)? Especially since it uses the pgClient from that very context, being a closure over it.

No; context is user controlled, we should not be manipulating it. This is why we have an external utility to help generate a compatible context, but that utility lives outside of the schema generation and knows nothing of the schema internals.

The info (or resolveInfo) parameter, on the other hand, is ripe for customisation. This is how other projects in the GraphQL.js ecosystem handle it also. It's a standard approach.

selectGraphQLResultFromTable is a helper function so the end user needs to know less of the internals of the project to be productive. This is true of much of the graphile-utils package.

AFAIK the graphile-build docs don't have any PostgreSQL-specific functionality documented. QueryBuilder is entirely contained in graphile-build-pg - the PostgreSQL-concerned package. This is currently documented under PostGraphile because it doesn't currently have it's own documentation.

The parentQueryBuilder appears like a hack, where is this really used? It seems to be set whenever a nested query is constructed, but via direct assignment - I kinda would have expected a setParent() method (that e.g. checks for overwrites of an existing parent).

Yes, it's a backwards-compatible addition that was necessary to enable certain advanced features that were not included in 4.0.0. The same is true for many other fields that were added after 4.0.0 was released. It is used when a subquery needs access to the parent query, for examples see https://www.graphile.org/postgraphile/make-extend-schema-plugin/

Why do the select and orderBy methods take callbacks?

Everything on QueryBuilder takes callback. Because they don't become permanent until the QueryBuilder is ready to be finalised, because different plugins add different things at different times and they're all inter-dependent. See the lock method that's called when you request certain final features of the query builder.

And what do all those internal methods (that one shouldn't call) do? I'd expect some comments in the code on how this "locking" scheme works and what it is used for.

They help build the query. The QueryBuilder is mutable, but once you've asked for it to build you wouldn't want it to change again as that would be misleading. As such, it has a dependency system and a locking system that prevent bad things happening. Sometimes fields depend on things that haven't been defined yet, which is why we have beforeLock which can quickly add any other select fields or orders, etc as needed.

  • Are any of these steps cached? Building the SQL query on every request seems like it could have considerable performance impact, although it should be the same for each GraphQL query.

A lot of optimisation has gone into building the SQL query. It is now extremely efficient.

Unfortunately we cannot cache it currently, as some clauses might be dependent on the context (which is library-user controlled) or on the value of variables (which is end-user controlled). Instead we've focussed on making sure it's very fast - a lot of optimisation has gone into it.

Could these be stored on the parsed queries that PostGraphile caches? Probably no, because the ArgData generators are a little different every time, because the pg-sql2 fragments contain the variable values themselves, and because the heavy use of closures in the builder allows lots of context stuff to leak inside the queries (instead of them being purely built from the query). But maybe this would be an idea for a V5 architecture :-)

It's certainly something I've given a lot of thought to.

@ab-pm
Copy link
Contributor Author

ab-pm commented Sep 16, 2019

Remember that graphile-build is completely database independent, so the format of a data generator is a generic map {[key: string]: any} that any plugins can hook into and register properties on that are useful.

I see, thanks for the link to queryFromResolveData. Is that the only metadata usage? I guess I was a bit confused by this usage (from a closer look the map property seems to pluck value values from the dummy data in the test case) and also the generic documentation which uses map or limit.

Great. Not sure where to explain this; where would you have looked for this? Most users don't need to know this level of detail, so it shouldn't be too prominent as it could be overwhelming. This is why the Graphile Engine and PostGraphile docs are separate.

Yeah, I'm not sure either. I guess the Graphile Engine docs are what only powerusers would read, but then they shouldn't be Postgraphile-specific. Maybe still add a section "For example, this is how this feature is used in Postgraphile", with links to the code?

Alternatively, the page https://www.graphile.org/postgraphile/extending-raw/ seems to be the best to explain everything that one needs to write raw plugins.

We could also add a "Internals" page to the "Operations" section, where it has a walkthrough of the important parts of the code and how everything is linked together.

Similar for the hooks description.

context is user controlled, we should not be manipulating it.
The info (or resolveInfo) parameter, on the other hand, is ripe for customisation. This is how other projects in the GraphQL.js ecosystem handle it also - it's a standard approach.

OK, I didn't know that. I thought the resolveInfo came from the query, the context came from the framework.

We do have an external utility to help generate a compatible context, but that utility lives outside of the schema generation and knows nothing of the schema internals.

That's another thing I guess that needs better (unified?) documentation, which options from where go into what callbacks. I'll try to come up with a list.

The parentQueryBuilder i's a backwards-compatible addition that was necessary to enable certain advanced features that were not included in 4.0.0. It is used when a subquery needs access to the parent query, for examples see https://www.graphile.org/postgraphile/make-extend-schema-plugin/

OK, I don't know about pre-4.0 PostGraphile, I guess it's not so important anyway. But it's good to know that this is explicitly designed to be used only in custom plugins, I was confused a bit why the core pg-plugins didn't use this feature.

Everything on QueryBuilder takes callback.

…but the methods also take plain values. The docs (limit and offset) are a bit inconsistent there. PR tomorrow :-)

Sometimes fields depend on things that haven't been defined yet, which is why we have beforeLock which can quickly add any other select fields or orders, etc as needed.

OK, that's advanced stuff. From a cursory glance, it seems like this is currently used only for beforeLock("orderBy", …), adding a default order for connections.

@benjie
Copy link
Member

benjie commented Sep 16, 2019

Remember that graphile-build is completely database independent, so the format of a data generator is a generic map {[key: string]: any} that any plugins can hook into and register properties on that are useful.

I see, thanks for the link to queryFromResolveData. Is that the only metadata usage? I guess I was a bit confused by this usage (from a closer look the map property seems to pluck value values from the dummy data in the test case) and also the generic documentation which uses map or limit.

You can use it however you want.

Great. Not sure where to explain this; where would you have looked for this? Most users don't need to know this level of detail, so it shouldn't be too prominent as it could be overwhelming. This is why the Graphile Engine and PostGraphile docs are separate.

Yeah, I'm not sure either. I guess the Graphile Engine docs are what only powerusers would read, but then they shouldn't be Postgraphile-specific. Maybe still add a section "For example, this is how this feature is used in Postgraphile", with links to the code?

Sounds good 👍 Examples are always good. If you link to code, make sure it's a permalink (press y on your keyboard when viewing code on GitHub to get the permalink).

Alternatively, the page graphile.org/postgraphile/extending-raw seems to be the best to explain everything that one needs to write raw plugins.

Also a good location. Or you could make subpages or even an entire subsection for it.

We could also add a "Internals" page to the "Operations" section, where it has a walkthrough of the important parts of the code and how everything is linked together.

Sure; it's rare that adding more docs/links will be the wrong thing to do.

Similar for the hooks description.

Yes, this definitely needs updating.

context is user controlled, we should not be manipulating it.
The info (or resolveInfo) parameter, on the other hand, is ripe for customisation. This is how other projects in the GraphQL.js ecosystem handle it also - it's a standard approach.

OK, I didn't know that. I thought the resolveInfo came from the query, the context came from the framework.

GraphQLResolveInfo is basically metadata that graphql(.js) itself adds.

We do have an external utility to help generate a compatible context, but that utility lives outside of the schema generation and knows nothing of the schema internals.

That's another thing I guess that needs better (unified?) documentation, which options from where go into what callbacks. I'll try to come up with a list.

That's withPostGraphileContext which is currently documented here:

https://www.graphile.org/postgraphile/usage-schema/

Everything on QueryBuilder takes callback.

…but the methods also take plain values. The docs (limit and offset) are a bit inconsistent there. PR tomorrow :-)

👍

This is why we have a callIfNecessary function. You only use the function version if there's a conflict/dependency going on.

Sometimes fields depend on things that haven't been defined yet, which is why we have beforeLock which can quickly add any other select fields or orders, etc as needed.

OK, that's advanced stuff. From a cursory glance, it seems like this is currently used only for beforeLock("orderBy", …), adding a default order for connections.

That's one place; but I'm certain there are many others. I can't remember what they are off-hand. They might be in external plugins.

@benjie
Copy link
Member

benjie commented Nov 8, 2019

[semi-automated message] Thanks for your question; hopefully we're well on the way to helping you solve your issue. This doesn't currently seem to be a bug in the library so I'm going to close the issue, but please feel free to keep requesting help below and if it does turn out to be a bug we can definitely re-open it 👍

You can also ask for help in the #help-and-support channel in our Discord chat.

@benjie benjie closed this as completed Nov 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants