How lookahead and queries work #522

ab-pm · 2019-09-16T09:10:32Z

…in greater detail

Originally posted by @benjie in graphile/crystal#387 (comment):

When it comes to actually implementing the lookahead, it comes down to getDataFromParsedResolveInforFragment's second argument.

I don't want to derail the discussion about implementing interfaces over there, so I'll start a new thread.

I've already seen the graphile-engine lookahead documentation (describing parseResolveInfo and simplifyParsedResolveInfoFragmentWithType), but there's some open questions:

What exactly is the format of a data generator? From what I've seen they're either objects with a map property or objects with a pgQuery property (apparently containing a callback to modify a QueryBuilder)
That getDataFromParsedResolveInforFragment actually is the function that runs the data generators was not obvious from the name, and not well-documented. I'd also like to see such descriptions in the code comments - or a link to the graphile.org documentation page to avoid duplication.
I see the docs use the term "meta-data", maybe it would be appropriate to rename the methods to addFieldMetadata, addFieldArgMetadata, and getMetadataFromParsedResolveInfo? getData… at first sounded as if it was already fetching the data from the db, then a second guess was that it constructed a grapqhl resolver ("makeDataGetter"), before it hit me when looking at the implementation
The next issue is where and how the metadata is used further. Most of the plugins whose code I studied do use the pgQueryFromResolveData function, which is implemented here and added to the build object here. It seems to create the actual SQL (fragment), used either in subqueries of other data generators (e.g. here or there), or in the root queries (e.g. here or there where the sql query is subsequently passed to sql.compile and then to pgPrepareAndRun which actually executes it.
In a makeExtendSchemaPlugin however one is advised to use selectGraphQLResultFromTable method which does all of the above in one step, but executes a normal pgClient.query not a prepared statement.
I think I got this now, would be nice if this was explained somewhere. (Where?)
WTH is selectGraphQLResultFromTable put on a .graphile property on the resolveInfo object that is passed to the user-supplied resolver? Wouldn't this fit much, much better on the context (where all the other [post]graphile helper methods live)? Especially since it uses the pgClient from that very context, being a closure over it.
The QueryBuilder is documented on the makeExtendSchemaPlugin page, I kinda had expected a separate page in the graphile-build docs.
The parentQueryBuilder appears like a hack, where is this really used? It seems to be set whenever a nested query is constructed, but via direct assignment - I kinda would have expected a setParent() method (that e.g. checks for overwrites of an existing parent).
Why do the select and orderBy methods take callbacks?
And what do all those internal methods (that one shouldn't call) do? I'd expect some comments in the code on how this "locking" scheme works and what it is used for.
Are any of these steps cached? Building the SQL query on every request seems like it could have considerable performance impact, although it should be the same for each GraphQL query.
Could these be stored on the parsed queries that PostGraphile caches? Probably no, because the ArgData generators are a little different every time, because the pg-sql2 fragments contain the variable values themselves, and because the heavy use of closures in the builder allows lots of context stuff to leak inside the queries (instead of them being purely built from the query). But maybe this would be an idea for a V5 architecture :-)

The text was updated successfully, but these errors were encountered:

benjie · 2019-09-16T12:03:45Z

Should this be in the docs repo? https://github.com/graphile/graphile.github.io

benjie · 2019-09-16T12:05:56Z

Added to the megathread: graphile/graphile.github.io#153 (comment)

ab-pm · 2019-09-16T12:43:51Z

@benjie Some things might have implications on the docs repo, yeah, but I wanted to discuss a few points first which also might have effects on the engine code.

benjie · 2019-09-16T14:17:16Z

What exactly is the format of a data generator? From what I've seen they're either objects with a map property or objects with a pgQuery property (apparently containing a callback to modify a QueryBuilder)

It's a generic map {[key: string]: any} that any plugins can hook into and register properties on that are useful.

We use pgQuery extensively, but there's quite a few other bits of data we register internally. Here's some of them being used in queryFromResolveData:

graphile-engine/packages/graphile-build-pg/src/queryFromResolveDataFactory.js

Lines 36 to 44 in 2335d6e

    
           const { 
        
             pgQuery, 
        
             pgAggregateQuery, 
        
             pgCursorPrefix: reallyRawCursorPrefix, 
        
             pgDontUseAsterisk, 
        
             calculateHasNextPage, 
        
             calculateHasPreviousPage, 
        
             usesCursor: explicitlyUsesCursor, 
        
           } = resolveData;

Remember that graphile-build is completely database independent, so though it created the data generator interface it knows nothing about PostgreSQL. graphile-build-pg uses these custom entries, but other plugins are free to use their own data. There is no central registry, however we do encourage the use of namespaces in the name, e.g. graphile-build-pg uses the pg namespace.

That getDataFromParsedResolveInfoFragment actually is the function that runs the data generators was not obvious from the name, and not well-documented. I'd also like to see such descriptions in the code comments - or a link to the graphile.org documentation page to avoid duplication.

PRs welcome 👍

I see the docs use the term "meta-data", maybe it would be appropriate to rename the methods to addFieldMetadata, addFieldArgMetadata, and getMetadataFromParsedResolveInfo? getData… at first sounded as if it was already fetching the data from the db, then a second guess was that it constructed a grapqhl resolver ("makeDataGetter"), before it hit me when looking at the implementation

Though maybe we should have called this metadata from the start, doing so now would be a breaking change.

The next issue is where and how the metadata is used further. Most of the plugins whose code I studied do use the pgQueryFromResolveData function, which is implemented here and added to the build object here. It seems to create the actual SQL (fragment), used either in subqueries of other data generators (e.g. here or there), or in the root queries (e.g. here or there where the sql query is subsequently passed to sql.compile and then to pgPrepareAndRun which actually executes it.

Yes, this has evolved over time.

In a makeExtendSchemaPlugin however one is advised to use selectGraphQLResultFromTable method which does all of the above in one step, but executes a normal pgClient.query not a prepared statement.

Currently. pgPrepareAndRun is very recent, more recent than makeExtendSchemaPlugin.

I think I got this now, would be nice if this was explained somewhere. (Where?)

Great. Not sure; where would you have looked for this? Most users don't need to know this level of detail, so it shouldn't be too prominent as it could be overwhelming. This is why the Graphile Engine and PostGraphile docs are separate.

WTH is selectGraphQLResultFromTable put on a .graphile property on the resolveInfo object that is passed to the user-supplied resolver? Wouldn't this fit much, much better on the context (where all the other [post]graphile helper methods live)? Especially since it uses the pgClient from that very context, being a closure over it.

No; context is user controlled, we should not be manipulating it. This is why we have an external utility to help generate a compatible context, but that utility lives outside of the schema generation and knows nothing of the schema internals.

The info (or resolveInfo) parameter, on the other hand, is ripe for customisation. This is how other projects in the GraphQL.js ecosystem handle it also. It's a standard approach.

selectGraphQLResultFromTable is a helper function so the end user needs to know less of the internals of the project to be productive. This is true of much of the graphile-utils package.

The QueryBuilder is documented on the makeExtendSchemaPlugin page, I kinda had expected a separate page in the graphile-build docs.

AFAIK the graphile-build docs don't have any PostgreSQL-specific functionality documented. QueryBuilder is entirely contained in graphile-build-pg - the PostgreSQL-concerned package. This is currently documented under PostGraphile because it doesn't currently have it's own documentation.

The parentQueryBuilder appears like a hack, where is this really used? It seems to be set whenever a nested query is constructed, but via direct assignment - I kinda would have expected a setParent() method (that e.g. checks for overwrites of an existing parent).

Yes, it's a backwards-compatible addition that was necessary to enable certain advanced features that were not included in 4.0.0. The same is true for many other fields that were added after 4.0.0 was released. It is used when a subquery needs access to the parent query, for examples see https://www.graphile.org/postgraphile/make-extend-schema-plugin/

Why do the select and orderBy methods take callbacks?

Everything on QueryBuilder takes callback. Because they don't become permanent until the QueryBuilder is ready to be finalised, because different plugins add different things at different times and they're all inter-dependent. See the lock method that's called when you request certain final features of the query builder.

And what do all those internal methods (that one shouldn't call) do? I'd expect some comments in the code on how this "locking" scheme works and what it is used for.

They help build the query. The QueryBuilder is mutable, but once you've asked for it to build you wouldn't want it to change again as that would be misleading. As such, it has a dependency system and a locking system that prevent bad things happening. Sometimes fields depend on things that haven't been defined yet, which is why we have beforeLock which can quickly add any other select fields or orders, etc as needed.

Are any of these steps cached? Building the SQL query on every request seems like it could have considerable performance impact, although it should be the same for each GraphQL query.

A lot of optimisation has gone into building the SQL query. It is now extremely efficient.

Unfortunately we cannot cache it currently, as some clauses might be dependent on the context (which is library-user controlled) or on the value of variables (which is end-user controlled). Instead we've focussed on making sure it's very fast - a lot of optimisation has gone into it.

Could these be stored on the parsed queries that PostGraphile caches? Probably no, because the ArgData generators are a little different every time, because the pg-sql2 fragments contain the variable values themselves, and because the heavy use of closures in the builder allows lots of context stuff to leak inside the queries (instead of them being purely built from the query). But maybe this would be an idea for a V5 architecture :-)

It's certainly something I've given a lot of thought to.

ab-pm · 2019-09-16T17:06:24Z

Remember that graphile-build is completely database independent, so the format of a data generator is a generic map {[key: string]: any} that any plugins can hook into and register properties on that are useful.

I see, thanks for the link to queryFromResolveData. Is that the only metadata usage? I guess I was a bit confused by this usage (from a closer look the map property seems to pluck value values from the dummy data in the test case) and also the generic documentation which uses map or limit.

Great. Not sure where to explain this; where would you have looked for this? Most users don't need to know this level of detail, so it shouldn't be too prominent as it could be overwhelming. This is why the Graphile Engine and PostGraphile docs are separate.

Yeah, I'm not sure either. I guess the Graphile Engine docs are what only powerusers would read, but then they shouldn't be Postgraphile-specific. Maybe still add a section "For example, this is how this feature is used in Postgraphile", with links to the code?

Alternatively, the page https://www.graphile.org/postgraphile/extending-raw/ seems to be the best to explain everything that one needs to write raw plugins.

We could also add a "Internals" page to the "Operations" section, where it has a walkthrough of the important parts of the code and how everything is linked together.

Similar for the hooks description.

context is user controlled, we should not be manipulating it.
The info (or resolveInfo) parameter, on the other hand, is ripe for customisation. This is how other projects in the GraphQL.js ecosystem handle it also - it's a standard approach.

OK, I didn't know that. I thought the resolveInfo came from the query, the context came from the framework.

We do have an external utility to help generate a compatible context, but that utility lives outside of the schema generation and knows nothing of the schema internals.

That's another thing I guess that needs better (unified?) documentation, which options from where go into what callbacks. I'll try to come up with a list.

The parentQueryBuilder i's a backwards-compatible addition that was necessary to enable certain advanced features that were not included in 4.0.0. It is used when a subquery needs access to the parent query, for examples see https://www.graphile.org/postgraphile/make-extend-schema-plugin/

OK, I don't know about pre-4.0 PostGraphile, I guess it's not so important anyway. But it's good to know that this is explicitly designed to be used only in custom plugins, I was confused a bit why the core pg-plugins didn't use this feature.

Everything on QueryBuilder takes callback.

…but the methods also take plain values. The docs (limit and offset) are a bit inconsistent there. PR tomorrow :-)

Sometimes fields depend on things that haven't been defined yet, which is why we have beforeLock which can quickly add any other select fields or orders, etc as needed.

OK, that's advanced stuff. From a cursory glance, it seems like this is currently used only for beforeLock("orderBy", …), adding a default order for connections.

benjie · 2019-09-16T19:07:48Z

Remember that graphile-build is completely database independent, so the format of a data generator is a generic map {[key: string]: any} that any plugins can hook into and register properties on that are useful.

I see, thanks for the link to queryFromResolveData. Is that the only metadata usage? I guess I was a bit confused by this usage (from a closer look the map property seems to pluck value values from the dummy data in the test case) and also the generic documentation which uses map or limit.

You can use it however you want.

Great. Not sure where to explain this; where would you have looked for this? Most users don't need to know this level of detail, so it shouldn't be too prominent as it could be overwhelming. This is why the Graphile Engine and PostGraphile docs are separate.

Yeah, I'm not sure either. I guess the Graphile Engine docs are what only powerusers would read, but then they shouldn't be Postgraphile-specific. Maybe still add a section "For example, this is how this feature is used in Postgraphile", with links to the code?

Sounds good 👍 Examples are always good. If you link to code, make sure it's a permalink (press y on your keyboard when viewing code on GitHub to get the permalink).

Alternatively, the page graphile.org/postgraphile/extending-raw seems to be the best to explain everything that one needs to write raw plugins.

Also a good location. Or you could make subpages or even an entire subsection for it.

We could also add a "Internals" page to the "Operations" section, where it has a walkthrough of the important parts of the code and how everything is linked together.

Sure; it's rare that adding more docs/links will be the wrong thing to do.

Similar for the hooks description.

Yes, this definitely needs updating.

context is user controlled, we should not be manipulating it.
The info (or resolveInfo) parameter, on the other hand, is ripe for customisation. This is how other projects in the GraphQL.js ecosystem handle it also - it's a standard approach.

OK, I didn't know that. I thought the resolveInfo came from the query, the context came from the framework.

GraphQLResolveInfo is basically metadata that graphql(.js) itself adds.

We do have an external utility to help generate a compatible context, but that utility lives outside of the schema generation and knows nothing of the schema internals.

That's another thing I guess that needs better (unified?) documentation, which options from where go into what callbacks. I'll try to come up with a list.

That's withPostGraphileContext which is currently documented here:

https://www.graphile.org/postgraphile/usage-schema/

Everything on QueryBuilder takes callback.

…but the methods also take plain values. The docs (limit and offset) are a bit inconsistent there. PR tomorrow :-)

👍

This is why we have a callIfNecessary function. You only use the function version if there's a conflict/dependency going on.

Sometimes fields depend on things that haven't been defined yet, which is why we have beforeLock which can quickly add any other select fields or orders, etc as needed.

OK, that's advanced stuff. From a cursory glance, it seems like this is currently used only for beforeLock("orderBy", …), adding a default order for connections.

That's one place; but I'm certain there are many others. I can't remember what they are off-hand. They might be in external plugins.

benjie · 2019-11-08T10:49:04Z

[semi-automated message] Thanks for your question; hopefully we're well on the way to helping you solve your issue. This doesn't currently seem to be a bug in the library so I'm going to close the issue, but please feel free to keep requesting help below and if it does turn out to be a bug we can definitely re-open it 👍

You can also ask for help in the #help-and-support channel in our Discord chat.

ab-pm changed the title ~~How lookahead works~~ (WIP) How lookahead works Sep 16, 2019

ab-pm changed the title ~~(WIP) How lookahead works~~ How lookahead and queries work Sep 16, 2019

benjie added the question label Sep 16, 2019

This was referenced Sep 17, 2019

Table inheritance / interfaces graphile/crystal#387

Closed

graphile-build-pg? graphile/graphile.github.io#162

Closed

benjie added the add-to-docs label Sep 23, 2019

benjie closed this as completed Nov 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How lookahead and queries work #522

How lookahead and queries work #522

ab-pm commented Sep 16, 2019 •

edited

Loading

benjie commented Sep 16, 2019

benjie commented Sep 16, 2019

ab-pm commented Sep 16, 2019

benjie commented Sep 16, 2019

ab-pm commented Sep 16, 2019

benjie commented Sep 16, 2019

benjie commented Nov 8, 2019

How lookahead and queries work #522

How lookahead and queries work #522

Comments

ab-pm commented Sep 16, 2019 • edited Loading

benjie commented Sep 16, 2019

benjie commented Sep 16, 2019

ab-pm commented Sep 16, 2019

benjie commented Sep 16, 2019

ab-pm commented Sep 16, 2019

benjie commented Sep 16, 2019

benjie commented Nov 8, 2019

ab-pm commented Sep 16, 2019 •

edited

Loading