You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I believe sub-query filtering and sorting is an incredibly useful feature, and I have been able to figure out a partial solution. Posting here so hopefully people can build off of it. NOTE: This solution takes advantage of the RelationshipLoader class for resolving sub-queries, so it will only work if the batching option on the Node's Meta class is set to True. The sample code below builds off of the changes discussed in #419, but should be easily adaptable if people haven't implemented those changes.
Issue
Our UI follows a backend for frontend pattern, and on one page we want to display information about an entity, along with recent events related to that entity. Ideally, we could use a query akin to this:
The problem is that with the current code, the filter and sort arguments to events are essentially ignored. As a result, we have no choice but to query all of events, and filter and sort them manually. This results in extraneous frontend logic, and massively increases load times and response sizes for the page, as we are loading the entire event history for the entity. We can see the following (summarized) statements emitted by sqlalchemy when we run this query:
INFO: sqlalchemy.engine : SELECT entity.* FROM entity WHERE entity.name = $1 ORDER BY entity.id ASC;
INFO: sqlalchemy.engine : SELECT events.* FROM events WHERE events.entity_id = $1 ORDER BY events.id ASC;
Potential Workaround
One possible workaround is to set an order_by on the relationship definition itself, and then pass the first argument to events to limit the response to only the first N results. The statements emitted in this case are:
INFO: sqlalchemy.engine : SELECT entity.* FROM entity WHERE entity.name = $1 ORDER BY entity.id ASC;
INFO: sqlalchemy.engine : SELECT events.* FROM events WHERE events.entity_id = $1 ORDER BY events.timestamp DESC;
However, this approach has the following problems:
The first argument is not a direct analogue of SQL's OFFSET - it will still load the full event history into memory in order to generate a cursor, and then truncate the results (NOTE: depending on the database engine, this approach may use a database cursor rather than sending all data to the graphql server. I'm not familiar enough with cursors to verify this). So this is still a fundamentally slow query.
Returning first N results is semantically different from returning all results since a given time. Some days might have 0 events, while others will have 10+. If our goal is to show all events since the day before, then this approach is untenable.
Having to set the sort as part of the relationship definition rather than specifying as part of the query itself is inflexible and relies on developer awareness, rather than explicit ordering.
Solution
To fix the issue, we will implement our own RelationshipLoader class, modifying it from the original to accept sort/filter args during construction, then patch in a modified get_batch_resolver definition that uses our new implementation.
Usually, the _load_for_path internal function called by batch_load_fn will ignore any filtering or sorting set on the query_context arg. It does, however, respect filtering requested via the loadopt parameter, in order to implement with_loader_criteria() semantics described here. The easiest way to hijack this flow is to construct our own loader (a defaultload) and steal it's loader options (stored in context[0]). _load_for_pathdoes respect the order_by prop of the relationship that was passed in to initially (and will only resolve that reference at call-time), so we can simply update relationship_prop.order_by with the requested order right before calling. Note that, while the RelationshipLoader is constructed on a per-arg basis, the relationship_prop is still shared by all instances, so we have to set this value right before calling in order to avoid conflicts. In the case where no sort arg is passed, we store a cache of default sort orders the first time each relationship_prop is passed in, which is set prior to calling.
NOTE: the patch_relationship_loader() function must be called before constructing your graphene.Schema object.
Specific changes are noted in the comments.
# patch_batching.pyfromasyncioimportget_event_loopfromtypingimportAny, Callable, Coroutineimportsqlalchemyfromgraphene_sqlalchemy.batchingimportDataLoaderfromsqlalchemy.ormimportSession, defaultload, strategiesfromsqlalchemy.utilimportimmutabledictfromsqlalchemy.util._concurrency_py3kimportgreenlet_spawndefpatch_relationship_loader() ->None:
""" Call this function once prior to any other setup code (specifically, prior to creating your `graphene.Schema` object) """importgraphene_sqlalchemy.fields, graphene_sqlalchemy.convertergraphene_sqlalchemy.fields.get_batch_resolver=get_batch_resolvergraphene_sqlalchemy.converter.get_batch_resolver=get_batch_resolverclassRelationshipLoader(DataLoader):
cache=FalseDEFAULT_PROP_ORDER_CACHE: dict[Any, Any] = {}
"Cache to ensure that default sort order is used when no `sort` arg is passed"def__init__(self, relationship_prop: Any, selectin_loader: Any,
args: dict[Any, Any]) ->None:
super().__init__()
ifrelationship_propnotinRelationshipLoader.DEFAULT_PROP_ORDER_CACHE:
RelationshipLoader.DEFAULT_PROP_ORDER_CACHE[
relationship_prop] =relationship_prop.order_byself.relationship_prop=relationship_propself.selectin_loader=selectin_loaderself.args=argssort=args.get('sort')
ifsortisNone:
# Check cache if no sort if specifiedself.order_by=RelationshipLoader.DEFAULT_PROP_ORDER_CACHE[
relationship_prop]
else:
# Sort handling copied from `SQLAlchemyConnectionField.get_query`ifnotisinstance(sort, list):
sort= [sort]
sort_args= []
foriteminsort:
importenumfromgraphene_sqlalchemy.utilsimportEnumValueifisinstance(item, enum.Enum):
sort_args.append(item.value.value)
elifisinstance(item, EnumValue):
sort_args.append(item.value)
else:
sort_args.append(item)
self.order_by=sort_argsfilter=args.get('filter')
iffilterisNone:
self.loadopt=Noneelse:
filters=type(filter).execute_filters(None, filter)[1]
# Need `InstrumentedAttribute` (type of `relationship_prop`) to utilize `and_` filtering. Better way to access this?rel_attr=getattr(self.relationship_prop.parent.entity,
self.relationship_prop.key)
# Use SA's public relationship-loading-strategy functions to generate a `_AttributeStrategyLoad` object for usself.loadopt=defaultload(rel_attr.and_(
*filters)).context[0]
asyncdefbatch_load_fn(self, parents: Any) ->list[Any]:
""" This function is the corrected version of the batch load function to use for relationships. The only differences are that, rather than calling `selectin_loader._load_for_path` directly, it calls it asynchronously using `greenlet_spawn`; it also sets `self.relationship_prop.order_by` to enable sorting (referenced via `self.selectin_loader`) and passes `self.loadopt` to enable filtering. Other extraneous logic has been removed for clarity. See the original code at https://github.com/graphql-python/graphene-sqlalchemy/blob/eb9c663cc0e314987397626573e3d2f940bea138/graphene_sqlalchemy/batching.py#L39 """child_mapper=self.relationship_prop.mapperparent_mapper=self.relationship_prop.parentsession=Session.object_session(parents[0])
forparentinparents:
assertsessionisSession.object_session(parent)
assertsessionandparentnotinsession.dirtystates= [(sqlalchemy.inspect(parent), True) forparentinparents]
query_context=Noneifsession:
self.relationship_prop.order_by=self.order_byparent_mapper_query=session.query(parent_mapper.entity)
query_context=parent_mapper_query._compile_context()
awaitgreenlet_spawn(
self.selectin_loader._load_for_path,
query_context,
parent_mapper._path_registry,
states,
None,
child_mapper,
self.loadopt,
None, # recursion depth can be noneimmutabledict(), # default value for selectinload->lazyload
)
result= [
getattr(parent, self.relationship_prop.key) forparentinparents
]
returnresult# Cache this across `batch_load_fn` calls# This is so SQL string generation is cached under-the-hood via `bakery`# Caching the relationship loader for each relationship prop and args combo (can't use dict because `args` is not hashable)RELATIONSHIP_LOADERS_CACHE: list[RelationshipLoader] = []
defget_batch_resolver(
relationship_prop: Any
) ->Callable[[Any, Any], Coroutine[Any, Any, None]]:
"""Get the resolve function for the given relationship. Changed from default implementation to pass `args` in and consider `args` while caching."""def_get_loader(relationship_prop: Any,
args: dict[Any, Any]) ->RelationshipLoader:
"""Retrieve the cached loader of the given relationship."""loop=get_event_loop()
RELATIONSHIP_LOADERS_CACHE[:] = [
lforlinRELATIONSHIP_LOADERS_CACHEifl.loop==loop
]
loader=next(
(lforlinRELATIONSHIP_LOADERS_CACHEifl.relationship_prop==relationship_propandl.args==args),
None)
ifloaderisNone:
selectin_loader=strategies.SelectInLoader(
relationship_prop, (("lazy", "selectin"), ))
loader=RelationshipLoader(
relationship_prop=relationship_prop,
selectin_loader=selectin_loader,
args=args,
)
RELATIONSHIP_LOADERS_CACHE.append(loader)
returnloaderasyncdefresolve(root: Any, info: Any, **args: Any) ->None:
returnawait_get_loader(relationship_prop, args).load(root)
returnresolve
The text was updated successfully, but these errors were encountered:
I believe sub-query filtering and sorting is an incredibly useful feature, and I have been able to figure out a partial solution. Posting here so hopefully people can build off of it. NOTE: This solution takes advantage of the
RelationshipLoader
class for resolving sub-queries, so it will only work if thebatching
option on the Node'sMeta
class is set toTrue
. The sample code below builds off of the changes discussed in #419, but should be easily adaptable if people haven't implemented those changes.Issue
Our UI follows a backend for frontend pattern, and on one page we want to display information about an entity, along with recent events related to that entity. Ideally, we could use a query akin to this:
The problem is that with the current code, the
filter
andsort
arguments toevents
are essentially ignored. As a result, we have no choice but to query all of events, and filter and sort them manually. This results in extraneous frontend logic, and massively increases load times and response sizes for the page, as we are loading the entire event history for the entity. We can see the following (summarized) statements emitted by sqlalchemy when we run this query:Potential Workaround
One possible workaround is to set an
order_by
on the relationship definition itself, and then pass thefirst
argument toevents
to limit the response to only the first N results. The statements emitted in this case are:However, this approach has the following problems:
first
argument is not a direct analogue of SQL'sOFFSET
- it will still load the full event history into memory in order to generate a cursor, and then truncate the results (NOTE: depending on the database engine, this approach may use a database cursor rather than sending all data to the graphql server. I'm not familiar enough with cursors to verify this). So this is still a fundamentally slow query.Solution
To fix the issue, we will implement our own
RelationshipLoader
class, modifying it from the original to accept sort/filter args during construction, then patch in a modifiedget_batch_resolver
definition that uses our new implementation.Usually, the
_load_for_path
internal function called bybatch_load_fn
will ignore any filtering or sorting set on thequery_context
arg. It does, however, respect filtering requested via theloadopt
parameter, in order to implementwith_loader_criteria()
semantics described here. The easiest way to hijack this flow is to construct our own loader (adefaultload
) and steal it's loader options (stored incontext[0]
)._load_for_path
does respect theorder_by
prop of the relationship that was passed in to initially (and will only resolve that reference at call-time), so we can simply updaterelationship_prop.order_by
with the requested order right before calling. Note that, while theRelationshipLoader
is constructed on a per-arg basis, therelationship_prop
is still shared by all instances, so we have to set this value right before calling in order to avoid conflicts. In the case where nosort
arg is passed, we store a cache of default sort orders the first time eachrelationship_prop
is passed in, which is set prior to calling.NOTE: the
patch_relationship_loader()
function must be called before constructing yourgraphene.Schema
object.Specific changes are noted in the comments.
The text was updated successfully, but these errors were encountered: