Enabling sub-query filtering and sorting #422

rcb4by · 2025-04-21T14:32:35Z

I believe sub-query filtering and sorting is an incredibly useful feature, and I have been able to figure out a partial solution. Posting here so hopefully people can build off of it. NOTE: This solution takes advantage of the RelationshipLoader class for resolving sub-queries, so it will only work if the batching option on the Node's Meta class is set to True. The sample code below builds off of the changes discussed in #419, but should be easily adaptable if people haven't implemented those changes.

Issue

Our UI follows a backend for frontend pattern, and on one page we want to display information about an entity, along with recent events related to that entity. Ideally, we could use a query akin to this:

query EntityWithEvents($entity_name: String!) {
  entity(filter: {name: {eq: $entity_name}}) {
    edges {
      node {
        value1
        value2
        events(filter: {timestamp: {gt: "<past day's events>"}}, sort: TIMESTAMP_DESC) {
          edges {
            node {
              eventValue1
              eventValue2
            }
          }
        }
      }
    }
  }
}

The problem is that with the current code, the filter and sort arguments to events are essentially ignored. As a result, we have no choice but to query all of events, and filter and sort them manually. This results in extraneous frontend logic, and massively increases load times and response sizes for the page, as we are loading the entire event history for the entity. We can see the following (summarized) statements emitted by sqlalchemy when we run this query:

INFO: sqlalchemy.engine : SELECT entity.* FROM entity WHERE entity.name = $1 ORDER BY entity.id ASC;
INFO: sqlalchemy.engine : SELECT events.* FROM events WHERE events.entity_id = $1 ORDER BY events.id ASC;

Potential Workaround

One possible workaround is to set an order_by on the relationship definition itself, and then pass the first argument to events to limit the response to only the first N results. The statements emitted in this case are:

INFO: sqlalchemy.engine : SELECT entity.* FROM entity WHERE entity.name = $1 ORDER BY entity.id ASC;
INFO: sqlalchemy.engine : SELECT events.* FROM events WHERE events.entity_id = $1 ORDER BY events.timestamp DESC;

However, this approach has the following problems:

The first argument is not a direct analogue of SQL's OFFSET - it will still load the full event history into memory in order to generate a cursor, and then truncate the results (NOTE: depending on the database engine, this approach may use a database cursor rather than sending all data to the graphql server. I'm not familiar enough with cursors to verify this). So this is still a fundamentally slow query.
Returning first N results is semantically different from returning all results since a given time. Some days might have 0 events, while others will have 10+. If our goal is to show all events since the day before, then this approach is untenable.
Having to set the sort as part of the relationship definition rather than specifying as part of the query itself is inflexible and relies on developer awareness, rather than explicit ordering.

Solution

To fix the issue, we will implement our own RelationshipLoader class, modifying it from the original to accept sort/filter args during construction, then patch in a modified get_batch_resolver definition that uses our new implementation.

Usually, the _load_for_path internal function called by batch_load_fn will ignore any filtering or sorting set on the query_context arg. It does, however, respect filtering requested via the loadopt parameter, in order to implement with_loader_criteria() semantics described here. The easiest way to hijack this flow is to construct our own loader (a defaultload) and steal it's loader options (stored in context[0]). _load_for_path does respect the order_by prop of the relationship that was passed in to initially (and will only resolve that reference at call-time), so we can simply update relationship_prop.order_by with the requested order right before calling. Note that, while the RelationshipLoader is constructed on a per-arg basis, the relationship_prop is still shared by all instances, so we have to set this value right before calling in order to avoid conflicts. In the case where no sort arg is passed, we store a cache of default sort orders the first time each relationship_prop is passed in, which is set prior to calling.

NOTE: the patch_relationship_loader() function must be called before constructing your graphene.Schema object.

Specific changes are noted in the comments.

# patch_batching.py
from asyncio import get_event_loop
from typing import Any, Callable, Coroutine

import sqlalchemy
from graphene_sqlalchemy.batching import DataLoader
from sqlalchemy.orm import Session, defaultload, strategies
from sqlalchemy.util import immutabledict
from sqlalchemy.util._concurrency_py3k import greenlet_spawn


def patch_relationship_loader() -> None:
    """
    Call this function once prior to any other setup code
    (specifically, prior to creating your `graphene.Schema` object)
    """
    import graphene_sqlalchemy.fields, graphene_sqlalchemy.converter
    graphene_sqlalchemy.fields.get_batch_resolver = get_batch_resolver
    graphene_sqlalchemy.converter.get_batch_resolver = get_batch_resolver


class RelationshipLoader(DataLoader):
    cache = False
    DEFAULT_PROP_ORDER_CACHE: dict[Any, Any] = {}
    "Cache to ensure that default sort order is used when no `sort` arg is passed"

    def __init__(self, relationship_prop: Any, selectin_loader: Any,
                 args: dict[Any, Any]) -> None:
        super().__init__()
        if relationship_prop not in RelationshipLoader.DEFAULT_PROP_ORDER_CACHE:
            RelationshipLoader.DEFAULT_PROP_ORDER_CACHE[
                relationship_prop] = relationship_prop.order_by

        self.relationship_prop = relationship_prop
        self.selectin_loader = selectin_loader
        self.args = args

        sort = args.get('sort')
        if sort is None:
            # Check cache if no sort if specified
            self.order_by = RelationshipLoader.DEFAULT_PROP_ORDER_CACHE[
                relationship_prop]
        else:
            # Sort handling copied from `SQLAlchemyConnectionField.get_query`
            if not isinstance(sort, list):
                sort = [sort]
            sort_args = []
            for item in sort:
                import enum
                from graphene_sqlalchemy.utils import EnumValue
                if isinstance(item, enum.Enum):
                    sort_args.append(item.value.value)
                elif isinstance(item, EnumValue):
                    sort_args.append(item.value)
                else:
                    sort_args.append(item)
            self.order_by = sort_args

        filter = args.get('filter')
        if filter is None:
            self.loadopt = None
        else:
            filters = type(filter).execute_filters(None, filter)[1]
            # Need `InstrumentedAttribute` (type of `relationship_prop`) to utilize `and_` filtering. Better way to access this?
            rel_attr = getattr(self.relationship_prop.parent.entity,
                               self.relationship_prop.key)
            # Use SA's public relationship-loading-strategy functions to generate a `_AttributeStrategyLoad` object for us
            self.loadopt = defaultload(rel_attr.and_(
                *filters)).context[0]

    async def batch_load_fn(self, parents: Any) -> list[Any]:
        """
        This function is the corrected version of the batch load function
        to use for relationships. The only differences are that,
        rather than calling `selectin_loader._load_for_path` directly, it
        calls it asynchronously using `greenlet_spawn`; it also sets
        `self.relationship_prop.order_by` to enable sorting (referenced via
        `self.selectin_loader`) and passes `self.loadopt` to enable filtering.

        Other extraneous logic has been removed for clarity. See the original
        code at
            https://github.com/graphql-python/graphene-sqlalchemy/blob/eb9c663cc0e314987397626573e3d2f940bea138/graphene_sqlalchemy/batching.py#L39
        """
        child_mapper = self.relationship_prop.mapper
        parent_mapper = self.relationship_prop.parent
        session = Session.object_session(parents[0])

        for parent in parents:
            assert session is Session.object_session(parent)
            assert session and parent not in session.dirty

        states = [(sqlalchemy.inspect(parent), True) for parent in parents]

        query_context = None
        if session:
            self.relationship_prop.order_by = self.order_by
            parent_mapper_query = session.query(parent_mapper.entity)
            query_context = parent_mapper_query._compile_context()
            await greenlet_spawn(
                self.selectin_loader._load_for_path,
                query_context,
                parent_mapper._path_registry,
                states,
                None,
                child_mapper,
                self.loadopt,
                None,  # recursion depth can be none
                immutabledict(),  # default value for selectinload->lazyload
            )
        result = [
            getattr(parent, self.relationship_prop.key) for parent in parents
        ]
        return result


# Cache this across `batch_load_fn` calls
# This is so SQL string generation is cached under-the-hood via `bakery`
# Caching the relationship loader for each relationship prop and args combo (can't use dict because `args` is not hashable)
RELATIONSHIP_LOADERS_CACHE: list[RelationshipLoader] = []


def get_batch_resolver(
        relationship_prop: Any
) -> Callable[[Any, Any], Coroutine[Any, Any, None]]:
    """Get the resolve function for the given relationship.
       Changed from default implementation to pass `args` in and consider `args` while caching."""

    def _get_loader(relationship_prop: Any,
                    args: dict[Any, Any]) -> RelationshipLoader:
        """Retrieve the cached loader of the given relationship."""
        loop = get_event_loop()
        RELATIONSHIP_LOADERS_CACHE[:] = [
            l for l in RELATIONSHIP_LOADERS_CACHE if l.loop == loop
        ]
        loader = next(
            (l for l in RELATIONSHIP_LOADERS_CACHE
             if l.relationship_prop == relationship_prop and l.args == args),
            None)
        if loader is None:
            selectin_loader = strategies.SelectInLoader(
                relationship_prop, (("lazy", "selectin"), ))
            loader = RelationshipLoader(
                relationship_prop=relationship_prop,
                selectin_loader=selectin_loader,
                args=args,
            )
            RELATIONSHIP_LOADERS_CACHE.append(loader)
        return loader

    async def resolve(root: Any, info: Any, **args: Any) -> None:
        return await _get_loader(relationship_prop, args).load(root)

    return resolve

The text was updated successfully, but these errors were encountered:

AdamLuptak · 2025-05-12T05:17:30Z

Hello, You said you partiali solved this problem, What is missing ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling sub-query filtering and sorting #422

Enabling sub-query filtering and sorting #422

rcb4by commented Apr 21, 2025 •

edited

Loading

AdamLuptak commented May 12, 2025

Enabling sub-query filtering and sorting #422

Enabling sub-query filtering and sorting #422

Comments

rcb4by commented Apr 21, 2025 • edited Loading

Issue

Potential Workaround

Solution

AdamLuptak commented May 12, 2025

rcb4by commented Apr 21, 2025 •

edited

Loading