Skip to content

DataLoader does not de-dupe keys when not configured to cache. #280

Open
@tony-brookes

Description

@tony-brookes

Expected Behavior

If I pass in multiple identical keys, then the batch function should only be called with the unique set of keys, even if not caching (they are still identical requests.)

Current Behavior

The data loader does not de-duplicate the keys and they are passed along in full to the batch function,.

Possible Solution

The data loader should still de-duplicate the keys. My understanding of "cache" is about whether the results are kept in memory after the batch function is called, not whether or not they are batched up. So I would expect that the data loader would still keep map the unique set of results back to each request which asked for those keys.

Steps to Reproduce

Create a DataLoader with {cache: false} as the options.
Call load(1) twice.
Let the batch function get called.
Note that the value 1 will be present twice in the keys.

Context

We have designed our batch functions to expect each key only once, since that is what the data loader supplies in "normal" cases (normal = cache: true). However, we have recently written a thin wrapper around a data loader, which actually delegates the caching part to Redis so we can get caching across processes / machines. Note that Redis is the backing store for the cache, it is NOT the place the real batch function goes to get data.

As such we just wrap a DataLoader with a simple class that takes the options and the actual batch function, and stores those, but create a data loader inside the object with whatever the supplied data loader options were, but with cache: false. This data loader is given a batch function which is just a closure function which will then go and look things up in Redis and only call the underlying real batch function with the set of keys which are not in the cache.

We can turn this on and off as for our local developers there was no need to force them to spin up a Redis instance, so our code can configurable instantiate a native DataLoader or a CachingDataLoader. We noticed we got some very odd behaviour only when using the caching data loader and after investigation we discovered that this was because our caching data loader batch function was receiving duplicate keys. We were merrily passing these on to our underlying real batch function when there was a cache miss as we assumed they were unique. Our batch functions were all written when we weren't using this and hence they had never been tested with duplicate keys. Granted perhaps we should have tested them but we didn't because at the time it seemed it could never happen.

We have written some code in our caching data loader batch function which de-dupes the list, but semantically it felt wrong that the data loader would not de-dupe the list when it was not caching the results in memory. It felt like the logic of taking in multiple requests across a set of (sometimes repeating) keys and then farming out the responses from the batch function to those requests, should be orthogonal to whether or not those responses were held in memory afterwards or not. Hence the "bug." Perhaps this is a feature request, not entirely sure but it felt disconnected enough from expected behaviour that I've opted to file it as a bug for now. :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions