Skip to content

[REQUEST] dataloader accept information of 'selected fields' #236

Open
@thfai2000

Description

@thfai2000

What problem am I trying to solve?

Dataloader currently can solve the problem of collection of data leads to multiple database access.
But batchLoadFn may query unnecessary data from database in order to suit for all kind of resolvers.

Here is my example:
There is a Graph Type called Car and it represents my database table cars.

sql database

create table cars(
 id INT,
 brand string
)

create table car_trade_records(
 id INT,
 carId INT,
 owner varchar(255),
 logDatetime timestamp
 )

create table car_accident_records(
 id INT,
 carId INT,
 location varchar(255),
 logDatetime timestamp
)

car.graphql

type Car {
  id: Int,
  brand: String,
  lastOwner: String,
  lastAccidentLocation: String
}

type Query {
   cars(ids: [Int]): [Car]
}

resolvers.js

let resolvers = {
  Car :{
    latestOwner: async (car, args, context, info) => {
      let carRecord = await carLoader.load(car.id)
      return carRecord.lastOwner
    },
    lastAccidentLocation: async (car, args, context, info) => {
      let carRecord = await carLoader.load(car.id)
      return carRecord.lastAccidentLocation
    }
  },
  Query: {
    cars: async (car, args, context, info) => {
      return await carLoader.loadMany(args.ids)
    }
  }
}

the dataloader

// native sql approach
const carLoader = new DataLoader(keys => {
  return await mysql.query(`SELECT *, 
    (SELECT owner FROM car_trade_records trade 
      WHERE trade.cardId = car.id ORDER BY logDatetime DESC LIMIT 1) AS lastOwner ,
    (SELECT location FROM car_accident_records acc 
      WHERE acc.carId = car.id ORDER BY logDatetime DESC LIMIT 1) AS lastAccidentLocation
    FROM car WHERE id IN ?`, keys)
})

// ORM approach. It is sails.js example. ORM usually can get the related table's data easily.
const carLoader = new DataLoader(keys => {
  return (await Car.find({where:{id: keys}})
      .populate('carOwners', {limit: 1})
      .populate('carAccidents', {limit: 1}) 
    ).map( c => {
      c.lastOwner = c.carOwners.owner
      c.lastAccidentLocation = c.carAccidents.location
      return c 
    })
})

If the graphql query doesn't involve both lastOwner or lastAccidentLocation fields at same time, there will be unnecessary database query cost because the SQL query will make unnecessary table joins/selection.
If the car_trade_records and car_accident_records are huge table or with some complex data manipulation, the unnecessary cost will be significant.

Query examples:

# Example 1: No unnecessary cost

cars(ids: [1,2,3,4]) {
   id,
   brand,
   lastOwner,
   lastAccidentLocation
}

# Example 2: selecting unnecessary data from the table 'car_accident_record'
cars(ids: [1,2,3,4]) {
   id,
   brand,
   lastOwner
}

# Example 3: selecting unnecessary data from the table 'car_trade_record'
cars(ids: [1,2,3,4]) {
   id,
   brand,
   lastAccidentLocation
}

# Example 4: selecting unnecessary data from both two tables
cars(ids: [1,2,3,4]) {
   id,
   brand
}

Someone might think that we can 'join other tables' on demand (in the resolvers) like below. But there will be additional database queries. If 4 car records are selected, 8 (4 * 2=8) more sql queries will be made.

// using sails.js ORM to get data from database  
let resolvers = {
  Car: {
    latestOwner: async (car, args, context, info) => {
      let carRecord = await carLoader.load(car.id)
      return (await CarTradeRecord.find({
        where: {
          carId: car.id
        },
        sort: [{'logDatetime': 'desc'}],
        limit: 1
       }
      )).owner
    },
    lastAccidentLocation: async (car, args, context, info) => {
      let carRecord = await carLoader.load(car.id)
      return (await CarAccidentRecord.find({
        where: {
          carId: car.id
        },
        sort: [{'logDatetime': 'desc'}],
        limit: 1
      }
     )).location
    }
  },
  Query: {
    cars: async (car, args, context, info) => {
      return await carLoader.loadMany(args.ids)
    }
  }
}

const carLoader = new DataLoader(keys => {	
  return await Car.find({ where: {id: keys} })
})

The new feature I'd like to have

If dataloader can accept more infomation about what attributes will be selected, it can reduce unnecessary cost in querying data.

Proposed changes of DataLoader functions:

load(key, fields)

  • key: A key value to load.
  • fields: (optional) An array of field to load. Default value = []

loadMany(keys, fields)

  • keys: An array of key values to load.
  • fields: (optional) An array of field to load. Default value = []

new DataLoader(batchLoadFn [, options])

  • batchLoadFn: function(keys, fields)
    • keys: an array of keys
    • fields: an array of field that are being queried.

For load() and loadMany(), they will return a cached value only if both the key and fields matches those key and fields in batchLoadFn().

Example 1

// assume that cached value is never expired. 

const loader = new DataLoader((keys, fields)=> {
  console.log('keys:', keys)
  console.log('fields:', fields)
  return batchLoadImpl(keys, fields)
})

(async ()=> {
  console.log('No cache')
  let a1 = loader.load(1, ['field1', 'field2'])
  let a2 = loader.load(2, ['field2', 'field3'])
  await a1 && await a2

  console.log('All hit cache')
  let b1 = loader.load(1, ['field2'])
  let b2 = loader.load(1, ['field3'])
  let b3 = loader.load(2, ['field1'])
  let b4 = loader.load(2, ['field2'])
  let b5 = loader.load(2)
  // value can be taken from cached if the requested fields are the part of the cached fields 
  await b1 && await b2 && await b3 && await b4 && await b5

  console.log('Some hit cache')
  let c1 = loader.load(1, ['field1', 'field4'])
  let c2 = loader.load(2, ['field1', 'field2', 'field3'])
  //value with key 2 can be taken from cached
  await c1 && await c2

  console.log('Different cache even with same key')
  let d1 = loader.load(2, ['field1'])
  let d2 = loader.load(2, ['field1', 'field4'])
  console.log('d1 is from cache?', (await d1) === (await c2))  // get from cache?
  console.log('d2 is from cache?', (await d2) === (await c2))  // get from cache?
})()

Expected console outputs:

No cache
keys: [1, 2]
fields: ['field1', 'field2', 'field3']
All hit cache
Some hit cache
keys: [1]        
fields: ['field1', 'field4']
Different cache even with same key
keys: [2]
fields: ['field1, 'field4']
d1 is from cache? true
d2 is from cache? false

Example 2: demostrate the real case

// native sql approach
const carLoader = new DataLoader( (keys, fields) => {
  let subqueries = []

  if(fields.includes('latestOwner')){
    subqueries.push(`(SELECT owner FROM car_trade_records trade 
      WHERE trade.cardId = car.id ORDER BY logDatetime DESC LIMIT 1) AS lastOwner`)
  }

  if(fields.includes('lastAccidentLocation')){
    subqueries.push(`(SELECT location FROM car_accident_records acc 
      WHERE acc.carId = car.id ORDER BY logDatetime DESC LIMIT 1) AS lastAccidentLocation`)
  }
	
  return await mysql.query(`SELECT *, ${subqueries.join(',')} FROM car WHERE id IN ?`, keys)
	
})

let resolvers = {
  Car :{
    latestOwner: async (car, args, context, info) => {
      // info.fieldName: 'latestOwner'
      let carRecord = await carLoader.load(car.id, [info.fieldName])	
      return carRecord.lastOwner
    },
    lastAccidentLocation: async (car, args, context, info) => {
      // info.fieldName: 'lastAccidentLocation'
      let carRecord = await carLoader.load(car.id, [info.fieldName])
      return carRecord.lastAccidentLocation
    }
  },
  Query: {
    cars: async (car, args, context, info) => {
      //determine the necessary fields of the Car Entity in graphql query
      let involvedFields = info.fieldNodes.map(node => node.name.value)
      return await carLoader.loadMany(args.ids, involvedFields)
    }
  }
}

If it is implemented, no matter what fields are selected to Type Car, only one single sql query will be made.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions