Description
What problem am I trying to solve?
Dataloader currently can solve the problem of collection of data leads to multiple database access.
But batchLoadFn
may query unnecessary data from database in order to suit for all kind of resolvers.
Here is my example:
There is a Graph Type called Car
and it represents my database table cars
.
sql database
create table cars(
id INT,
brand string
)
create table car_trade_records(
id INT,
carId INT,
owner varchar(255),
logDatetime timestamp
)
create table car_accident_records(
id INT,
carId INT,
location varchar(255),
logDatetime timestamp
)
car.graphql
type Car {
id: Int,
brand: String,
lastOwner: String,
lastAccidentLocation: String
}
type Query {
cars(ids: [Int]): [Car]
}
resolvers.js
let resolvers = {
Car :{
latestOwner: async (car, args, context, info) => {
let carRecord = await carLoader.load(car.id)
return carRecord.lastOwner
},
lastAccidentLocation: async (car, args, context, info) => {
let carRecord = await carLoader.load(car.id)
return carRecord.lastAccidentLocation
}
},
Query: {
cars: async (car, args, context, info) => {
return await carLoader.loadMany(args.ids)
}
}
}
the dataloader
// native sql approach
const carLoader = new DataLoader(keys => {
return await mysql.query(`SELECT *,
(SELECT owner FROM car_trade_records trade
WHERE trade.cardId = car.id ORDER BY logDatetime DESC LIMIT 1) AS lastOwner ,
(SELECT location FROM car_accident_records acc
WHERE acc.carId = car.id ORDER BY logDatetime DESC LIMIT 1) AS lastAccidentLocation
FROM car WHERE id IN ?`, keys)
})
// ORM approach. It is sails.js example. ORM usually can get the related table's data easily.
const carLoader = new DataLoader(keys => {
return (await Car.find({where:{id: keys}})
.populate('carOwners', {limit: 1})
.populate('carAccidents', {limit: 1})
).map( c => {
c.lastOwner = c.carOwners.owner
c.lastAccidentLocation = c.carAccidents.location
return c
})
})
If the graphql query doesn't involve both lastOwner
or lastAccidentLocation
fields at same time, there will be unnecessary database query cost because the SQL query will make unnecessary table joins/selection.
If the car_trade_records
and car_accident_records
are huge table or with some complex data manipulation, the unnecessary cost will be significant.
Query examples:
# Example 1: No unnecessary cost
cars(ids: [1,2,3,4]) {
id,
brand,
lastOwner,
lastAccidentLocation
}
# Example 2: selecting unnecessary data from the table 'car_accident_record'
cars(ids: [1,2,3,4]) {
id,
brand,
lastOwner
}
# Example 3: selecting unnecessary data from the table 'car_trade_record'
cars(ids: [1,2,3,4]) {
id,
brand,
lastAccidentLocation
}
# Example 4: selecting unnecessary data from both two tables
cars(ids: [1,2,3,4]) {
id,
brand
}
Someone might think that we can 'join other tables' on demand (in the resolvers) like below. But there will be additional database queries. If 4 car records are selected, 8 (4 * 2=8) more sql queries will be made.
// using sails.js ORM to get data from database
let resolvers = {
Car: {
latestOwner: async (car, args, context, info) => {
let carRecord = await carLoader.load(car.id)
return (await CarTradeRecord.find({
where: {
carId: car.id
},
sort: [{'logDatetime': 'desc'}],
limit: 1
}
)).owner
},
lastAccidentLocation: async (car, args, context, info) => {
let carRecord = await carLoader.load(car.id)
return (await CarAccidentRecord.find({
where: {
carId: car.id
},
sort: [{'logDatetime': 'desc'}],
limit: 1
}
)).location
}
},
Query: {
cars: async (car, args, context, info) => {
return await carLoader.loadMany(args.ids)
}
}
}
const carLoader = new DataLoader(keys => {
return await Car.find({ where: {id: keys} })
})
The new feature I'd like to have
If dataloader can accept more infomation about what attributes will be selected, it can reduce unnecessary cost in querying data.
Proposed changes of DataLoader functions:
load(key, fields)
key
: A key value to load.fields
: (optional) An array of field to load. Default value = []
loadMany(keys, fields)
keys
: An array of key values to load.fields
: (optional) An array of field to load. Default value = []
new DataLoader(batchLoadFn [, options])
batchLoadFn
: function(keys, fields)keys
: an array of keysfields
: an array of field that are being queried.
For load()
and loadMany()
, they will return a cached value only if both the key
and fields
matches those key and fields in batchLoadFn()
.
Example 1
// assume that cached value is never expired.
const loader = new DataLoader((keys, fields)=> {
console.log('keys:', keys)
console.log('fields:', fields)
return batchLoadImpl(keys, fields)
})
(async ()=> {
console.log('No cache')
let a1 = loader.load(1, ['field1', 'field2'])
let a2 = loader.load(2, ['field2', 'field3'])
await a1 && await a2
console.log('All hit cache')
let b1 = loader.load(1, ['field2'])
let b2 = loader.load(1, ['field3'])
let b3 = loader.load(2, ['field1'])
let b4 = loader.load(2, ['field2'])
let b5 = loader.load(2)
// value can be taken from cached if the requested fields are the part of the cached fields
await b1 && await b2 && await b3 && await b4 && await b5
console.log('Some hit cache')
let c1 = loader.load(1, ['field1', 'field4'])
let c2 = loader.load(2, ['field1', 'field2', 'field3'])
//value with key 2 can be taken from cached
await c1 && await c2
console.log('Different cache even with same key')
let d1 = loader.load(2, ['field1'])
let d2 = loader.load(2, ['field1', 'field4'])
console.log('d1 is from cache?', (await d1) === (await c2)) // get from cache?
console.log('d2 is from cache?', (await d2) === (await c2)) // get from cache?
})()
Expected console outputs:
No cache
keys: [1, 2]
fields: ['field1', 'field2', 'field3']
All hit cache
Some hit cache
keys: [1]
fields: ['field1', 'field4']
Different cache even with same key
keys: [2]
fields: ['field1, 'field4']
d1 is from cache? true
d2 is from cache? false
Example 2: demostrate the real case
// native sql approach
const carLoader = new DataLoader( (keys, fields) => {
let subqueries = []
if(fields.includes('latestOwner')){
subqueries.push(`(SELECT owner FROM car_trade_records trade
WHERE trade.cardId = car.id ORDER BY logDatetime DESC LIMIT 1) AS lastOwner`)
}
if(fields.includes('lastAccidentLocation')){
subqueries.push(`(SELECT location FROM car_accident_records acc
WHERE acc.carId = car.id ORDER BY logDatetime DESC LIMIT 1) AS lastAccidentLocation`)
}
return await mysql.query(`SELECT *, ${subqueries.join(',')} FROM car WHERE id IN ?`, keys)
})
let resolvers = {
Car :{
latestOwner: async (car, args, context, info) => {
// info.fieldName: 'latestOwner'
let carRecord = await carLoader.load(car.id, [info.fieldName])
return carRecord.lastOwner
},
lastAccidentLocation: async (car, args, context, info) => {
// info.fieldName: 'lastAccidentLocation'
let carRecord = await carLoader.load(car.id, [info.fieldName])
return carRecord.lastAccidentLocation
}
},
Query: {
cars: async (car, args, context, info) => {
//determine the necessary fields of the Car Entity in graphql query
let involvedFields = info.fieldNodes.map(node => node.name.value)
return await carLoader.loadMany(args.ids, involvedFields)
}
}
}
If it is implemented, no matter what fields are selected to Type Car
, only one single sql query will be made.