[REQUEST] dataloader accept information of 'selected fields'

## What problem am I trying to solve?
Dataloader currently can solve the problem of collection of data leads to multiple database access.
But `batchLoadFn` may query unnecessary data from database in order to suit for all kind of resolvers.


 Here is my example:
There is a Graph Type called `Car` and it represents my database table `cars`.

### sql database
```sql
create table cars(
 id INT,
 brand string
)

create table car_trade_records(
 id INT,
 carId INT,
 owner varchar(255),
 logDatetime timestamp
 )

create table car_accident_records(
 id INT,
 carId INT,
 location varchar(255),
 logDatetime timestamp
)
```

### car.graphql
```graphql
type Car {
  id: Int,
  brand: String,
  lastOwner: String,
  lastAccidentLocation: String
}

type Query {
   cars(ids: [Int]): [Car]
}
```

### resolvers.js
```javascript
let resolvers = {
  Car :{
    latestOwner: async (car, args, context, info) => {
      let carRecord = await carLoader.load(car.id)
      return carRecord.lastOwner
    },
    lastAccidentLocation: async (car, args, context, info) => {
      let carRecord = await carLoader.load(car.id)
      return carRecord.lastAccidentLocation
    }
  },
  Query: {
    cars: async (car, args, context, info) => {
      return await carLoader.loadMany(args.ids)
    }
  }
}
```

### the dataloader
```javascript
// native sql approach
const carLoader = new DataLoader(keys => {
  return await mysql.query(`SELECT *, 
    (SELECT owner FROM car_trade_records trade 
      WHERE trade.cardId = car.id ORDER BY logDatetime DESC LIMIT 1) AS lastOwner ,
    (SELECT location FROM car_accident_records acc 
      WHERE acc.carId = car.id ORDER BY logDatetime DESC LIMIT 1) AS lastAccidentLocation
    FROM car WHERE id IN ?`, keys)
})

// ORM approach. It is sails.js example. ORM usually can get the related table's data easily.
const carLoader = new DataLoader(keys => {
  return (await Car.find({where:{id: keys}})
      .populate('carOwners', {limit: 1})
      .populate('carAccidents', {limit: 1}) 
    ).map( c => {
      c.lastOwner = c.carOwners.owner
      c.lastAccidentLocation = c.carAccidents.location
      return c 
    })
})
```

If the graphql query doesn't involve both `lastOwner` or `lastAccidentLocation` fields at same time, there will be unnecessary database query cost because the SQL query will make unnecessary table joins/selection.
If the `car_trade_records` and `car_accident_records` are huge table or with some complex data manipulation, the unnecessary cost will be significant.

Query examples:
```
# Example 1: No unnecessary cost

cars(ids: [1,2,3,4]) {
   id,
   brand,
   lastOwner,
   lastAccidentLocation
}

# Example 2: selecting unnecessary data from the table 'car_accident_record'
cars(ids: [1,2,3,4]) {
   id,
   brand,
   lastOwner
}

# Example 3: selecting unnecessary data from the table 'car_trade_record'
cars(ids: [1,2,3,4]) {
   id,
   brand,
   lastAccidentLocation
}

# Example 4: selecting unnecessary data from both two tables
cars(ids: [1,2,3,4]) {
   id,
   brand
}
```

Someone might think that we can 'join other tables' on demand (in the resolvers) like below. But there will be additional database queries. If 4 car records are selected, 8 (4 * 2=8) more sql queries will be made.

```javascript
// using sails.js ORM to get data from database  
let resolvers = {
  Car: {
    latestOwner: async (car, args, context, info) => {
      let carRecord = await carLoader.load(car.id)
      return (await CarTradeRecord.find({
        where: {
          carId: car.id
        },
        sort: [{'logDatetime': 'desc'}],
        limit: 1
       }
      )).owner
    },
    lastAccidentLocation: async (car, args, context, info) => {
      let carRecord = await carLoader.load(car.id)
      return (await CarAccidentRecord.find({
        where: {
          carId: car.id
        },
        sort: [{'logDatetime': 'desc'}],
        limit: 1
      }
     )).location
    }
  },
  Query: {
    cars: async (car, args, context, info) => {
      return await carLoader.loadMany(args.ids)
    }
  }
}

const carLoader = new DataLoader(keys => {	
  return await Car.find({ where: {id: keys} })
})
```

## The new feature I'd like to have
If dataloader can accept more infomation about what attributes will be selected, it can reduce unnecessary cost in querying data.

Proposed changes of DataLoader functions:

`load(key, fields)`
- `key`: A key value to load.
- `fields`: (optional) An array of field to load. Default value = []

`loadMany(keys, fields)`
- `keys`: An array of key values to load.
- `fields`: (optional) An array of field to load. Default value = []

`new DataLoader(batchLoadFn [, options])`
- `batchLoadFn`: function(keys, fields)
  - `keys`: an array of keys
  - `fields`: an array of field that are being queried. 

For `load()` and `loadMany()`, they will return a cached value only if both the `key` and `fields` matches those key and fields in `batchLoadFn()`.


Example 1
```javascript
// assume that cached value is never expired. 

const loader = new DataLoader((keys, fields)=> {
  console.log('keys:', keys)
  console.log('fields:', fields)
  return batchLoadImpl(keys, fields)
})

(async ()=> {
  console.log('No cache')
  let a1 = loader.load(1, ['field1', 'field2'])
  let a2 = loader.load(2, ['field2', 'field3'])
  await a1 && await a2

  console.log('All hit cache')
  let b1 = loader.load(1, ['field2'])
  let b2 = loader.load(1, ['field3'])
  let b3 = loader.load(2, ['field1'])
  let b4 = loader.load(2, ['field2'])
  let b5 = loader.load(2)
  // value can be taken from cached if the requested fields are the part of the cached fields 
  await b1 && await b2 && await b3 && await b4 && await b5

  console.log('Some hit cache')
  let c1 = loader.load(1, ['field1', 'field4'])
  let c2 = loader.load(2, ['field1', 'field2', 'field3'])
  //value with key 2 can be taken from cached
  await c1 && await c2

  console.log('Different cache even with same key')
  let d1 = loader.load(2, ['field1'])
  let d2 = loader.load(2, ['field1', 'field4'])
  console.log('d1 is from cache?', (await d1) === (await c2))  // get from cache?
  console.log('d2 is from cache?', (await d2) === (await c2))  // get from cache?
})()
```
Expected console outputs:
```
No cache
keys: [1, 2]
fields: ['field1', 'field2', 'field3']
All hit cache
Some hit cache
keys: [1]        
fields: ['field1', 'field4']
Different cache even with same key
keys: [2]
fields: ['field1, 'field4']
d1 is from cache? true
d2 is from cache? false
```

Example 2: demostrate the real case
```javascript
// native sql approach
const carLoader = new DataLoader( (keys, fields) => {
  let subqueries = []

  if(fields.includes('latestOwner')){
    subqueries.push(`(SELECT owner FROM car_trade_records trade 
      WHERE trade.cardId = car.id ORDER BY logDatetime DESC LIMIT 1) AS lastOwner`)
  }

  if(fields.includes('lastAccidentLocation')){
    subqueries.push(`(SELECT location FROM car_accident_records acc 
      WHERE acc.carId = car.id ORDER BY logDatetime DESC LIMIT 1) AS lastAccidentLocation`)
  }
	
  return await mysql.query(`SELECT *, ${subqueries.join(',')} FROM car WHERE id IN ?`, keys)
	
})

let resolvers = {
  Car :{
    latestOwner: async (car, args, context, info) => {
      // info.fieldName: 'latestOwner'
      let carRecord = await carLoader.load(car.id, [info.fieldName])	
      return carRecord.lastOwner
    },
    lastAccidentLocation: async (car, args, context, info) => {
      // info.fieldName: 'lastAccidentLocation'
      let carRecord = await carLoader.load(car.id, [info.fieldName])
      return carRecord.lastAccidentLocation
    }
  },
  Query: {
    cars: async (car, args, context, info) => {
      //determine the necessary fields of the Car Entity in graphql query
      let involvedFields = info.fieldNodes.map(node => node.name.value)
      return await carLoader.loadMany(args.ids, involvedFields)
    }
  }
}
```
If it is implemented, no matter what fields are selected to Type `Car`, only one single sql query will be made.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[REQUEST] dataloader accept information of 'selected fields' #236

What problem am I trying to solve?

sql database

car.graphql

resolvers.js

the dataloader

The new feature I'd like to have

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[REQUEST] dataloader accept information of 'selected fields' #236

Description

What problem am I trying to solve?

sql database

car.graphql

resolvers.js

the dataloader

The new feature I'd like to have

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions