diff --git a/cspell.yml b/cspell.yml index fb16fb4497..733048d696 100644 --- a/cspell.yml +++ b/cspell.yml @@ -25,6 +25,7 @@ overrides: - swcrc - noreferrer - xlink + - deduplication ignoreRegExpList: - u\{[0-9a-f]{1,8}\} diff --git a/website/pages/docs/_meta.ts b/website/pages/docs/_meta.ts index 36da875caa..b766ad02f7 100644 --- a/website/pages/docs/_meta.ts +++ b/website/pages/docs/_meta.ts @@ -19,6 +19,7 @@ const meta = { 'constructing-types': '', 'oneof-input-objects': '', 'defer-stream': '', + 'n1-dataloader': '', 'resolver-anatomy': '', 'graphql-errors': '', '-- 3': { diff --git a/website/pages/docs/n1-dataloader.mdx b/website/pages/docs/n1-dataloader.mdx new file mode 100644 index 0000000000..7a5680a0f5 --- /dev/null +++ b/website/pages/docs/n1-dataloader.mdx @@ -0,0 +1,148 @@ +--- +title: Solving the N+1 Problem with `DataLoader` +--- + +When building a server with GraphQL.js, it's common to encounter +performance issues related to the N+1 problem: a pattern that +results in many unnecessary database or service calls, +especially in nested query structures. + +This guide explains what the N+1 problem is, why it's relevant in +GraphQL field resolution, and how to address it using +[`DataLoader`](https://github.com/graphql/dataloader). + +## What is the N+1 problem? + +The N+1 problem happens when your API fetches a list of items using one +query, and then issues an additional query for each item in the list. +In GraphQL, this usually occurs in nested field resolvers. + +For example, in the following query: + +```graphql +{ + posts { + id + title + author { + name + } + } +} +``` + +If the `posts` field returns 10 items, and each `author` field fetches +the author by ID with a separate database call, the server performs +11 total queries: one to fetch the posts, and one for each post's author +(10 total authors). As the number of parent items increases, the number +of database calls grows, which can degrade performance. + +Even if several posts share the same author, the server will still issue +duplicate queries unless you implement deduplication or batching manually. + +## Why this happens in GraphQL.js + +In GraphQL.js, each field resolver runs independently. There's no built-in +coordination between resolvers, and no automatic batching. This makes field +resolvers composable and predictable, but it also creates the N+1 problem. +Nested resolutions, such as fetching an author for each post in the previous +example, will each call their own data-fetching logic, even if those calls +could be grouped. + +## Solving the problem with `DataLoader` + +[`DataLoader`](https://github.com/graphql/dataloader) is a utility library designed +to solve this problem. It batches multiple `.load(key)` calls into a single `batchLoadFn(keys)` +call and caches results during the life of a request. This means you can reduce redundant data +fetches and group related lookups into efficient operations. + +To use `DataLoader` in a `graphql-js` server: + +1. Create `DataLoader` instances for each request. +2. Attach the instance to the `contextValue` passed to GraphQL execution. You can attach the +loader when calling [`graphql()`](https://graphql.org/graphql-js/graphql/#graphql) directly, or +when setting up a GraphQL HTTP server such as [express-graphql](https://github.com/graphql/express-graphql). +3. Use `.load(id)` in resolvers to fetch data through the loader. + +### Example: Batching author lookups + +Suppose each `Post` has an `authorId`, and you have a `getUsersByIds(ids)` +function that can fetch multiple users in a single call: + +```js +import { + graphql, + GraphQLObjectType, + GraphQLSchema, + GraphQLString, + GraphQLList, + GraphQLID +} from 'graphql'; +import DataLoader from 'dataloader'; +import { getPosts, getUsersByIds } from './db.js'; + +const UserType = new GraphQLObjectType({ + name: 'User', + fields: () => ({ + id: { type: GraphQLID }, + name: { type: GraphQLString }, + }), +}); + +const PostType = new GraphQLObjectType({ + name: 'Post', + fields: () => ({ + id: { type: GraphQLID }, + title: { type: GraphQLString }, + author: { + type: UserType, + resolve(post, args, context) { + return context.userLoader.load(post.authorId); + }, + }, + }), +}); + +const QueryType = new GraphQLObjectType({ + name: 'Query', + fields: () => ({ + posts: { + type: GraphQLList(PostType), + resolve: () => getPosts(), + }, + }), +}); + +const schema = new GraphQLSchema({ query: QueryType }); + +function createContext() { + return { + userLoader: new DataLoader(async (userIds) => { + const users = await getUsersByIds(userIds); + return userIds.map(id => users.find(user => user.id === id)); + }), + }; +} +``` + +With this setup, all `.load(authorId)` calls are automatically collected and batched +into a single call to `getUsersByIds`. `DataLoader` also caches results for the duration +of the request, so repeated `.load(id)` calls for the same ID don't trigger +additional fetches. + +## Best practices + +- Create a new `DataLoader` instance per request. This ensures that caching is scoped +correctly and avoids leaking data between users. +- Always return results in the same order as the input keys. This is required by the +`DataLoader` contract. If a key is not found, return `null` or throw depending on +your policy. +- Keep batch functions focused. Each loader should handle a specific data access pattern. +- Use `.loadMany()` sparingly. While it's useful when you already have a list of IDs, it's +typically not needed in field resolvers, since `.load()` already batches individual calls +made within the same execution cycle. + +## Additional resources + +- [`DataLoader` GitHub repository](https://github.com/graphql/dataloader): Includes full API docs and usage examples +- [GraphQL field resolvers](https://graphql.org/graphql-js/resolvers/): Background on how field resolution works.