Skip to content

Tag listing is slow and will not scale (N+1 problem) #3

Open
@shevron

Description

@shevron

When listing tags, we call the GitHub API to fetch the list of git refs, and then iterate over them to get the git tag information (message, creation date, revision it points to) for each one. Each requires an addition GitHub API call. Essentially, this is a classic N+1 problem.

This will become very slow quite fast once we have more than a handful of tags for a dataset.

Directions to solve:

  1. Maybe there is an API endpoint I am missing that could be used for this instead of what I have used. I didn't find any but perhaps there is a way.
  2. Lazy-load some of the information not on initial fetch but on subsequent data access. This can speed up some use cases but will not help with others.
  3. ???

Note that this doesn't even include an additional API call that might be needed for some use cases, to fetch the data package itself beyond the revision the tag points to.

Github API

There are 3 ways to get tags ...

Git Data API "References"

  • GET /repos/:owner/:repo/git/matching-refs/tags
  • For each result you look up the tag object ... GET /repos/:owner/:repo/git/tags/:tag_sha

https://developer.github.com/v3/repos/#list-tags

[
  {
    "name": "v0.1",
    "commit": {
      "sha": "c5b97d5ae6c19d5c5df71a34c7fbeeda2479ccbc",
      "url": "https://api.github.com/repos/octocat/Hello-World/commits/c5b97d5ae6c19d5c5df71a34c7fbeeda2479ccbc"
    },
    "zipball_url": "https://github.com/octocat/Hello-World/zipball/v0.1",
    "tarball_url": "https://github.com/octocat/Hello-World/tarball/v0.1"
  }
]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions