Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tag listing is slow and will not scale (N+1 problem) #3

Open
shevron opened this issue Jun 1, 2020 · 2 comments
Open

Tag listing is slow and will not scale (N+1 problem) #3

shevron opened this issue Jun 1, 2020 · 2 comments

Comments

@shevron
Copy link
Contributor

shevron commented Jun 1, 2020

When listing tags, we call the GitHub API to fetch the list of git refs, and then iterate over them to get the git tag information (message, creation date, revision it points to) for each one. Each requires an addition GitHub API call. Essentially, this is a classic N+1 problem.

This will become very slow quite fast once we have more than a handful of tags for a dataset.

Directions to solve:

  1. Maybe there is an API endpoint I am missing that could be used for this instead of what I have used. I didn't find any but perhaps there is a way.
  2. Lazy-load some of the information not on initial fetch but on subsequent data access. This can speed up some use cases but will not help with others.
  3. ???

Note that this doesn't even include an additional API call that might be needed for some use cases, to fetch the data package itself beyond the revision the tag points to.

Github API

There are 3 ways to get tags ...

Git Data API "References"

  • GET /repos/:owner/:repo/git/matching-refs/tags
  • For each result you look up the tag object ... GET /repos/:owner/:repo/git/tags/:tag_sha

https://developer.github.com/v3/repos/#list-tags

[
  {
    "name": "v0.1",
    "commit": {
      "sha": "c5b97d5ae6c19d5c5df71a34c7fbeeda2479ccbc",
      "url": "https://api.github.com/repos/octocat/Hello-World/commits/c5b97d5ae6c19d5c5df71a34c7fbeeda2479ccbc"
    },
    "zipball_url": "https://github.com/octocat/Hello-World/zipball/v0.1",
    "tarball_url": "https://github.com/octocat/Hello-World/tarball/v0.1"
  }
]
@shevron
Copy link
Contributor Author

shevron commented Jun 10, 2020

This GraphQL query seems to work more or less, and provide results much faster than a bunch of REST calls:

query($repoName:String!, $repoOwner:String!) {
  repository(name: $repoName, owner: $repoOwner) {
    refs(refPrefix: "refs/tags/", last: 100) {
      nodes {
        name
        target {
          __typename
          ... on Tag {
            oid
            name
            tag_message: message
            tagger {
              email
              name
            }
            target {
              oid
            }
          }
          ... on Commit {
            commit_message: message
          }
        }
      }
    }
  }
}

But on some repositories I noticed objects pointed to by refs/tags/ are not Tag objects but Commit objects. That is very odd, perhaps related to how tags work in Git and maybe a difference between annotated and lightweight tags? I need to test to see if this works on tags created via the API. If this inconsistency cannot be explained I don't know if the API is used correctly. Will continue to investigate.

@shevron
Copy link
Contributor Author

shevron commented Jun 14, 2020

I started implementing this in feature/tag-listing-using-graphql. I am pausing for now as I have higher priority tasks.

I reached a point where I need to somehow plugin the GitHub API authentication token into the GraphQL API, and am not sure how to do that. Will continue investigating once time allows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant