Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User not found #7

Closed
yangavin opened this issue Mar 28, 2024 · 9 comments
Closed

User not found #7

yangavin opened this issue Mar 28, 2024 · 9 comments

Comments

@yangavin
Copy link

yangavin commented Mar 28, 2024

A regular request to get user's contributions stopped working, getting a response of:

{
"error": "User \"grubersjoe\" not found."
}

@grubersjoe
Copy link
Owner

Hey. Thanks for reporting this! GitHub seems to have changed how the contribution graph is rendered. Now client-side JavaScript appears to be required. You can test this by disabling JavaScript in your browser (the chart never shows up, but you will only see a spinner forever). Consequently, the contribution data can no longer be scraped by simply parsing the HTML source.

So this will make this a lot harder, unfortunately, and I have to think about this. There probably is no way around something like Puppeteer which sounds quite heavy.

@grubersjoe
Copy link
Owner

grubersjoe commented Mar 29, 2024

I think I've found a way around Puppeteer so that DOM parsing can still be used. There's an endpoint that prerenders some of the calendar, but it should be all we need:

https://github.com/users/grubersjoe/contributions

@adriangalilea
Copy link

adriangalilea commented Mar 29, 2024

Hey @grubersjoe, why not just use graphql?
image

source

EDIT: managed to implement a custom version of your component github-activity-calendar (dunno how storyblooks work so I just copypasted all src 😆 ) and then hooked it up to the graphql endpoint, then flattened the data, works flawless.

@grubersjoe
Copy link
Owner

Oh wow, there's an API now? I'm pretty sure there wasn't when I started the project.

In this case of course this is way to go. I'll look into it later. Thanks for the hint!

@adriangalilea
Copy link

Oh wow, there's an API now? I'm pretty sure there wasn't when I started the project.

I could not find the documentation anywhere, just that post.

You can take a look at my crude implementation here

Glad that I helped ;)

@yangavin
Copy link
Author

Hey @grubersjoe,

It's really unfortunate that GitHub changed their rendering strategy. I was actually using GitHub's GraphQL API and I had two major issues with it that ultimately lead me to using your API:

  1. Rate limits
    GitHub's API has a rate limit that I found to be too low for fetching large amounts of data, such as contributions:
    https://docs.github.com/en/graphql/overview/rate-limits-and-node-limits-for-the-graphql-api
    I could be querying it wrong, but I ran into the limit for personal use, I can imagine this may be an issue for a public-facing API.

  2. Only showing previous year
    Currently there does not seem to be a way to fetch all total contributions, you can only fetch total contributions for the past year (similar to what's shown on your GitHub profile on initial load. There isn't even an option to fetch total contributions within a date range. More about this here:
    Repositories contributed to is only the last year anuraghazra/github-readme-stats#2269

I resorted to using your API because your web-scraping strategy bypasses these two issues. These are just some things to consider as your working on a fix if you're using GraphQL. While something like Puppeteer may be heavy, it could perhaps bring in a tremendous amount of value that is not currently being supported by GraphQL.

Hope this helps!

@d3or
Copy link
Contributor

d3or commented Mar 29, 2024

Hi. PR here that should fix this: #8

@grubersjoe
Copy link
Owner

grubersjoe commented Mar 29, 2024

Deployed latest version, everything should work again. Thanks everyone ❤️!

@grubersjoe
Copy link
Owner

@yangavin I think you made two very good points 👍. The rate limit would probably be reached pretty fast. Scraping is fragile but it solves these two issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants