Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hourly 401 exceptions, causing PR queue to drain #240

Open
yale opened this issue Apr 23, 2019 · 1 comment
Open

Hourly 401 exceptions, causing PR queue to drain #240

yale opened this issue Apr 23, 2019 · 1 comment

Comments

@yale
Copy link
Contributor

yale commented Apr 23, 2019

Hi there 👋

I'm running a private fork of this repo for internal use at my company. Things are running excellently - except, every hour, I notice a pattern:

Screen Shot 2019-04-22 at 5 28 02 PM

The top graph is the size of the PR queue. Notice that every hour, the size of the PR queue drops down to 0, meaning the bot forgets about all the PRs that it has been tracking to that point. The bottom graph is of the rate limit header coming back from the Github API. The spikes seem to indicate when the token is refreshed.

Screen Shot 2019-04-22 at 5 28 52 PM

This graph is of the exceptions in Sentry. Most of which are "Bad Credentials" 401 coming from Github's API. The number of occurrences each hour matches the size of the PR queue. (Someone else is dealing with this: #97)

When the PR processing fails, the PR drops off the queue and needs to be manually re-enqueued somehow. This leads to a pretty crummy experience where PRs are not getting auto merged for a long while.

I found this issue in the Probot project, which describes the issue I'm facing: probot/probot#637

Could it be that the enqueued PRs are stuck with their stale context, even after the installation token has been refreshed?

@bobvanderlinden
Copy link
Owner

Thanks a lot for these statistics. I have not been able to find the cause of the 401's nor the hourly pattern using sentry.io, but your graphs clearly show the pattern.

You are probably right about auto-merge using a stale context. It seems probot only handles token refreshes upon receiving events and not when doing API requests.

I've made a PR here that should resolve this issue: #246
Since I currently lack statistics, could you give it a go?

Apart from this issue, do you know of any free online service where I can manage such statistics/graphs? I've looked at Datadog, but they do not supply any free plans for custom metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants