Hourly 401 exceptions, causing PR queue to drain #240

yale · 2019-04-23T18:59:54Z

Hi there 👋

I'm running a private fork of this repo for internal use at my company. Things are running excellently - except, every hour, I notice a pattern:

The top graph is the size of the PR queue. Notice that every hour, the size of the PR queue drops down to 0, meaning the bot forgets about all the PRs that it has been tracking to that point. The bottom graph is of the rate limit header coming back from the Github API. The spikes seem to indicate when the token is refreshed.

This graph is of the exceptions in Sentry. Most of which are "Bad Credentials" 401 coming from Github's API. The number of occurrences each hour matches the size of the PR queue. (Someone else is dealing with this: #97)

When the PR processing fails, the PR drops off the queue and needs to be manually re-enqueued somehow. This leads to a pretty crummy experience where PRs are not getting auto merged for a long while.

I found this issue in the Probot project, which describes the issue I'm facing: probot/probot#637

Could it be that the enqueued PRs are stuck with their stale context, even after the installation token has been refreshed?

bobvanderlinden · 2019-04-28T16:27:57Z

Thanks a lot for these statistics. I have not been able to find the cause of the 401's nor the hourly pattern using sentry.io, but your graphs clearly show the pattern.

You are probably right about auto-merge using a stale context. It seems probot only handles token refreshes upon receiving events and not when doing API requests.

I've made a PR here that should resolve this issue: #246
Since I currently lack statistics, could you give it a go?

Apart from this issue, do you know of any free online service where I can manage such statistics/graphs? I've looked at Datadog, but they do not supply any free plans for custom metrics.

bobvanderlinden mentioned this issue Apr 28, 2019

Error while processing PR #97

Closed

yale mentioned this issue May 2, 2019

Prometheus metrics reporting #258

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hourly 401 exceptions, causing PR queue to drain #240

Hourly 401 exceptions, causing PR queue to drain #240

yale commented Apr 23, 2019 •

edited

Loading

bobvanderlinden commented Apr 28, 2019

Hourly 401 exceptions, causing PR queue to drain #240

Hourly 401 exceptions, causing PR queue to drain #240

Comments

yale commented Apr 23, 2019 • edited Loading

bobvanderlinden commented Apr 28, 2019

yale commented Apr 23, 2019 •

edited

Loading