Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2 year old runner has randomly broken, Jobs not being picked up #3526

Closed
D3VL-Jack opened this issue Oct 24, 2024 · 2 comments
Closed

2 year old runner has randomly broken, Jobs not being picked up #3526

D3VL-Jack opened this issue Oct 24, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@D3VL-Jack
Copy link

Describe the bug
A suspected recent update has stopped runners from picking up jobs on production but a carbon copy staging environment still works. These servers have been running without any configuration changes for the previous 2 years, no associated software updates have aligned with this issue starting.

All jobs submitted are stuck in Waiting for a runner to pick up this job... until they finally time out.
The runner is seen successfully polling for Jobs, and the GitHub dashboard shows it as being online and idle, however, GitHub is not releasing jobs to the worker.

[2024-10-24 14:06:15Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
.... REPEATS ....
[2024-10-24 14:25:31Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-10-24 14:26:21Z INFO MessageListener] No message retrieved from session '********-****-****-****-************' within last 30 minutes.
[2024-10-24 14:26:21Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...
[2024-10-24 14:26:21Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/**REDACTED**/actions-runner/.credentials_rsaparams
[2024-10-24 14:26:22Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
.... REPEATS ....
[2024-10-24 14:41:27Z INFO MessageListener] BrokerMigration message received. Polling Broker for messages...

image

The runner updated itself on the 4th, however, issues have only arisen since the 19th at 02:00 UTC

[2024-10-04 18:49:51-0590] --------whoami--------
**REDACTED**
[2024-10-04 18:49:51-0636] --------whoami--------
[2024-10-04 18:49:51-0651] Waiting for Runner.Listener (2597187) to complete
[2024-10-04 18:49:51-0670] Process 2597187 still running
[2024-10-04 18:49:53-0051] Process 2597187 finished running
[2024-10-04 18:49:53-0071] Sleep 1 more second to make sure process exited
[2024-10-04 18:49:54-0053] Delete existing junction bin folder
[2024-10-04 18:49:54-0091] Delete existing junction externals folder
[2024-10-04 18:49:54-0127] Create junction bin folder
[2024-10-04 18:49:54-0161] Create junction externals folder
[2024-10-04 18:49:54-0302] Update succeed
[2024-10-04 18:49:54-0339] update.finished file creation succeed
[2024-10-04 18:49:54-0359] Rename /home/**REDACTED**/actions-runner/_diag/SelfUpdate-20241004-184951.log to be /home/**REDACTED**/actions-runner/_diag/SelfUpdate-20241004-184951.log.succeed
renamed '/home/**REDACTED**/actions-runner/_diag/SelfUpdate-20241004-184951.log' -> '/home/**REDACTED**/actions-runner/_diag/SelfUpdate-20241004-184951.log.succeed'

We've attempted restarting the service, and fully rebooting the system, neither result in jobs being picked up.

To Reproduce
It is unknown how to reproduce this broken state.

Expected behavior
Runners should be picking up jobs from GitHub.

Runner Version and Platform

2.320.0
Ubuntu 20.04.6 LTS

This may be a similar issue to #3515, though, Thiers appears to eventually pick up jobs.

@D3VL-Jack D3VL-Jack added the bug Something isn't working label Oct 24, 2024
@D3VL-Jack
Copy link
Author

Very strangely, exactly a week later (as of 30 minutes ago) Jobs have started being picked up again! 🤷‍♂️

@D3VL-Jack
Copy link
Author

Adding to this to keep a log, may be useful if anyone else faces these issues;
Today at 2PM exactly, this runner started having issues again, checking the logs it appears GitHub backend was rejected it from connecting citing that it was already connected.

[2024-11-21 14:05:48Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/[[REDACTED]]/actions-runner/.credentials_rsaparams
[2024-11-21 14:05:48Z INFO RSAFileKeyManager] Loading RSA key parameters from file /home/[[REDACTED]]/actions-runner/.credentials_rsaparams
[2024-11-21 14:05:48Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-11-21 14:05:48Z ERR  GitHubActionsService] POST request to https://pipelinesghubeus7.actions.githubusercontent.com/[[REDACTED]]/_apis/distributedtask/pools/1/sessions failed. HTTP Status: Conflict
[2024-11-21 14:05:48Z ERR  MessageListener] Catch exception during create session.
[2024-11-21 14:05:48Z ERR  MessageListener] GitHub.DistributedTask.WebApi.TaskAgentSessionConflictException: The actions runner Production Server already has an active session.
   at GitHub.Services.WebApi.VssHttpClientBase.HandleResponseAsync(HttpResponseMessage response, CancellationToken cancellationToken)
   at GitHub.Services.WebApi.VssHttpClientBase.SendAsync(HttpRequestMessage message, HttpCompletionOption completionOption, Object userState, CancellationToken cancellationToken)
   at GitHub.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpRequestMessage message, Object userState, CancellationToken cancellationToken)
   at GitHub.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpMethod method, IEnumerable`1 additionalHeaders, Guid locationId, Object routeValues, ApiResourceVersion version, HttpContent content, IEnumerable`1 queryParameters, Object userState, CancellationToken cancellationToken)
   at GitHub.Runner.Listener.MessageListener.CreateSessionAsync(CancellationToken token)
[2024-11-21 14:05:48Z INFO MessageListener] The session for this runner already exists.
[2024-11-21 14:05:48Z ERR  Terminal] WRITE ERROR: A session for this runner already exists.
[2024-11-21 14:05:48Z INFO MessageListener] The session conflict exception haven't reached retry limit.
[2024-11-21 14:05:48Z ERR  Terminal] WRITE ERROR: 2024-11-21 14:05:48Z: Runner connect error: The actions runner Production Server already has an active session.. Retrying until reconnected.
[2024-11-21 14:05:48Z INFO MessageListener] Sleeping for 30 seconds before retrying.
[2024-11-21 14:06:18Z INFO MessageListener] Attempt to create session.
[2024-11-21 14:06:18Z INFO MessageListener] Connecting to the Runner Server...
[2024-11-21 14:06:18Z INFO RunnerServer] EstablishVssConnection
[2024-11-21 14:06:18Z INFO RunnerServer] Establish connection with 100 seconds timeout.
[2024-11-21 14:06:18Z INFO GitHubActionsService] Starting operation Location.GetConnectionData
[2024-11-21 14:06:18Z INFO RunnerServer] EstablishVssConnection
[2024-11-21 14:06:18Z INFO RunnerServer] Establish connection with 60 seconds timeout.
[2024-11-21 14:06:18Z INFO GitHubActionsService] Starting operation Location.GetConnectionData
[2024-11-21 14:06:18Z INFO RunnerServer] EstablishVssConnection
[2024-11-21 14:06:18Z INFO RunnerServer] Establish connection with 60 seconds timeout.
[2024-11-21 14:06:18Z INFO GitHubActionsService] Starting operation Location.GetConnectionData
[2024-11-21 14:06:19Z INFO GitHubActionsService] Finished operation Location.GetConnectionData
[2024-11-21 14:06:19Z INFO GitHubActionsService] Finished operation Location.GetConnectionData
[2024-11-21 14:06:19Z INFO GitHubActionsService] Finished operation Location.GetConnectionData
[2024-11-21 14:06:19Z INFO GitHubActionsService] Starting operation Location.GetConnectionData
[2024-11-21 14:06:19Z INFO GitHubActionsService] Finished operation Location.GetConnectionData
[2024-11-21 14:06:19Z INFO GitHubActionsService] Starting operation Location.GetConnectionData
[2024-11-21 14:06:20Z INFO GitHubActionsService] Finished operation Location.GetConnectionData
[2024-11-21 14:06:20Z INFO GitHubActionsService] Starting operation Location.GetConnectionData
[2024-11-21 14:06:20Z INFO GitHubActionsService] Finished operation Location.GetConnectionData
[2024-11-21 14:06:20Z INFO MessageListener] VssConnection created
[2024-11-21 14:06:20Z INFO Terminal] WRITE LINE: 
[2024-11-21 14:06:20Z INFO Terminal] WRITE LINE: 
[2024-11-21 14:06:20Z WARN GitHubActionsService] Attempt 1 of POST request to https://pipelinesghubeus7.actions.githubusercontent.com/[[REDACTED]]/_apis/distributedtask/pools/1/sessions failed (HTTP Status: ServiceUnavailable). The operation will be retried in 10.962 seconds.
[2024-11-21 14:06:31Z WARN GitHubActionsService] Attempt 2 of POST request to https://pipelinesghubeus7.actions.githubusercontent.com/[[REDACTED]]/_apis/distributedtask/pools/1/sessions failed (HTTP Status: ServiceUnavailable). The operation will be retried in 12.544 seconds.
[2024-11-21 14:06:44Z WARN GitHubActionsService] Attempt 3 of POST request to https://pipelinesghubeus7.actions.githubusercontent.com/[[REDACTED]]/_apis/distributedtask/pools/1/sessions failed (HTTP Status: ServiceUnavailable). The operation will be retried in 17.231 seconds.
[2024-11-21 14:07:01Z ERR  GitHubActionsService] Attempt 4 of POST request to https://pipelinesghubeus7.actions.githubusercontent.com/[[REDACTED]]/_apis/distributedtask/pools/1/sessions failed (HTTP Status: ServiceUnavailable). The maximum number of attempts has been reached.

After this, no commands would allow the server to reconnect and it would only throw the following error

[2024-11-22 00:03:23Z ERR  GitHubActionsService] POST request to https://vstoken.actions.githubusercontent.com/_apis/oauth2/token/[[REDACTED]] failed. HTTP Status: BadRequest, AFD Ref: Ref A: [[REDACTED]] Ref B: [[REDACTED]] Ref C: 2024-11-22T00:03:23Z
[2024-11-22 00:03:23Z INFO GitHubActionsService] AAD Correlation ID for this token request: Unknown
[2024-11-22 00:03:23Z ERR  MessageListener] Catch exception during create session.
[2024-11-22 00:03:23Z ERR  MessageListener] GitHub.Services.OAuth.VssOAuthTokenRequestException: Registration was not found or is not medium trust. ClientType: 
   at GitHub.Services.OAuth.VssOAuthTokenProvider.OnGetTokenAsync(IssuedToken failedToken, CancellationToken cancellationToken)
   at GitHub.Services.Common.IssuedTokenProvider.GetTokenOperation.GetTokenAsync(VssTraceActivity traceActivity)
   at GitHub.Services.Common.IssuedTokenProvider.GetTokenAsync(IssuedToken failedToken, CancellationToken cancellationToken)
   at GitHub.Services.Common.VssHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at GitHub.Services.Common.VssHttpRetryMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
   at GitHub.Services.WebApi.VssHttpClientBase.SendAsync(HttpRequestMessage message, HttpCompletionOption completionOption, Object userState, CancellationToken cancellationToken)
   at GitHub.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpRequestMessage message, Object userState, CancellationToken cancellationToken)
   at GitHub.Services.WebApi.VssHttpClientBase.SendAsync[T](HttpMethod method, IEnumerable`1 additionalHeaders, Guid locationId, Object routeValues, ApiResourceVersion version, HttpContent content, IEnumerable`1 queryParameters, Object userState, CancellationToken cancellationToken)
   at GitHub.Runner.Listener.MessageListener.CreateSessionAsync(CancellationToken token)

specifically Registration was not found or is not medium trust.

Removing .runner and adding this server again as a new runner has fixed it for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant