-
Notifications
You must be signed in to change notification settings - Fork 866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Releases failing randomly with agent error "Unhandled exception happened in worker" #3855
Comments
Hi @getvivekv thanks for reporting! We are working on more prioritized issues at the moment, but will get back to this one soon. |
Any resolution or idea why this is happening. Same problem here. Self Hosted agent via self hosted kubernetes Only happens intermittent. This particular job successfully completed on attempt 3. Sometimes it goes through first time. Sometimes it fails 3 attempts and pipeline falls over. |
##[error]Newtonsoft.Json.JsonReaderException: Invalid character after parsing property name. Expected ':' but got: m. Path 'jobContainer', line 1, position 30728. |
I'm also running self-hosted agents on AKS and getting the exact same thing reported here: sometimes it succeeds first time and other times you need to run 2-3 times for it to succeed because you keep getting the dreaded |
Hi @getvivekv @BenH-Puregym @markatky46 could you please share more information on this approach:
If you have any private information, please send it to [email protected]. Thanks |
@KonstantinTyukalov, always pulls the latest, it just uses this script at the time of writing it's using 2.210.1 unsure what i can provide on your second point because the agent is just built on linux using the script i mentioned above with some added extras and then that's deployed to AKS as a Keda deployment. In devops it's completely random, there's no specific task which triggers it. Could be building a site, running a script or even sending a http request. |
@BenH-Puregym how often are you able to reproduce this? Is there something special about network configuration that you have? |
Hi all, |
@BenH-Puregym, @getvivekv We will need some help here from you on your specific setup as we can't reproduce the issue despite trying for several days. Looking at the code we are using standard and very widely used library to serialize/deserialize the json and I don't expect the issue be there. More likely there is something in your environment that triggers this and we would need more info (e.g. are you using ScaledObject or ScaledJobs). Based on the log snippets it almost looks like the http payload has been truncated. |
@mmrazik My team reports that they are no longer seeing this error after we had the outage on Azure which forced us to reboot all the nodes on the Kubernetes cluster. The team also moved several of the critical pipeline to use a VM based agent as this was a blocker for us for several weeks. Currently it is not reproducible. The agents were running on AKS with Istio service mesh with mTLS enabled. The pipeline had a kubectl task, helm chart task and a cmd task. |
Ok. I am closing this for now. |
This is still happening I'm afraid. Didn't happen for a couple weeks and now we've had multiple instances of it happen in the last week. I've been able to capture some more information this time. So when a job finishes devops should delete the agent so that we have short lived agents... By the looks of the logs on this occurrence it seems when the job finished instead of deleting immediately like it does on other agents (i can see this by looking at the logs) it failed over and over again to delete the agent and eventually succeeds. In the meantime another job on a totally different pipeline decides to pick the same agent and of course once the agent is deleted we get the
|
this issue still happens with self-hosted windows agent. I just logged my bug here: #4813 |
Our pipelines are failing randomly with this error message produced by the ADO Agent
Agent package linux-x64.
Running on Linux (X64).
RuntimeInformation: Linux 5.4.0-1078-azure 81~18.04.1-Ubuntu SMP Mon Apr 25 23:16:13 UTC 2022.
Running as container on Kubernetes (AKS)
Azure DevOps Type and Version: dev.azure.com
File: /azp/_diag/Worker_20220602-113138-utc.log
It is not clear from the error what is wrong. The same agent is able to build some builds which tells me that there is no issue with the PAT token or the injected environment variables. This is running as a container in AKS
The text was updated successfully, but these errors were encountered: