Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JFR main thread dies because build log file does not exist yet #62

Open
anbrsap opened this issue Jan 18, 2021 · 1 comment
Open

JFR main thread dies because build log file does not exist yet #62

anbrsap opened this issue Jan 18, 2021 · 1 comment
Labels
bug Something isn't working

Comments

@anbrsap
Copy link
Contributor

anbrsap commented Jan 18, 2021

Problem

Sometimes Steward pipeline runs do not terminate correctly but get killed after the timeout period, although according to the pipeline log the pipeline finished (either successfully or with error).

At the beginning of the JFR container log the following messages could be found:

Cloning pipeline repository https://github.com/SAP-samples/stewardci-example-pipelines
Cloning into '.'...
Checking out pipeline from revision master
Your branch is up to date with 'origin/master'.
Already on 'master'
Delete pipeline git clone credentials
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.thoughtworks.xstream.core.util.Fields (file:/app/repo/org/jvnet/hudson/xstream/1.4.7-jenkins-1/xstream-1.4.7-jenkins-1.jar) to field java.util.TreeMap.comparator
WARNING: Please consider reporting this to the maintainers of com.thoughtworks.xstream.core.util.Fields
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2021-01-15 07:50:03.002+0000 [id=1]	SEVERE	h.i.i.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler#uncaughtException: A thread (main/1) died unexpectedly due to an uncaught exception, this may leave your Jenkins in a bad way and is usually indicative of a bug in the code.
java.io.FileNotFoundException: /jenkins_home/jobs/job/builds/1/log (No such file or directory)
	at java.base/java.io.RandomAccessFile.open0(Native Method)
	at java.base/java.io.RandomAccessFile.open(RandomAccessFile.java:345)
	at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:259)
	at java.base/java.io.RandomAccessFile.<init>(RandomAccessFile.java:214)
	at org.kohsuke.stapler.framework.io.LargeText$FileSession.<init>(LargeText.java:399)
	at org.kohsuke.stapler.framework.io.LargeText$2.open(LargeText.java:121)
	at org.kohsuke.stapler.framework.io.LargeText.writeLogTo(LargeText.java:212)
	at hudson.console.AnnotatedLargeText.writeLogTo(AnnotatedLargeText.java:165)
	at hudson.model.Run.writeWholeLogTo(Run.java:1556)
	at io.jenkins.jenkinsfile.runner.Runner.writeLogTo(Runner.java:134)
	at io.jenkins.jenkinsfile.runner.Runner.run(Runner.java:94)
Caused: java.lang.reflect.InvocationTargetException
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at io.jenkins.jenkinsfile.runner.JenkinsfileRunnerLauncher.doLaunch(JenkinsfileRunnerLauncher.java:32)
	at io.jenkins.jenkinsfile.runner.JenkinsLauncher.launch(JenkinsLauncher.java:83)
	at io.jenkins.jenkinsfile.runner.App.run(App.java:13)
	at io.jenkins.jenkinsfile.runner.bootstrap.Bootstrap.run(Bootstrap.java:312)
	at io.jenkins.jenkinsfile.runner.bootstrap.Bootstrap.main(Bootstrap.java:135)

It seems that it happens more often if the Kubernetes node has all CPUs fully utilized while the JFR starts up, e.g. when many pipeline runs are triggered simultaneously.

Version Information

First Analysis

The stack trace shows the call stack of the main thread. The InvocationTargetException wraps the FileNotFoundException coming from the reflective call of io.jenkins.jenkinsfile.runner.Runner.run().

io.jenkins.jenkinsfile.runner.Runner.run() schedules a new build, waits until it has been started, and then starts copying the build log file contents to stdout.

Presumably, with our Elasticsearch Log Plug-In the creation of the local build log file happens some time after the build has been started. So there is a race condition between the Elasticsearch Log Plug-In initialization (creating the local log file) and log copy by io.jenkins.jenkinsfile.runner.Runner.run(). If the latter is faster, it fails to open the local build log file.

As the build execution happens in other threads, the pipeline can run to completion even after the main thread died. I currently don't know why the JVM doesn't terminate.

@anbrsap anbrsap transferred this issue from SAP/stewardci-core Jan 18, 2021
anbrsap added a commit that referenced this issue Jan 18, 2021
Create the local build log file _before_ starting JFR, so that JFR's log copy does not fail.

Should be reverted once a permanent fix for #62
has been established.
@holgpar
Copy link
Contributor

holgpar commented Jan 18, 2021

I currently don't know why the JVM doesn't terminate.

We have seen this before. Apparently, it is the job of the main thread to shutdown everything once the build is finished.

anbrsap added a commit that referenced this issue Jan 27, 2021
anbrsap added a commit that referenced this issue Jan 27, 2021
@anbrsap anbrsap added the bug Something isn't working label Jan 27, 2021
anbrsap added a commit that referenced this issue Jan 28, 2021
Workaround for #62 (attempt 2)
alxsap pushed a commit that referenced this issue Nov 16, 2021
romanisb pushed a commit to romanisb/stewardci-jenkinsfilerunner-image that referenced this issue Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants