Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runMATLABCommand sometimes stuck. #322

Open
Rvh91 opened this issue May 14, 2024 · 14 comments
Open

runMATLABCommand sometimes stuck. #322

Rvh91 opened this issue May 14, 2024 · 14 comments
Assignees

Comments

@Rvh91
Copy link

Rvh91 commented May 14, 2024

I've noticed that sometimes a stage is 'stuck'. where I have executed a certain task in the buildtool, it completes but somehow jenkins does not continue, so probably Matlab isn't providing an exit code? I don't really know how to debug this. Not sure if it is relevant, but multiple builds can be running simultaneous on the same windows agent.

stage('Run repository test suite (unit & integration tests)') {
	steps {
		runMATLABCommand(command: 'buildtool testReport')
	}
}

I've noticed that on the builds where it does continue I can see the following:


Parallel pool using the 'Processes' profile is shutting down.

while on those where it seems stuck it only shows the

** Finished testReport
@nbhoski
Copy link
Member

nbhoski commented May 15, 2024

Hi @Rvh91 can you please check if the same issue is reproducible outside of the Jenkins ? meaning you could just run the same build tool on your batch CLI using matlab -batch command
for eg

matlab -batch "buildtool testReport"

you could run above command on the same host where you see the job is getting stuck and see if its reproducible.

@Rvh91
Copy link
Author

Rvh91 commented May 15, 2024

Hi @nbhoski, This is a bit difficult to troubleshoot, as it does not happen every time. I have tried the command a couple of times from the command line now, but it doesn't happen now. That being said, it could also be related to only having one 'instance' running now? While when Jenkins is active, it frequently happens that we have multiple builds running simultaneously on the same machine. I'm not entirely sure how the Parallel pool works? is that shared between instances?

@nbhoski
Copy link
Member

nbhoski commented May 15, 2024

Hi @nbhoski, This is a bit difficult to troubleshoot, as it does not happen every time. I have tried the command a couple of times from the command line now, but it doesn't happen now. That being said, it could also be related to only having one 'instance' running now? While when Jenkins is active, it frequently happens that we have multiple builds running simultaneously on the same machine. I'm not entirely sure how the Parallel pool works? is that shared between instances?

One thing I could suggest you is check if your resources are not exosted. try increasing the worker threads on Jenkins and see if this persists.

@Rvh91
Copy link
Author

Rvh91 commented May 15, 2024

@nbhoski, do you mean the number of executors on our agent?

@nbhoski
Copy link
Member

nbhoski commented May 15, 2024

@nbhoski, do you mean the number of executors on our agent?

Yes

@Rvh91
Copy link
Author

Rvh91 commented May 15, 2024

I have currently 4 executors enabled on this agent, so I'm able to run 4 build simultaneously, which hardly ever happens. But I suspect that the parallel pool that is created when using a 'parfor' loop, might somehow be shared between builds? So it wont be able to shut down, as another instance is still using it? Is that possible?

@nbhoski
Copy link
Member

nbhoski commented May 16, 2024

could you share a similar example which uses PCT keywords like parfor so that it would be easy for me to reproduce it.

@TylerWeir
Copy link

Hello, I'm facing a similar issue where our Jenkins runMATLABBuild call was hanging on a specific commit running through our CI pipeline. This specific call to runMATLABBuild runs a buildtool task to build all models in our Simulink project matching a specific label. Note this issue was only affecting a single commit running through the CI pipeline.

As suggested above, I tried to reproduce the issue outside of Jenkins using matlab -batch buildtool [task_name] on the problem commit. Interestingly, at the end of the build output, but prior to Matlab exiting a prompt window is opened asking about saving a data dictionary before closing (undoubtedly there is an error in the build, but I'd expect the CI pipeline to be resilient nonetheless). I'm wondering if this prompt windows is the source of the hang when running in CI as well? Perhaps something similar is the cause of @Rvh91's problem? Note that Matlab exits only after closing the prompt.

Here is the prompt that is opened:
batch-prompt-croped

Further, I tried running the same command with the -noFigureWindows option. While, the prompt didn't open, Matlab only printed warnings and never seemed to exit. It only exits once I kill it with ctrl-c.

Here are said warnings:

Error using buildtool
Build failed.

Warning: dialog is no longer supported when MATLAB is started with the -nodisplay or -noFigureWindows option or there is no display.
> In warnfiguredialog (line 15)
In dialog (line 41)
In questdlg (line 160)
Warning: uiwait is no longer supported when MATLAB is started with the -nodisplay or -noFigureWindows option or there is no display.
> In warnfiguredialog (line 15)
In uiwait (line 40)
In questdlg (line 413)

And here's the tail of the logs of the runMatlabBuild that hung in CI:

** Failed [task name]

** Closed project [our project name]

{�Error using buildtool
Build failed.

Error in build_CtbW3xqx (line 2)
buildtool [task name]
}� 

I hope these findings are useful in debugging this issue. Happy to help if possible.

@nbhoski
Copy link
Member

nbhoski commented May 22, 2024

@TylerWeir I too think the prompt window is the cause of hang. are you making any changes to the suggested file in the pop up window ? could you handle it ?

@nbhoski
Copy link
Member

nbhoski commented May 28, 2024

@TylerWeir could you confirm if the the issue was with pop up window and if you could resolve it?

@TylerWeir
Copy link

@TylerWeir could you confirm if the the issue was with pop up window and if you could resolve it?

Yes, fixing the Simulink Model to avoid the pop up window allowed our build to complete successfully and thus run through CI.

@nbhoski
Copy link
Member

nbhoski commented Jun 10, 2024

executors

Hi @Rvh91,

I was trying to reproduce the issue in local. to support you better could you please provide with following

  • How many core do you set for your MATLAB's parallel run.
  • How many executors do you have in the host where your MATLAB is running
  • could you please provide the example pipeline (if not original)
  • And Example build file or test which are using parfor

This would help me reproduce the issue close to your environment.

Regards
Nikhil

@nbhoski
Copy link
Member

nbhoski commented Aug 26, 2024

@Rvh91 could you please confirm if this issue is still relevent ? Please let me know if this is still reproducible.

@Rvh91
Copy link
Author

Rvh91 commented Aug 27, 2024

It is still an issue. We have moved to calling matlab from the commandline via:
bat "matlab -batch -wait ${fullCommand} "

so without the plugin, however the problem persists. So it might not be directly due to the plugin itself. Also I cant reproduce it consistently, it appears on some builds, but not on others. I have yet to figure out what the root cause is here.

@nbhoski nbhoski assigned mw-kapilg and unassigned nbhoski Nov 25, 2024
@davidbuzinski davidbuzinski self-assigned this Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants