-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nco randomly fails to append ts plots #365
Comments
@adagj @YanchunHe have you encountered this on nird? |
@justin-richling |
This seems to be an issues with running on too many processors and overloading the number of times a file is being opened/used? It's also more likely when we're running with more variables. Regardless, running on a single processor makes this work fine (although slowly). Is there another way to ncks time series files with multiprocessing? |
Avoid crashes related to #365, but runs slowly
Avoid crashes related to #365, but runs slowly
FWIW Richard Valent at the the NCAR help desk commented: Thanks, Will. I'm glad you have a workaround running sequentially. The NCO User Guide https://nco.sourceforge.net/nco.pdf describes some parallel strategy. It looks like it's on the user to program it correctly. I'll need to study the Guide further to be sure, esp the section named "Parallel" starting on p. 415 of the Guide. I'll let you know when I have had a look. PS I checked our past tickets and find no references to users trying to run CDO in parallel. |
This might be better solved by using ( |
What happened?
It seems like nco randomly fails to append land timeseries plots with area & landfrac data, which causes subsequent parts of the adf workflow to fail. Error code from ts generation is below.
The source code that's causing the failure is in adf_diag
I've tried changing the ncarenv being used, as well as nco and hdf5 versions that are loaded (e.g. nco/5.2.4 or 5.3.1). I also tried pointing to different cases and adding the
-C
flag to the ncks command.The timeseries and climo generation typically proceeds, but then the LDF can fail when trying to calculate tables of global means / sums, although this is also inconsistent. In cases when the LDF fails I've been able to identify the case and variables where ncks fails to append ts files appropriately. Then I delete both of these ts and climo files, and repeat the LDF. Eventually this manual workaround is successful, but it's kind of a frustrating, time consuming process.
ADF Hash you are using
clm-diag branch
What machine were you running the ADF on?
CISL machine
The text was updated successfully, but these errors were encountered: