-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FileReadError exception leading to 8001 exit code instead of 8021 #42179
Comments
assign core |
New categories assigned: core @Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks |
A new Issue was created by @makortel Matti Kortelainen. @Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
Another case was reported in https://cms-talk.web.cern.ch/t/corrupted-files/32728, in CMSSW_13_0_15. Snippet from the log
|
Could we have a |
I didn't find any slicing @wddgit Could you look into this? I'd hope the problem would be reproducible by just throwing the |
having the correct exit code will facilitate automated reporting, checking and fixing of corrupted files a topic which has now reached near the top of the queue for DataTransfer Ops https://indico.cern.ch/event/1356295/ |
We surely have plenty of examples of
which exit with 8021. I did not check in those cases if there's a lzma report as well, it seemed enough to trigger a file checksum verification to me. |
Could you give a pointer to the log of this case, or paste the full CMSSW exception message here? |
I'll start looking at this after lunch. |
here's one such job: exception seems very similar
|
Thanks. This job used CMSSW_10_6_26. The context line 0 is added in cmssw/IOPool/Input/src/RootDelayedReader.cc Lines 80 to 84 in d884dab
The exception handling in this class was changed in #36743, that was merged in 12_3_X. @belforte I'm becoming more interested in various examples in either way (correct vs. incorrect exit code). |
This looks suspicious. I have not verified this or looked carefully yet but this looks suspicious like it might slice. I just noticed it. Possibly unrelated... https://cmssdt.cern.ch/dxr/CMSSW/source/FWCore/Framework/src/ProductResolvers.cc#302 I'll continue working on this. |
Only one match for this:
And this points to ProductResolvers.cc in extendException. There are four places where that is used. One is line 302 and there are 3 other similar spots in the same file. I have not succeeded in reproducing this or that proving my fix gets rid of the problem, but the fix is obvious. I could just submit a PR with the fix and then if the problem does not go away we revisit this. Change:
to:
|
Good catch! The problem is that e.g. in try {
...
} catch(cms::Exception& e) {
throw e;
} the This is how I interpret https://en.cppreference.com/w/cpp/language/throw
vs
|
For the moment, I'll continue to try to figure out how to reproduce the problem. Simply throwing at the location you suggested yields the correct error code so there is more to it than that... |
Yes to your comment. |
The |
I reproduced the error and verified the fix works. At first I was throwing on every getEntry, but the first one is for EventAuxiliary which is not a delayed read. If you wait for some product read by a module then the problem reproduces and the fix makes it go away. I submit a PR soon. |
Thanks @wddgit ! I think the fix needs to be backported to
but let's get the 14_0_X PR in first. |
@belforte I think I won't need more examples anymore. |
Just submitted #43579 which fixes the problem in 14_0_X and adds a unit test. |
@wddgit The master PR was merged. Could you look into backporting it to the release cycles listed in #42179 (comment) ? |
I'll start on that this afternoon. |
All the backport pull requests are submitted now. |
+core All backport PRs have been merged |
This issue is fully signed and ready to be closed. |
A log file analyzed in #40437 (comment) shows a case where FileReadError exception lead to 8001 exit code instead of 8021. The job used CMSSW_12_4_14_patch1. Based on the exception message, it seems to me the exception in question is thrown here
cmssw/IOPool/Input/src/RootTree.cc
Lines 383 to 394 in b517d9f
The text was updated successfully, but these errors were encountered: