-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle UnknownFormatException for batch processing multiple files. #4233
Conversation
When sequentially processing many files the unhandled exception terminates the program and no further files will be processed that may still be readable. This commit catches the exception, provides a warning message and continues processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tstoeter for the follow-up. A few inline comments to try to better understand the expected behavior.
As we are still a few weeks away from the next major release of Bio-Formats, there is still time to refine the expectation of the batch functionality and document it appropriately in https://bio-formats.readthedocs.io/en/latest/users/index.html#command-line-tools. However, note that post-release, this will be treated like the rest of the API and any backwards-incompatible change will require an increment of the major version number.
new ImageInfo().testRead(newArgs); | ||
} | ||
catch (UnknownFormatException e) { | ||
LOGGER.warn("Unknown file format: " + newArgs[idx]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ImageInfo.testRead
can throw three types of exception:
FormatException
(of whichUnknowFormatException
is a subclass of)IOException
ServiceException
What is the rationale for handling UnknownFormatException
differently from the other exceptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @sbesson for your code review and for pointing out the other possible exceptions. I intended to do the least changes to the current behaviour. I can add exception handling for the other cases, too, i.e. to handle general FormatException and IOException. Could you please tell me, in which case a ServiceException would occur?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can read more about ServiceException
in https://bio-formats.readthedocs.io/en/latest/developers/service.html. Typically this would happen when a dependency is missing.
} | ||
catch (UnknownFormatException e) { | ||
LOGGER.warn("Unknown file format: " + newArgs[idx]); | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should the return code be if any of the processed files fails? With the current implementation it will be zero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point, thank you. I would suggest a non-zero return code, then. Is there any preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely see the value of returning a non-zero return code if any input file fails. This would give an indication to the called that the output of the utility should be reviewed for exceptions.
…files. If an exception occurs batch processing should continue, but now an exit code is returned when the program terminates, indicating there was an error. The logger level for messages was elevated to error was well.
@@ -1144,19 +1144,31 @@ public static void main(String[] args) throws Exception { | |||
newArgs[idx] = scanner.nextLine(); | |||
System.out.println("====% " + newArgs[idx]); | |||
try { | |||
new ImageInfo().testRead(newArgs); | |||
new ImageInfo().testRead(newArgs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stylistic, can these be updated to use an indentation of two spaces - https://bio-formats.readthedocs.io/en/latest/developers/code-formatting.html#java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested the last commit with a mixture of valid and invalid input files
sbesson@Sebastiens-MacBook-Pro-3 bioformats % cat files | ./tools/showinf -nopix - || echo "Completed with failures"
====% /tmp/test.ini
Initializing reader
Caught IOException. /tmp/test.ini (No such file or directory)
====% /tmp/foo
File has length 0 and may be corrupt
Caught FormatException. Unknown file format: /tmp/foo
====% /tmp/test.fake
Initializing reader
FakeReader initializing /tmp/test.fake
Initialization took 0.039s
Reading core metadata
filename = /tmp/test.fake
Series count = 1
Series #0 :
Image count = 1
RGB = false (1)
Interleaved = false
Indexed = false (true color)
Width = 512
Height = 512
SizeZ = 1
SizeT = 1
SizeC = 1
Tile size = 512 x 512
Thumbnail size = 128 x 128
Endianness = intel (little)
Dimension order = XYZCT (certain)
Pixel type = uint8
Valid bits per pixel = 8
Metadata complete = true
Thumbnail series = false
-----
Plane #0 <=> Z 0, C 0, T 0
Reading global metadata
Reading metadata
Completed with failures
Happy to get the proposed changes included in Bio-Formats 8.0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely an improvement over how checked exceptions were handled previously, but unchecked exceptions (e.g. NullPointerException
, NumberFormatException
) are still not being caught and will stop processing. This now means two different sets of behavior depending upon whether a checked or unchecked exception was thrown.
As an example, take test.msr (see #4139), which is currently expected to throw NumberFormatException
during initialization, plus artificial files:
$ touch abc.fake
$ touch xyz.fake
$ ls -gG
total 268952
-rwxrwxrwx 1 0 Sep 16 11:58 abc.fake
-rwxrwxrwx 1 275404088 Sep 16 11:51 test.msr
-rwxrwxrwx 1 0 Sep 16 11:58 xyz.fake
Then attempt to process all 3 files:
$ ls | showinf -nopix - || echo "Completed with failures"
====% abc.fake
Checking file format [Simulated data]
Initializing reader
FakeReader initializing abc.fake
Initialization took 0.128s
Reading core metadata
filename = abc.fake
Used files = [/home/melissa/data/gh-4233-test/abc.fake]
Series count = 1
Series #0 :
Image count = 1
RGB = false (1)
Interleaved = false
Indexed = false (true color)
Width = 512
Height = 512
SizeZ = 1
SizeT = 1
SizeC = 1
Tile size = 512 x 512
Thumbnail size = 128 x 128
Endianness = intel (little)
Dimension order = XYZCT (certain)
Pixel type = uint8
Valid bits per pixel = 8
Metadata complete = true
Thumbnail series = false
-----
Plane #0 <=> Z 0, C 0, T 0
Reading global metadata
Reading metadata
====% test.msr
Checking file format [Lavision Imspector]
Initializing reader
ImspectorReader initializing test.msr
Exception in thread "main" java.lang.NumberFormatException: For input string: "2.8E-43"
at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.base/java.lang.Integer.parseInt(Integer.java:652)
at java.base/java.lang.Integer.parseInt(Integer.java:770)
at loci.formats.in.ImspectorReader.initFile(ImspectorReader.java:429)
at loci.formats.FormatReader.setId(FormatReader.java:1480)
at loci.formats.ImageReader.setId(ImageReader.java:864)
at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:692)
at loci.formats.tools.ImageInfo.testRead(ImageInfo.java:1048)
at loci.formats.tools.ImageInfo.main(ImageInfo.java:1147)
Completed with failures
So xyz.fake
is never processed in this case.
Separately, since the catch
blocks are pretty much identical, I'd prefer to see them unified. The error logging could be something like LOGGER.error("Caught " + e.getClass().getSimpleName(), e)
. Passing the thrown exception to LOGGER.error(...)
means that the stack trace is preserved if needed for debugging. I'd also be OK with logging just the message at ERROR
, and the stack trace at DEBUG
, something like:
LOGGER.error("Caught {}. {}", e.getClass().getSimpleName(), e.getMessage());
LOGGER.debug("Failed to process " + newArgs[idx], e);
Thank you very much @sbesson and @melissalinkert for your thorough tests and reviews. I'll work on it! |
Generalize to catch all exceptions and condense exception handling code and error messages. Co-authored-by: Melissa Linkert <[email protected]> Co-authored-by: Sébastien Besson <[email protected]>
@melissalinkert I have implemented the requested changes and made sure your test case (with the same test.msr file) now passes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @tstoeter. This looks good to me now, and the latest changes have been included in a passing build: https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-repo/192/
When sequentially processing many files the unhandled exception terminates the program and no further files will be processed that may still be readable. This commit catches the exception, provides a warning message and continues processing.