Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle UnknownFormatException for batch processing multiple files. #4233

Merged
merged 4 commits into from
Sep 19, 2024

Conversation

tstoeter
Copy link
Contributor

@tstoeter tstoeter commented Sep 9, 2024

When sequentially processing many files the unhandled exception terminates the program and no further files will be processed that may still be readable. This commit catches the exception, provides a warning message and continues processing.

When sequentially processing many files the unhandled exception terminates the program
and no further files will be processed that may still be readable. This commit catches
the exception, provides a warning message and continues processing.
@sbesson sbesson self-requested a review September 9, 2024 13:25
Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tstoeter for the follow-up. A few inline comments to try to better understand the expected behavior.

As we are still a few weeks away from the next major release of Bio-Formats, there is still time to refine the expectation of the batch functionality and document it appropriately in https://bio-formats.readthedocs.io/en/latest/users/index.html#command-line-tools. However, note that post-release, this will be treated like the rest of the API and any backwards-incompatible change will require an increment of the major version number.

new ImageInfo().testRead(newArgs);
}
catch (UnknownFormatException e) {
LOGGER.warn("Unknown file format: " + newArgs[idx]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImageInfo.testRead can throw three types of exception:

  • FormatException (of which UnknowFormatException is a subclass of)
  • IOException
  • ServiceException

What is the rationale for handling UnknownFormatException differently from the other exceptions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @sbesson for your code review and for pointing out the other possible exceptions. I intended to do the least changes to the current behaviour. I can add exception handling for the other cases, too, i.e. to handle general FormatException and IOException. Could you please tell me, in which case a ServiceException would occur?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can read more about ServiceException in https://bio-formats.readthedocs.io/en/latest/developers/service.html. Typically this would happen when a dependency is missing.

}
catch (UnknownFormatException e) {
LOGGER.warn("Unknown file format: " + newArgs[idx]);
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should the return code be if any of the processed files fails? With the current implementation it will be zero

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point, thank you. I would suggest a non-zero return code, then. Is there any preference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely see the value of returning a non-zero return code if any input file fails. This would give an indication to the called that the output of the utility should be reviewed for exceptions.

…files.

If an exception occurs batch processing should continue, but now an exit code is returned
when the program terminates, indicating there was an error. The logger level for messages
was elevated to error was well.
@@ -1144,19 +1144,31 @@ public static void main(String[] args) throws Exception {
newArgs[idx] = scanner.nextLine();
System.out.println("====% " + newArgs[idx]);
try {
new ImageInfo().testRead(newArgs);
new ImageInfo().testRead(newArgs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistic, can these be updated to use an indentation of two spaces - https://bio-formats.readthedocs.io/en/latest/developers/code-formatting.html#java

@sbesson sbesson self-requested a review September 11, 2024 16:08
Copy link
Member

@sbesson sbesson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested the last commit with a mixture of valid and invalid input files

sbesson@Sebastiens-MacBook-Pro-3 bioformats % cat files | ./tools/showinf -nopix - || echo "Completed with failures"
====% /tmp/test.ini
Initializing reader
Caught IOException. /tmp/test.ini (No such file or directory)
====% /tmp/foo
File has length 0 and may be corrupt
Caught FormatException. Unknown file format: /tmp/foo
====% /tmp/test.fake
Initializing reader
FakeReader initializing /tmp/test.fake
Initialization took 0.039s

Reading core metadata
filename = /tmp/test.fake
Series count = 1
Series #0 :
	Image count = 1
	RGB = false (1) 
	Interleaved = false
	Indexed = false (true color)
	Width = 512
	Height = 512
	SizeZ = 1
	SizeT = 1
	SizeC = 1
	Tile size = 512 x 512
	Thumbnail size = 128 x 128
	Endianness = intel (little)
	Dimension order = XYZCT (certain)
	Pixel type = uint8
	Valid bits per pixel = 8
	Metadata complete = true
	Thumbnail series = false
	-----
	Plane #0 <=> Z 0, C 0, T 0


Reading global metadata

Reading metadata
Completed with failures

Happy to get the proposed changes included in Bio-Formats 8.0.0

@sbesson sbesson added this to the 8.0.0 milestone Sep 13, 2024
Copy link
Member

@melissalinkert melissalinkert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely an improvement over how checked exceptions were handled previously, but unchecked exceptions (e.g. NullPointerException, NumberFormatException) are still not being caught and will stop processing. This now means two different sets of behavior depending upon whether a checked or unchecked exception was thrown.

As an example, take test.msr (see #4139), which is currently expected to throw NumberFormatException during initialization, plus artificial files:

$ touch abc.fake
$ touch xyz.fake
$ ls -gG
total 268952
-rwxrwxrwx 1         0 Sep 16 11:58 abc.fake
-rwxrwxrwx 1 275404088 Sep 16 11:51 test.msr
-rwxrwxrwx 1         0 Sep 16 11:58 xyz.fake

Then attempt to process all 3 files:

$ ls | showinf -nopix - || echo "Completed with failures"
====% abc.fake
Checking file format [Simulated data]
Initializing reader
FakeReader initializing abc.fake
Initialization took 0.128s

Reading core metadata
filename = abc.fake
Used files = [/home/melissa/data/gh-4233-test/abc.fake]
Series count = 1
Series #0 :
	Image count = 1
	RGB = false (1) 
	Interleaved = false
	Indexed = false (true color)
	Width = 512
	Height = 512
	SizeZ = 1
	SizeT = 1
	SizeC = 1
	Tile size = 512 x 512
	Thumbnail size = 128 x 128
	Endianness = intel (little)
	Dimension order = XYZCT (certain)
	Pixel type = uint8
	Valid bits per pixel = 8
	Metadata complete = true
	Thumbnail series = false
	-----
	Plane #0 <=> Z 0, C 0, T 0


Reading global metadata

Reading metadata
====% test.msr
Checking file format [Lavision Imspector]
Initializing reader
ImspectorReader initializing test.msr
Exception in thread "main" java.lang.NumberFormatException: For input string: "2.8E-43"
	at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.base/java.lang.Integer.parseInt(Integer.java:652)
	at java.base/java.lang.Integer.parseInt(Integer.java:770)
	at loci.formats.in.ImspectorReader.initFile(ImspectorReader.java:429)
	at loci.formats.FormatReader.setId(FormatReader.java:1480)
	at loci.formats.ImageReader.setId(ImageReader.java:864)
	at loci.formats.ReaderWrapper.setId(ReaderWrapper.java:692)
	at loci.formats.tools.ImageInfo.testRead(ImageInfo.java:1048)
	at loci.formats.tools.ImageInfo.main(ImageInfo.java:1147)
Completed with failures

So xyz.fake is never processed in this case.

Separately, since the catch blocks are pretty much identical, I'd prefer to see them unified. The error logging could be something like LOGGER.error("Caught " + e.getClass().getSimpleName(), e). Passing the thrown exception to LOGGER.error(...) means that the stack trace is preserved if needed for debugging. I'd also be OK with logging just the message at ERROR, and the stack trace at DEBUG, something like:

LOGGER.error("Caught {}. {}", e.getClass().getSimpleName(), e.getMessage());
LOGGER.debug("Failed to process " + newArgs[idx], e);

@tstoeter
Copy link
Contributor Author

Thank you very much @sbesson and @melissalinkert for your thorough tests and reviews. I'll work on it!

Generalize to catch all exceptions and condense exception handling code and error messages.

Co-authored-by: Melissa Linkert <[email protected]>
Co-authored-by: Sébastien Besson <[email protected]>
@tstoeter
Copy link
Contributor Author

@melissalinkert I have implemented the requested changes and made sure your test case (with the same test.msr file) now passes.

Copy link
Member

@melissalinkert melissalinkert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @tstoeter. This looks good to me now, and the latest changes have been included in a passing build: https://merge-ci.openmicroscopy.org/jenkins/job/BIOFORMATS-test-repo/192/

@sbesson sbesson merged commit 800950a into ome:develop Sep 19, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants