Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Processor Strategy, Caching, and Tuning #2

Open
kaladay opened this issue Sep 12, 2024 · 1 comment
Open

Optimize Processor Strategy, Caching, and Tuning #2

kaladay opened this issue Sep 12, 2024 · 1 comment
Assignees

Comments

@kaladay
Copy link

kaladay commented Sep 12, 2024

Performance in Cantaloupe is influenced by many factors including the processor strategy and caching setup. In Cantaloupe, processors read images from sources, decode them, transform them according to request arguments, and encode and write derivative images back to the client. Processors can be selected in different ways including on a request by request basis. Processors rely on different underlying processing engines which may have a direct effect on quality and performance.

While we're looking at performance, I think it might be a good time to review processor strategy and caching setup. I don't think it's worth time comparing processors and formats, but it would be good to know if our setup could be causing issues.

Because we have a tendency to use JPGs in more recent collections:

  • Are we using progressive JPGs? (I think we should be for better performance.)
  • What is our JPG quality for derivatives? (I think it should be around 80/100)
  • Are we using TurboJpegProcessor or something else for reading and writing JPEGs?

We also serve TIFs, JPEG2000s, and PDFs for some collections. Looking at similar settings for these is also probably a good idea. Specifically, are we using LZW on TIFFs or no compression at all?

Looking at stream and retrieval strategy is worth it. Optimal setup here is really dependent on what we're doing, but DownloadStrategy for either option would be unideal. At my previous institution, we primarily relied on the CacheStrategy and FilesystemCache but I think more modern strategies are StreamStrategy with CacheStrategy as a fallback. More info is here.

Beyond processor strategy, we should look at deployment & tuning and caching.

Acceptance Criteria

  • Review cantaloupe documentation and make decisions regarding processor strategy, caching, and tuning.
@kaladay kaladay self-assigned this Sep 16, 2024
@kaladay
Copy link
Author

kaladay commented Sep 17, 2024

To answer the questions asked.

Are we using progressive JPGs?
Yes, the current setting is:

# Progressive JPEGs are usually more compact.
processor.jpg.progressive = true

What is our JPG quality for derivatives? (I think it should be around 80/100)
The current setting is:

# JPEG output quality (1-100).
processor.jpg.quality = 80

Are we using TurboJpegProcessor or something else for reading and writing JPEGs?
(Edit) We may or may not be using TurboJpegProcessor due to the AutomaticSelectionStrategy.
The manual selection is not being used.

processor.selection_strategy = AutomaticSelectionStrategy
...
processor.ManualSelectionStrategy.avi = FfmpegProcessor
processor.ManualSelectionStrategy.bmp =
processor.ManualSelectionStrategy.flv = FfmpegProcessor
processor.ManualSelectionStrategy.gif =
processor.ManualSelectionStrategy.jp2 = KakaduNativeProcessor
processor.ManualSelectionStrategy.jpg =
processor.ManualSelectionStrategy.mov = FfmpegProcessor
processor.ManualSelectionStrategy.mp4 = FfmpegProcessor
processor.ManualSelectionStrategy.mpg = FfmpegProcessor
processor.ManualSelectionStrategy.pdf = PdfBoxProcessor
processor.ManualSelectionStrategy.png =
processor.ManualSelectionStrategy.tif =
processor.ManualSelectionStrategy.webm = FfmpegProcessor
processor.ManualSelectionStrategy.xpm =

# Fall back to this processor for any formats not assigned above.
processor.ManualSelectionStrategy.fallback = Java2dProcessor

The jp2, I believe is JPEG2000, appears to use KakaduNativeProcessor.
The jpg is empty, which suggests that it falls back to Java2dProcessor.

The documentation shows that there are several compile time or system design time setups that are necessary to actually use things like JPEG2000.
The JPEG2000 is also being recommended against by the documentation and is only recommended as a last resort.

The OpenJPEG is considered a better alternative.

I am seeing the following in the logs:

415 Unsupported Media Type
Unsupported output format: JPEG2000

edu.illinois.library.cantaloupe.processor.OutputFormatException: Unsupported output format: JPEG2000
	at edu.illinois.library.cantaloupe.processor.Processor.validate(Processor.java:204)
	at edu.illinois.library.cantaloupe.resource.ImageRequestHandler.handle(ImageRequestHandler.java:399)
	at edu.illinois.library.cantaloupe.resource.iiif.v2.ImageResource.doGET(ImageResource.java:128)
	at edu.illinois.library.cantaloupe.resource.HandlerServlet.handle(HandlerServlet.java:97)
	at edu.illinois.library.cantaloupe.resource.HandlerServlet.doGet(HandlerServlet.java:35)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:791)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.eclipse.jetty.server.Server.handle(Server.java:516)
	at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:556)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:773)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:905)
	at java.base/java.lang.Thread.run(Thread.java:840)

Specifically, are we using LZW on TIFFs or no compression at all?
Yes, we are using:

# TIFF output compression type. Available values are `Deflate`, `JPEG`,
# `LZW`, and `RLE`. Leave blank for no compression.
processor.tif.compression = LZW

...but I think more modern strategies are StreamStrategy with CacheStrategy as a fallback.
We appear to be doing the following on PROD:

- env:
  - name: PROCESSOR_FALLBACK_RETRIEVAL_STRATEGY
    value: StreamStrategy
  - name: PROCESSOR_STREAM_RETRIEVAL_STRATEGY
    value: CacheStrategy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant