Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve render test compare performance #2223

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kwokcb
Copy link
Contributor

@kwokcb kwokcb commented Feb 11, 2025

Proposed Changes

Use OpenCV to to compare performance. It's about 3x or more faster based on local tests
of OSL vs GLSL RTS. ( ~2.3 vs 7.5 seconds ).

OpenCV is C++ based so is generally faster. Also addition utilities could be added as desired
using this library.

@lgritz
Copy link
Contributor

lgritz commented Feb 11, 2025

Does fellow ASWF project OpenImageIO perform poorly compared to those two?

@kwokcb
Copy link
Contributor Author

kwokcb commented Feb 11, 2025

That's a good question. I did a small experiment and OIIO is also about 3x slower for performing the compare at least. About 7 seconds so better than PIL + utils (slowest). This with the caveat I'm no OIIO or OpenCV expert :).

This can all be updated with different options depending on image size or format (e.g. OIIO can handle EXRs -- which is not the current output format as the RTS avoids dependencies on OpenEXR by default).

import OpenImageIO as oiio
import numpy as np
import os

def computeDiff(image1Path, image2Path, imageDiffPath):
    try:
        # Remove the diff image if it already exists
        if os.path.exists(imageDiffPath):
            os.remove(imageDiffPath)

        # Check if input images exist
        if not os.path.exists(image1Path):
            print("Image diff input missing: " + image1Path)
            return

        if not os.path.exists(image2Path):
            print("Image diff input missing: " + image2Path)
            return

        # Load images using OpenImageIO (no reshaping, directly as NumPy arrays)
        image1 = oiio.ImageInput.open(image1Path)
        image2 = oiio.ImageInput.open(image2Path)

        if not image1 or not image2:
            print("Failed to load images.")
            return

        # Read the image data in a single call (as float)
        pixels1 = image1.read_image(format=oiio.FLOAT)
        pixels2 = image2.read_image(format=oiio.FLOAT)

        image1.close()
        image2.close()

        # Directly access the image specifications
        spec1 = image1.spec()
        spec2 = image2.spec()

        # Ensure images have the same dimensions
        if spec1.width != spec2.width or spec1.height != spec2.height:
            print("Image dimensions do not match.")
            return

        # Slice the arrays to ignore the alpha channel (keep only RGB)
        pixels1_rgb = pixels1[:, :, :3]  # Take the first 3 channels (RGB)
        pixels2_rgb = pixels2[:, :, :3]  # Take the first 3 channels (RGB)

        # Compute the absolute difference (using vectorized NumPy operations)
        diff = np.abs(pixels1_rgb - pixels2_rgb)

        # Save the difference image
        out_spec = oiio.ImageSpec(spec1.width, spec1.height, 3, oiio.FLOAT)
        out_image = oiio.ImageOutput.create(imageDiffPath)
        if out_image:
            out_image.open(imageDiffPath, out_spec)
            out_image.write_image(diff)
            out_image.close()
        else:
            print("Failed to write difference image.")

        # Compute normalized RMS error 
        normalized_rms = np.sqrt(np.mean(diff ** 2)) / (3.0 * 1.0)

        return normalized_rms

    except Exception as e:
        print(f"Error during image comparison: {e}")
        return

@lgritz
Copy link
Contributor

lgritz commented Feb 11, 2025

I'll try a couple experiments on my end to see where the time is going. What's the resolution, format, and number of channels of the images you're comparing?

@jstone-lucasfilm
Copy link
Member

@kwokcb @lgritz From my perspective, as long as OpenImageIO generates the same results as PIL, it's purely a bonus if it's also faster, since we'd be replacing an external dependency with an ASWF dependency!

Maybe other teams would feel differently, but after spending over an hour to render out reference-quality GLSL and OSL images, the additional 7.5 seconds to generate difference images isn't at the top of my mind. :D

@lgritz
Copy link
Contributor

lgritz commented Feb 12, 2025

Are there a couple of these images I can try on my end (without having to build all of MaterialX and run a full testsuite) that I can use as a proxy to investigate performance? I find it hard to believe I can't speed it up further if I have a concrete example.

@kwokcb
Copy link
Contributor Author

kwokcb commented Feb 12, 2025

@jstone-lucasfilm, @lgritz

  • I can replace the OpenCV code with OIIO and it can be reviewed here.
  • As for tests, I can place a sample here of a few pairs of files and the number of pairs tested.

Side notes:

  • I'm guessing the number of comparison pairs is a important factor esp if you run the full RTS vs the default subset.
  • 4 channel RGB for GLSL vs 3 channel RGB for OSL could affect performance (minimally it's inconsistent).

@kwokcb
Copy link
Contributor Author

kwokcb commented Feb 12, 2025

Here are 2 pairs. The subset has 121 pairs to compare.
standard_surface_chess_set.zip

@kwokcb
Copy link
Contributor Author

kwokcb commented Feb 12, 2025

Replace OpenCV with OIIO.

Leaving OpenCV implementation for comparison / posterity

def computeDiff(image1Path, image2Path, imageDiffPath):
    try:
        # Remove the diff image if it already exists
        if os.path.exists(imageDiffPath):
            os.remove(imageDiffPath)

        # Check if input images exist
        if not os.path.exists(image1Path):
            print("Image diff input missing: " + image1Path)
            return

        if not os.path.exists(image2Path):
            print("Image diff input missing: " + image2Path)
            return

        # Load images using OpenCV
        image1 = cv2.imread(image1Path)
        image2 = cv2.imread(image2Path)

        # Ensure images were loaded successfully
        if image1 is None or image2 is None:
            print("Failed to load images.")
            return

        # Convert images to RGB (OpenCV loads images in BGR by default)
        image1 = cv2.cvtColor(image1, cv2.COLOR_BGR2RGB)
        image2 = cv2.cvtColor(image2, cv2.COLOR_BGR2RGB)

        # Compute the absolute difference between the two images
        diff = cv2.absdiff(image1, image2)

        # Save the difference image
        cv2.imwrite(imageDiffPath, cv2.cvtColor(diff, cv2.COLOR_RGB2BGR))

        # Compute normalized RMS error
        normalized_rms = np.sqrt(np.mean(diff**2))  / (3.0 * 255.0)

        return normalized_rms

    except Exception as e:
        # Clean up and print error message
        if os.path.exists(imageDiffPath):
            os.remove(imageDiffPath)
        print("Failed to create image diff between: " + image1Path + ", " + image2Path)
        print(f"Error: {e}")

@fpliu
Copy link

fpliu commented Feb 12, 2025

@kwokcb have you seen Flip? https://github.com/NVlabs/flip We use this for our image diffs using YardVFX RenderBenchmarks and Rez.

@lgritz
Copy link
Contributor

lgritz commented Feb 12, 2025

So you're reading in two PNG images? What are you writing? Also PNG?

@lgritz
Copy link
Contributor

lgritz commented Feb 13, 2025

Please don't merge this yet, I will have some important review comments...

@lgritz
Copy link
Contributor

lgritz commented Feb 13, 2025

OK, one part of why OIIO is around 3.5x slower is that you're inadvertently asking it to do a lot of extra copies and data conversions. The input files are 8 bit channels, they're staying that way in your OpenCV code (don't know about the original PIL, but I suspect so), but for OIIO, you were saying:

    pixels1 = image1.read_image(format=oiio.FLOAT)
    pixels2 = image2.read_image(format=oiio.FLOAT)

you could instead just let it stay in the native format:

    pixels1 = image1.read_image(format=oiio.UNKNOWN)
    pixels2 = image2.read_image(format=oiio.UNKNOWN)

and that saves some time. (Note, because now the values you pull into NumPy will be uint8, you need to change the normalization factor back to 3*255.0 like you did with OpenCV, not 3*1.0, if you catch my drift.)

If you comment out the image write entirely, just to see how the reads compare, after the above changes, OIIO reading from PNG is within 1.2x the speed of OpenCV. So most of the remaining difference is specifically the writing of PNG, which honestly is not something we've spent much time optimizing in OIIO land, as it is distinctly not a "VFX format" (don't get me started).

The first thing I spotted is that when you were writing, you said

    out_spec = oiio.ImageSpec(spec1.width, spec1.height, 3, oiio.FLOAT)

which is requesting that you write a float file, but PNG doesn't have float, so it will pick the next best thing, which is uint16, but again that is not apples-to-apples with the OpenCV or PIL code which is writing 8 bit PNG files. So instead I would just advise outputting the diff image in the same data type that you read from the original input:

    out_spec = oiio.ImageSpec(spec1.width, spec1.height, 3, spec1.format)

At this point, with the writing enabled again, OIIO takes 2.5x the time of OpenCV (versus ~3.5x when we started), and we know that most of that difference is the PNG write. Like I said, PNG write isn't the most optimized path we had, so just for fun, I changed the name of the output file to end in ".tif" instead of ".png". Can we write TIFF files (same 8 bit data) faster than PNG?

Boy howdy, can we!

Outputting .tif instead makes the OIIO version run in 0.94x the time that it takes OpenCV.

Is that just an inherent difference between TIFF and PNG, and now I'm hobbling OpenCV by having it write PNG while OIIO writes TIFF? Nope, change the OpenCV version of the script to output a TIFF file and OIIO's overall edge improves even more to 0.92x the time of OpenCV for all the same work.

YMMV on the exact timing. I'm using OIIO main (which doesn't currently differ in any important way from the supported 3.0 release) and running on a 2020 era 8 core Intel-based MacBookPro. I was reading your "Queen" sample files, and just doing your comparison subroutine 100 times in a loop.

@lgritz
Copy link
Contributor

lgritz commented Feb 13, 2025

But wait, there's more.

This may be more complicated than it's worth, but for these particular files, the GLSL ones anyway are 4-channel PNG files. But alpha = 1.0 everywhere. I happen to know that since PNG files are by specification unassociated alpha, and OIIO is automatically converting them to associated alpha on the way in, for the particular case of alpha=1.0 everywhere, that's a no-op, but OIIO's PNG reader has no way to know this ahead of time. I betcha that OpenCV doesn't know anything about associated vs unassociated alpha and isn't doing this extra conversion that OIIO is doing (needlessly, in this case).

So... there's a special trick to tell OIIO's PNG reader not to convert to associated alpha:

    configspec = oiio.ImageSpec()
    configspec.attribute("oiio:UnassociatedAlpha", 1)

    # Load images using OpenImageIO (no reshaping, directly as NumPy arrays)
    image1 = oiio.ImageInput.open(image1Path, config=configspec)
    image2 = oiio.ImageInput.open(image2Path, config=configspec)

And with this change, now OIIO does the whole read-diff-write cycle (to a TIFF output) in 0.86x the time it takes OpenCV to do the same.

So this full set of changes moves us from an original 3.5x slower to 15% faster. (Or, if we really insist on outputting PNG instead of TIFF, OIIO is still 2.5x slower until we actually crack open OIIO's PNG writing code and see what is wasteful there.)

@lgritz
Copy link
Contributor

lgritz commented Feb 13, 2025

One more! Back to PNG, but just checking if maybe we're not using the best compression choice. (I have no idea what OpenCV is using.) Instead of using the default compression, we use "pngfast", which is just a different speed setting on the underlying zlib calls,

    out_spec = oiio.ImageSpec(spec1.width, spec1.height, 3, spec1.format)
    out_spec.attribute("compression", "pngfast")

that closes the gap to 1.35x even when outputting PNG files. This is just further fooling around, I don't know if switching to this faster compression has any important change in the file sizes or anything else that will matter, and the degree to which it matters might depend on the content of the files. But hey, another set of tradeoffs to consider.

@jstone-lucasfilm
Copy link
Member

Thanks for the deep analysis, @lgritz!

For this use case, I'm guessing PNG was selected because the source and difference images are being organized into an HTML page for sharing/publication, and PNG is a lossless image format that browsers natively understand. We're not married to that choice, though, and any other lossless image format that browsers can handle would work in its place.

I'll pause on integrating this change until @kwokcb has had a chance to take a look at the optimizations that you've suggested, and my hope is that OpenImageIO can be the path forward for this workflow.

@lgritz
Copy link
Contributor

lgritz commented Feb 13, 2025

Yes, for web display and wide browser support, PNG makes total sense. It's a little weird as an image comparison format, simply because it's LDR, so the clipping of any values > 1 could mask differences between the rendered images if you have hot highlights or something like that.

Anyway, to summarize what I've found, you can get from about 3.5x longer than OpenCV to only 1.35x longer with the combination of:

  • Read as 8 bit, without needless conversion to float.
  • Write as 8 bit, without needlessly saving it as uint16 (via a request to save as float, which PNG can't even do).
  • Hint input to skip OIIO's usual automatic conversion to associated alpha -- but only if you think that's safe (circumstantial evidence: you weren't converting to associated with OpenCV or PIL, and these images are A=1.0 everywhere anyway, but I don't know if that's true of all your images).
  • Hint output to use the "faster" PNG compression -- but only if, after checking, you verify that it's not having a hugely negative effect on file size in practice.

If you're moving in this direction and 1.35x isn't close enough, I can also look at OIIO's PNG writer and see if there's some fat to squeeze out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants