Add Highway SIMD acceleration to ImageBufAlgo [add, sub, mul, div, mad, resample] #4994

ssh4net · 2026-01-07T09:27:33Z

Optional SIMD optimizations for selected ImageBufAlgo operations using the Google Highway library: • add/sub
• mul/div
• mad
• resample
Adds CMake and build system support, new implementation helpers, and developer documentation.

Checklist:

I have read the guidelines on contributions and code review procedures.
I have updated the documentation if my PR adds features or changes
behavior.
I am sure that this PR's changes are tested somewhere in the
testsuite.
I have run and passed the testsuite in CI before submitting the
PR, by pushing the changes to my fork and seeing that the automated CI
passed there. (Exceptions: If most tests pass and you can't figure out why
the remaining ones fail, it's ok to submit the PR and ask for help. Or if
any failures seem entirely unrelated to your change; sometimes things break
on the GitHub runners.)
My code follows the prevailing code style of this project and I
fixed any problems reported by the clang-format CI test.
If I added or modified a public C++ API call, I have also amended the
corresponding Python bindings. If altering ImageBufAlgo functions, I also
exposed the new functionality as oiiotool options.

Optional SIMD optimizations for selected ImageBufAlgo operations using the Google Highway library: • add/sub • mul/div • mad • resample Adds CMake and build system support, new implementation helpers, and developer documentation. Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>

lgritz · 2026-01-07T18:13:14Z

I suspect you used LLM for some of this? Which is fine, but I think you should document in the PR description (commit comment) which tool you used and for what parts.

CMakeLists.txt

src/cmake/externalpackages.cmake

lgritz · 2026-01-07T18:26:38Z

src/include/OpenImageIO/platform.h

 #endif

 #ifdef _MSC_VER
+#    include <malloc.h>  // for alloca


This doesn't make sense to me. We already use alloca extensively without trouble, why did this need to be added now?

lgritz · 2026-01-07T18:28:00Z

src/include/OpenImageIO/platform.h

+#elif defined(_MSC_VER)
+#    define OIIO_ALLOCA(type, size) (assert(size < (1<<20)), (size) != 0 ? ((type*)_alloca((size) * sizeof(type))) : nullptr)


Why? I think this definition is identical to the #else clause below, so the MS case would already be handled properly, right?

lgritz · 2026-01-07T19:25:15Z

src/libOpenImageIO/imagebufalgo_addsub.cpp

+#if defined(_WIN32)
+#    include <malloc.h>  // for alloca
+#endif
+


why? It always worked before

src/libOpenImageIO/imagebufalgo_addsub.cpp

lgritz · 2026-01-07T19:35:10Z

src/libOpenImageIO/imagebufalgo_addsub.cpp

+template<class Rtype, class Atype, class Btype>
+static bool
+add_impl_hwy(ImageBuf& R, const ImageBuf& A, const ImageBuf& B, ROI roi,
+             int nthreads)
+{


I haven't done a line-by-line comparison, but it seems to me that the only difference between add_impl_hwy, sub_impl_hwy, and mul_impl_hwy is likely going to be

[](auto d, auto a, auto b) { return hn::Add(a, b); }

versus that one lambda changing for Sub and Mul.

I would love for even the initial commit to reduce this whole thing to a shared hwy_binary_perpixel_op() template that takes the lambda housing the op kernel as a templated parameter.

lgritz · 2026-01-07T19:37:52Z

src/libOpenImageIO/imagebufalgo_addsub.cpp

+                // Process pixel by pixel (scalar fallback for strided channels)
+                for (int x = roi.xbegin; x < roi.xend; ++x) {
+                    Rtype* r_ptr = ChannelPtr<Rtype>(Rv, x, y, roi.chbegin);
+                    const Atype* a_ptr = ChannelPtr<Atype>(Av, x, y,
+                                                           roi.chbegin);
+                    const Btype* b_ptr = ChannelPtr<Btype>(Bv, x, y,
+                                                           roi.chbegin);


I think we should benchmark the strided case and see how it compares to the contiguous case and the full scalar fallback that we've always had. If there is no big speed gain, I would be in favor of eliminating this whole clause and let non-contiguous strides use the old scalar path, then there is much less template expansion for hwy in the cases where there is not a large gain to be had. Note that this means that the "to hwy or not to hwy" test would need to test contiguity in addition to just localpixels().

Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>

This reverts commit 4d3b1f3.

Co-authored-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad Erium <shaamaan@gmail.com>

lgritz reviewed Jan 7, 2026

View reviewed changes

src/libOpenImageIO/imagebufalgo_addsub.cpp Outdated Show resolved Hide resolved

lgritz reviewed Jan 7, 2026

View reviewed changes

ssh4net and others added 5 commits January 13, 2026 10:47

Simplify CMake hwy option

4d3b1f3

Signed-off-by: Vlad (Kuzmin) Erium <libalias@gmail.com>

Revert "Simplify CMake hwy option"

6571c14

This reverts commit 4d3b1f3.

Update CMakeLists.txt

606edc4

Co-authored-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad Erium <shaamaan@gmail.com>

Update src/cmake/externalpackages.cmake

d57aa9d

Co-authored-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad Erium <shaamaan@gmail.com>

Update src/libOpenImageIO/imagebufalgo_addsub.cpp

dc53561

Co-authored-by: Larry Gritz <lg@larrygritz.com> Signed-off-by: Vlad Erium <shaamaan@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Highway SIMD acceleration to ImageBufAlgo [add, sub, mul, div, mad, resample] #4994

Add Highway SIMD acceleration to ImageBufAlgo [add, sub, mul, div, mad, resample] #4994

ssh4net commented Jan 7, 2026

Uh oh!

lgritz commented Jan 7, 2026

Uh oh!

Uh oh!

Uh oh!

lgritz Jan 7, 2026

Uh oh!

lgritz Jan 7, 2026

Uh oh!

lgritz Jan 7, 2026

Uh oh!

Uh oh!

lgritz Jan 7, 2026

Uh oh!

lgritz Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		#elif defined(_MSC_VER)
		# define OIIO_ALLOCA(type, size) (assert(size < (1<<20)), (size) != 0 ? ((type)_alloca((size) sizeof(type))) : nullptr)

Add Highway SIMD acceleration to ImageBufAlgo [add, sub, mul, div, mad, resample] #4994

Are you sure you want to change the base?

Add Highway SIMD acceleration to ImageBufAlgo [add, sub, mul, div, mad, resample] #4994

Conversation

ssh4net commented Jan 7, 2026

Checklist:

Uh oh!

lgritz commented Jan 7, 2026

Uh oh!

Uh oh!

Uh oh!

lgritz Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

lgritz Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

lgritz Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lgritz Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

lgritz Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants