Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify testrender to work with triangle meshes #1865

Merged

Conversation

fpsunflower
Copy link
Contributor

Description

testrender was originally envisioned as a tiny example renderer that should only handle spheres/planes. Over time several groups have expressed the wish for it to handle arbitrary geometry instead. This PR replaces the sphere and quad primitives with triangle meshes.

I am making use of the (embedded) rapidobj library to load models in the .obj format. The original .xml scene format remains, so that you can combine several models together as well as declare and assign the osl shader networks you want to these meshes. For backwards compatibility, spheres and planes are still supported via tessellation (you specify how many subdivisions you want). In the case of .obj scenes, the shader assignment can be done either by the mesh name or the material name.

I am submitting this to get the review kickstarted. The history of commits includes some fairly large test scenes that should be squashed away to go into the main repo. Files over 50Mb are recommended to use git lfs, but we probably want to figure that out as a different task (possibly handle it via a separate repo).

The handling of derivatives is not totally correct yet, but the behavior is compatible with the previous (incorrect) handling of derivatives we had before.

Tests

Existing tests are passing (I will push updated reference images shortly).

Checklist:

  • I have read the contribution guidelines.
  • I have updated the documentation, if applicable.
  • I have ensured that the change is tested somewhere in the testsuite (adding new test cases if necessary).
  • My code follows the prevailing code style of this project. If I haven't
    already run clang-format v17 before submitting, I definitely will look at
    the CI test that runs clang-format and fix anything that it highlights as
    being nonconforming.

…inimal set of functions to have lit renders

Signed-off-by: Chris Kulla <[email protected]>
…of all triangles that share the same shader

Signed-off-by: Chris Kulla <[email protected]>
…de version which is not implemented in testrender

Signed-off-by: Chris Kulla <[email protected]>
…bals to visualize several fields of the shader globals, add basic dPdu and dPdv math

Signed-off-by: Chris Kulla <[email protected]>
…ntation for sphere primitives in previous version

Signed-off-by: Chris Kulla <[email protected]>
…ay bounces growing to infinity

Signed-off-by: Chris Kulla <[email protected]>
…p assist in manual scene setup

Signed-off-by: Chris Kulla <[email protected]>
Signed-off-by: Chris Kulla <[email protected]>
@steenax86
Copy link
Contributor

Line 62 in src/testrender/CMakeLists.txt has fp-model=precise already, and the precise model has fhonor-nans on by default. I don't see fp-model=precise for the non-Intel clang compiler, which maybe why the assert is not firing for clang. There is a nan calculation happening somewhere higher up--binID is very odd, and depends on the nan-valued center via Box3.

Could you add fp-model=precise for a clang compilation in the CMakeList.txt and check for nans.

(Probably not needed, but for purposes of debug only, you could add fno-honor-nans and fno-honor-infinities This should disallow optimizations that assume results/operands are not nans. )

@lgritz
Copy link
Collaborator

lgritz commented Sep 20, 2024

Crossing my fingers that this will fix the icx issue.

Chris, can you also look at the GPU test failure? (It's a build failure, not a run failure.)

@lgritz
Copy link
Collaborator

lgritz commented Sep 20, 2024

Oh, N/M, the GPU one is where you said that Tim's fix immediately after will address it, right?

@tgrant-nv
Copy link
Contributor

Yes, I'll open a PR for the OptiX fixes after this PR is merged.

@lgritz
Copy link
Collaborator

lgritz commented Sep 20, 2024

looks like it's still hitting assertions

@lgritz
Copy link
Collaborator

lgritz commented Sep 20, 2024

I am very tempted to merge this as-is, knowing that there will be icx test failures (or maybe you'd prefer to disable the render tests just for icx, for now?) and assume that we'll get to the bottom of it and soon have what will probably be a very small follow-on PR to directly address the issue. I think failing tests for one compiler is a lesser evil that entering Dev Days with this unmerged, since I think some people want to do work on testrender and it seems pointless for them to start with the obsolete code. What do you think, @fpsunflower ?

@fpsunflower
Copy link
Contributor Author

I agree. I think its worth merging now and we can investigate further as a follow up.

Copy link
Collaborator

@lgritz lgritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lgritz
Copy link
Collaborator

lgritz commented Sep 20, 2024

OK, merging!

As a reminder, there are two outstanding issues here:

  • The icx test, alone, is failing by hitting assertions in the BVH code and we don't understand why yet. Expect some debugging and a follow-up PR to address the issue. Bug in the new bvh code? Need some kind of clamp to deal with compiler-induced floating point slop? Legit icx compiler bug? Will be exciting to find out!
  • The GPU test is failing to build. We understand that Tim Grant's immediate follow-on PR that adds OptiX functionality should take care of that.

@lgritz lgritz merged commit 78e5392 into AcademySoftwareFoundation:main Sep 20, 2024
20 of 22 checks passed
@lgritz
Copy link
Collaborator

lgritz commented Sep 20, 2024

@timgrant You're up next!

@tgrant-nv tgrant-nv mentioned this pull request Sep 21, 2024
4 tasks
for (int y = 0; y < H; y++) {
float t = float(y + 0.5f) / float(H);
float z = cosf(t * float(M_PI));
float q = sqrtf(1 - z * z);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float q = sqrtf(1 - z * z);
So if z > 1.0 by even 1ulp, then sqrtf could result in a NAN.
The result of cosf could allow a certain # of bits accuracy which means it could have results slightly greater than 1.0f.
Please try clamping the input to 0.f
float q = sqrtf(std::max(0.f, 1 - z * z));

and see if you NAN issues go away.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes in C++ I really miss OSL's "debug_nan" feature that will insert extra code around all math to 100% guarantee identifying the line in which a NaN first creeps in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just did source code review to find that, C library does not guarantee perfect accuracy, so some clamping is needed if feeding values into a NaN producing operation.

Would someone be willing to try that and re-enable icpx CI?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I will try right now!

This is a lesson we know well, that's why OIIO::safe_sqrt() exists!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running now...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is a smart change regardless, so I will submit the PR.

But I'm afraid it doesn't solve the problem. A number of tests still either time out (800 seconds!) or hit assertions. Only for icx.

@steenax86
Copy link
Contributor

steenax86 commented Oct 3, 2024

Diagnosed the issue to be undefined behavior in the [] operator in float center = bbox.center()[bestAxis] in bvh.cpp's build_bvh function. IMath library's[] operator on a Vec3<float> dereferences only the first component, here float x, and every access beyond that (to y, or z) is out of bounds. Refining the fix and should be able to submit a PR shortly.

@fpsunflower
Copy link
Contributor Author

Thanks for tracking that down! It's too bad that icx miscompiles this particular idiom. Also, I believe the same operator is used in the ray/triangle test, so perhaps there are more bugs lurking once this first one is fixed?

@fpsunflower
Copy link
Contributor Author

I'm also wondering what the ideal way of expressing this idiom would be for icx? Is there a solution we could recommend for the Imath folks that would make all compilers happy?

@lgritz
Copy link
Collaborator

lgritz commented Oct 3, 2024

Wow, great detective work!

That idiom is unfortunately used in a number of places in OpenEXR. I feel like I've run into that problem before, back when we were realizing that it had a hard time vectorizing those classes. I thought I remembered fixing it on the openexr side, but maybe not?

What's the best way to fix that will not be UB? We should fix it in OpenEXR for sure.

@lgritz
Copy link
Collaborator

lgritz commented Oct 3, 2024

I mean Imath, not OpenEXR.

@lgritz
Copy link
Collaborator

lgritz commented Oct 3, 2024

Unfortunately, we also need a workaround on the OSL side, since we can't count on an Imath version being newer than whenever we can get it fixed there.

@steenax86
Copy link
Contributor

It was an interesting experience! XD. I examined values of all locals in build_bvh for icc 2022.0.0 and icx 2023.1.0. Turns out the value of center (for binID calculation) never changed and caused incorrect values to be propagated. Bringing bbox.center() on the stack and then indexing into bestAxis, solved the issue but then identified the core problem.

In it's current form, Imath's [] operator is breaking icx's strict aliasing rules.
The fix is a workaround in OSL. Replaced [] operator with a local getter to return vec3.x, vec3.y, and vec3.z, via depending on bestAxis value.

I will do a pass on all testrender files (including ray/triangle tests) to replace all occurrences of Imath::Box3f [] with the new getter.

@fpsunflower
Copy link
Contributor Author

Replacing that memory access with a switch statement (I'm guessing?) seems like it will make things slower. I'll be curious to see the performance impact of the change.

Is there any flag we could pass to icx for it to match the aliasing rules of the other compilers?

@lgritz
Copy link
Collaborator

lgritz commented Oct 3, 2024

Could you use a trick like

union shim {
    Box3D b;
    float f[6];
};

Box3f thebox;
float component = ((const shim*)&thebox)->f[i];

Or is that also the same kind of UB?

@AlexMWells
Copy link
Contributor

@fpsunflower , Imath breaks C/C++ language aliasing rules, so its not an icx specific issue.
https://github.com/AcademySoftwareFoundation/Imath/blob/main/src/Imath/ImathVec.h#L1593

template <class T>
IMATH_HOSTDEVICE IMATH_CONSTEXPR14 inline T&
Vec3<T>::operator[] (int i) IMATH_NOEXCEPT
{
    return (&x)[i]; // NOSONAR - suppress SonarCloud bug report.
}

When an address of an object is taken, it is illegal/undefined behavior to use that pointer to access memory outside the sizeof the object. In this case the object is float x, yes it happens to be a data member, but the object type that lifetime is marked for is 4 bytes of a float. In practice what this means is a temporary Vec3 with x, y, z is considered as only having x be used and y and z might be removed. Later when &x[i] happens with i > 0, the data being accessed won't be there, undefined behavior.

A simple correctness fix for this would be to take the address of the Vec3 object itself to mark all 12 bytes as potentially used, then cast to float to index.

template <class T>
IMATH_HOSTDEVICE IMATH_CONSTEXPR14 inline T&
Vec3<T>::operator[] (int i) IMATH_NOEXCEPT
{
    return reinterpret_cast<float *>(this)[i]
}

There are many good reasons to not use dynamic indices, but that is a longer conversation

@lgritz
Copy link
Collaborator

lgritz commented Oct 3, 2024

We are literally talking about this and looking at this ticket at this moment in the OpenEXR/Imath TSC meeting.

@fpsunflower
Copy link
Contributor Author

The reinterpret_cast version looks good to me if it workss, though I believe its technically also UB, so is using a union. Its a shame there isn't a good way to do this in C++.

As far as this particular issue goes, we could switch from using Imath::Vec3 to using a plain old float[3] maybe? We would loose the overloaded operators but it might not be that bad for the handful of use cases that need indexing.

@cary-ilm
Copy link
Member

Belatedly catching up on this. I reposted the issue at AcademySoftwareFoundation/Imath#446 for wider discussion, comments there are welcome!

@fpsunflower fpsunflower deleted the testrender-triangles branch October 28, 2024 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants