Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bounds_t with pre-aligned mins and maxs #1482

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

illwieckz
Copy link
Member

@illwieckz illwieckz commented Dec 30, 2024

This is not urgent at all. This is complementary to:

It also improves over:

The BoxOnPlaneSide() function is known to be a hot spot, being called by multiple recursive functions. Right now we spend a lot of time in _mm_loadu_ps, we have to call sseLoadVec3Unsafe() explicitly because we can't guess if the data is aligned or not. This comes with multiple downside:

  • _mm_loadu_ps is said to be slower than _mm_load_ps, and that fits my experience¹.
  • The compiler doesn't optimize _mm_loadu_ps and will always call it if we ask for it explicitely, even if the data is already aligned, and by experience, even if no copy is needed.

So the idea is to introduce some nice bounds_t struct that uses aligned mins and maxs, same for cplane_t with an aligned normal. When doing that, we can write an explicit _mm_load_ps that is faster than _mm_loadu_ps, and most of the cases, because the compiler knows the data is const and already aligned, the compiler just removes the _mm_load_ps and process the data without any copy.

See also:


¹ Some times ago I tried to write optimized SSE code for some other functions, but the code was slower because of the explicit _mm_loadu_ps call. I even noticed copying an unaligned vec3 to an aligned vec3 before calling _mm_load_ps could make the code faster when the compiler notices the data is already aligned and skips the copy and calls _mm_load_ps (or even doesn't need to call it at all).

@illwieckz
Copy link
Member Author

For now there is a bug somewhere I have not found yet. The bug can be seen by loading the plat23 map and looking to the left (some surfaces will disappear).

In my first iteration of the branch I made a stupid mistakes by passing the bounds_t by value instead of by reference, meaning functions modifying it would not modify it. I assume this is now fixed but I guess I left somewhere another mistake as stupid as that.

@illwieckz illwieckz marked this pull request as draft December 30, 2024 19:25
@illwieckz
Copy link
Member Author

Note: the percentage of time spent is not reliable in those screenshots because I use a viewpos on some acid tubes spamming acid gas and the amount of particles is very variable.

Before (C), a lot of time is spent in sseLoadVec3Unsafe():

20241230-201855-000 orbit-unvanquished-aligned-sse-load

After (C), the first loads are just completely skipped:

20241230-202108-000 orbit-unvanquished-aligned-sse-load

Before (Asm), we see three movups instructions:

20241230-201857-000 orbit-unvanquished-aligned-sse-load

After (Asm), se see only two movaps instructions:

20241230-202121-000 orbit-unvanquished-aligned-sse-load

@illwieckz
Copy link
Member Author

So, as I said, this is not urgent. But I would appreciate people re-reading what I did in hope someone finds the stupid mistake I may have done. The code change should be straightforward because it's basically rewriting with a different data layout, there is no algorithm change to be expected. Unfortunately this data struct is used in many places so the patch is massive.

There are many functions in CM not using the new structure yet that can be migrated, but that can be done later as the patch is already heavy as it is. Migrating other functions would be mostly code clean-up and only about code purity, it is not required to make it work. Though maybe it can also help the compiler to optimize in other places. For the same reasons, I only modified the game to be compatible with the new structs and shared functions, using some wrapping around the new struct, to keep the patch light on game side (porting the game to the new struct for the sake of purity can be done later as well).

@illwieckz
Copy link
Member Author

I also added some constness to some function input, which may help the compiler to optimize a bit more some other places of the code. I doubt this extra-constness is the cause of the bug I'm facing because the compiler doesn't complain I'm writing to some const structs, so hopefully I only added some const keywords to structs that are only read and not written.

@illwieckz illwieckz force-pushed the illwieckz/bounds/sync branch from da96dc5 to b9fbbab Compare December 30, 2024 19:48
out->bounds.mins[ 1 ] = LittleLong( in->mins[ 1 ] );
out->bounds.mins[ 2 ] = LittleLong( in->mins[ 2 ] );
out->bounds.maxs[ 1 ] = LittleLong( in->maxs[ 1 ] );
out->bounds.maxs[ 2 ] = LittleLong( in->maxs[ 2 ] );
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I forgot:

out->bounds.maxs[ 0 ] = LittleLong( in->maxs[ 0 ] );

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the bug I was looking for.

@illwieckz illwieckz force-pushed the illwieckz/bounds/sync branch from b9fbbab to 3176224 Compare December 30, 2024 22:28
@illwieckz
Copy link
Member Author

So I reread everything and found the bug I was looking for. I missed the conversion of one line (I just did not re-added the new one, that's why there was no compile type error).

It works.

@illwieckz illwieckz force-pushed the illwieckz/bounds/sync branch 2 times, most recently from 4bea319 to e99d1aa Compare December 30, 2024 22:44
@illwieckz
Copy link
Member Author

I found another bug affecting BSP movers like doors. One can test it in the Vega map. The doors disappear according to the viewing angle / point of view.

@illwieckz illwieckz force-pushed the illwieckz/bounds/sync branch 4 times, most recently from 92be21c to d7d9d1e Compare December 31, 2024 02:18
@DolceTriade
Copy link
Contributor

Any fps difference?

@illwieckz illwieckz force-pushed the illwieckz/bounds/sync branch from d7d9d1e to 3239c6e Compare December 31, 2024 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants