Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Parallelize critical engine functions #10217

Open
wants to merge 137 commits into
base: master
Choose a base branch
from

Conversation

tibetiroka
Copy link
Member

@tibetiroka tibetiroka commented Jun 16, 2024

Refactor

This PR refactors various engine functions for better performance.

Summary

In C++17, execution policies were introduced for many standard functions like std::sort, std::for_each etc. This provides native multithreaded (or even vectorized) execution support on every platform with minimal front-end complexity.

On systems that don't fully support these features, fallback algorithms were implemented using TaskQueue. This is significantly slower than the standard version, but still provides good speedup over the single-threaded version.

Using these policies, AI::Step, Engine::MoveShip, Engine::DoCollisions and some other functions were optimized, resulting in an overall 3-4x speedup using the release build (tested on my pc, without native optimizations) for larger fleets.

Details

Collision handling had to be refactored to not make use of the global result lists. I'm not entirely sure why they existed in the first place, to be honest, since it didn't support querying the last collision list anyway.

AI::firingCommands has been replaced with a local variable. (Btw, I really hope that the current lockless AI::Step is actually reliable on all platforms, I'm not entirely sure about that. I think we are good.)

Engine::visuals and Engine::newVisuals were migrated to use std::list instead of std::vector to avoid unnecessary copies when parallelized. This changes the signature of a lot of other functions dealing with these variables.

Some other Engine variables needed temporary locking capabilities for concurrent modification; a helper class was created to handle these.

The tbb library (intel threading building blocks) was pulled in, because GCC uses it to implement the new execution policies.

Detailed details

There are several challenges that come up in multi-threaded systems that we have to now deal with.

Parallel functions

Unfortunately, the support for parallel stl functions is not universal (looking at you, Apple). Also, currently, GCC's required tbb cannot be used in vcpkg reliably (though that might be fixed at some point). This means that some systems can't compile the fancy C++17 functions we are using.

For these cases, I added another backend in Parallel.h. If the environment supports parallel stl, it pulls in those functions, otherwise it uses the ES TaskQueue to parallelize tasks. While our implementation is slower, it still provides better performance than sequential operations. (This backend can also be enabled via the ES_PARALLEL_USE_TASK_QUEUE CMake option.)

Parallel.h is split into two parts: defining the parallel namespace, and defining the parallel stl functions. The namespace is handled separately because Apple can't be arsed to implement the standards correctly.

seq, par, and par_unseq

These are the three execution policies supported by C++17. (We also get unseq in C++20.)

seq is sequential execution, with the same semantics as the normal stl functions. It's really useful for debugging.

par is for parallel execution. It's the most common option used in our case, and should work with everything out of the box that doesn't do concurrent edits, or reads from mutables.

par_unseq is for parallel execution that can be vectorized. This is only usable for really small functions in most cases. Do not use if there are locks in the code. Also, please note that Dictionary uses locks.

Lists, vectors and partial guarding

For containers that are iterated over, can be edited concurrently, but are rarely edited, I added PartiallyGuarded variants. These variants guarantee safe access to emplace_back, but make no such guarantees to any other function. While this is a rather niche use case, it matches exactly how we use certain Engine members for near-zero overhead.

If concurrent edits are more common, lock contention would become a serious issue in the above implementation. In these cases, I created a different container for each thread, and only synchronized the additions at the end of the operation - this way, no thread would have to wait for another to finish writing. This is extremely efficient with list splicing, but gets expensive with vectors due to the extra copy introduced at the end, so some fields had to be changed to lists.

There are also collections that are safe for concurrent edits as long as we don't edit the same item twice, such as std::map. In these cases, it is often worth locking on the key we are editing on instead of the map, as this can greatly reduce lock contention. Long story short, Body got a mutex for every object. (Since mutexes cannot be copied, I had to add a helper class with custom copy functions.)

We also need to edit ships, minables and other bodies concurrently in some cases (when they are taking damage, for example). The same per-body locks are also essential here.

Mutable members and collisions

Many classes use mutable members to cache values. One example is AI::firingCommands, another one is the results in collision sets. While these are nice for single-threaded performance, they can't scale to concurrent workloads, so they have to be removed.

This generally means one of two things:

  • passing these values to the functions using them, or
  • instantiating them inside the functions.

The first one requires more front-end changes, but avoids copies if said value is returned by the function. This was used in CollisionSet and AsteroidField. (It actually reduced the number of copies in the code, since it can now append to an existing vector of collisions.)

ResourceProvider

Another new class introduced is ResourceProvider. This is a wrapper around various per-thread resources, providing efficient resource acquirement and automatic synchronization with the non per-thread master resources. (A helper introduced for this is ResourceGuard, which guards both a lock and the associated resources. It also provides stl-style accessors to the per-thread resources.)

Testing Done

I tested that the game doesn't crash during normal gameplay, and that the mechanics seem to work.

Some sub-tick behaviour might change or become non-deterministic, but no major changes should happen.

Performance Impact

A whole lot faster.

@tibetiroka tibetiroka added the refactor A request to refactor the engine code label Jun 16, 2024
@TheGiraffe3
Copy link
Contributor

I have been noticing a significant impact on speed when in battles with the Pug, because of all their Seekers. If this helps, great!

source/PartiallyGuarded.h Outdated Show resolved Hide resolved
source/PartiallyGuarded.h Outdated Show resolved Hide resolved
@TheGiraffe3
Copy link
Contributor

That might help with the code style?

@tibetiroka
Copy link
Member Author

Yeah, it should. I'm fighting with the containers' provided libraries for now.

@tibetiroka
Copy link
Member Author

Waiting on #10218. I could add library support here, but it would be messy, as tbb would have to be pinned to a lower version, or a newer gcc would have to be imported from a third-party repo on Ubuntu 20. The compiled executable should still work on Ubuntu 20.

(tbb got a refactor a while ago, but ubuntu is shipping the gcc built before the refactor, so it doesn't work too well.)

@tibetiroka tibetiroka marked this pull request as draft June 16, 2024 17:04
source/Engine.cpp Show resolved Hide resolved
source/AsteroidField.cpp Outdated Show resolved Hide resolved
source/Body.h Outdated Show resolved Hide resolved
# Conflicts:
#	.github/workflows/cd.yaml
#	.github/workflows/cd_release.yaml
source/Engine.cpp Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor A request to refactor the engine code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants