refactor: Parallelize critical engine functions #10217

tibetiroka · 2024-06-16T15:05:18Z

Refactor

This PR refactors various engine functions for better performance.

Summary

In C++17, execution policies were introduced for many standard functions like std::sort, std::for_each etc. This provides native multithreaded (or even vectorized) execution support on every platform with minimal front-end complexity.

On systems that don't fully support these features, fallback algorithms were implemented using TaskQueue. This is significantly slower than the standard version, but still provides good speedup over the single-threaded version.

Using these policies, AI::Step, Engine::MoveShip, Engine::DoCollisions and some other functions were optimized, resulting in an overall 3-4x speedup using the release build (tested on my pc, without native optimizations) for larger fleets.

Details

Collision handling had to be refactored to not make use of the global result lists. I'm not entirely sure why they existed in the first place, to be honest, since it didn't support querying the last collision list anyway.

AI::firingCommands has been replaced with a local variable. (Btw, I really hope that the current lockless AI::Step is actually reliable on all platforms, I'm not entirely sure about that. I think we are good.)

Engine::visuals and Engine::newVisuals were migrated to use std::list instead of std::vector to avoid unnecessary copies when parallelized. This changes the signature of a lot of other functions dealing with these variables.

Some other Engine variables needed temporary locking capabilities for concurrent modification; a helper class was created to handle these.

The tbb library (intel threading building blocks) was pulled in, because GCC uses it to implement the new execution policies.

Detailed details

There are several challenges that come up in multi-threaded systems that we have to now deal with.

Parallel functions

Unfortunately, the support for parallel stl functions is not universal (looking at you, Apple). Also, currently, GCC's required tbb cannot be used in vcpkg reliably (though that might be fixed at some point). This means that some systems can't compile the fancy C++17 functions we are using.

For these cases, I added another backend in Parallel.h. If the environment supports parallel stl, it pulls in those functions, otherwise it uses the ES TaskQueue to parallelize tasks. While our implementation is slower, it still provides better performance than sequential operations. (This backend can also be enabled via the ES_PARALLEL_USE_TASK_QUEUE CMake option.)

Parallel.h is split into two parts: defining the parallel namespace, and defining the parallel stl functions. The namespace is handled separately because Apple can't be arsed to implement the standards correctly.

`seq`, `par`, and `par_unseq`

These are the three execution policies supported by C++17. (We also get unseq in C++20.)

seq is sequential execution, with the same semantics as the normal stl functions. It's really useful for debugging.

par is for parallel execution. It's the most common option used in our case, and should work with everything out of the box that doesn't do concurrent edits, or reads from mutables.

par_unseq is for parallel execution that can be vectorized. This is only usable for really small functions in most cases. Do not use if there are locks in the code. Also, please note that Dictionary uses locks.

Lists, vectors and partial guarding

For containers that are iterated over, can be edited concurrently, but are rarely edited, I added PartiallyGuarded variants. These variants guarantee safe access to emplace_back, but make no such guarantees to any other function. While this is a rather niche use case, it matches exactly how we use certain Engine members for near-zero overhead.

If concurrent edits are more common, lock contention would become a serious issue in the above implementation. In these cases, I created a different container for each thread, and only synchronized the additions at the end of the operation - this way, no thread would have to wait for another to finish writing. This is extremely efficient with list splicing, but gets expensive with vectors due to the extra copy introduced at the end, so some fields had to be changed to lists.

There are also collections that are safe for concurrent edits as long as we don't edit the same item twice, such as std::map. In these cases, it is often worth locking on the key we are editing on instead of the map, as this can greatly reduce lock contention. Long story short, Body got a mutex for every object. (Since mutexes cannot be copied, I had to add a helper class with custom copy functions.)

We also need to edit ships, minables and other bodies concurrently in some cases (when they are taking damage, for example). The same per-body locks are also essential here.

Mutable members and collisions

Many classes use mutable members to cache values. One example is AI::firingCommands, another one is the results in collision sets. While these are nice for single-threaded performance, they can't scale to concurrent workloads, so they have to be removed.

This generally means one of two things:

passing these values to the functions using them, or
instantiating them inside the functions.

The first one requires more front-end changes, but avoids copies if said value is returned by the function. This was used in CollisionSet and AsteroidField. (It actually reduced the number of copies in the code, since it can now append to an existing vector of collisions.)

ResourceProvider

Another new class introduced is ResourceProvider. This is a wrapper around various per-thread resources, providing efficient resource acquirement and automatic synchronization with the non per-thread master resources. (A helper introduced for this is ResourceGuard, which guards both a lock and the associated resources. It also provides stl-style accessors to the per-thread resources.)

Testing Done

I tested that the game doesn't crash during normal gameplay, and that the mechanics seem to work.

Some sub-tick behaviour might change or become non-deterministic, but no major changes should happen.

Performance Impact

A whole lot faster.

TheGiraffe3 · 2024-06-16T16:31:23Z

I have been noticing a significant impact on speed when in battles with the Pug, because of all their Seekers. If this helps, great!

source/PartiallyGuarded.h

TheGiraffe3 · 2024-06-16T16:42:46Z

That might help with the code style?

tibetiroka · 2024-06-16T16:43:49Z

Yeah, it should. I'm fighting with the containers' provided libraries for now.

Co-authored-by: Loymdayddaud <[email protected]>

tibetiroka · 2024-06-16T17:04:11Z

Waiting on #10218. I could add library support here, but it would be messy, as tbb would have to be pinned to a lower version, or a newer gcc would have to be imported from a third-party repo on Ubuntu 20. The compiled executable should still work on Ubuntu 20.

(tbb got a refactor a while ago, but ubuntu is shipping the gcc built before the refactor, so it doesn't work too well.)

source/Engine.cpp

source/AsteroidField.cpp

source/Body.h

# Conflicts: # .github/workflows/cd.yaml # .github/workflows/cd_release.yaml

source/Engine.cpp

tibetiroka added 2 commits June 15, 2024 17:44

Library stuff

49854fb

cpu goes brrrrrr

f431a28

tibetiroka added the refactor A request to refactor the engine code label Jun 16, 2024

tibetiroka added 4 commits June 16, 2024 18:09

cd

d69237a

maybe

26eaa04

hopes and dreams

dd01b79

compiler version

b3573ef

tibetiroka added 3 commits June 16, 2024 18:32

help

17ebf7c

old syntax

58d2c33

appimage support

829cef5

TheGiraffe3 reviewed Jun 16, 2024

View reviewed changes

source/PartiallyGuarded.h Outdated Show resolved Hide resolved

source/PartiallyGuarded.h Outdated Show resolved Hide resolved

tibetiroka and others added 3 commits June 16, 2024 18:53

bump

4a632ec

Update source/PartiallyGuarded.h

8d29c28

Co-authored-by: Loymdayddaud <[email protected]>

Update source/PartiallyGuarded.h

60e607f

Co-authored-by: Loymdayddaud <[email protected]>

tibetiroka marked this pull request as draft June 16, 2024 17:04

tibetiroka and others added 4 commits June 16, 2024 19:09

libraries

9dd48a5

Merge remote-tracking branch 'origin/brrrrr' into brrrrr

8fbf374

Merge branch 'master' into brrrrr

f53dc81

duplicate ci fix

4a7ac3a

RisingLeaf reviewed Jun 16, 2024

View reviewed changes

source/Engine.cpp Show resolved Hide resolved

source/AsteroidField.cpp Outdated Show resolved Hide resolved

source/Body.h Outdated Show resolved Hide resolved

tibetiroka added 5 commits June 16, 2024 19:56

test

4ca49ec

remove unsafe optimization

ab04cc6

ci

5314be9

Merge branch 'refs/heads/ci-update' into brrrrr

19431c3

# Conflicts: # .github/workflows/cd.yaml # .github/workflows/cd_release.yaml

leaf

3a51a79

RisingLeaf reviewed Jun 16, 2024

View reviewed changes

source/Engine.cpp Outdated Show resolved Hide resolved

tibetiroka and others added 30 commits June 19, 2024 09:39

oops

3113011

do it safely

4e33369

fixes and optimizations

d335a71

Overcommit threads on TaskQueue

7845627

Fix collision checking

5ebbd46

Merge branch 'master' into brrrrr

f10f5ff

Minor fixes

e480b34

Merge remote-tracking branch 'origin/brrrrr' into brrrrr

5d24c5e

Minor fixes

6ae038e

Optimize BatchDrawList and misc changes

5281daa

style

9baca73

clang

115c440

clang

fbdb90a

Fix file order

052de83

Force taskqueue backend

e8800bb

debug windows

697b8c3

debug windows

89510ab

debug windows

0a2d21b

debug windows

899e565

debug windows

3571196

debug windows

f3375c8

Remove debug line

f7b024c

Minor fixes

0a09026

Minor fixes

7fd9621

Fix AI maps

b08d34e

style

8c0f329

please

df6cdfc

oops

22cbd73

Merge branch 'refs/heads/master' into brrrrr

b46bf5c

Merge branch 'master' into brrrrr

0c2d0f7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Parallelize critical engine functions #10217

refactor: Parallelize critical engine functions #10217

tibetiroka commented Jun 16, 2024 •

edited

Loading

TheGiraffe3 commented Jun 16, 2024

TheGiraffe3 commented Jun 16, 2024

tibetiroka commented Jun 16, 2024

tibetiroka commented Jun 16, 2024

refactor: Parallelize critical engine functions #10217

Are you sure you want to change the base?

refactor: Parallelize critical engine functions #10217

Conversation

tibetiroka commented Jun 16, 2024 • edited Loading

Summary

Details

Detailed details

Parallel functions

seq, par, and par_unseq

Lists, vectors and partial guarding

Mutable members and collisions

ResourceProvider

Testing Done

Performance Impact

TheGiraffe3 commented Jun 16, 2024

TheGiraffe3 commented Jun 16, 2024

tibetiroka commented Jun 16, 2024

tibetiroka commented Jun 16, 2024

tibetiroka commented Jun 16, 2024 •

edited

Loading

`seq`, `par`, and `par_unseq`