forked from acts-project/traccc
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve robustness and performance of CCL
This commit partially addresses acts-project#567. In the past, the CCL kernel was unable to deal with extremely large partitions. Although this is very unlikely to happen, our ODD samples contain a few cases of partitions so large it crashes the code. This commit equips the CCL code with some scratch memory which it can reserve using a mutex. This allows it enough space to do its work in global memory. Although this is, of course, slower, it should happen very infrequently. Parameters can be tuned to determine that frequency. This commit also contains a few optimizations to the code which reduce the running time on a μ = 200 event from about 1100 microseconds to 700 microseconds on an RTX A5000.
- Loading branch information
1 parent
a9f2e8c
commit 87c8735
Showing
28 changed files
with
527 additions
and
245 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
/** | ||
* traccc library, part of the ACTS project (R&D line) | ||
* | ||
* (c) 2024 CERN for the benefit of the ACTS project | ||
* | ||
* Mozilla Public License Version 2.0 | ||
*/ | ||
|
||
#pragma once | ||
|
||
#include <cstdint> | ||
|
||
#include "traccc/definitions/qualifiers.hpp" | ||
|
||
namespace traccc { | ||
/** | ||
* @brief Configuration type for massively parallel clustering algorithms. | ||
*/ | ||
struct clustering_config { | ||
/** | ||
* @brief The desired number of threads per partition. | ||
* | ||
* This directly correlates to the block size on most algorithms, so don't | ||
* set this too low (which will reduce occupancy due to available thread | ||
* slots) or too high (which may not be supported on a device). | ||
*/ | ||
unsigned int threads_per_partition; | ||
|
||
/** | ||
* @brief The maximum number of cells per thread. | ||
* | ||
* This sets the maximum thread coarsening factor for the CCA algorithm. | ||
* Increasing this value increases shared memory usage and may decrease | ||
* occupancy. If this is too low, scratch space will need to be used which | ||
* may slow the algorithm down. | ||
*/ | ||
unsigned int max_cells_per_thread; | ||
|
||
/** | ||
* @brief The desired number of cells per thread. | ||
* | ||
* This sets the desired thread coarsening factor for the CCA algorithm. | ||
* Decreasing this may decrease occupancy. Increasing this increases the | ||
* probability that scratch space will need to be used. | ||
*/ | ||
unsigned int target_cells_per_thread; | ||
|
||
/** | ||
* @brief The upscaling factor for the scratch space. | ||
* | ||
* The scratch space will be large enough to support partitions this number | ||
* of times larger than the maximum partition size determined by | ||
* `threads_per_partition` and `max_cells_per_thread` | ||
*/ | ||
unsigned int backup_size_multiplier; | ||
|
||
/** | ||
* @brief The maximum number of cells per partition. | ||
*/ | ||
TRACCC_HOST_DEVICE constexpr std::size_t max_partition_size() const { | ||
return threads_per_partition * max_cells_per_thread; | ||
} | ||
|
||
/** | ||
* @brief The target number of cells per partition. | ||
*/ | ||
TRACCC_HOST_DEVICE constexpr std::size_t target_partition_size() const { | ||
return threads_per_partition * target_cells_per_thread; | ||
} | ||
|
||
/** | ||
* @brief The total size of the scratch space, in number of cells. | ||
*/ | ||
TRACCC_HOST_DEVICE constexpr std::size_t backup_size() const { | ||
return max_partition_size() * backup_size_multiplier; | ||
} | ||
}; | ||
} // namespace traccc |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
18 changes: 18 additions & 0 deletions
18
device/common/include/traccc/clusterization/device/ccl_kernel_definitions.hpp
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
/** | ||
* traccc library, part of the ACTS project (R&D line) | ||
* | ||
* (c) 2024 CERN for the benefit of the ACTS project | ||
* | ||
* Mozilla Public License Version 2.0 | ||
*/ | ||
|
||
#pragma once | ||
|
||
namespace traccc::device::details { | ||
/// These indices in clusterization will only range from 0 to | ||
/// max_cells_per_partition, so we only need a short | ||
using index_t = unsigned short; | ||
|
||
/// The limit on the stack size in terms of cells per thread. | ||
static constexpr std::size_t CELLS_PER_THREAD_STACK_LIMIT = 32; | ||
} // namespace traccc::device::details |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.