Increase robustness of clustering partitioning #567

stephenswat · 2024-05-04T10:46:02Z

Writing this down here as a suggestion for any students or anyone else wanting to get started on traccc.

The clustering algorithm relies on being able to partition the hits into segments which are separated by at least one full row (or column) on a 2D pixel-like detector of zero-activation cells. This guarantees that there are no cross-partition clusters. The algorithm uses shared memory which is of limited side; the maximum partition size $n_\text{max}$ determines the amount of shared memory used and, as a result, the performance of the algorithm: as $n_\text{max}$ increases, performance decreases. However, this gives us an algorithm with a probabilistic success rate. For a hit density $d$, a module width or height $n$, the success probability for a given partition is approximated by $p = 1 - (1 - (1 - d)^n)^{\lfloor\frac{n_\text{max}}{dn}\rfloor+1}$. Although this chance is tiny, it still exists.

There are to projects here. First, the success probability can be increased by making the partition algorithm smarter. Second, there needs to be some mechanism to rescue the clustering in the unlikely event that a partition fails to be created.

Increasing the success probability can be done using the knowledge that a full empty row is actually a bit excessive; in reality, we only need to ensure that there is no cluster sharing between two adjacent rows. We can verify this by reifying adjacent rows and checking if they overlap under an 8-adjacency rule. This will lower performance, but probably not by much. Additional kudos if you can come up with a robust estimate of the success probability under this new rule.

Secondly, we need some logic to allocate memory in order to finish the clustering if we have an oversized cluster. This can be done fairly easily by allocating some scratch space from the device. You can allocate global memory in kernels using malloc; although this is not recommended for performance reasons, the overhead should be acceptable for this extremely rare edge case. The memory should be used to salvage the partitioning and then be deallocated.

The text was updated successfully, but these errors were encountered:

This commit partially addresses acts-project#567. In the past, the CCL kernel was unable to deal with extremely large partitions. Although this is very unlikely to happen, our ODD samples contain a few cases of partitions so large it crashes the code. This commit equips the CCL code with some scratch memory which it can reserve using a mutex. This allows it enough space to do its work in global memory. Although this is, of course, slower, it should happen very infrequently. Parameters can be tuned to determine that frequency. This commit also contains a few optimizations to the code which reduce the running time on a μ = 200 event from about 1100 microseconds to 700 microseconds on an RTX A5000.

stephenswat added good first issue Good for newcomers improvement Improve an existing feature labels May 4, 2024

stephenswat mentioned this issue May 28, 2024

Improve robustness and performance of CCL #595

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase robustness of clustering partitioning #567

Increase robustness of clustering partitioning #567

stephenswat commented May 4, 2024

Increase robustness of clustering partitioning #567

Increase robustness of clustering partitioning #567

Comments

stephenswat commented May 4, 2024