Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add debug output to clustering algorithms #640

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

stephenswat
Copy link
Member

@stephenswat stephenswat commented Jul 10, 2024

In #595, I equipped the CCA code with some edge case handling which allows it to handle oversized partitions. Although this makes sure the algorithm works, it also risks to slow down execution. In order to better understand how much performance we might be losing, this commit adds the ability for the SYCL and CUDA algorithms to print some warnings if they ever encounter this edge case.

@stephenswat stephenswat added cuda Changes related to CUDA improvement Improve an existing feature sycl Changes related to SYCL labels Jul 10, 2024
@stephenswat stephenswat changed the title Feat/ccl debug reporting Add debug output to clustering algorithms Jul 10, 2024
In acts-project#595, I equipped the CCA code with some edge case handling which
allows it to handle oversized partitions. Although this makes sure the
algorithm works, it also risks to slow down execution. In order to
better understand how much performance we might be losing, this commit
adds the ability for the SYCL and CUDA algorithms to print some warnings
if they ever encounter this edge case.
@krasznaa
Copy link
Member

krasznaa commented Aug 2, 2024

🤔 I think this opens a much bigger question that I've been putting off all this time...

We will need some "monitoring functionality" for the reconstruction algorithms. I only realised this a few months ago when an Allen developer talked about how they optimised the monitoring functionality in their algorithms.

You see, for debugging it's fine to transfer intermediate data products back to the host, and debug the data there. But in production running, even though we're not actively debugging the code at that point, we'll still need to be able to produce "monitoring histograms" about what the reconstruction is doing. (Number of spacepoints, their positions, this sort of stuff.) We need this to be able to detect errors in our data quickly/reliably.

This sort of information, about how often the reconstruction needed to take a slow route, falls right into this type of monitoring information in my mind. So rather than adding something specifically just for this, I think we should rather start designing a general monitoring functionality for our code. 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda Changes related to CUDA improvement Improve an existing feature sycl Changes related to SYCL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants