Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split the CUDA CKF into different TUs #742

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

stephenswat
Copy link
Member

This commit splits the monstrously large CUDA track finding (and some CCL) translation unit up into smaller ones, one for each of the kernels. This should speed up compilation times and decrease memory usage.

Also groups the payloads for each of the functions into convenient structs, so we don't need to pass 20+ arguments for some of the kernel calls.

Does not change the functionality of the code.

@stephenswat stephenswat added refactor Change the structure of the code cuda Changes related to CUDA labels Oct 16, 2024
@stephenswat stephenswat force-pushed the refactor/ckf_tus branch 2 times, most recently from bbeb4f7 to 30aebf1 Compare October 16, 2024 13:30
@stephenswat
Copy link
Member Author

stephenswat commented Oct 16, 2024

My plan to finally get rid of those .ipp files has failed. 🫡

Anyway, updated.

@stephenswat stephenswat force-pushed the refactor/ckf_tus branch 4 times, most recently from 24a9c9e to 4ec0290 Compare October 23, 2024 08:48
Copy link
Member

@krasznaa krasznaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure there's no hastiness on this one...

Copy link
Member

@krasznaa krasznaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, let's go with this general setup at the moment. As we discussed in person, I have some further ideas for how to tweak the code later on even further, but these changes do not make those future changes any more difficult. (The general direction of my idea is very similar.)

Comment on lines 20 to 39
typename detector_t::view_type det_data;
const int n_params;
bound_track_parameters_collection_types::view params_view;
vecmem::data::vector_view<const unsigned int> params_liveness_view;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some Doxygen documentation should be put on the variables, and the struct as well. Since now that documentation is not on the main function anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@stephenswat
Copy link
Member Author

Great! I've rebased and incorporated the change requests

This commit splits the monstrously large CUDA track finding translation
unit up into smaller ones, one for each of the kernels. This should
speed up compilation times and decrease memory usage.

Also groups the payloads for each of the functions into convenient
structs, so we don't need to pass 20+ arguments for some of the kernel
calls.

Does not change the functionality of the code.
Copy link

sonarcloud bot commented Oct 23, 2024

Copy link
Member

@krasznaa krasznaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a little more to go...

Comment on lines +20 to +27
measurement_collection_types::const_view measurements_view,
bound_track_parameters_collection_types::const_view seeds_view,
vecmem::data::jagged_vector_view<const candidate_link> links_view,
vecmem::data::vector_view<const typename candidate_link::link_index_type>
tips_view,
track_candidate_container_types::view track_candidates_view,
vecmem::data::vector_view<unsigned int> valid_indices_view,
unsigned int* n_valid_tracks);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to use traccc::device::build_tracks_payload here.

Comment on lines +105 to +107
unsigned int* shared_num_candidates;
std::pair<unsigned int, unsigned int>* shared_candidates;
unsigned int& shared_candidates_size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as I complain anyway, these variables should also get some description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda Changes related to CUDA refactor Change the structure of the code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants