Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Cluster/pack multi_get_json_object paths by common prefixes #11263

Closed
revans2 opened this issue Jul 26, 2024 · 2 comments · Fixed by #11289
Closed

[FEA] Cluster/pack multi_get_json_object paths by common prefixes #11263

revans2 opened this issue Jul 26, 2024 · 2 comments · Fixed by #11289
Assignees
Labels
performance A performance related task/issue

Comments

@revans2
Copy link
Collaborator

revans2 commented Jul 26, 2024

Is your feature request related to a problem? Please describe.
In the latest code for the multi-get_json_object code we now process multiple paths for a single row in a single warp. The advantage for this is that we should have less cache issues as the data is shared and we should also have less thread divergence as it is processing the same data (at least for validation). It would be even better for thread divergence if we could also cluster the paths by common prefixes and then do some packing knowing that a warp has 32 threads. In my testing with just a hacked up sort I saw on one query a 10% performance improvement.

@revans2 revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify labels Jul 26, 2024
@revans2 revans2 self-assigned this Jul 26, 2024
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Jul 30, 2024
@revans2
Copy link
Collaborator Author

revans2 commented Aug 2, 2024

I saw a 10% improvement on T4 GPUs too.

@revans2
Copy link
Collaborator Author

revans2 commented Aug 8, 2024

I did a bunch of other performance tests grouping things in different ways and I think we could speed up some queries by 25% to 35% with more tuning. But that is going to require us to reduce the memory requirements enough that we can get a lot more paths running in parallel NVIDIA/spark-rapids-jni#2247 and then we can do a better job with clustering.

@revans2 revans2 closed this as completed Aug 8, 2024
@sameerz sameerz added performance A performance related task/issue and removed feature request New feature or request labels Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants