[FEA] Cluster/pack multi_get_json_object paths by common prefixes #11263

revans2 · 2024-07-26T16:53:55Z

Is your feature request related to a problem? Please describe.
In the latest code for the multi-get_json_object code we now process multiple paths for a single row in a single warp. The advantage for this is that we should have less cache issues as the data is shared and we should also have less thread divergence as it is processing the same data (at least for validation). It would be even better for thread divergence if we could also cluster the paths by common prefixes and then do some packing knowing that a warp has 32 threads. In my testing with just a hacked up sort I saw on one query a 10% performance improvement.

revans2 · 2024-08-02T14:28:05Z

I saw a 10% improvement on T4 GPUs too.

revans2 · 2024-08-08T16:12:01Z

I did a bunch of other performance tests grouping things in different ways and I think we could speed up some queries by 25% to 35% with more tuning. But that is going to require us to reduce the memory requirements enough that we can get a lot more paths running in parallel NVIDIA/spark-rapids-jni#2247 and then we can do a better job with clustering.

revans2 added feature request New feature or request ? - Needs Triage Need team to review and classify labels Jul 26, 2024

revans2 self-assigned this Jul 26, 2024

mattahrens removed the ? - Needs Triage Need team to review and classify label Jul 30, 2024

This was referenced Aug 2, 2024

Have multi-get_json_object batch and cluster paths NVIDIA/spark-rapids-jni#2299

Merged

Use the new chunked API from multi-get_json_object #11289

Merged

revans2 closed this as completed Aug 8, 2024

sameerz added performance A performance related task/issue and removed feature request New feature or request labels Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Cluster/pack multi_get_json_object paths by common prefixes #11263

[FEA] Cluster/pack multi_get_json_object paths by common prefixes #11263

revans2 commented Jul 26, 2024

revans2 commented Aug 2, 2024

revans2 commented Aug 8, 2024

[FEA] Cluster/pack multi_get_json_object paths by common prefixes #11263

[FEA] Cluster/pack multi_get_json_object paths by common prefixes #11263

Comments

revans2 commented Jul 26, 2024

revans2 commented Aug 2, 2024

revans2 commented Aug 8, 2024