You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
In the latest code for the multi-get_json_object code we now process multiple paths for a single row in a single warp. The advantage for this is that we should have less cache issues as the data is shared and we should also have less thread divergence as it is processing the same data (at least for validation). It would be even better for thread divergence if we could also cluster the paths by common prefixes and then do some packing knowing that a warp has 32 threads. In my testing with just a hacked up sort I saw on one query a 10% performance improvement.
The text was updated successfully, but these errors were encountered:
I did a bunch of other performance tests grouping things in different ways and I think we could speed up some queries by 25% to 35% with more tuning. But that is going to require us to reduce the memory requirements enough that we can get a lot more paths running in parallel NVIDIA/spark-rapids-jni#2247 and then we can do a better job with clustering.
Is your feature request related to a problem? Please describe.
In the latest code for the multi-get_json_object code we now process multiple paths for a single row in a single warp. The advantage for this is that we should have less cache issues as the data is shared and we should also have less thread divergence as it is processing the same data (at least for validation). It would be even better for thread divergence if we could also cluster the paths by common prefixes and then do some packing knowing that a warp has 32 threads. In my testing with just a hacked up sort I saw on one query a 10% performance improvement.
The text was updated successfully, but these errors were encountered: