You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
We put a lot of work into get_json_object and we were able to speed up a specific customer query by over 3x from the original GPU version we tested, and over 4x from the CPU version.
After that together with the CUDF team we have optimized from_json and JSON Scan significantly. I think it is time for us to revisit multi-get_json_object. If I rewrite this customer query to use from_json where possible we are able to speed up the current CPU implementation by an additional 1.45x making the total GPU speedup closer to 5x than 3x.
Describe the solution you'd like
This is mostly an experiment. We could try and write custom code that uses the tokens from the cudf JSON tokenizer to process multiple JSON paths in parallel similar to what we do today with multi-get_json_object. We could also just rewrite the query so that parts we feel confident doing with from_json we can do that way. We could also just say that we are at a good point and stay there. But we need to make an informed decision and ideally use more than one benchmark/query to make that decision.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
We put a lot of work into get_json_object and we were able to speed up a specific customer query by over 3x from the original GPU version we tested, and over 4x from the CPU version.
After that together with the CUDF team we have optimized from_json and JSON Scan significantly. I think it is time for us to revisit multi-get_json_object. If I rewrite this customer query to use from_json where possible we are able to speed up the current CPU implementation by an additional 1.45x making the total GPU speedup closer to 5x than 3x.
Describe the solution you'd like
This is mostly an experiment. We could try and write custom code that uses the tokens from the cudf JSON tokenizer to process multiple JSON paths in parallel similar to what we do today with multi-get_json_object. We could also just rewrite the query so that parts we feel confident doing with from_json we can do that way. We could also just say that we are at a good point and stay there. But we need to make an informed decision and ideally use more than one benchmark/query to make that decision.
The text was updated successfully, but these errors were encountered: