-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve GroupIdGenerators performance for high cardinality group by keys #14685
Comments
cc @bziobrowski who is also working on optimizing the multi-stage group-by engine |
What is the difference between this and |
It makes more sense for the new config to reserve Ideally the |
Alternatively instead of introducing
and we can assign a default value of say @Jackie-Jiang do let me know if this approach suits you more. |
I wouldn't add too many configs for the same purpose because there is no way to make user understand them. I agree we should cap |
I agree with the sentiment, but in this case, I feel a new config should be introduced for the reasons I will be expanding below.
I agree, and we can do it in a separate PR since it is outside the scope of this current change.
This is precisely the reason we need a separate config to enable reserving the hash map of the Another alternative that I suggested above is to introduce a separate config solely responsible for setting the initial size of the Until we have some type of column stats for the group by columns, which can help us determine at runtime what the estimated cardinality of group by columns is, we might have to manually rely on this config. |
I know where the confusion is coming from. The initial result holder capacity should just match the initial group map capacity because the map is simply generating the index in the result holder. Do you think we can just reuse the same config for both of them? |
@Jackie-Jiang : adding some color here. I agree completely with not having too many configs, but in this case, the workload profile is very different for the V1 Engine max initial capacity and the V2 Engine AggregateOperator max initial capacity. The V1 Engine will run the Since the V1 Engine deals with a much lower amount of data, setting a low value of initial result holder capacity for it makes sense. The V2 Engine AggregateOperator however would aggregate over all the groups returned by the leaf operator, which may be in the 100s of millions per server, in which case the hash-map sizing becomes quite crucial. Hence our proposal is to have a separate config for |
While running some high-volume multi-stage engine queries on Pinot where the join key was high cardinality, we recently observed a disproportionate latency increase when data was increased across both sides of the joins for the following query shape:
After profiling conducted on a server
It turns out that the major cause of the latency increase is due to inefficient groupId generation in
org/apache/pinot/query/runtime/operator/MultistageGroupByExecutor.generateGroupByKeys
, which is happening due to a few reasons:Object2IntOpenHashMap
which performs poorly for high cardinality use cases.We are considering a few different strategies like better hash-map selection (avoid open addressing for high-cardinality), generating groupIds in batches, etc. We would be leveraging benchmarks for selecting the appropriate strategy with the most RoI.
This optimization can help boost performance for both Pinot v1 and v2 engines simultaneously, since both the engines rely on this logic. cc: @Jackie-Jiang
The text was updated successfully, but these errors were encountered: