Reuse order by columns for ARRAY_AGG #55496

k-lukas · 2025-01-27T13:51:53Z

Enhancement

Given a query like this:

select
array_agg(col1 order by colx,coly,colz),
array_agg(col2 order by colx,coly,colz),
...
array_agg(coln order by colx,coly,colz)
from table
group by id

Notice array_agg is called N times with the same ordering columns colx, coly, colz.

Currently, in the pre-aggregation the ordering columns are copied into each array_agg result, then exchanged to other nodes, then sorted and then the ordering columns are deleted. This leads to high memory consumption since the ordering columns are duplicated for each invocation of array_agg (N times in the example).

SR should reuse the ordering columns if they are reused for multiple array_agg calls. I believe this is already done for window aggregations.

The text was updated successfully, but these errors were encountered:

LiShuMing · 2025-02-05T02:33:00Z

Yep, it's a good idea.

Maybe we can add one rule in optimizer to transform the expected plan for reducing the repeated order by

k-lukas · 2025-02-05T08:23:10Z

Interesting, how would this work in the optimizer? I mainly looked at the aggregation implementation in the BE and there it looked quite hard to have some shared aggregation state between multiple array_agg functions

k-lukas added the type/enhancement Make an enhancement to StarRocks label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse order by columns for ARRAY_AGG #55496

Reuse order by columns for ARRAY_AGG #55496

k-lukas commented Jan 27, 2025

LiShuMing commented Feb 5, 2025

k-lukas commented Feb 5, 2025

Reuse order by columns for ARRAY_AGG #55496

Reuse order by columns for ARRAY_AGG #55496

Comments

k-lukas commented Jan 27, 2025

Enhancement

LiShuMing commented Feb 5, 2025

k-lukas commented Feb 5, 2025