Top-N join query still scans 6,249,580 rows on fact table even after HASH distribution and colocate optimization #62336
Unanswered
junit
asked this question in
A - General / Q&A
Replies: 2 comments
-
|
hello @junit we will analyze this issue. you can join our discord(https://discord.gg/JemTbMKj), we are always online. |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
hello @junit we have found that:
so, please feel free, your join is COLOCATE join and the 6249334 is not the scan row number. if you want to figure out the scan number of each table, you can share with us the profile, we will analyze it for you. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Doris team,
We reproduced this issue on Apache Doris 2.1.11.
We are seeing poor performance on a simple
JOIN + ORDER BY + LIMITquery in Doris.We have already optimized the table design by:
dynamic_idcolocate_withgroupHowever, the query plan still shows that Doris scans 6,249,580 rows from table
e_dynamic, although the final result only needsLIMIT 15.This is very different from MySQL, where the same SQL only reads a small number of rows from the driving table and then does PK lookup on the joined table.
We would like to know:
e_dynamicside (cardinality=6249580) in this case.ORDER BY ... LIMITbetter in this join pattern.Table definitions
e_dynamic_newr_hot_picture_tag_newQuery
Doris EXPLAIN
Key observation
Even after:
DISTRIBUTED BY HASH(DYNAMIC_ID)BUCKETScolocate_with = "dynamic_group"dynamic_idthe plan still shows:
HAS_COLO_PLAN_NODE: falseINNER JOIN(BROADCAST)e_dynamicscan cardinality = 6249580So the main issue is:
e_dynamicORDER BY t2.CREATE_TIMESTAMP DESC LIMIT 15does not effectively reduce the scan range before joinMySQL execution plan for the same SQL
In MySQL:
t1t1t2is accessed by PK lookup (eq_ref)Expected behavior
We expect Doris to avoid scanning millions of rows from
e_dynamicfor this kind of query pattern, especially after colocate/hash distribution optimization.At minimum, we hope to understand:
e_dynamicsideJOIN + ORDER BY on right table + LIMITAdditional questions
UNIQUE KEYtable with current design prevent colocate join from being used here?CREATE_TIMESTAMPineffective forORDER BY ... DESC LIMIT?ORDER BYcolumn from join table cannot be used to prune scan early?Environment
Please help us confirm what additional info is needed. We can provide:
EXPLAIN SHAPE PLANThanks!
Beta Was this translation helpful? Give feedback.
All reactions