You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In ETL scenarios, after creating a large table (e.g. 262K rows) via CTAS or INSERT INTO SELECT, it is immediately joined with a known small table (e.g. 10 rows). Because auto-analyze has not yet completed, the FE optimizer cannot obtain the row count of the new table and falls back to 1, causing the large table to be incorrectly chosen as the broadcast (replicated) side. This leads to excessive memory usage and query cancellation.
The root cause chain: after CTAS/INSERT INTO SELECT becomes VISIBLE, the new table has no TableStatsMeta. StatsCalculator.getOlapTableRowCount() receives -1 and is clamped by Math.max(1, -1) to 1. If the small table has been analyzed and has a known row count (e.g. 10), the broadcast cost model considers 1 < 10 and broadcasts the large table.
Solution
After CTAS/INSERT INTO SELECT transaction becomes VISIBLE, bootstrap a minimal TableStatsMeta that contains only table-level and base-index row count, without any column statistics. This allows the optimizer to consume the row count for correct broadcast-side selection.
Core changes:
TableStatsMeta.newBootstrapStats(): creates a TableStatsMeta with only rowCount, updatedRows, and base index indexesRowCount. Does not set userInjected and does not interfere with subsequent auto-analyze scheduling.
AnalysisManager.bootstrapTableStatsIfAbsent(): double-checked locking, only writes when no TableStatsMeta exists and loadedRows > 0.
OlapInsertExecutor: invokes bootstrap after the transaction reaches VISIBLE status.
ShowTableStatsCommand: adds null guard for jobType, as bootstrap stats have no associated analyze job.
SET enable_insert_select_table_stats_bootstrap = true;
CREATETABLEtarget_tableASSELECT ... FROM large_source;
-- orINSERT INTO target_table SELECT ... FROM large_source;
-- After the statement returns, SHOW TABLE STATS shows the row count,-- and the optimizer can use it for correct broadcast-side selection.
Check List
Test:
FE Unit Test:
TableStatsMetaTest.testNewBootstrapStatsSeedsBaseIndexRowCount — verifies bootstrap metadata field correctness
OlapInsertExecutorTest.testExecuteSingleInsertVisibleBootstrapsTableStatsWhenAbsent — verifies bootstrap takes effect when enabled
OlapInsertExecutorTest.testExecuteSingleInsertVisibleDoesNotBootstrapTableStatsWhenDisabled — verifies no bootstrap when disabled (default)
ShowTableStatsCommandTest.testConstructTableResultSetForBootstrapStats — verifies SHOW TABLE STATS renders bootstrap metadata without NPE
Regression Test: insert_select_table_stats_bootstrap.groovy — two-phase assertions: when disabled, stats=1 and large table is broadcast; when enabled, stats=262,144 and small table is broadcast. Ran 10 consecutive times on a remote Doris instance, all passed.
Manual Test: verified on a deployed remote Doris instance with the latest code.
Behavior changed: No (disabled by default, no impact on existing behavior)
Does this need documentation: Yes (new session variable)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
In ETL scenarios, after creating a large table (e.g. 262K rows) via CTAS or INSERT INTO SELECT, it is immediately joined with a known small table (e.g. 10 rows). Because auto-analyze has not yet completed, the FE optimizer cannot obtain the row count of the new table and falls back to 1, causing the large table to be incorrectly chosen as the broadcast (replicated) side. This leads to excessive memory usage and query cancellation.
The root cause chain: after CTAS/INSERT INTO SELECT becomes VISIBLE, the new table has no
TableStatsMeta.StatsCalculator.getOlapTableRowCount()receives-1and is clamped byMath.max(1, -1)to1. If the small table has been analyzed and has a known row count (e.g. 10), the broadcast cost model considers1 < 10and broadcasts the large table.Solution
After CTAS/INSERT INTO SELECT transaction becomes VISIBLE, bootstrap a minimal
TableStatsMetathat contains only table-level and base-index row count, without any column statistics. This allows the optimizer to consume the row count for correct broadcast-side selection.Core changes:
TableStatsMeta.newBootstrapStats(): creates aTableStatsMetawith onlyrowCount,updatedRows, and base indexindexesRowCount. Does not setuserInjectedand does not interfere with subsequent auto-analyze scheduling.AnalysisManager.bootstrapTableStatsIfAbsent(): double-checked locking, only writes when noTableStatsMetaexists andloadedRows > 0.OlapInsertExecutor: invokes bootstrap after the transaction reaches VISIBLE status.ShowTableStatsCommand: adds null guard forjobType, as bootstrap stats have no associated analyze job.New Session Variable
enable_insert_select_table_stats_bootstrap(defaultfalse, EXPERIMENTAL)Usage:
Check List
TableStatsMetaTest.testNewBootstrapStatsSeedsBaseIndexRowCount— verifies bootstrap metadata field correctnessOlapInsertExecutorTest.testExecuteSingleInsertVisibleBootstrapsTableStatsWhenAbsent— verifies bootstrap takes effect when enabledOlapInsertExecutorTest.testExecuteSingleInsertVisibleDoesNotBootstrapTableStatsWhenDisabled— verifies no bootstrap when disabled (default)ShowTableStatsCommandTest.testConstructTableResultSetForBootstrapStats— verifiesSHOW TABLE STATSrenders bootstrap metadata without NPEinsert_select_table_stats_bootstrap.groovy— two-phase assertions: when disabled,stats=1and large table is broadcast; when enabled,stats=262,144and small table is broadcast. Ran 10 consecutive times on a remote Doris instance, all passed.