Skip to content

HIVE-28902: Fix unknown column PARTITION_NAME in aggrStatsUseDB #5768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 16, 2025

Conversation

wecharyu
Copy link
Contributor

What changes were proposed in this pull request?

Fix the sql in direct SQL implement of aggrStatsUseDB.

Why are the changes needed?

#4744 remove PARTITION_NAME column from PART_COL_STATS table, it would throw exception now:

java.sql.SQLSyntaxErrorException: Unknown column 'PARTITION_NAME' in 'where clause'

Does this PR introduce any user-facing change?

No.

Is the change a dependency upgrade?

No.

How was this patch tested?

Add unit test:

mvn test -Dtest.groups= -Dtest=org.apache.hadoop.hive.metastore.TestObjectStore#testAggrStatsUseDB -pl :hive-standalone-metastore-server

Copy link

Copy link
Contributor

@okumin okumin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -943,6 +944,31 @@ public void testPartitionStatisticsOps() throws Exception {
Assert.assertEquals(0, stat.size());
}

@Test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified this test case would expectedly fail without the patch.

MetaException(message:See previous errors; Error executing SQL query "select "COLUMN_NAME", "COLUMN_TYPE", min("LONG_LOW_VALUE"), max("LONG_HIGH_VALUE"), min("DOUBLE_LOW_VALUE"), max("DOUBLE_HIGH_VALUE"), min(cast("BIG_DECIMAL_LOW_VALUE" as decimal)), max(cast("BIG_DECIMAL_HIGH_VALUE" as decimal)), sum("NUM_NULLS"), max("NUM_DISTINCTS"), max("AVG_COL_LEN"), max("MAX_COL_LEN"), sum("NUM_TRUES"), sum("NUM_FALSES"), avg(("LONG_HIGH_VALUE"-"LONG_LOW_VALUE")/cast("NUM_DISTINCTS" as decimal)),avg(("DOUBLE_HIGH_VALUE"-"DOUBLE_LOW_VALUE")/"NUM_DISTINCTS"),avg((cast("BIG_DECIMAL_HIGH_VALUE" as decimal)-cast("BIG_DECIMAL_LOW_VALUE" as decimal))/"NUM_DISTINCTS"),sum("NUM_DISTINCTS") from "PART_COL_STATS" inner join "PARTITIONS" on "PART_COL_STATS"."PART_ID" = "PARTITIONS"."PART_ID" inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" where "DBS"."CTLG_NAME" = ? and "DBS"."NAME" = ? and "TBLS"."TBL_NAME" = ?  and "COLUMN_NAME" in (?) and "PARTITION_NAME" in (?,?,?) and "ENGINE" = ?  group by "COLUMN_NAME", "COLUMN_TYPE"".Failed to execute [select "COLUMN_NAME", "COLUMN_TYPE", min("LONG_LOW_VALUE"), max("LONG_HIGH_VALUE"), min("DOUBLE_LOW_VALUE"), max("DOUBLE_HIGH_VALUE"), min(cast("BIG_DECIMAL_LOW_VALUE" as decimal)), max(cast("BIG_DECIMAL_HIGH_VALUE" as decimal)), sum("NUM_NULLS"), max("NUM_DISTINCTS"), max("AVG_COL_LEN"), max("MAX_COL_LEN"), sum("NUM_TRUES"), sum("NUM_FALSES"), avg(("LONG_HIGH_VALUE"-"LONG_LOW_VALUE")/cast("NUM_DISTINCTS" as decimal)),avg(("DOUBLE_HIGH_VALUE"-"DOUBLE_LOW_VALUE")/"NUM_DISTINCTS"),avg((cast("BIG_DECIMAL_HIGH_VALUE" as decimal)-cast("BIG_DECIMAL_LOW_VALUE" as decimal))/"NUM_DISTINCTS"),sum("NUM_DISTINCTS") from "PART_COL_STATS" inner join "PARTITIONS" on "PART_COL_STATS"."PART_ID" = "PARTITIONS"."PART_ID" inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID" inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID" where "DBS"."CTLG_NAME" = ? and "DBS"."NAME" = ? and "TBLS"."TBL_NAME" = ?  and "COLUMN_NAME" in (?) and "PARTITION_NAME" in (?,?,?) and "ENGINE" = ?  group by "COLUMN_NAME", "COLUMN_TYPE"] with parameters [hive, testobjectstoredb1, testobjectstoretable1, test_part_col, test_part_col=a0, test_part_col=a1, test_part_col=a2, hive]
)
	at org.apache.hadoop.hive.metastore.MetastoreDirectSqlUtils.executeWithArray(MetastoreDirectSqlUtils.java:81)
	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.executeWithArray(MetaStoreDirectSql.java:2417)
	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.executeWithArray(MetaStoreDirectSql.java:2412)
	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.aggrStatsUseDB(MetaStoreDirectSql.java:2002)
	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.columnStatisticsObjForPartitionsBatch(MetaStoreDirectSql.java:1939)
	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.access$1800(MetaStoreDirectSql.java:134)

Copy link
Contributor

@zhangbutao zhangbutao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

@okumin okumin merged commit 625f965 into apache:master Apr 16, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants