Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADBDEV-6314: Fix unexpected output for empty grouping set #1117

Open
wants to merge 3 commits into
base: adb-6.x-dev
Choose a base branch
from

Conversation

mos65o2
Copy link

@mos65o2 mos65o2 commented Nov 12, 2024

Prohibiting grouping sets with ungrouped attributes in TargetList

Problem:
Queries with grouping expressions and ungrouped attributes in TL could
produce unexpected output if the group did not have a primary key.

Example with 3 columns where "a" is a primary key:
SELECT a,b,c FROM t GROUP BY GROUPING SETS((a),(b));

Grouping "(a)" had correct output of columns "a", "b", "c". Attribute "b" in
group "(b)" is not a primary key, so it is not unique, so it can be grouped.
In this case, the output of column "c" was undefined. This also applied to
"GROUPING SETS ((a),())", ROLLUP and CUBE (detailed behavior is
described in tests with ungrouped check).
Grouping expressions were introduced in PostgreSQL 9.5 and such queries
were prohibited. And this patch aligns the behavior with that.

Solution:
Improve parse-stage validation for ungrouped columns when using grouping
expressions.

Changes:
Similar to PostgreSQL, the parsing stage for queries with grouping
expressions reads common attributes for all groups. If any, they are placed in
"groupClauseCommonVars". In case of ungrouped attributes in TL, this list is
used to check for PK in groups.
Compared to the original, the "groupClauses" list uses TargetEntry rather than
an expression. Added function to get common refsortgroupref in grouping
clauses. At the beginning, the function assumes that all expressions are found
in grouping clauses. By running through the groupings, missing expressions
are excluded from the common refs list. The patch does not change the
behavior of regular GROUP BY.

@BenderArenadata
Copy link

Allure report https://allure.adsw.io/launch/86054

@mos65o2 mos65o2 marked this pull request as ready for review November 18, 2024 13:45
Copy link

@red1452 red1452 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wrote, that changes are similar to Postgres 9.5, but I did not find the same sources there. Can you give the commit, which is example of your patch? Or can you backport such commit?

@whitehawk

This comment was marked as resolved.

@whitehawk

This comment was marked as resolved.

@mos65o2
Copy link
Author

mos65o2 commented Nov 19, 2024

You wrote, that changes are similar to Postgres 9.5, but I did not find the same sources there. Can you give the commit, which is example of your patch?

f3d3118 introduces support in PG 9.5. Also in GPDB7.

Or can you backport such commit?

I discussed this issue with @Stolb27. I only need to correct the parser behavior without porting. My changes only concern the ungrouped columns check. Whereas the patch in PostgreSQL makes changes to the planner and much more.

@red1452
Copy link

red1452 commented Nov 19, 2024

When I try the functional_deps solely (with optimizer enabled), it fails on my side.
It appears that it is due to GUC optimizer_enable_table_alias. If I set it to off in the test, the test passes. For ex with sample patch below:

diff --git a/src/test/regress/sql/functional_deps.sql b/src/test/regress/sql/functional_deps.sql
index 878a595259..5026b275f9 100644
--- a/src/test/regress/sql/functional_deps.sql
+++ b/src/test/regress/sql/functional_deps.sql
@@ -2,6 +2,9 @@
 
 -- Enable logs that allow to show GPORCA failed to produce a plan
 SET optimizer_trace_fallback = on;
+-- start_ignore
+SET optimizer_enable_table_alias = off;
+-- end_ignore
 
 CREATE TEMP TABLE articles (
     id int CONSTRAINT articles_pkey PRIMARY KEY,
@@ -429,3 +432,6 @@ DROP TABLE test_table3;
 
 RESET optimizer_trace_fallback;
 RESET optimizer;
+-- start_ignore
+RESET optimizer_enable_table_alias;
+-- end_ignore

I guess that the CI pipeline passes Ok, because some other test disables this GUC, and doesn't reset it back.

It is a separate issue (as it was presented before your patch). But I believe we need to keep tests as low coupled with each other, as possible. So it is better to fix this problem some day soon. Can you please re-check on your side and create a ticket for this problem?

Tests for GPDB6 is run with "optimizer_enable_table_alias=off". You do not need to set GUC in the test.

export OPTIMIZER_ENABLE_TABLE_ALIAS=off

@mos65o2
Copy link
Author

mos65o2 commented Nov 19, 2024

In the description:

"groping extensions"

groping -> grouping

And I guess not "extensions", but "expressions". Need to update it throughout the text.

Also, it is better to use past tense when describing previous wrong behaviour, and present tense for new/current behaviour.

I fixed the typo and the use of tense. I hope the meaning of "grouping expressions" and expressions (Vars) does not overlap. I will also correct the comment in the code.
Perhaps it would be worth mentioning "grouping expressions" in the title, rather than "grouping sets", since this affects ROLLUP and CUBE.

@mos65o2 mos65o2 requested a review from red1452 November 19, 2024 12:34
*/
List*
get_com_grouping_ressortgroupref(Query *qry, bool *hasGrouping, List *grp_tles){

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excessive line

@whitehawk

This comment was marked as resolved.

* check_ungrouped_columns().
*
Copy link

@whitehawk whitehawk Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we replace the entire loop below with a walker similar to CQueryMutators::GetGroupUniqueTleReferencesWalker ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is very similar to what we need. But in this cycle we need to get TLE, not refsortgroupref. Because in further processing either TLE or expr is used.

@whitehawk
Copy link

Followins query:

create table funcdep1(a int primary key, b int, c int, d int);
explain (costs off) select a, b, c from funcdep1 group by GROUPING SETS(a), GROUPING SETS(a, ());

On Postgres 17 works ok:

postgres=# select version();
create table funcdep1(a int primary key, b int, c int, d int);
explain (costs off) select a, b, c from funcdep1 group by GROUPING SETS(a), GROUPING SETS(a, ());
                                                 version                                                  
----------------------------------------------------------------------------------------------------------
 PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
(1 row)

CREATE TABLE
         QUERY PLAN         
----------------------------
 HashAggregate
   Hash Key: a
   Hash Key: a
   ->  Seq Scan on funcdep1
(4 rows)

while on your branch it reports an error:

postgres=# select version();
create table funcdep1(a int primary key, b int, c int, d int);
explain (costs off) select a, b, c from funcdep1 group by GROUPING SETS(a), GROUPING SETS(a, ());
                                                                                                                  version                                                                                                                  
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 9.4.26 (Greenplum Database 6.27.1_arenadata57+dev.51.gae07e294ff build dev) on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit compiled on Nov 19 2024 11:50:41 (with assert checking)
(1 row)

CREATE TABLE
ERROR:  column "funcdep1.b" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: explain (costs off) select a, b, c from funcdep1 group by GR...

Can you please check?

@mos65o2
Copy link
Author

mos65o2 commented Nov 20, 2024

Followins query:
select a, b, c from funcdep1 group by GROUPING SETS(a), GROUPING SETS(a, ());
On Postgres 17 works ok

Also runs in PG 9.5. Maybe I should make a list of common attributes for a specific grouping expression.

@BenderArenadata
Copy link

Allure report https://allure.adsw.io/launch/86371

Also testcases are added
@mos65o2
Copy link
Author

mos65o2 commented Nov 20, 2024

It is necessary to select common attributes among top-level groupings.
This case degenerates into a simple GROUP BY, despite the absence of common attributes among the top-level GROUPING SETS, so these queries work:
create table t (a int primary key, b int, c int, d int);

select a, b, c from t group by GROUPING SETS(a), GROUPING SETS (a,()); ->
select a, b, c from t group by a, GROUPING SETS (a,());

select a, b, c from t group by GROUPING SETS(a), GROUPING SETS (b,()); ->
select a, b, c from t group by a, GROUPING SETS (b,());

It is necessary to take into account nesting. This query also degenerates into a regular GROUP BY and works:
select a, b, c from t group by GROUPING SETS (GROUPING SETS ((a), GROUPING SETS ((grouping sets(a),a)), GROUPING SETS (b,());

This query does not work:
select a, b, c from t group by GROUPING SETS (GROUPING SETS(a), GROUPING SETS (GROUPING SETS(b),a)), GROUPING SETS (b,());

With a composite key it works differently (does not work):
create table t2 (a int, b text, c int, PRIMARY KEY(a,c));
select a, b, c from t2 group by GROUPING SETS (a,c), GROUPING SETS (a,())
All queries are tested in PG 9.5.

@mos65o2
Copy link
Author

mos65o2 commented Nov 20, 2024

Followins query:

create table funcdep1(a int primary key, b int, c int, d int);
explain (costs off) select a, b, c from funcdep1 group by GROUPING SETS(a), GROUPING SETS(a, ());

while on your branch it reports an error:

I have changed the behavior. Please check. Pay attention to the tests.

@BenderArenadata
Copy link

Allure report https://allure.adsw.io/launch/86406

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants