Skip to content

Incorrect optimization of X OR X #6182

@dllggyx

Description

@dllggyx

Hi, I encountered a non-equivalence issue with the code below.

def rewrite(schema_ddl, raw_sql, target_dialect='mysql'):
    schema = sqlglot.MappingSchema(dialect=target_dialect)
    create_expressions = sqlglot.parse(schema_ddl, read=target_dialect)

    # Iterate over each CREATE expression
    for create_expr in create_expressions:
        if isinstance(create_expr, exp.Create) and create_expr.kind == 'TABLE':
            table_expression = create_expr.this

            # Extract column definitions
            columns = {}
            if isinstance(table_expression, exp.Schema):
                for column_def in table_expression.expressions:
                    if isinstance(column_def, exp.ColumnDef):
                        col_name = column_def.this.name
                        col_type = column_def.kind.sql(dialect=target_dialect)
                        columns[col_name] = col_type
                table_name = table_expression.this
            else:
                table_name = table_expression

            schema.add_table(table_name, columns)

    expression = sqlglot.parse_one(raw_sql, read=target_dialect)
    optimized_expression = optimize(expression, schema=schema, dialect=target_dialect)
    optimized_sql = optimized_expression.sql(dialect=target_dialect, pretty=True)

    print(optimized_sql)

I ran the code with the following inputs:

schema_ddl:

DROP TABLE IF EXISTS `t0`;
CREATE TABLE `t0` (
    `c0` float NOT NULL,
    PRIMARY KEY (`c0`),
    UNIQUE KEY `c0` (`c0`),
    KEY `i0` (`c0` DESC) USING BTREE
) ;
INSERT INTO `t0` VALUES (-1160540000),(0),(0.344569),(0.408406),(0.648023),(429587000);

DROP TABLE IF EXISTS `t1`;
CREATE TABLE `t1` (
    `c0` float NOT NULL ,
    PRIMARY KEY (`c0`),
    UNIQUE KEY `c0` (`c0`),
    UNIQUE KEY `i2` (`c0`),
    KEY `i1` (`c0`)
) ;
INSERT INTO `t1` VALUES (-416404000),(0),(0.307241),(0.379278),(0.408406),(93),(429587000);

raw_sql:

SELECT DISTINCT CAST((- (BIT_COUNT((t1.c0) OR (t1.c0)))) AS SIGNED) AS ref0 FROM t1, t0;

Then I got the optimized sql:

SELECT DISTINCT
  CAST(-BIT_COUNT(`t1`.`c0`) AS SIGNED) AS `ref0`
FROM `t1` AS `t1`
CROSS JOIN `t0` AS `t0`

The execution result of raw sql in MySQL 8.0.4:

+------+
| ref0 |
+------+
|   -1 |
|    0 |
+------+
2 rows in set (0.002 sec)

The result of optimized sql:

+------+
| ref0 |
+------+
|  -49 |
|    0 |
|   -5 |
|  -14 |
+------+
4 rows in set (0.001 sec)

The optimizer may mistakenly treat a logical operation (X OR X) as a useless operation that can be simplified to X, just like #6104.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions