Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query by using materialized view query rewrite for iceberg table return partial result #54536

Open
Fulton-x opened this issue Dec 31, 2024 · 0 comments
Labels
type/bug Something isn't working

Comments

@Fulton-x
Copy link

Steps to reproduce the behavior (Required)

  1. Create iceberg table in AWS Athena,

    CREATE TABLE iceberg_test2 (
    prcdate date,
    price double,
    localcode string)
    PARTITIONED BY (month(prcdate))
    LOCATION 's3://bucket/directory/'
    TBLPROPERTIES (
    'table_type'='iceberg',
    'format'='PARQUET',
    'write_compression'='GZIP'
    );

  2. Create external catalog in StarRocks and materialized view

    CREATE EXTERNAL CATALOG iceberg_catalog_glue_test
    PROPERTIES
    (
    "type" = "iceberg",
    "iceberg.catalog.type" = "glue",
    "aws.glue.use_instance_profile" = "true",
    "aws.glue.region" = "us-east-2",
    "aws.s3.use_instance_profile" = "true",
    "aws.s3.region" = "us-east-2"
    );

    CREATE MATERIALIZED VIEW test1view (prcdate, price, localcode)
    PARTITION BY (date_trunc('month', prcdate))
    DISTRIBUTED BY HASH(localcode)
    REFRESH MANUAL
    PROPERTIES (
    "replicated_storage" = "true",
    "force_external_table_query_rewrite" = "CHECKED",
    "query_rewrite_consistency" = "CHECKED",
    "mv_rewrite_staleness_second" = "120",
    "replication_num" = "1",
    "enable_query_rewrite" = "true"
    )
    AS
    SELECT prcdate, price, localcode FROM iceberg_catalog_glue_test.dev_db.iceberg_test2;

  3. Query select DISTINCT prcdate from iceberg_catalog_glue_test.dev_db.iceberg_test2;

  4. Insert some data to Athena table

    insert into iceberg_test2 values(date('2024-10-03'),10,'test1');
    insert into iceberg_test2 values(date('2024-10-05'),10,'test1');
    insert into iceberg_test2 values(date('2024-11-03'),10,'test1');

  5. Refresh materilized view REFRESH MATERIALIZED VIEW test1view;

  6. Query result is expected select DISTINCT prcdate from iceberg_catalog_glue_test.dev_db.iceberg_test2;

  7. Further Insert some data to Athena table

    insert into iceberg_test2 values(date('2024-11-08'),10,'test1');

After a period of time, possible external table metadata had been refreshed
8. Query result is not expected select DISTINCT prcdate from iceberg_catalog_glue_test.dev_db.iceberg_test2;

Expected behavior (Required)

prcdate
2024-10-03
2024-10-05
2024-11-05
2024-11-08

Real behavior (Required)

prcdate
2024-10-05
2024-10-03

Query result of materialized view is expected SELECT distinct prcdate FROM test1view;

prcdate
2024-10-03
2024-10-05
2024-11-05

Here's the query explain,still materialized view was selected to use and looks one partition is not be queried explain select DISTINCT prcdate from iceberg_catalog_glue_test.dev_db.iceberg_test2

PLAN FRAGMENT 0
OUTPUT EXPRS:1: prcdate
PARTITION: UNPARTITIONED

RESULT SINK

5:EXCHANGE

PLAN FRAGMENT 1
OUTPUT EXPRS:
PARTITION: HASH_PARTITIONED: 1: prcdate

STREAM DATA SINK
EXCHANGE ID: 05
UNPARTITIONED

4:AGGREGATE (merge finalize)
| group by: 1: prcdate
|
3:EXCHANGE

PLAN FRAGMENT 2
OUTPUT EXPRS:
colocate exec groups: ExecGroup{groupId=1, nodeIds=[0, 1, 2]}
PARTITION: RANDOM

STREAM DATA SINK
EXCHANGE ID: 03
HASH_PARTITIONED: 1: prcdate

2:AGGREGATE (update serialize)
| STREAMING
| group by: 1: prcdate
|
1:Project
| <slot 1> : 4: prcdate
|
0:OlapScanNode
TABLE: test1view
PREAGGREGATION: ON
partitions=1/2
rollup: test1view
tabletRatio=6/6
tabletList=40459,40461,40463,40465,40467,40469
cardinality=2
avgRowSize=8.0
MaterializedView: true

StarRocks version (Required)

  • 3.3.7
@Fulton-x Fulton-x added the type/bug Something isn't working label Dec 31, 2024
@Fulton-x Fulton-x changed the title Query rewrite for iceberg table return partial result Materialized view query rewrite for iceberg table return partial result Dec 31, 2024
@Fulton-x Fulton-x changed the title Materialized view query rewrite for iceberg table return partial result Query by using materialized view query rewrite for iceberg table return partial result Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant