Skip to content

HIVE-28765: Iceberg: Incorrect partition statistics on time travel + partition evolution #5748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

okumin
Copy link
Contributor

@okumin okumin commented Apr 6, 2025

What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/HIVE-28765

We use not partition stats but table stats when we access a complicated table with tagging or branching.

Why are the changes needed?

We would like to improve the query plan for a time-travel read with tagging and branching.

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

Updated an existing query file.

@okumin okumin changed the title [WIP] HIVE-28765: Iceberg: Incorrect partition statistics on time travel + partition evolution HIVE-28765: Iceberg: Incorrect partition statistics on time travel + partition evolution Apr 12, 2025
@okumin okumin marked this pull request as ready for review April 12, 2025 07:34
return false;
}
Table table = IcebergTableUtil.getTable(conf, hmsTable.getTTable());
Snapshot snapshot = IcebergTableUtil.getTableSnapshot(table, hmsTable);
boolean isTimeTravel = snapshot != null && table.currentSnapshot() != null &&
Copy link
Member

@deniskuzZ deniskuzZ Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@okumin, could we use equals here?

  public boolean isPartitioned(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
    if (!hmsTable.getTTable().isSetId()) {
      return false;
    }
    Table table = IcebergTableUtil.getTable(conf, hmsTable.getTTable());
    Snapshot snapshot = IcebergTableUtil.getTableSnapshot(table, hmsTable);
    
    boolean isTimeTravelOrBranch = snapshot != null && snapshot.equals(table.currentSnapshot());
    if (isTimeTravelOrBranch && hasUndergonePartitionEvolution(table)) {
      return false;
    }
    return table.spec().isPartitioned();
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is simpler and tolerant with unexpected snapshots, such as having the same snapshot id but different timestamps.
https://github.com/apache/iceberg/blob/c338323b2e9cd8a862fdf328ceacc1b59ea6275a/core/src/main/java/org/apache/iceberg/BaseSnapshot.java#L320-L332

I updated it. I kept the variable name, isTimeTravel, assuming reading a different snapshot using tag/branch sounds like a time-travel query.

Copy link
Member

@deniskuzZ deniskuzZ Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forked branch could be ahead of main, so I wouldn't call it time-travel

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, pending tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants