-
Notifications
You must be signed in to change notification settings - Fork 4.7k
HIVE-28765: Iceberg: Incorrect partition statistics on time travel + partition evolution #5748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
c33a0d6
to
7b1b1b8
Compare
7b1b1b8
to
6caf105
Compare
6caf105
to
01a4117
Compare
…partition evolution
01a4117
to
e11c40d
Compare
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
Show resolved
Hide resolved
return false; | ||
} | ||
Table table = IcebergTableUtil.getTable(conf, hmsTable.getTTable()); | ||
Snapshot snapshot = IcebergTableUtil.getTableSnapshot(table, hmsTable); | ||
boolean isTimeTravel = snapshot != null && table.currentSnapshot() != null && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@okumin, could we use equals
here?
public boolean isPartitioned(org.apache.hadoop.hive.ql.metadata.Table hmsTable) {
if (!hmsTable.getTTable().isSetId()) {
return false;
}
Table table = IcebergTableUtil.getTable(conf, hmsTable.getTTable());
Snapshot snapshot = IcebergTableUtil.getTableSnapshot(table, hmsTable);
boolean isTimeTravelOrBranch = snapshot != null && snapshot.equals(table.currentSnapshot());
if (isTimeTravelOrBranch && hasUndergonePartitionEvolution(table)) {
return false;
}
return table.spec().isPartitioned();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it is simpler and tolerant with unexpected snapshots, such as having the same snapshot id but different timestamps.
https://github.com/apache/iceberg/blob/c338323b2e9cd8a862fdf328ceacc1b59ea6275a/core/src/main/java/org/apache/iceberg/BaseSnapshot.java#L320-L332
I updated it. I kept the variable name, isTimeTravel, assuming reading a different snapshot using tag/branch sounds like a time-travel query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
forked branch could be ahead of main, so I wouldn't call it time-travel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, pending tests
|
What changes were proposed in this pull request?
https://issues.apache.org/jira/browse/HIVE-28765
We use not partition stats but table stats when we access a complicated table with tagging or branching.
Why are the changes needed?
We would like to improve the query plan for a time-travel read with tagging and branching.
Does this PR introduce any user-facing change?
No
Is the change a dependency upgrade?
No
How was this patch tested?
Updated an existing query file.