HIVE-29413: Avoid code duplication by updating getPartCols method for iceberg tables#6337
HIVE-29413: Avoid code duplication by updating getPartCols method for iceberg tables#6337ramitg254 wants to merge 2 commits intoapache:masterfrom
Conversation
| org.apache.hadoop.hive.ql.metadata.Table tbl = hive.getTable(database, table); | ||
| FetchWork work; | ||
| if (!tbl.getPartCols().isEmpty()) { | ||
| if (!tbl.getSupportedPartCols().isEmpty()) { |
There was a problem hiding this comment.
please keep the old method signature
| allColumns.addAll(table.getCols()); | ||
| allColumns.addAll(table.getPartCols()); | ||
| return allColumns; | ||
| return new ArrayList<>(table.getAllCols()); |
There was a problem hiding this comment.
It was already cloning it so I just replaced
List<FieldSchema> allColumns = new ArrayList<>();
allColumns.addAll(table.getCols());
allColumns.addAll(table.getPartCols());
with
new ArrayList<>(table.getAllCols());
|
|
||
| table.setFields(oldTable.getCols()); | ||
| table.setPartCols(oldTable.getPartCols()); | ||
| table.setPartCols(oldTable.getNativePartCols()); |
There was a problem hiding this comment.
this is super confusing!!!
table.setPartCols <- oldTable.getNativePartCols
There was a problem hiding this comment.
kept the old signature
| partitionColumns = table.hasNonNativePartitionSupport() ? | ||
| table.getStorageHandler().getPartitionKeys(table) : | ||
| table.getPartCols(); | ||
| partitionColumns = table.getSupportedPartCols(); |
There was a problem hiding this comment.
why not make it like partitionColumns = table.getPartCols() ?
| table.getCols() : partition.getCols()); | ||
| if (!desc.isFormatted()) { | ||
| cols.addAll(table.getPartCols()); | ||
| cols.addAll(table.getNativePartCols()); |
| } | ||
|
|
||
| public List<FieldSchema> getPartCols() { | ||
| public List<FieldSchema> getSupportedPartCols() { |
There was a problem hiding this comment.
we should have generic API - getPartCols
There was a problem hiding this comment.
updating getPartCols has direct effect on implementations related to alter statements and stats autogather implementation in which some of them are limited to native ones only ( like getPartCols() should return empty list for iceberg tables even if they have keys which is required for those implementations) which will introduce more ifs at those places and even if after all that if we get a green run we can't be sure that everything would be unaffected as we can't be sure that all possible scenarios affected from it are covered by tests.
so IMHO, I think we should go for this separate api for generic use cases and older one for native use cases.
I tried to explain my thinking and one such scenario more elaboratively in this comment: #6337 (comment)
There was a problem hiding this comment.
@deniskuzZ can you please validate this explanation for separate api seems reasonable or are we still leaning towards having a common api with updated implementation which can lead to further if else blocks introduction on those kind of areas mentioned above
| f_list.addAll(getCols()); | ||
| f_list.addAll(getPartCols()); | ||
| return f_list; | ||
| ArrayList<FieldSchema> allCols = new ArrayList<>(getCols()); |
| Map<String, String> colTypes = new HashMap<>(); | ||
| List<FieldSchema> partitionKeys = table.hasNonNativePartitionSupport() ? | ||
| table.getStorageHandler().getPartitionKeys(table) : table.getPartitionKeys(); | ||
| List<FieldSchema> partitionKeys = table.getSupportedPartCols(); |
There was a problem hiding this comment.
prev used api is table.getPartitionKeys()
There was a problem hiding this comment.
yes but it should be better to use the current one as table.getPartitionKeys() throws null when no keys are there which is different from table.getStorageHandler().getPartitionKeys(table) used here which gives empty arraylist for no keys.
so with current usage it will return empty list in both scenarios and rule out null pointer exception case
| } | ||
| } else { | ||
| partColSchema.addAll(tbl.getPartCols()); | ||
| partColSchema.addAll(tbl.getNativePartCols()); |
There was a problem hiding this comment.
keep the method signature unchanged
8043508 to
c6c6cdd
Compare
|
|
closing this pull request for now as we should have single method and other places where it is getting used should be updated in such a way that it should be inclusive of updated logic |
|
hi @ramitg254, since you put this activity on-hold, would you mind if I take it over? |
|
Hi, I am still working on it on this pr: #6413, it is in wip right now as I am resolving test failures corresponding to that. |



What changes were proposed in this pull request?
updated getPartCols method and introduced a boolean parameter in it to handle partition keys needed for non native tables and to get rid of this kind of code redundancy in some places:
Why are the changes needed?
to avoid code redundancy and have a common method to retrieve keys for both native and non native tables.
Does this PR introduce any user-facing change?
No
How was this patch tested?
locally