-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-27956: Query based compactor implementation separation #4952
Conversation
Quality Gate passedThe SonarCloud Quality Gate passed, but some issues were introduced. 17 New issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kuczoram,
Very nice change. The code is nice and clean. I'm not familiar with this code part, but is it possible to write unit test for these methods? I see most of these are private and protected. But especially those that just play with strings maybe it is worth to invest (IF possible)
CompactionQueryBuilderForInsertOnly(CompactionType compactionType, Operation operation, String resultTableName) { | ||
super(compactionType, operation, true, resultTableName); | ||
if (CompactionType.REBALANCE.equals(compactionType)) { | ||
throw new IllegalArgumentException("Rebalance compaction is supported only on full ACID tables!"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use the following exception message: Rebalance compaction can only be supported on full ACID tables.
But you can ignore this request. It is just style change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @aturoczy for the review!!
Sure, I can add some unit tests, it would totally make sense.
I can change this message, no problem. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good PR, only some minor things. I would introduce a factory for the query builders, however not strictly tied to it.
CompactionQueryBuilderForMajor(CompactionType compactionType, Operation operation, String resultTableName) { | ||
super(compactionType, operation, false, resultTableName); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CompactionType.Major
could be passed directly into super call (hardcoded) instead of taking it as a parameter. Passing other values can cause this class to output incorrect SQL. The same applies for the other implementations.
@@ -72,10 +72,9 @@ public boolean run(CompactorContext context) throws IOException { | |||
* See {@link org.apache.hadoop.hive.conf.HiveConf.ConfVars#SPLIT_GROUPING_MODE} for the config description. | |||
*/ | |||
private List<String> getCreateQueries(String fullName, Table t, String tmpTableLocation) { | |||
return Lists.newArrayList(new CompactionQueryBuilder( | |||
return Lists.newArrayList(new CompactionQueryBuilderForMajor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may create a factory class for returning the specific CompactionQueryBuilder
implementations. Using this approach have some benefit:
- the decisive logic can be hidden in the factory method
- the implementation classes can be hidden from the compactors (the factory returns the base class), especially if you move the builders into a sub-package. better decoupling between compactors and query builders.
- the
getCreateQueries(), getCompactionQueries(), getDropQueries()
method can be moved toQueryCompactor
*/ | ||
class CompactionQueryBuilderForMajor extends CompactionQueryBuilder { | ||
|
||
private static final Logger LOG = LoggerFactory.getLogger(CompactionQueryBuilderForMajor.class.getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used
*/ | ||
class CompactionQueryBuilderForRebalance extends CompactionQueryBuilder { | ||
|
||
private static final Logger LOG = LoggerFactory.getLogger(CompactionQueryBuilderForRebalance.class.getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used
if (buckCols.size() > 0) { | ||
query.append("CLUSTERED BY (").append(StringUtils.join(buckCols, ",")).append(") "); | ||
List<Order> sortCols = sourceTab.getSd().getSortCols(); | ||
if (sortCols.size() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe !buckCols.isEmpty()
*/ | ||
class CompactionQueryBuilderForInsertOnly extends CompactionQueryBuilder { | ||
|
||
private static final Logger LOG = LoggerFactory.getLogger(CompactionQueryBuilderForInsertOnly.class.getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not used
/** | ||
* Builds query strings that help with query-based MAJOR compaction of CRUD. | ||
*/ | ||
class CompactionQueryBuilderForMajor extends CompactionQueryBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getDdlForCreate(), defineColumns(), addTblProperties() are exactly the same as in the Rebalance implementation. You may check if the duplications can be removed without over engineering it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor stuff. else looks good. Please check for duplications & the missing Override annotations for the overridden methods
return this; | ||
} | ||
|
||
protected void buildSelectClauseForInsert(StringBuilder query) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing Override
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ayushtkn ,
I am so sorry, I completely missed to response here. Huge thanks for the review! I went through all of your comments and fixed the issues. I just somehow forgot to answer here. As this PR became very old, I opened a new one: #5192. That one contains the original changes and the adjustments based on the review comments and also a unit test for the query generation. If you have some time, I would be very happy if you could review it.
if (CompactionType.MAJOR.equals(compactionType) && sourcePartition != null || CompactionType.MINOR.equals( | ||
compactionType)) { | ||
if (sourceTab == null) { | ||
return; // avoid NPEs, don't throw an exception but skip this part of the query | ||
} | ||
cols = sourceTab.getSd().getCols(); | ||
} else { | ||
cols = null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we adjust this same logic inside the switch case? Otherwise this if check logic looks like redundant
} | ||
} | ||
|
||
protected void getSourceForInsert(StringBuilder query) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing Override
} | ||
} | ||
|
||
protected void buildWhereClauseForInsert(StringBuilder query) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing override
protected void buildWhereClauseForInsert(StringBuilder query) { | ||
if (CompactionType.MAJOR.equals(compactionType) && sourcePartition != null && sourceTab != null) { | ||
List<String> vals = sourcePartition.getValues(); | ||
List<FieldSchema> keys = sourceTab.getPartitionKeys(); | ||
if (keys.size() != vals.size()) { | ||
throw new IllegalStateException("source partition values (" + Arrays.toString( | ||
vals.toArray()) + ") do not match source table values (" + Arrays.toString( | ||
keys.toArray()) + "). Failing compaction."); | ||
} | ||
|
||
query.append(" where "); | ||
for (int i = 0; i < keys.size(); ++i) { | ||
FieldSchema keySchema = keys.get(i); | ||
query.append(i == 0 ? "`" : " and `").append(keySchema.getName()).append("`="); | ||
if (!keySchema.getType().equalsIgnoreCase(ColumnType.BOOLEAN_TYPE_NAME)) { | ||
query.append("'").append(vals.get(i)).append("'"); | ||
} else { | ||
query.append(vals.get(i)); | ||
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is dupe in both classes here & CompactionQueryBuilderForMajor, refactor and put in one place only
if (buckCols.size() > 0) { | ||
query.append("CLUSTERED BY (").append(StringUtils.join(buckCols, ",")).append(") "); | ||
List<Order> sortCols = sourceTab.getSd().getSortCols(); | ||
if (sortCols.size() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to if (!sortCols.isEmpty()) {
/** | ||
* Builds query strings that help with query-based MAJOR compaction of CRUD. | ||
*/ | ||
class CompactionQueryBuilderForMajor extends CompactionQueryBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
super(compactionType, operation, false, resultTableName); | ||
} | ||
|
||
protected void buildSelectClauseForInsert(StringBuilder query) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing Override annotation here & other, please add them for all other overriden methods as well
if (validWriteIdList != null) { | ||
if (validWriteIdList != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it is there twice?
* Define columns of the create query. | ||
*/ | ||
private void defineColumns(StringBuilder query) { | ||
if (sourceTab == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are doing this null & return here and even other places can't we wrap our logic within if(sourceTab!=null)
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Since this PR is very far from the master now, it is easier to open a new PR. |
What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
Is the change a dependency upgrade?
How was this patch tested?