Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-27956: Query based compactor implementation separation #4952

Closed
wants to merge 1 commit into from

Conversation

kuczoram
Copy link
Contributor

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

Is the change a dependency upgrade?

How was this patch tested?

Copy link

sonarcloud bot commented Dec 15, 2023

Quality Gate Passed Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

17 New issues
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

Copy link

@aturoczy aturoczy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kuczoram,

Very nice change. The code is nice and clean. I'm not familiar with this code part, but is it possible to write unit test for these methods? I see most of these are private and protected. But especially those that just play with strings maybe it is worth to invest (IF possible)

CompactionQueryBuilderForInsertOnly(CompactionType compactionType, Operation operation, String resultTableName) {
super(compactionType, operation, true, resultTableName);
if (CompactionType.REBALANCE.equals(compactionType)) {
throw new IllegalArgumentException("Rebalance compaction is supported only on full ACID tables!");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use the following exception message: Rebalance compaction can only be supported on full ACID tables.
But you can ignore this request. It is just style change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @aturoczy for the review!!
Sure, I can add some unit tests, it would totally make sense.
I can change this message, no problem. :)

Copy link
Contributor

@veghlaci05 veghlaci05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good PR, only some minor things. I would introduce a factory for the query builders, however not strictly tied to it.

Comment on lines +49 to +51
CompactionQueryBuilderForMajor(CompactionType compactionType, Operation operation, String resultTableName) {
super(compactionType, operation, false, resultTableName);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CompactionType.Major could be passed directly into super call (hardcoded) instead of taking it as a parameter. Passing other values can cause this class to output incorrect SQL. The same applies for the other implementations.

@@ -72,10 +72,9 @@ public boolean run(CompactorContext context) throws IOException {
* See {@link org.apache.hadoop.hive.conf.HiveConf.ConfVars#SPLIT_GROUPING_MODE} for the config description.
*/
private List<String> getCreateQueries(String fullName, Table t, String tmpTableLocation) {
return Lists.newArrayList(new CompactionQueryBuilder(
return Lists.newArrayList(new CompactionQueryBuilderForMajor(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may create a factory class for returning the specific CompactionQueryBuilder implementations. Using this approach have some benefit:

  • the decisive logic can be hidden in the factory method
  • the implementation classes can be hidden from the compactors (the factory returns the base class), especially if you move the builders into a sub-package. better decoupling between compactors and query builders.
  • the getCreateQueries(), getCompactionQueries(), getDropQueries() method can be moved to QueryCompactor

*/
class CompactionQueryBuilderForMajor extends CompactionQueryBuilder {

private static final Logger LOG = LoggerFactory.getLogger(CompactionQueryBuilderForMajor.class.getName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used

*/
class CompactionQueryBuilderForRebalance extends CompactionQueryBuilder {

private static final Logger LOG = LoggerFactory.getLogger(CompactionQueryBuilderForRebalance.class.getName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used

Comment on lines +190 to +193
if (buckCols.size() > 0) {
query.append("CLUSTERED BY (").append(StringUtils.join(buckCols, ",")).append(") ");
List<Order> sortCols = sourceTab.getSd().getSortCols();
if (sortCols.size() > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe !buckCols.isEmpty()

*/
class CompactionQueryBuilderForInsertOnly extends CompactionQueryBuilder {

private static final Logger LOG = LoggerFactory.getLogger(CompactionQueryBuilderForInsertOnly.class.getName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used

/**
* Builds query strings that help with query-based MAJOR compaction of CRUD.
*/
class CompactionQueryBuilderForMajor extends CompactionQueryBuilder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getDdlForCreate(), defineColumns(), addTblProperties() are exactly the same as in the Rebalance implementation. You may check if the duplications can be removed without over engineering it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor stuff. else looks good. Please check for duplications & the missing Override annotations for the overridden methods

return this;
}

protected void buildSelectClauseForInsert(StringBuilder query) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Override

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ayushtkn ,
I am so sorry, I completely missed to response here. Huge thanks for the review! I went through all of your comments and fixed the issues. I just somehow forgot to answer here. As this PR became very old, I opened a new one: #5192. That one contains the original changes and the adjustments based on the review comments and also a unit test for the query generation. If you have some time, I would be very happy if you could review it.

Comment on lines +83 to +91
if (CompactionType.MAJOR.equals(compactionType) && sourcePartition != null || CompactionType.MINOR.equals(
compactionType)) {
if (sourceTab == null) {
return; // avoid NPEs, don't throw an exception but skip this part of the query
}
cols = sourceTab.getSd().getCols();
} else {
cols = null;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we adjust this same logic inside the switch case? Otherwise this if check logic looks like redundant

}
}

protected void getSourceForInsert(StringBuilder query) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing Override

}
}

protected void buildWhereClauseForInsert(StringBuilder query) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing override

Comment on lines +119 to +140
protected void buildWhereClauseForInsert(StringBuilder query) {
if (CompactionType.MAJOR.equals(compactionType) && sourcePartition != null && sourceTab != null) {
List<String> vals = sourcePartition.getValues();
List<FieldSchema> keys = sourceTab.getPartitionKeys();
if (keys.size() != vals.size()) {
throw new IllegalStateException("source partition values (" + Arrays.toString(
vals.toArray()) + ") do not match source table values (" + Arrays.toString(
keys.toArray()) + "). Failing compaction.");
}

query.append(" where ");
for (int i = 0; i < keys.size(); ++i) {
FieldSchema keySchema = keys.get(i);
query.append(i == 0 ? "`" : " and `").append(keySchema.getName()).append("`=");
if (!keySchema.getType().equalsIgnoreCase(ColumnType.BOOLEAN_TYPE_NAME)) {
query.append("'").append(vals.get(i)).append("'");
} else {
query.append(vals.get(i));
}
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is dupe in both classes here & CompactionQueryBuilderForMajor, refactor and put in one place only

if (buckCols.size() > 0) {
query.append("CLUSTERED BY (").append(StringUtils.join(buckCols, ",")).append(") ");
List<Order> sortCols = sourceTab.getSd().getSortCols();
if (sortCols.size() > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to if (!sortCols.isEmpty()) {

/**
* Builds query strings that help with query-based MAJOR compaction of CRUD.
*/
class CompactionQueryBuilderForMajor extends CompactionQueryBuilder {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

super(compactionType, operation, false, resultTableName);
}

protected void buildSelectClauseForInsert(StringBuilder query) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing Override annotation here & other, please add them for all other overriden methods as well

Comment on lines +69 to +70
if (validWriteIdList != null) {
if (validWriteIdList != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it is there twice?

* Define columns of the create query.
*/
private void defineColumns(StringBuilder query) {
if (sourceTab == null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are doing this null & return here and even other places can't we wrap our logic within if(sourceTab!=null)

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Feel free to reach out on the [email protected] list if the patch is in need of reviews.

@kuczoram
Copy link
Contributor Author

kuczoram commented Apr 11, 2024

Since this PR is very far from the master now, it is easier to open a new PR.
#5192

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants