Skip to content

Conversation

soumyakanti3578
Copy link
Contributor

@soumyakanti3578 soumyakanti3578 commented Sep 8, 2025

What changes were proposed in this pull request?

Added a deserializer to convert JSON plans to logical plans (RelNodes)

Why are the changes needed?

While we can serialize a plan to JSON with explain cbo formatted, we didn't have a deserializer to convert back to a RelNode.

Does this PR introduce any user-facing change?

No

How was this patch tested?

mvn test -pl ql -Dtest=org.apache.hadoop.hive.ql.optimizer.calcite.TestRelPlanParser

Copy link

@soumyakanti3578 soumyakanti3578 changed the title [WIP] - DO NOT REVIEW - Deserializer hive 28197 HIVE-28197: Add deserializer to convert JSON plans to RelNodes Sep 24, 2025
@soumyakanti3578 soumyakanti3578 marked this pull request as ready for review September 24, 2025 17:46
Copy link
Member

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't fully went through the changes but sending a first batch of comments in order not to lose them. Let me finalize the review before starting making code changes. For comments that simply require an answer feel free to share your thoughts.

String enable = pk.isEnable_cstr()? "ENABLE": "DISABLE";
String validate = pk.isValidate_cstr()? "VALIDATE": "NOVALIDATE";
String rely = pk.isRely_cstr()? "RELY": "NORELY";
enableValidateRely.put(pk.getNn_name(), ImmutableList.of(enable, validate, rely));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember exactly - I will have to look into it, but somewhere we were running into an error due to immutability of the list during serialization or deserialization.

I will try to find out exactly what the issue was.

Comment on lines +56 to +57
this(input.getCluster(), input.getTraitSet(), input.getInput(),
input.getBitSet("group"), input.getBitSetList("groups"), input.getAggregateCalls("aggs"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the following work?

Suggested change
this(input.getCluster(), input.getTraitSet(), input.getInput(),
input.getBitSet("group"), input.getBitSetList("groups"), input.getAggregateCalls("aggs"));
super(input);

If yes then can I do the same on the other RelNodes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can call super directly in some of them. We cannot call super for anything that extends Join as it doesn't have the constructor. There are others where we need to do some post-processing after calling super. I will do these changes in the next commit.

Comment on lines +118 to +129
public HiveMultiJoin(RelInput input) {
this(
input.getCluster(),
input.getInputs(),
input.getExpression("condition"),
input.getRowType("rowType"),
(List<Pair<Integer, Integer>>) input.get("getJoinInputsForHiveMultiJoin"),
(List<JoinRelType>) input.get("getJoinTypesForHiveMultiJoin"),
input.getExpressionList("filters")
);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we to modify this class? Normally we shouldn't need to serialize/deserialize MultiJoin expressions cause they never appear in the final plan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look into it in detail as it was relevant in the earlier PR: 6b94a20

Comment on lines +30 to +37
static Stream<RelNode> stream(RelNode node) {
return Stream.concat(
Stream.of(node),
node.getInputs()
.stream()
.flatMap(HiveRelNode::stream)
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep this we should add appropriate Javadoc. In addition, putting static methods in interfaces is not a good pattern; it is better to move it to a utility class.

Other than that the most common way to traverse RelNode tree is via visitor and shuttles so not sure if this kind of Stream based traversal is something that will be well adopted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to avoid doing this and rely on a shuttle.

* @param t
* @return
*/
private long getMaxNulls(RexCall call, HiveTableScan t) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we changing the selectivity estimator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be reverted.

RelPlanParser parser = new RelPlanParser(cluster, conf);
RelNode deserializedPlan = parser.parse(jsonPlan);
// Apply partition pruning to compute partition list in HiveTableScan
deserializedPlan = applyPartitionPruning(conf, deserializedPlan, cluster, planner);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the partition list? Can't we deserialize the plan without it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can deserialize the plan but partitionList of RelOptHiveTable will be empty. Also, the plans won't match up as we do print plKey in CBO plans for HiveTableScan.

// Apply partition pruning to compute partition list in HiveTableScan
deserializedPlan = applyPartitionPruning(conf, deserializedPlan, cluster, planner);
if (LOG.isDebugEnabled()) {
LOG.debug("Deserialized plan: \n{}", RelOptUtil.toString(deserializedPlan));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing logging from this API. Same reasons as the one mentioned before.

return null;
}

return HiveRelEnumTypes.toEnum(enumName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of HiveRelEnumTypes seems a bit of an overkill. Can't we simply create the instance directly and drop the entire RelEnumTypes copy?

Suggested change
return HiveRelEnumTypes.toEnum(enumName);
return HiveTableScanTrait.valueOf(enumName);

Copy link
Contributor Author

@soumyakanti3578 soumyakanti3578 Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually HiveRelEnumTypes was getting used in another place earlier but I was able to remove it: https://github.com/apache/hive/pull/5131/files#diff-29a8fea85c2750c60547a7b4d3088d3f704a48f77f6a6b9eb933f1b5b527e033R334

I didn't realize that now this was the only place where it is used. I will update this in the next commit.

Comment on lines +368 to +370
if (enumName == null) {
return null;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there cases where we don't serialize the trait? Can we ever have null here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we don't always serialize tableScanTrait. We only serialize it when pw.getDetailLevel() == SqlExplainLevel.ALL_ATTRIBUTES

}

JSONObject outJSONObject = new JSONObject(new LinkedHashMap<>());
outJSONObject.put("CBOPlan", serializeWithPlanWriter(plan, new HiveRelJsonImpl()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the extra wrapping attribute for "CBOPlan".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain cbo formatted comes with the wrapper so I guess it's a good idea to keep it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants