HIVE-28197: Add deserializer to convert JSON plans to RelNodes #6067

soumyakanti3578 · 2025-09-08T17:54:41Z

What changes were proposed in this pull request?

Added a deserializer to convert JSON plans to logical plans (RelNodes)

Why are the changes needed?

While we can serialize a plan to JSON with explain cbo formatted, we didn't have a deserializer to convert back to a RelNode.

Does this PR introduce any user-facing change?

No

How was this patch tested?

mvn test -pl ql -Dtest=org.apache.hadoop.hive.ql.optimizer.calcite.TestRelPlanParser

sonarqubecloud · 2025-09-23T22:06:28Z

Quality Gate passed

Issues
64 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

zabetak

Didn't fully went through the changes but sending a first batch of comments in order not to lose them. Let me finalize the review before starting making code changes. For comments that simply require an answer feel free to share your thoughts.

zabetak · 2025-09-25T06:53:20Z

ql/src/java/org/apache/hadoop/hive/ql/metadata/NotNullConstraint.java

        String enable = pk.isEnable_cstr()? "ENABLE": "DISABLE";
        String validate = pk.isValidate_cstr()? "VALIDATE": "NOVALIDATE";
        String rely = pk.isRely_cstr()? "RELY": "NORELY";
-        enableValidateRely.put(pk.getNn_name(), ImmutableList.of(enable, validate, rely));


Why is this change necessary?

I don't remember exactly - I will have to look into it, but somewhere we were running into an error due to immutability of the list during serialization or deserialization.

I will try to find out exactly what the issue was.

zabetak · 2025-09-25T06:55:08Z

ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveAggregate.java

+    this(input.getCluster(), input.getTraitSet(), input.getInput(),
+        input.getBitSet("group"), input.getBitSetList("groups"), input.getAggregateCalls("aggs"));


Can the following work?

Suggested change

this(input.getCluster(), input.getTraitSet(), input.getInput(),

input.getBitSet("group"), input.getBitSetList("groups"), input.getAggregateCalls("aggs"));

super(input);

If yes then can I do the same on the other RelNodes?

Yes, we can call super directly in some of them. We cannot call super for anything that extends Join as it doesn't have the constructor. There are others where we need to do some post-processing after calling super. I will do these changes in the next commit.

zabetak · 2025-09-25T06:58:03Z

ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveMultiJoin.java

+  public HiveMultiJoin(RelInput input) {
+    this(
+        input.getCluster(),
+        input.getInputs(),
+        input.getExpression("condition"),
+        input.getRowType("rowType"),
+        (List<Pair<Integer, Integer>>) input.get("getJoinInputsForHiveMultiJoin"),
+        (List<JoinRelType>) input.get("getJoinTypesForHiveMultiJoin"),
+        input.getExpressionList("filters")
+    );
+  }
+


Why do we to modify this class? Normally we shouldn't need to serialize/deserialize MultiJoin expressions cause they never appear in the final plan.

I will look into it in detail as it was relevant in the earlier PR: 6b94a20

zabetak · 2025-09-25T07:00:22Z

ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveRelNode.java

+  static Stream<RelNode> stream(RelNode node) {
+    return Stream.concat(
+        Stream.of(node),
+        node.getInputs()
+            .stream()
+            .flatMap(HiveRelNode::stream)
+    );
+  }


If we keep this we should add appropriate Javadoc. In addition, putting static methods in interfaces is not a good pattern; it is better to move it to a utility class.

Other than that the most common way to traverse RelNode tree is via visitor and shuttles so not sure if this kind of Stream based traversal is something that will be well adopted.

I will try to avoid doing this and rely on a shuttle.

zabetak · 2025-09-25T07:12:38Z

ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/stats/FilterSelectivityEstimator.java

   * @param t
   * @return
   */
  private long getMaxNulls(RexCall call, HiveTableScan t) {


Why are we changing the selectivity estimator?

This can be reverted.

zabetak · 2025-09-25T10:27:09Z

ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java

+    RelPlanParser parser = new RelPlanParser(cluster, conf);
+    RelNode deserializedPlan = parser.parse(jsonPlan);
+    // Apply partition pruning to compute partition list in HiveTableScan
+    deserializedPlan = applyPartitionPruning(conf, deserializedPlan, cluster, planner);


Why do we need the partition list? Can't we deserialize the plan without it?

We can deserialize the plan but partitionList of RelOptHiveTable will be empty. Also, the plans won't match up as we do print plKey in CBO plans for HiveTableScan.

zabetak · 2025-09-25T10:31:10Z

ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java

+    // Apply partition pruning to compute partition list in HiveTableScan
+    deserializedPlan = applyPartitionPruning(conf, deserializedPlan, cluster, planner);
+    if (LOG.isDebugEnabled()) {
+      LOG.debug("Deserialized plan: \n{}", RelOptUtil.toString(deserializedPlan));


Consider removing logging from this API. Same reasons as the one mentioned before.

zabetak · 2025-09-25T10:48:32Z

ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableScan.java

+      return null;
+    }
+
+    return HiveRelEnumTypes.toEnum(enumName);


The use of HiveRelEnumTypes seems a bit of an overkill. Can't we simply create the instance directly and drop the entire RelEnumTypes copy?

Suggested change

return HiveRelEnumTypes.toEnum(enumName);

return HiveTableScanTrait.valueOf(enumName);

Actually HiveRelEnumTypes was getting used in another place earlier but I was able to remove it: https://github.com/apache/hive/pull/5131/files#diff-29a8fea85c2750c60547a7b4d3088d3f704a48f77f6a6b9eb933f1b5b527e033R334

I didn't realize that now this was the only place where it is used. I will update this in the next commit.

zabetak · 2025-09-25T10:49:38Z

ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/reloperators/HiveTableScan.java

+    if (enumName == null) {
+      return null;
+    }


Are there cases where we don't serialize the trait? Can we ever have null here?

Yes we don't always serialize tableScanTrait. We only serialize it when pw.getDetailLevel() == SqlExplainLevel.ALL_ATTRIBUTES

zabetak · 2025-09-25T11:43:35Z

ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelOptUtil.java

+    }
+
+    JSONObject outJSONObject = new JSONObject(new LinkedHashMap<>());
+    outJSONObject.put("CBOPlan", serializeWithPlanWriter(plan, new HiveRelJsonImpl()));


I don't think we need the extra wrapping attribute for "CBOPlan".

explain cbo formatted comes with the wrapper so I guess it's a good idea to keep it?

asf-ci-hive added tests pending tests failed and removed tests pending labels Sep 8, 2025

soumyakanti3578 force-pushed the deserializer-HIVE-28197 branch from a63b1f9 to 39b0ec0 Compare September 10, 2025 22:03

asf-ci-hive added tests pending tests unstable and removed tests failed tests pending labels Sep 10, 2025

asf-ci-hive added tests pending tests unstable and removed tests unstable tests pending labels Sep 19, 2025

HIVE-28197: Add deserializer to convert JSON plans to RelNodes

92b45f8

soumyakanti3578 force-pushed the deserializer-HIVE-28197 branch from 8dad604 to 92b45f8 Compare September 23, 2025 21:10

asf-ci-hive added tests pending and removed tests unstable labels Sep 23, 2025

asf-ci-hive added tests passed and removed tests pending labels Sep 23, 2025

soumyakanti3578 changed the title ~~[WIP] - DO NOT REVIEW - Deserializer hive 28197~~ HIVE-28197: Add deserializer to convert JSON plans to RelNodes Sep 24, 2025

soumyakanti3578 marked this pull request as ready for review September 24, 2025 17:46

zabetak reviewed Sep 26, 2025

View reviewed changes

		this(input.getCluster(), input.getTraitSet(), input.getInput(),
		input.getBitSet("group"), input.getBitSetList("groups"), input.getAggregateCalls("aggs"));

	this(input.getCluster(), input.getTraitSet(), input.getInput(),
	input.getBitSet("group"), input.getBitSetList("groups"), input.getAggregateCalls("aggs"));
	super(input);

	return HiveRelEnumTypes.toEnum(enumName);
	return HiveTableScanTrait.valueOf(enumName);

HIVE-28197: Add deserializer to convert JSON plans to RelNodes #6067

Are you sure you want to change the base?

HIVE-28197: Add deserializer to convert JSON plans to RelNodes #6067

Uh oh!

Conversation

soumyakanti3578 commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

sonarqubecloud bot commented Sep 23, 2025

Quality Gate passed

Uh oh!

zabetak left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

soumyakanti3578 Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

soumyakanti3578 commented Sep 8, 2025 •

edited

Loading

soumyakanti3578 Oct 2, 2025 •

edited

Loading