-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-28197: Add deserializer to convert JSON plans to RelNodes #6067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
HIVE-28197: Add deserializer to convert JSON plans to RelNodes #6067
Conversation
a63b1f9
to
39b0ec0
Compare
8dad604
to
92b45f8
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't fully went through the changes but sending a first batch of comments in order not to lose them. Let me finalize the review before starting making code changes. For comments that simply require an answer feel free to share your thoughts.
String enable = pk.isEnable_cstr()? "ENABLE": "DISABLE"; | ||
String validate = pk.isValidate_cstr()? "VALIDATE": "NOVALIDATE"; | ||
String rely = pk.isRely_cstr()? "RELY": "NORELY"; | ||
enableValidateRely.put(pk.getNn_name(), ImmutableList.of(enable, validate, rely)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember exactly - I will have to look into it, but somewhere we were running into an error due to immutability of the list during serialization or deserialization.
I will try to find out exactly what the issue was.
this(input.getCluster(), input.getTraitSet(), input.getInput(), | ||
input.getBitSet("group"), input.getBitSetList("groups"), input.getAggregateCalls("aggs")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the following work?
this(input.getCluster(), input.getTraitSet(), input.getInput(), | |
input.getBitSet("group"), input.getBitSetList("groups"), input.getAggregateCalls("aggs")); | |
super(input); |
If yes then can I do the same on the other RelNodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can call super
directly in some of them. We cannot call super
for anything that extends Join
as it doesn't have the constructor. There are others where we need to do some post-processing after calling super
. I will do these changes in the next commit.
public HiveMultiJoin(RelInput input) { | ||
this( | ||
input.getCluster(), | ||
input.getInputs(), | ||
input.getExpression("condition"), | ||
input.getRowType("rowType"), | ||
(List<Pair<Integer, Integer>>) input.get("getJoinInputsForHiveMultiJoin"), | ||
(List<JoinRelType>) input.get("getJoinTypesForHiveMultiJoin"), | ||
input.getExpressionList("filters") | ||
); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we to modify this class? Normally we shouldn't need to serialize/deserialize MultiJoin
expressions cause they never appear in the final plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look into it in detail as it was relevant in the earlier PR: 6b94a20
static Stream<RelNode> stream(RelNode node) { | ||
return Stream.concat( | ||
Stream.of(node), | ||
node.getInputs() | ||
.stream() | ||
.flatMap(HiveRelNode::stream) | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we keep this we should add appropriate Javadoc. In addition, putting static methods in interfaces is not a good pattern; it is better to move it to a utility class.
Other than that the most common way to traverse RelNode tree is via visitor and shuttles so not sure if this kind of Stream based traversal is something that will be well adopted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to avoid doing this and rely on a shuttle.
* @param t | ||
* @return | ||
*/ | ||
private long getMaxNulls(RexCall call, HiveTableScan t) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we changing the selectivity estimator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be reverted.
RelPlanParser parser = new RelPlanParser(cluster, conf); | ||
RelNode deserializedPlan = parser.parse(jsonPlan); | ||
// Apply partition pruning to compute partition list in HiveTableScan | ||
deserializedPlan = applyPartitionPruning(conf, deserializedPlan, cluster, planner); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need the partition list? Can't we deserialize the plan without it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can deserialize the plan but partitionList
of RelOptHiveTable
will be empty. Also, the plans won't match up as we do print plKey
in CBO plans for HiveTableScan
.
// Apply partition pruning to compute partition list in HiveTableScan | ||
deserializedPlan = applyPartitionPruning(conf, deserializedPlan, cluster, planner); | ||
if (LOG.isDebugEnabled()) { | ||
LOG.debug("Deserialized plan: \n{}", RelOptUtil.toString(deserializedPlan)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider removing logging from this API. Same reasons as the one mentioned before.
return null; | ||
} | ||
|
||
return HiveRelEnumTypes.toEnum(enumName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of HiveRelEnumTypes
seems a bit of an overkill. Can't we simply create the instance directly and drop the entire RelEnumTypes
copy?
return HiveRelEnumTypes.toEnum(enumName); | |
return HiveTableScanTrait.valueOf(enumName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually HiveRelEnumTypes
was getting used in another place earlier but I was able to remove it: https://github.com/apache/hive/pull/5131/files#diff-29a8fea85c2750c60547a7b4d3088d3f704a48f77f6a6b9eb933f1b5b527e033R334
I didn't realize that now this was the only place where it is used. I will update this in the next commit.
if (enumName == null) { | ||
return null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there cases where we don't serialize the trait? Can we ever have null here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we don't always serialize tableScanTrait
. We only serialize it when pw.getDetailLevel() == SqlExplainLevel.ALL_ATTRIBUTES
} | ||
|
||
JSONObject outJSONObject = new JSONObject(new LinkedHashMap<>()); | ||
outJSONObject.put("CBOPlan", serializeWithPlanWriter(plan, new HiveRelJsonImpl())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need the extra wrapping attribute for "CBOPlan".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explain cbo formatted
comes with the wrapper so I guess it's a good idea to keep it?
What changes were proposed in this pull request?
Added a deserializer to convert JSON plans to logical plans (RelNodes)
Why are the changes needed?
While we can serialize a plan to JSON with
explain cbo formatted
, we didn't have a deserializer to convert back to a RelNode.Does this PR introduce any user-facing change?
No
How was this patch tested?