-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-8379][VL] Support query trace #8380
Conversation
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
We are also looking into this. I tried it locally but encountered the error below. Would you fix it in your PR?
|
No, I just add the config. The exception is because we don't support to serialize ValueStream Node to json, we could support it, it is not hard. |
QueryTrace only supports some operators, we need to extend it. |
I think we might need to convert the exception below into a warning log to tolerate the issue you mentioned, where only some operators are supported. |
And we also need to register spark fucntions in |
If you don't set the trace config, we won't run into this exception. The exception indicates you could not enable query trace in that case. Looks like we still have further more work to do to enable query trace in Gluten |
@jinchengchenghh what's the difference/advantage from our microbenchmark tool? |
Our benchmark accepts a total velox plan, but query trace can specify the plan node id, it can save the input of any plan node @FelixYBW |
Run Gluten Clickhouse CI on x86 |
Can we consolidate the two functionalities? They sound overlap more or less? |
If it can save the input of any plan node and reproduce the node only, it will be useful to debug. |
@@ -558,6 +558,18 @@ std::unordered_map<std::string, std::string> WholeStageResultIterator::getQueryC | |||
configs[velox::core::QueryConfig::kSparkLegacyDateFormatter] = "false"; | |||
} | |||
|
|||
const auto setIfExists = [&](const std::string& glutenKey, const std::string& veloxKey) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this func is common, should we declare it as static member function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good suggestion, I think the config needs to do the refactor, now if the config not set by java side, we will set a default value of velox config to velox query config, we should change to not set the config.
If velox config default value is changed, we can change automatic with them if we don't have different default value with velox.
69a5f31
to
f8b94a4
Compare
Run Gluten ClickHouse CI on ARM |
Run Gluten ClickHouse CI on ARM |
2 similar comments
Run Gluten ClickHouse CI on ARM |
Run Gluten ClickHouse CI on ARM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible we finally get the generic benchmark and this feature merged in Gluten? Since both are invasive to the core execution code path. E.g., Generic benchmark requires for a individual parameter passed through JNI bridge and this feature requires for changing the ValueStreamNode
to ValueNode
. Do both features have own users as of now?
std::string getQueryId(const std::unordered_map<std::string, std::string>& confMap) { | ||
auto it = confMap.find(kQueryTraceQueryId); | ||
if (it != confMap.end()) { | ||
return it->second; | ||
} | ||
return ""; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little bit confused with the empty query ID. Can we always give a meaningful ID to Velox?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we uses the applicationId_stageId as the queryId?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we uses the applicationId_stageId as the queryId?
Do we have applicationId passed through JNI? Or could just align it with the task name somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If applicationId
is important for the feature and not yet passed through JNI, could also add it and use it for query context and task names / IDs in another PR. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The queryId concept is not really query id, it is velox query id, not Spark queryId, now one query for one task, so I think we can use the same name with task name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we don't pass the applicationId
VELOX_CHECK_LT(streamIdx, inputIters_.size(), "Could not find stream index {} in input iterator list.", streamIdx); | ||
const auto iterator = inputIters_[streamIdx]; | ||
while (iterator->hasNext()) { | ||
auto cb = VeloxColumnarBatch::from(defaultLeafVeloxMemoryPool().get(), iterator->next()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the memory managed by Spark? Given that defaultLeafVeloxMemoryPool
is global.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only used in benchmark, so we don't need to track it.
docs/developers/QueryTrace.md
Outdated
--- | ||
layout: page | ||
title: How To Use Gluten | ||
nav_order: 1 | ||
parent: Developer Overview | ||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The header needs to update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Would you update title
(and perhaps other fields as well) in the header? Thanks.
@jinchengchenghh It is awesome to support query tracing in gluten. By the way, I am planning to support |
Run Gluten ClickHouse CI on ARM |
The GenericBenchmark is frequently used by us CC @JkSelf , it is really usefully for debug in cpp side, we can use vscode to do step-by-step debug. And query trace is widely used in Velox to do correctness verification and performance profiling, so I think we need to enable it in Gluten. and with this optimization #8380 (comment), we could drop the ValueStreamNode to ValusNode replace. @zhztheplayer |
As discussed with @xiaoxmeng and @jinchengchenghh offline, we could add simple |
I tried the serialize ValueStreamNode as empty serialization, but failed by deserialization. The plan node deserialization is registered in velox, but ValueStreamNode exists in gluten, so we cannot do that.
|
Can you help review again? Thanks! @zhztheplayer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits. Thank you!
docs/developers/QueryTrace.md
Outdated
--- | ||
layout: page | ||
title: How To Use Gluten | ||
nav_order: 1 | ||
parent: Developer Overview | ||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Would you update title
(and perhaps other fields as well) in the header? Thanks.
.createWithDefault(false) | ||
|
||
val QUERY_TRACE_DIR = buildConf("spark.gluten.sql.columnar.backend.velox.queryTraceDir") | ||
.doc("Base dir of a query to store tracing data.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if one leaves this empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the config is not set correctly, Velox will throw exception, so as other config.
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
W0120 15:09:59.768762 3040717 GenericBenchmark.cc:253] Setting CPU for thread 0 to 0
terminate called after throwing an instance of 'facebook::velox::VeloxUserError'
what(): Exception: VeloxUserError
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: Query trace enabled but the trace dir is not set
Retriable: False
Expression: !queryConfig.queryTraceDir().empty()
Function: maybeMakeTraceConfig
File: /mnt/DP_disk1/code/incubator-gluten/ep/build-velox/build/velox_ep/velox/exec/Task.cpp
Line: 3027
Stack trace:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we set it with default /tmp/query_trace
?
BTW, if we want expose the config to user, we need remove the internal()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config is only used in internal, user will not set it by this config name, because we need to enable query trace in benchmark, so user set it by FLAGS_XX in benchmark
std::string getQueryId(const std::unordered_map<std::string, std::string>& confMap) { | ||
auto it = confMap.find(kQueryTraceQueryId); | ||
if (it != confMap.end()) { | ||
return it->second; | ||
} | ||
return ""; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we uses the applicationId_stageId as the queryId?
Do we have applicationId passed through JNI? Or could just align it with the task name somehow.
Run Gluten ClickHouse CI on ARM |
4ac6780
to
f82e5a6
Compare
Run Gluten ClickHouse CI on ARM |
Thank you, Chengcheng for your effort! |
Run the MicroBenchmark to generate stage level plan and then enable query trace in benchmark to profile node level query. Benchmark with query trace enabled replaces ValueStreamNode which is hard to serialize to ValuesNode. This issue may be fixed by plan serialization optimization that only serializes the plan node to profile in velox query trace. facebookincubator/velox#12084
Run the MicroBenchmark to generate stage level plan and then enable query trace in benchmark to profile node level query. Benchmark with query trace enabled replaces ValueStreamNode which is hard to serialize to ValuesNode. This issue may be fixed by plan serialization optimization that only serializes the plan node to profile in velox query trace. facebookincubator/velox#12084