-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NNPA] Check return values of zdnn calls. (follow-up of PR#2267) #2338
base: main
Are you sure you want to change the base?
Conversation
… krnl-to-llvm conversion. Signed-off-by: Yasushi Negishi <[email protected]>
Signed-off-by: Haruki Imai <[email protected]>
Signed-off-by: Haruki Imai <[email protected]>
Signed-off-by: Haruki Imai <[email protected]>
Signed-off-by: Haruki Imai <[email protected]>
Signed-off-by: Haruki Imai <[email protected]>
Signed-off-by: Haruki Imai <[email protected]>
Signed-off-by: Haruki Imai <[email protected]>
Signed-off-by: Haruki Imai <[email protected]>
Signed-off-by: Haruki Imai <[email protected]>
Signed-off-by: Haruki Imai <[email protected]>
Signed-off-by: Haruki Imai <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about having two options (feel free to change the option names):
nnpa-check-zdnn-return-value
to emit code that checks zdnn return valuesnnpa-stop-if-zdnn-failed
to terminate if the zdnn function returns error, this is valid ifnnpa-check-zddn-return-value
is on.
// Exit | ||
if (funcCallErrorExit) { | ||
MLIRContext *context = rewriter.getContext(); | ||
Type int32Ty = IntegerType::get(context, 32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use i64 since we have an issue with i32 on big endian machine. More info: #778
// compare the return value with ref | ||
std::string errorMsg = | ||
"onnx-mlir: Error in zDNN call(" + apiIdStr(apiId) + "): returned "; | ||
equalOrExit(module, rewriter, loc, ref, ret, errorMsg, funcCallErrorExit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since NNPA calls already check the return value, I recommend to make this check optional when we need by providing an option to onnx-mlir
.
Type int32Ty = IntegerType::get(context, 32); | ||
Value one = create.math.constant(int32Ty, 1); | ||
FlatSymbolRefAttr exitRef = krnl::getOrInsertExit(rewriter, module); | ||
create.llvm.call({}, exitRef, {one}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In onnx-mlir, we return null
instead of exit
. Could you change it to return null
?
llvm::cl::desc("Execution failed when external function call failed." | ||
" Currently only zDNN calls in NNPA are supported."), | ||
llvm::cl::init(false), llvm::cl::cat(OnnxMlirOptions)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this inside the NNPA folder, since it is for NNPA only. Perhaps, prefixing NNPA options with nnpa
is easier for user.
%0 = memref.alloc() : memref<10x10xf32> | ||
%1 = memref.alloc() : memref<1x1x32x64xf32> | ||
"zlow.stick"(%0, %1) : (memref<10x10xf32>, memref<1x1x32x64xf32>) -> () | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following tests are sharing the same pattern. It looks redundant to test for all operations.
Instead, you can have a test with two or three zlow ops. e.g.
y = zlow.stick(x)
z = zlow.relu(y)
out = zlow.unstick(z)
return out
// CHECK: llvm.call @printf([[VAR_73_1_]], [[VAR_69_1_]]) : (!llvm.ptr, i32) -> () | ||
// CHECK: [[VAR_74_1_:%.+]] = llvm.mlir.constant(1 : i32) : i32 | ||
// CHECK: llvm.call @exit([[VAR_74_1_]]) : (i32) -> () | ||
// CHECK: llvm.br ^bb2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you call exit
, llvm.br ^bb2
looks like a dead code?
@AlexandreEichenberger They don't have any plan to update error handling in zDNN now. The zDNN returns error code, but currently onnx-mlir ignores it. So, this PR adds mechanism to handle it. |
@imaihal zDNN checks the error code and display an error message for every zdnn function: https://github.com/IBM/zDNN/blob/main/zdnn/aiu_ops.c#L151
It would be very easy to add the function_code into the ZDNN_STATUS's message. Do you know why they reject updating the message with function code? |
@tungld Sorry, correctly speaking, it is not rejected. @negiyas created an issue about IBM/zDNN#17, but not answered yet. |
If you think about it, if you have 100 calls to zdnn matmul, you will have 100 insertion of code that does this check. If it is in the library, there is already some code checking for errors, and it would be only once per operation (e.g. only 1 copy in the matmul code regardless of how many instances of calls there is to the zdnn matmul). |
Signed-off-by: Haruki Imai <[email protected]>
It seems I accidentary updated before. Signed-off-by: Haruki Imai <[email protected]>
This PR is follow-up of PR #2267.
Current main branch does not check the return values of zDNN functions and continues the execution without displaying any messages in onnx-mlir if the function returns errors . (We can see errors of zDNN, but they are not enough to identify the operations that generate errors.) This PR implements code to check it and display API name and the value when it returns errors. This message helps to identify the operation that generates errors and its causes. Also, this PR provides an option
--func-call-error-exit
to stop execution when the functions return an error.Example error message in T5
Signed-off-by: Haruki Imai [email protected]
Co-authored-by: Yasushi Negishi [email protected]