-
Notifications
You must be signed in to change notification settings - Fork 3.5k
[QNN EP] Fuse Gelu pattern into a QNN Gelu Node #26417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[QNN EP] Fuse Gelu pattern into a QNN Gelu Node #26417
Conversation
- ONNX models exported with older Opset version contains Gelu operator decomposed into multiple operators (Div, Erf, Add, Mul). - QNN doesn't support Erf operator but supports Gelu operator - Since QNN doesn't support Erf operator, the graphs contain Gelu pattern partition between QNN and CPU EPs and degrading the inference time. - Identify and fuse the Gelu pattern into a QNN Gelu node improves the inference time.
|
@chilo-ms, @devang-ml Could you please help to trigger CI job on this PR. Thanks, |
|
|
||
| // Unpack the initializer data | ||
| std::vector<uint8_t> unpacked_tensor; | ||
| if (!qnn_model_wrapper.UnpackInitializerData(*tensor_info.initializer_tensor, unpacked_tensor).IsOK()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general, a failed Status can indicate a more serious error. it might be better to propagate such an error so we will see unexpected errors if they happen. e.g., with something like ORT_THROW_IF_ERROR().
| } | ||
|
|
||
| // Check second input of Div is sqrt(2) ≈ 1.4142 | ||
| // Use a larger tolerance to handle approximations used in some models |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the "larger tolerance" referred to here? it seems to be the default tolerance value.
| // Check the other input node (e.g. not the Erf) is 1.0f | ||
| bool is_erf_first_input = (add_inputs[0].node_arg.Name() == erf_outputs[0].node_arg.Name()); | ||
| const auto& add_const_input = add_inputs[is_erf_first_input ? 1 : 0]; | ||
| if (!IsInitializerWithExpectedValue(qnn_model_wrapper, add_const_input, 1.0f, 1e-02f)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is there a larger tolerance here?
| /// Pattern 2: root -> Div -> Erf -> Add -> Mul -> Mul | ||
| /// Both patterns are translated into a QNN Gelu operator. | ||
| /// The contained NodeUnits can be of type SingleNode or QDQGroup (with Q-DQ nodes). | ||
| /// The second inputs to Div, Add, and Mul operations can be either constant or non-constant tensors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still true? I thought that the other inputs are required to be specific constant values.
| /// <summary> | ||
| /// Represents a fusion of the Gelu pattern expanded into ONNX operators. | ||
| /// This fusion handles two patterns: | ||
| /// Pattern 1: root -> Div -> Erf -> Add -> Mul (with Mul from root) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand this description, in particular, "Mul (with Mul from root)". could it be made clearer, possibly with a more detailed diagram?
| const std::unordered_map<const Node*, const NodeUnit*>& node_to_node_unit, | ||
| const std::unordered_map<const NodeUnit*, const IQnnNodeGroup*>& node_unit_to_qnn_node_group, | ||
| const logging::Logger& logger) { | ||
| // For now, all fusions involve standalone node units (i.e., no wrapping DQ/Q nodes) except MatMul w/ LPBQ encodings and Reshape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this comment be updated?
| break; | ||
| } | ||
|
|
||
| if (p_parent_node != nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this supposed to be in the outer for loop?
| const logging::Logger& logger) { | ||
| ORT_UNUSED_PARAMETER(logger); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
general nit: if it is unused (and we don't have to deal with ifdefs where it is only sometimes unused), just comment out the name. otherwise, it is possible for someone to add a usage later and forget to remove this ORT_UNUSED_PARAMETER().
| const logging::Logger& logger) { | |
| ORT_UNUSED_PARAMETER(logger); | |
| const logging::Logger& /*logger*/) { |
| return nullptr; | ||
| } | ||
|
|
||
| const NodeUnit* mul_node_unit = GetChildOfOutput(graph_viewer, *add_node_unit, add_outputs[0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible for there to be more than one "child of output"? I assume we would only want to proceed if there is a single child.
Description
Motivation and Context