Skip to content

Conversation

@quic-tirupath
Copy link
Contributor

Description

  • ONNX models exported with older Opset version contains Gelu operator decomposed into multiple operators (Div, Erf, Add, Mul).
  • QNN doesn't support Erf operator but supports Gelu operator
  • Since QNN doesn't support Erf operator, the graphs contain Gelu pattern partition between QNN and CPU EPs and degrading the inference time.

Motivation and Context

  • Identify and fuse the Gelu pattern into a QNN Gelu node improves the inference time.

 - ONNX models exported with older Opset version contains Gelu operator
   decomposed into multiple operators (Div, Erf, Add, Mul).
 - QNN doesn't support Erf operator but supports Gelu operator
 - Since QNN doesn't support Erf operator, the graphs contain Gelu pattern
   partition between QNN and CPU EPs and degrading the inference time.
 - Identify and fuse the Gelu pattern into a QNN Gelu node improves
   the inference time.
@quic-tirupath
Copy link
Contributor Author

@chilo-ms
As i mentioned in #26332, i would like to use this PR for merging this fusion.
Could you please help to trigger CI job ?

@quic-tirupath
Copy link
Contributor Author

@chilo-ms, @devang-ml
could you please refer to my comment in #26332 (comment).

Could you please help to trigger CI job on this PR.

Thanks,


// Unpack the initializer data
std::vector<uint8_t> unpacked_tensor;
if (!qnn_model_wrapper.UnpackInitializerData(*tensor_info.initializer_tensor, unpacked_tensor).IsOK()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general, a failed Status can indicate a more serious error. it might be better to propagate such an error so we will see unexpected errors if they happen. e.g., with something like ORT_THROW_IF_ERROR().

}

// Check second input of Div is sqrt(2) ≈ 1.4142
// Use a larger tolerance to handle approximations used in some models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the "larger tolerance" referred to here? it seems to be the default tolerance value.

// Check the other input node (e.g. not the Erf) is 1.0f
bool is_erf_first_input = (add_inputs[0].node_arg.Name() == erf_outputs[0].node_arg.Name());
const auto& add_const_input = add_inputs[is_erf_first_input ? 1 : 0];
if (!IsInitializerWithExpectedValue(qnn_model_wrapper, add_const_input, 1.0f, 1e-02f)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there a larger tolerance here?

/// Pattern 2: root -> Div -> Erf -> Add -> Mul -> Mul
/// Both patterns are translated into a QNN Gelu operator.
/// The contained NodeUnits can be of type SingleNode or QDQGroup (with Q-DQ nodes).
/// The second inputs to Div, Add, and Mul operations can be either constant or non-constant tensors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still true? I thought that the other inputs are required to be specific constant values.

/// <summary>
/// Represents a fusion of the Gelu pattern expanded into ONNX operators.
/// This fusion handles two patterns:
/// Pattern 1: root -> Div -> Erf -> Add -> Mul (with Mul from root)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand this description, in particular, "Mul (with Mul from root)". could it be made clearer, possibly with a more detailed diagram?

const std::unordered_map<const Node*, const NodeUnit*>& node_to_node_unit,
const std::unordered_map<const NodeUnit*, const IQnnNodeGroup*>& node_unit_to_qnn_node_group,
const logging::Logger& logger) {
// For now, all fusions involve standalone node units (i.e., no wrapping DQ/Q nodes) except MatMul w/ LPBQ encodings and Reshape
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this comment be updated?

break;
}

if (p_parent_node != nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this supposed to be in the outer for loop?

Comment on lines +171 to +172
const logging::Logger& logger) {
ORT_UNUSED_PARAMETER(logger);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general nit: if it is unused (and we don't have to deal with ifdefs where it is only sometimes unused), just comment out the name. otherwise, it is possible for someone to add a usage later and forget to remove this ORT_UNUSED_PARAMETER().

Suggested change
const logging::Logger& logger) {
ORT_UNUSED_PARAMETER(logger);
const logging::Logger& /*logger*/) {

return nullptr;
}

const NodeUnit* mul_node_unit = GetChildOfOutput(graph_viewer, *add_node_unit, add_outputs[0],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible for there to be more than one "child of output"? I assume we would only want to proceed if there is a single child.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants