[QNN EP] Fuse Gelu pattern into a QNN Gelu Node #26417

quic-tirupath · 2025-10-27T21:51:04Z

Description

ONNX models exported with older Opset version contains Gelu operator decomposed into multiple operators (Div, Erf, Add, Mul).
QNN doesn't support Erf operator but supports Gelu operator
Since QNN doesn't support Erf operator, the graphs contain Gelu pattern partition between QNN and CPU EPs and degrading the inference time.

Motivation and Context

Identify and fuse the Gelu pattern into a QNN Gelu node improves the inference time.

- ONNX models exported with older Opset version contains Gelu operator decomposed into multiple operators (Div, Erf, Add, Mul). - QNN doesn't support Erf operator but supports Gelu operator - Since QNN doesn't support Erf operator, the graphs contain Gelu pattern partition between QNN and CPU EPs and degrading the inference time. - Identify and fuse the Gelu pattern into a QNN Gelu node improves the inference time.

quic-tirupath · 2025-10-27T21:55:51Z

@chilo-ms
As i mentioned in #26332, i would like to use this PR for merging this fusion.
Could you please help to trigger CI job ?

quic-tirupath · 2025-10-30T00:36:21Z

@chilo-ms, @devang-ml
could you please refer to my comment in #26332 (comment).

Could you please help to trigger CI job on this PR.

Thanks,

edgchen1 · 2025-10-30T21:45:03Z

onnxruntime/core/providers/qnn/builder/qnn_node_group/gelu_fusion.cc

+
+  // Unpack the initializer data
+  std::vector<uint8_t> unpacked_tensor;
+  if (!qnn_model_wrapper.UnpackInitializerData(*tensor_info.initializer_tensor, unpacked_tensor).IsOK()) {


in general, a failed Status can indicate a more serious error. it might be better to propagate such an error so we will see unexpected errors if they happen. e.g., with something like ORT_THROW_IF_ERROR().

edgchen1 · 2025-10-30T23:48:11Z

onnxruntime/core/providers/qnn/builder/qnn_node_group/gelu_fusion.cc

+  }
+
+  // Check second input of Div is sqrt(2) ≈ 1.4142
+  // Use a larger tolerance to handle approximations used in some models


what's the "larger tolerance" referred to here? it seems to be the default tolerance value.

edgchen1 · 2025-10-30T23:50:21Z

onnxruntime/core/providers/qnn/builder/qnn_node_group/gelu_fusion.cc

+  // Check the other input node (e.g. not the Erf) is 1.0f
+  bool is_erf_first_input = (add_inputs[0].node_arg.Name() == erf_outputs[0].node_arg.Name());
+  const auto& add_const_input = add_inputs[is_erf_first_input ? 1 : 0];
+  if (!IsInitializerWithExpectedValue(qnn_model_wrapper, add_const_input, 1.0f, 1e-02f)) {


why is there a larger tolerance here?

edgchen1 · 2025-10-31T00:04:07Z

onnxruntime/core/providers/qnn/builder/qnn_node_group/gelu_fusion.h

+/// Pattern 2: root -> Div -> Erf -> Add -> Mul -> Mul
+/// Both patterns are translated into a QNN Gelu operator.
+/// The contained NodeUnits can be of type SingleNode or QDQGroup (with Q-DQ nodes).
+/// The second inputs to Div, Add, and Mul operations can be either constant or non-constant tensors.


is this still true? I thought that the other inputs are required to be specific constant values.

edgchen1 · 2025-10-31T00:06:54Z

onnxruntime/core/providers/qnn/builder/qnn_node_group/gelu_fusion.h

+/// <summary>
+/// Represents a fusion of the Gelu pattern expanded into ONNX operators.
+/// This fusion handles two patterns:
+/// Pattern 1: root -> Div -> Erf -> Add -> Mul (with Mul from root)


I don't quite understand this description, in particular, "Mul (with Mul from root)". could it be made clearer, possibly with a more detailed diagram?

edgchen1 · 2025-10-31T00:09:13Z

onnxruntime/core/providers/qnn/builder/qnn_node_group/qnn_node_group.cc

    const std::unordered_map<const Node*, const NodeUnit*>& node_to_node_unit,
    const std::unordered_map<const NodeUnit*, const IQnnNodeGroup*>& node_unit_to_qnn_node_group,
    const logging::Logger& logger) {
  // For now, all fusions involve standalone node units (i.e., no wrapping DQ/Q nodes) except MatMul w/ LPBQ encodings and Reshape


should this comment be updated?

edgchen1 · 2025-10-31T00:12:53Z

onnxruntime/core/providers/qnn/builder/qnn_node_group/utils.cc

+        break;
+      }
+
+      if (p_parent_node != nullptr) {


is this supposed to be in the outer for loop?

edgchen1 · 2025-10-31T00:20:40Z

onnxruntime/core/providers/qnn/builder/qnn_node_group/gelu_fusion.cc

+    const logging::Logger& logger) {
+  ORT_UNUSED_PARAMETER(logger);


general nit: if it is unused (and we don't have to deal with ifdefs where it is only sometimes unused), just comment out the name. otherwise, it is possible for someone to add a usage later and forget to remove this ORT_UNUSED_PARAMETER().

Suggested change

const logging::Logger& logger) {

ORT_UNUSED_PARAMETER(logger);

const logging::Logger& /*logger*/) {

edgchen1 · 2025-10-31T00:25:49Z

onnxruntime/core/providers/qnn/builder/qnn_node_group/gelu_fusion.cc

+    return nullptr;
+  }
+
+  const NodeUnit* mul_node_unit = GetChildOfOutput(graph_viewer, *add_node_unit, add_outputs[0],


is it possible for there to be more than one "child of output"? I assume we would only want to proceed if there is a single child.

quic-tirupath mentioned this pull request Oct 27, 2025

[QNN EP] Fuse Gelu pattern into a QNN Gelu Node #26332

Open

edgchen1 reviewed Oct 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QNN EP] Fuse Gelu pattern into a QNN Gelu Node #26417

[QNN EP] Fuse Gelu pattern into a QNN Gelu Node #26417

Uh oh!

quic-tirupath commented Oct 27, 2025

Uh oh!

quic-tirupath commented Oct 27, 2025

Uh oh!

quic-tirupath commented Oct 30, 2025

Uh oh!

edgchen1 Oct 30, 2025

Uh oh!

edgchen1 Oct 30, 2025

Uh oh!

edgchen1 Oct 30, 2025

Uh oh!

edgchen1 Oct 31, 2025

Uh oh!

edgchen1 Oct 31, 2025

Uh oh!

edgchen1 Oct 31, 2025

Uh oh!

edgchen1 Oct 31, 2025

Uh oh!

edgchen1 Oct 31, 2025

Uh oh!

edgchen1 Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		const logging::Logger& logger) {
		ORT_UNUSED_PARAMETER(logger);

	const logging::Logger& logger) {
	ORT_UNUSED_PARAMETER(logger);
	const logging::Logger& /logger/) {

[QNN EP] Fuse Gelu pattern into a QNN Gelu Node #26417

Are you sure you want to change the base?

[QNN EP] Fuse Gelu pattern into a QNN Gelu Node #26417

Uh oh!

Conversation

quic-tirupath commented Oct 27, 2025

Description

Motivation and Context

Uh oh!

quic-tirupath commented Oct 27, 2025

Uh oh!

quic-tirupath commented Oct 30, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants