[operator] add silu_and_mul C++ wrapper #982

qianwan · 2025-09-22T14:01:06Z

PR Category

Type of Change

Description

Issue

Progress

Change is properly reviewed (1 reviewer required, 2 recommended).
Change is responded to an issue.
Change is fully covered by a UT.

Performance

gemini-code-assist · 2025-09-22T14:01:23Z

Summary of Changes

Hello @qianwan, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates a new silu_and_mul operator into the flag_gems library, providing a C++ wrapper for the operation. This operator leverages a custom Triton kernel for optimized performance on GPUs. The changes include the necessary C++ implementation, header declarations, build system updates, and a comprehensive suite of unit tests to validate its functionality across different data types and broadcasting scenarios. The Python interface has also been updated to seamlessly use this new C++ backend.

Highlights

New Operator Implementation: Introduced a new silu_and_mul operator with a C++ wrapper for efficient execution.
Triton Kernel Integration: The silu_and_mul operator is backed by a new Triton kernel, silu_and_mul_kernel, for GPU acceleration.
Comprehensive Unit Tests: Added extensive unit tests for silu_and_mul covering various data types (Float16, BFloat16, Float32), the _out variant, and broadcasting capabilities, ensuring correctness and robustness.
Python Frontend Update: The Python frontend for gems_silu_and_mul in src/flag_gems/modules/activation.py has been updated to conditionally utilize the new C++ extension when available, improving performance.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request successfully adds a C++ wrapper for the silu_and_mul operator, including the Triton kernel, C++ implementation, tests, and Python bindings. The code is well-structured and the changes are logical. My review includes a couple of suggestions for improvement. One is an efficiency improvement in the C++ _out variant to avoid an unnecessary memory copy. The other is a small refactoring in the test code to improve clarity by using torch::silu directly. Overall, this is a solid contribution.

gemini-code-assist · 2025-09-22T14:03:05Z

ctests/test_triton_silu_and_mul.cpp

+  auto result_dtype = torch::result_type(x, y);
+  auto x_fp32 = x.to(torch::kFloat32);
+  auto y_fp32 = y.to(torch::kFloat32);
+  auto silu = torch::sigmoid(x_fp32) * x_fp32;


For clarity and maintainability, it's better to use torch::silu directly in your reference computation instead of manually implementing it with torch::sigmoid(x_fp32) * x_fp32. This makes the code more readable and directly expresses the intent.

Suggested change

auto silu = torch::sigmoid(x_fp32) * x_fp32;

auto silu = torch::silu(x_fp32);

As Gemini suggest, use torch::silu in the reference implementation.

gemini-code-assist · 2025-09-22T14:03:05Z

lib/silu_and_mul.cpp

+  at::Tensor out_contig;
+  if (out.is_contiguous()) {
+    out_contig = out;
+  } else {
+    out_contig = out.contiguous();
+  }
+  launch_silu_and_mul(x, y, out_contig);
+  if (!out.is_contiguous()) {
+    out.copy_(out_contig);
+  }


The current implementation for handling non-contiguous output tensors in silu_and_mul_out involves a potentially expensive data copy. When out is not contiguous, out.contiguous() is called, which creates a new tensor and copies the contents of out. Since out is an output tensor, its initial contents are not needed, making this copy unnecessary.

You can improve efficiency by creating an empty tensor with the same properties as out instead. This avoids the unnecessary copy.

auto out_contig = out.is_contiguous() ? out : at::empty_like(out); launch_silu_and_mul(x, y, out_contig); if (!out.is_contiguous()) { out.copy_(out_contig); }

The same idea with Gemini.

iclementine · 2025-09-23T06:37:03Z

lib/silu_and_mul.cpp

+  if (out.is_contiguous()) {
+    out_contig = out;
+  } else {
+    out_contig = out.contiguous();


There is no need to permute non-contiguous out to get a contiguous out_contig. Just create a contiguous Tensor is fine. Since it is written to, not read from.

iclementine · 2025-09-23T06:38:56Z

src/flag_gems/modules/activation.py

+        and hasattr(torch.ops, "flag_gems")
+        and hasattr(torch.ops.flag_gems, "silu_and_mul")
+        and hasattr(torch.ops.flag_gems, "silu_and_mul_out")
+    )


Skip this and just use the global flag. The condition to use C++ implementation is that:

has_c_extension The library is built with c extenasion; and

use_c_extension. Controlled by an environment variable.

[operator] add silu_and_mul C++ wrapper

81ef1a0

gemini-code-assist bot reviewed Sep 22, 2025

View reviewed changes

iclementine reviewed Sep 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[operator] add silu_and_mul C++ wrapper #982

[operator] add silu_and_mul C++ wrapper #982

Uh oh!

qianwan commented Sep 22, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Sep 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 22, 2025

Uh oh!

iclementine Sep 23, 2025

Uh oh!

gemini-code-assist bot Sep 22, 2025

Uh oh!

iclementine Sep 23, 2025

Uh oh!

iclementine Sep 23, 2025

Uh oh!

iclementine Sep 23, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	auto silu = torch::sigmoid(x_fp32) * x_fp32;
	auto silu = torch::silu(x_fp32);

[operator] add silu_and_mul C++ wrapper #982

Are you sure you want to change the base?

[operator] add silu_and_mul C++ wrapper #982

Uh oh!

Conversation

qianwan commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Category

Type of Change

Description

Issue

Progress

Performance

Uh oh!

gemini-code-assist bot commented Sep 22, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

iclementine Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

iclementine Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

iclementine Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

iclementine Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qianwan commented Sep 22, 2025 •

edited

Loading

iclementine Sep 23, 2025 •

edited

Loading