Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigation of inactive tests for GEMM kernels #7949

Open
CNOCycle opened this issue Mar 5, 2025 · 2 comments
Open

Investigation of inactive tests for GEMM kernels #7949

CNOCycle opened this issue Mar 5, 2025 · 2 comments

Comments

@CNOCycle
Copy link

CNOCycle commented Mar 5, 2025

Background

In the file generate-gemm-test.py, the parameter adj_k_block is set as follows:

"ADJKBLOCK": 2 * k_block if is_pipelined else k_block,
and the tests are grouped into two cases:

  1. General cases, where k_block can be any arbitrary positive integer, and
  2. The case where k_block > 1.

if (k_block > 1) {
gemm_tests.push_back(GemmTestParams(
"k_div_" + kbs,
tester.clone()
.m(mr).n(nr)
$if KERNELTYPE in ['qb4w', 'qc4w']:
.b_zero_point(8)
$if KERNELTYPE in ['qb4w']:
.bl(32)
, test_func, isa_check)
.loop_k(adj_k_block + k_block, k_block * 5, k_block));

For the general cases, some tests configure the start and end of the loop over k as

.loop_k(adj_k_block + 1, adj_k_block * 2 - 1, k_block));

When k_block is set to 1 and is_pipelined is False, the loop becomes invalid because adj_k_block + 1 = 2 > adj_k_block * 2 - 1 = 1.

Furthermore, the condition bl <= tester.k() / 2 = int(1/2)= 0 in the loop, defined in the file gemm-microkernel-tester.cc, is also invalid in this case.

for (size_t bl = params.loop_bl_.from; bl <= tester.k() / 2;

A review of the configurations in the existing *.yaml files shows that the case where k_block is set to 1 is a common configuration.

# Scalar
- name: xnn_f32_gemm_minmax_ukernel_1x4__scalar
init: xnn_init_f32_minmax_scalar_params
pack: xnn_pack_f32_gemm_goi_w
k-block: 1
- name: xnn_f32_gemm_minmax_ukernel_2x4__scalar
init: xnn_init_f32_minmax_scalar_params
pack: xnn_pack_f32_gemm_goi_w
k-block: 1
- name: xnn_f32_gemm_minmax_ukernel_4x2__scalar
init: xnn_init_f32_minmax_scalar_params
pack: xnn_pack_f32_gemm_goi_w
k-block: 1
- name: xnn_f32_gemm_minmax_ukernel_4x4__scalar
init: xnn_init_f32_minmax_scalar_params
pack: xnn_pack_f32_gemm_goi_w

For the scalar version f32-gemm-minmax tests, 72 out of 80 tests are inactive. In the case of the scalar version f32-igemm-minmax tests, none of the tests are enabled.

This issue indicates that the unit test patterns have not been thoroughly reviewed.

Proposed Solution

Part 1: Adjusting Loop Conditions

To ensure that at least one iteration is executed, the terminal condition of the loops needs to be adjusted. Below are the proposed patches:

--- XNNPACK/tools/generate-gemm-test.py
+++ XNNPACK/tools/generate-gemm-test.py
+ // (6 tests in this file should be refined)
-.loop_k(adj_k_block + 1, adj_k_block * 2 - 1, k_block)
+.loop_k(adj_k_block + 1,
+    (adj_k_block + 1 > adj_k_block * 2 - 1) ? adj_k_block + 1 : adj_k_block * 2 - 1,
+    k_block)

--- XNNPACK/test/gemm-microkernel-tester.cc
+++ XNNPACK/test/gemm-microkernel-tester.cc
-for (size_t bl = params.loop_bl_.from; bl <= tester.k() / 2;
     bl = params.loop_bl_.next(bl)) {
+for (size_t bl = params.loop_bl_.from; bl <= ((tester.k() / 2 > 1) ? tester.k() / 2 : 1);
     bl = params.loop_bl_.next(bl)) {

There are two possible implementations:

(a) The ending value must be greater than or equal to 1.

(b) The ending value must be greater than or equal to the starting value.

Personally, I prefer the second solution, as it offers a more general approach for arbitrary configurations, and not just for the case where k_block = 1. Both implementations can be easily integrated into the templates defined in generate-gemm-test.py. However, the TEST_P macro in gemm-microkernel-tester.cc introduces a more complex condition: round_up_po2(k, params.loop_bl_.step) % bl != 0, which filters out some invalid cases. This suggests that some configurations may result in no unit tests being executed. To address this issue, the second part is proposed.

It’s important to note that this change could affect all GEMM and iGEMM kernels, including f(16|32), q(s|u)8, x86-64 (scalar, SSE, AVX), ARM (Neon), and WASM.

Part 2: Guard Against Invalid Configurations

A guard should be added in gemm-microkernel-tester.cc to detect and handle invalid configurations specified in the .yaml file. Ideally, tests should only be skipped if explicitly marked, and allowing no tests to run in such cases is an undesirable design choice. The following patch is proposed:

TEST_P(GemmTest, Test) {
  // Ensure that we can execute this test.
  if (params.isa_check) {
    params.isa_check();
    if (IsSkipped()) {
      return;
    }
  }

+  bool skipall = true;

  for (size_t k = params.loop_k_.from; k <= params.loop_k_.to;
+    // skip ...
    for (size_t m = params.loop_m_.from; m <= params.loop_m_.to;
         m = params.loop_m_.next(m)) {
+      // skip ...
      for (size_t n = params.loop_n_.from; n <= params.loop_n_.to;
           n = params.loop_n_.next(n)) {
+        // skip ...
        for (size_t zi = params.loop_zi_.from; zi <= params.loop_zi_.to;
             zi = params.loop_zi_.next(zi)) {
+          // skip ...
          for (size_t bzp = params.loop_bzp_.from; bzp <= params.loop_bzp_.to;
               bzp = params.loop_bzp_.next(bzp)) {
+            // skip ...
            for (size_t bl = params.loop_bl_.from; bl <= ((tester.k() / 2 > 1) ? tester.k() / 2 : 1);
               bl = params.loop_bl_.next(bl)) {
               if (params.loop_bl_.is_set) {
                // Require block size to divide (padded) column size.
                if (round_up_po2(k, params.loop_bl_.step) % bl != 0) {
+                  // some configurations would be skipped.
                  continue;
                }
                tester.bl(bl);
              }

              // Call the test function.
              params.test_func(tester);
+              skipall = false;
            }
          }
        }
      }
    }
  }

+  ASSERT_EQ(skipall, false);
}

(Note: + // skip ... indicates sections of code omitted for brevity.)

This part has a minimal impact and should not significantly affect the overall performance or test results.

Part 3: Updating k_block Configuration

Lastly, k_block should be set to a value greater than 1 to address case (2), where k_block > 1. However, this will result in an increased time to complete all CI tests.

On the other hand, the merged PR #6310 reduces the value of k_block to achieve a more reasonable time for completing all CI tests.

Conclusion

The three proposed solutions are not mutually exclusive. However, any modifications made could affect gemm and igemm implementation across various types (f16-, f32-) and architecure (scalar, neon), and may increase the time required to complete all CI tests. Therefore, it is important to carefully consider the potential side effects before proceeding.

@dsharlet
Copy link
Collaborator

dsharlet commented Mar 5, 2025

Thanks for the investigation. For reference, #7328 is an existing change that attempted to fix what I believe is the same issue.

@CNOCycle
Copy link
Author

CNOCycle commented Mar 6, 2025

Thank you for your kind reminder regarding the PR that has been proposed to address this issue. However, I would like to point out that the changes in the PR do not fully resolve the problem.

As noted in the background, the loop over k remains invalid due to the contradictory terminal condition (adj_k_block + 1 = 2 > adj_k_block * 2 - 1 = 1), and the nested loop in the file gemm-microkernel-tester.cc will terminate immediately. It is essential to include additional patches in the file generate-gemm-test.py to fully address the issue. Additionally, all relevant test/{arch}-(gemm|igemm)-*.cc should be re-generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants