Investigation of inactive tests for GEMM kernels #7949

CNOCycle · 2025-03-05T12:23:56Z

Background

In the file generate-gemm-test.py, the parameter adj_k_block is set as follows:

XNNPACK/tools/generate-gemm-test.py

Line 773 in 0c7e6c7

"ADJKBLOCK": 2 * k_block if is_pipelined else k_block,

and the tests are grouped into two cases:

General cases, where k_block can be any arbitrary positive integer, and
The case where k_block > 1.

XNNPACK/tools/generate-gemm-test.py

Lines 287 to 297 in 0c7e6c7

    
                 if (k_block > 1) { 
        
                   gemm_tests.push_back(GemmTestParams( 
        
                       "k_div_" + kbs, 
        
                       tester.clone() 
        
                           .m(mr).n(nr) 
        
                           $if KERNELTYPE in ['qb4w', 'qc4w']: 
        
                             .b_zero_point(8) 
        
                           $if KERNELTYPE in ['qb4w']: 
        
                             .bl(32) 
        
                       , test_func, isa_check) 
        
                       .loop_k(adj_k_block + k_block, k_block * 5, k_block));

For the general cases, some tests configure the start and end of the loop over k as

XNNPACK/tools/generate-gemm-test.py

Line 261 in 0c7e6c7

.loop_k(adj_k_block + 1, adj_k_block * 2 - 1, k_block));

When k_block is set to 1 and is_pipelined is False, the loop becomes invalid because adj_k_block + 1 = 2 > adj_k_block * 2 - 1 = 1.

Furthermore, the condition bl <= tester.k() / 2 = int(1/2)= 0 in the loop, defined in the file gemm-microkernel-tester.cc, is also invalid in this case.

XNNPACK/test/gemm-microkernel-tester.cc

Line 70 in 0c7e6c7

for (size_t bl = params.loop_bl_.from; bl <= tester.k() / 2;

A review of the configurations in the existing *.yaml files shows that the case where k_block is set to 1 is a common configuration.

XNNPACK/test/f32-gemm-minmax.yaml

Lines 1396 to 1411 in 0c7e6c7

    
           # Scalar 
        
           - name: xnn_f32_gemm_minmax_ukernel_1x4__scalar 
        
             init: xnn_init_f32_minmax_scalar_params 
        
             pack: xnn_pack_f32_gemm_goi_w 
        
             k-block: 1 
        
           - name: xnn_f32_gemm_minmax_ukernel_2x4__scalar 
        
             init: xnn_init_f32_minmax_scalar_params 
        
             pack: xnn_pack_f32_gemm_goi_w 
        
             k-block: 1 
        
           - name: xnn_f32_gemm_minmax_ukernel_4x2__scalar 
        
             init: xnn_init_f32_minmax_scalar_params 
        
             pack: xnn_pack_f32_gemm_goi_w 
        
             k-block: 1 
        
           - name: xnn_f32_gemm_minmax_ukernel_4x4__scalar 
        
             init: xnn_init_f32_minmax_scalar_params 
        
             pack: xnn_pack_f32_gemm_goi_w

For the scalar version f32-gemm-minmax tests, 72 out of 80 tests are inactive. In the case of the scalar version f32-igemm-minmax tests, none of the tests are enabled.

This issue indicates that the unit test patterns have not been thoroughly reviewed.

Proposed Solution

Part 1: Adjusting Loop Conditions

To ensure that at least one iteration is executed, the terminal condition of the loops needs to be adjusted. Below are the proposed patches:

--- XNNPACK/tools/generate-gemm-test.py
+++ XNNPACK/tools/generate-gemm-test.py
+ // (6 tests in this file should be refined)
-.loop_k(adj_k_block + 1, adj_k_block * 2 - 1, k_block)
+.loop_k(adj_k_block + 1,
+    (adj_k_block + 1 > adj_k_block * 2 - 1) ? adj_k_block + 1 : adj_k_block * 2 - 1,
+    k_block)

--- XNNPACK/test/gemm-microkernel-tester.cc
+++ XNNPACK/test/gemm-microkernel-tester.cc
-for (size_t bl = params.loop_bl_.from; bl <= tester.k() / 2;
     bl = params.loop_bl_.next(bl)) {
+for (size_t bl = params.loop_bl_.from; bl <= ((tester.k() / 2 > 1) ? tester.k() / 2 : 1);
     bl = params.loop_bl_.next(bl)) {

There are two possible implementations:

(a) The ending value must be greater than or equal to 1.

(b) The ending value must be greater than or equal to the starting value.

Personally, I prefer the second solution, as it offers a more general approach for arbitrary configurations, and not just for the case where k_block = 1. Both implementations can be easily integrated into the templates defined in generate-gemm-test.py. However, the TEST_P macro in gemm-microkernel-tester.cc introduces a more complex condition: round_up_po2(k, params.loop_bl_.step) % bl != 0, which filters out some invalid cases. This suggests that some configurations may result in no unit tests being executed. To address this issue, the second part is proposed.

It’s important to note that this change could affect all GEMM and iGEMM kernels, including f(16|32), q(s|u)8, x86-64 (scalar, SSE, AVX), ARM (Neon), and WASM.

Part 2: Guard Against Invalid Configurations

A guard should be added in gemm-microkernel-tester.cc to detect and handle invalid configurations specified in the .yaml file. Ideally, tests should only be skipped if explicitly marked, and allowing no tests to run in such cases is an undesirable design choice. The following patch is proposed:

TEST_P(GemmTest, Test) {
  // Ensure that we can execute this test.
  if (params.isa_check) {
    params.isa_check();
    if (IsSkipped()) {
      return;
    }
  }

+  bool skipall = true;

  for (size_t k = params.loop_k_.from; k <= params.loop_k_.to;
+    // skip ...
    for (size_t m = params.loop_m_.from; m <= params.loop_m_.to;
         m = params.loop_m_.next(m)) {
+      // skip ...
      for (size_t n = params.loop_n_.from; n <= params.loop_n_.to;
           n = params.loop_n_.next(n)) {
+        // skip ...
        for (size_t zi = params.loop_zi_.from; zi <= params.loop_zi_.to;
             zi = params.loop_zi_.next(zi)) {
+          // skip ...
          for (size_t bzp = params.loop_bzp_.from; bzp <= params.loop_bzp_.to;
               bzp = params.loop_bzp_.next(bzp)) {
+            // skip ...
            for (size_t bl = params.loop_bl_.from; bl <= ((tester.k() / 2 > 1) ? tester.k() / 2 : 1);
               bl = params.loop_bl_.next(bl)) {
               if (params.loop_bl_.is_set) {
                // Require block size to divide (padded) column size.
                if (round_up_po2(k, params.loop_bl_.step) % bl != 0) {
+                  // some configurations would be skipped.
                  continue;
                }
                tester.bl(bl);
              }

              // Call the test function.
              params.test_func(tester);
+              skipall = false;
            }
          }
        }
      }
    }
  }

+  ASSERT_EQ(skipall, false);
}

(Note: + // skip ... indicates sections of code omitted for brevity.)

This part has a minimal impact and should not significantly affect the overall performance or test results.

Part 3: Updating `k_block` Configuration

Lastly, k_block should be set to a value greater than 1 to address case (2), where k_block > 1. However, this will result in an increased time to complete all CI tests.

On the other hand, the merged PR #6310 reduces the value of k_block to achieve a more reasonable time for completing all CI tests.

Conclusion

The three proposed solutions are not mutually exclusive. However, any modifications made could affect gemm and igemm implementation across various types (f16-, f32-) and architecure (scalar, neon), and may increase the time required to complete all CI tests. Therefore, it is important to carefully consider the potential side effects before proceeding.

The text was updated successfully, but these errors were encountered:

dsharlet · 2025-03-05T18:49:17Z

Thanks for the investigation. For reference, #7328 is an existing change that attempted to fix what I believe is the same issue.

CNOCycle · 2025-03-06T00:37:11Z

Thank you for your kind reminder regarding the PR that has been proposed to address this issue. However, I would like to point out that the changes in the PR do not fully resolve the problem.

As noted in the background, the loop over k remains invalid due to the contradictory terminal condition (adj_k_block + 1 = 2 > adj_k_block * 2 - 1 = 1), and the nested loop in the file gemm-microkernel-tester.cc will terminate immediately. It is essential to include additional patches in the file generate-gemm-test.py to fully address the issue. Additionally, all relevant test/{arch}-(gemm|igemm)-*.cc should be re-generated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigation of inactive tests for GEMM kernels #7949

Investigation of inactive tests for GEMM kernels #7949

CNOCycle commented Mar 5, 2025

dsharlet commented Mar 5, 2025

CNOCycle commented Mar 6, 2025

Investigation of inactive tests for GEMM kernels #7949

Investigation of inactive tests for GEMM kernels #7949

Comments

CNOCycle commented Mar 5, 2025

Background

Proposed Solution

Part 1: Adjusting Loop Conditions

Part 2: Guard Against Invalid Configurations

Part 3: Updating k_block Configuration

Conclusion

dsharlet commented Mar 5, 2025

CNOCycle commented Mar 6, 2025

Part 3: Updating `k_block` Configuration