You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When k_block is set to 1 and is_pipelined is False, the loop becomes invalid because adj_k_block + 1 = 2 > adj_k_block * 2 - 1 = 1.
Furthermore, the condition bl <= tester.k() / 2 = int(1/2)= 0 in the loop, defined in the file gemm-microkernel-tester.cc, is also invalid in this case.
For the scalar version f32-gemm-minmax tests, 72 out of 80 tests are inactive. In the case of the scalar version f32-igemm-minmax tests, none of the tests are enabled.
This issue indicates that the unit test patterns have not been thoroughly reviewed.
Proposed Solution
Part 1: Adjusting Loop Conditions
To ensure that at least one iteration is executed, the terminal condition of the loops needs to be adjusted. Below are the proposed patches:
(a) The ending value must be greater than or equal to 1.
(b) The ending value must be greater than or equal to the starting value.
Personally, I prefer the second solution, as it offers a more general approach for arbitrary configurations, and not just for the case where k_block = 1. Both implementations can be easily integrated into the templates defined in generate-gemm-test.py. However, the TEST_P macro in gemm-microkernel-tester.cc introduces a more complex condition: round_up_po2(k, params.loop_bl_.step) % bl != 0, which filters out some invalid cases. This suggests that some configurations may result in no unit tests being executed. To address this issue, the second part is proposed.
It’s important to note that this change could affect all GEMM and iGEMM kernels, including f(16|32), q(s|u)8, x86-64 (scalar, SSE, AVX), ARM (Neon), and WASM.
Part 2: Guard Against Invalid Configurations
A guard should be added in gemm-microkernel-tester.cc to detect and handle invalid configurations specified in the .yaml file. Ideally, tests should only be skipped if explicitly marked, and allowing no tests to run in such cases is an undesirable design choice. The following patch is proposed:
TEST_P(GemmTest, Test) {
// Ensure that we can execute this test.
if (params.isa_check) {
params.isa_check();
if (IsSkipped()) {
return;
}
}
+ bool skipall = true;
for (size_t k = params.loop_k_.from; k <= params.loop_k_.to;
+ // skip ...
for (size_t m = params.loop_m_.from; m <= params.loop_m_.to;
m = params.loop_m_.next(m)) {
+ // skip ...
for (size_t n = params.loop_n_.from; n <= params.loop_n_.to;
n = params.loop_n_.next(n)) {
+ // skip ...
for (size_t zi = params.loop_zi_.from; zi <= params.loop_zi_.to;
zi = params.loop_zi_.next(zi)) {
+ // skip ...
for (size_t bzp = params.loop_bzp_.from; bzp <= params.loop_bzp_.to;
bzp = params.loop_bzp_.next(bzp)) {
+ // skip ...
for (size_t bl = params.loop_bl_.from; bl <= ((tester.k() / 2 > 1) ? tester.k() / 2 : 1);
bl = params.loop_bl_.next(bl)) {
if (params.loop_bl_.is_set) {
// Require block size to divide (padded) column size.
if (round_up_po2(k, params.loop_bl_.step) % bl != 0) {
+ // some configurations would be skipped.
continue;
}
tester.bl(bl);
}
// Call the test function.
params.test_func(tester);
+ skipall = false;
}
}
}
}
}
}
+ ASSERT_EQ(skipall, false);
}
(Note: + // skip ... indicates sections of code omitted for brevity.)
This part has a minimal impact and should not significantly affect the overall performance or test results.
Part 3: Updating k_block Configuration
Lastly, k_block should be set to a value greater than 1 to address case (2), where k_block > 1. However, this will result in an increased time to complete all CI tests.
On the other hand, the merged PR #6310 reduces the value of k_block to achieve a more reasonable time for completing all CI tests.
Conclusion
The three proposed solutions are not mutually exclusive. However, any modifications made could affect gemm and igemm implementation across various types (f16-, f32-) and architecure (scalar, neon), and may increase the time required to complete all CI tests. Therefore, it is important to carefully consider the potential side effects before proceeding.
The text was updated successfully, but these errors were encountered:
Thank you for your kind reminder regarding the PR that has been proposed to address this issue. However, I would like to point out that the changes in the PR do not fully resolve the problem.
As noted in the background, the loop over k remains invalid due to the contradictory terminal condition (adj_k_block + 1 = 2 > adj_k_block * 2 - 1 = 1), and the nested loop in the file gemm-microkernel-tester.cc will terminate immediately. It is essential to include additional patches in the file generate-gemm-test.py to fully address the issue. Additionally, all relevant test/{arch}-(gemm|igemm)-*.cc should be re-generated.
Background
In the file
generate-gemm-test.py
, the parameteradj_k_block
is set as follows:XNNPACK/tools/generate-gemm-test.py
Line 773 in 0c7e6c7
k_block
can be any arbitrary positive integer, andk_block > 1
.XNNPACK/tools/generate-gemm-test.py
Lines 287 to 297 in 0c7e6c7
For the general cases, some tests configure the start and end of the loop over
k
asXNNPACK/tools/generate-gemm-test.py
Line 261 in 0c7e6c7
When
k_block
is set to1
andis_pipelined
isFalse
, the loop becomes invalid becauseadj_k_block + 1 = 2 > adj_k_block * 2 - 1 = 1
.Furthermore, the condition
bl <= tester.k() / 2 = int(1/2)= 0
in the loop, defined in the filegemm-microkernel-tester.cc
, is also invalid in this case.XNNPACK/test/gemm-microkernel-tester.cc
Line 70 in 0c7e6c7
A review of the configurations in the existing
*.yaml
files shows that the case wherek_block
is set to1
is a common configuration.XNNPACK/test/f32-gemm-minmax.yaml
Lines 1396 to 1411 in 0c7e6c7
For the scalar version f32-gemm-minmax tests, 72 out of 80 tests are inactive. In the case of the scalar version f32-igemm-minmax tests, none of the tests are enabled.
This issue indicates that the unit test patterns have not been thoroughly reviewed.
Proposed Solution
Part 1: Adjusting Loop Conditions
To ensure that at least one iteration is executed, the terminal condition of the loops needs to be adjusted. Below are the proposed patches:
There are two possible implementations:
(a) The ending value must be greater than or equal to 1.
(b) The ending value must be greater than or equal to the starting value.
Personally, I prefer the second solution, as it offers a more general approach for arbitrary configurations, and not just for the case where
k_block = 1
. Both implementations can be easily integrated into the templates defined ingenerate-gemm-test.py
. However, theTEST_P
macro ingemm-microkernel-tester.cc
introduces a more complex condition:round_up_po2(k, params.loop_bl_.step) % bl != 0
, which filters out some invalid cases. This suggests that some configurations may result in no unit tests being executed. To address this issue, the second part is proposed.It’s important to note that this change could affect all GEMM and iGEMM kernels, including f(16|32), q(s|u)8, x86-64 (scalar, SSE, AVX), ARM (Neon), and WASM.
Part 2: Guard Against Invalid Configurations
A guard should be added in
gemm-microkernel-tester.cc
to detect and handle invalid configurations specified in the.yaml
file. Ideally, tests should only be skipped if explicitly marked, and allowing no tests to run in such cases is an undesirable design choice. The following patch is proposed:(Note:
+ // skip ...
indicates sections of code omitted for brevity.)This part has a minimal impact and should not significantly affect the overall performance or test results.
Part 3: Updating
k_block
ConfigurationLastly,
k_block
should be set to a value greater than1
to address case (2), wherek_block > 1
. However, this will result in an increased time to complete all CI tests.On the other hand, the merged PR #6310 reduces the value of
k_block
to achieve a more reasonable time for completing all CI tests.Conclusion
The three proposed solutions are not mutually exclusive. However, any modifications made could affect gemm and igemm implementation across various types (f16-, f32-) and architecure (scalar, neon), and may increase the time required to complete all CI tests. Therefore, it is important to carefully consider the potential side effects before proceeding.
The text was updated successfully, but these errors were encountered: