Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible heap corruption in qs8-dwconv-bench with primary_tile=25 #7657

Open
ken-unger opened this issue Jan 8, 2025 · 1 comment
Open

Comments

@ken-unger
Copy link
Contributor

While implementing #7638 and attempting to run qs8-dwconv-bench with xnn_qs8_dwconv_minmax_fp32_ukernel_25p8vc the benchmark encounters a malloc error after several tests. Running with 9p8vc is fine.

I noticed that the current qs8-dwconv-bench only uses primary_tile = 9 for its scalar benchmarks. Adding a benchmark test to include the scalar primary_tile=25 kernel results in the same apparent heap corruption.

Add to bench/qs8-dwconv.cc

static void qs8_dwconv_25p4c__scalar_lrintf(benchmark::State& state, const char* net) {
  DWConvBenchmark(state,
    xnn_qs8_dwconv_minmax_fp32_ukernel_25p4c__scalar_lrintf,
    xnn_init_qs8_conv_minmax_fp32_scalar_params,
    4 /* channel tile */, 25 /* primary tile */);
}

BENCHMARK_DWCONV(qs8_dwconv_25p4c__scalar_lrintf);

Test result.

./qs8-dwconv-bench --benchmark_filter=qs8_dwconv_25p4c__scalar_lrintf
2025-01-07T18:34:14-08:00
Running ./qs8-dwconv-bench
Run on (8 X 1600 MHz CPU s)
CPU Caches:
  L1 Instruction 32 KiB (x8)
  L1 Data 32 KiB (x8)
  L2 Unified 512 KiB (x2)
Load Average: 2.04, 1.65, 0.80
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                                          Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
qs8_dwconv_25p4c__scalar_lrintf/mobilenet_v1/H:112/W:112/KH:3/KW:3/PH:2/PW:2/S:1/D:1/G:32/real_time         45615502 ns     45204284 ns           15 OPS=158.397M/s bytes=17.6088M/s cpufreq=1.6G
malloc(): invalid size (unsorted)
Aborted

I'm not clear if the test case is invalid here, or if there is a bug within DWConvBenchmark.

@fbarchard
Copy link
Collaborator

I think this is a case of invalid parameters for 5x5.
In practice 5x5 is used by mobilenet v3, while 3x3 is used in mobilenet v2.

So you could try mobilenet v3
Taking a quick look at the current
models/benchmark --benchmark_filter=V3
FP32MobileNetV3Large/real_time 5602 us 5601 us 125 cpufreq=3.3723G
FP32MobileNetV3Small/real_time 1722 us 1722 us 405 cpufreq=3.30428G
FP16MobileNetV3Large/real_time 14207 us 14200 us 49 cpufreq=3.4632G
FP16MobileNetV3Small/real_time 4880 us 4879 us 146 cpufreq=3.5469G
The QS8 model is missing.
The old end2end had it if you dig up old versions.
TFLite benchmark_model can do a .tflite file if you can get a mobilenet v3 model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants