Description
Hi,
I'm not sure where to start so I'm just posting here hoping that someone with more knowledge could help me out. I'm trying to run these notebooks on my system with 7900XTX and it is running very slow. The code that uses resnet26d seems ok but the code that uses convnext_small_in22k is very slow. I also tried convnext_small to make it use the model from torchvision but that seems to run just as slowly.
I first thought that ROCm pytorch has not optimized the model yet. But I found out that pytorch microbenchmark (https://github.com/ROCmSoftwarePlatform/pytorch-micro-benchmarking) actually shows 7900XTX running faster when using torchvision model.
(pt) root@rocm:~/pytorch-micro-benchmarking# python3 micro_benchmarking_pytorch.py --network convnext_small
INFO: running forward and backward for warmup.
INFO: running the benchmark..
OK: finished running benchmark..
--------------------SUMMARY--------------------------
Microbenchmark for network : convnext_small
Num devices: 1
Dtype: FP32
Mini batch size [img] : 64
Time per mini-batch : 2.088879442214966
Throughput [img/sec] : 30.638436429886497
Running the same test on my 3080ti gives
(pt) bsp2020@Ryzen5950X:~/pytorch-micro-benchmarking$ python3 micro_benchmarking_pytorch.py --network convnext_small
INFO: running forward and backward for warmup.
INFO: running the benchmark..
OK: finished running benchmark..
--------------------SUMMARY--------------------------
Microbenchmark for network : convnext_small
Num devices: 1
Dtype: FP32
Mini batch size [img] : 64
Time per mini-batch : 18.948059797286987
Throughput [img/sec] : 3.3776545295241056
Could anyone please help me figure out what is going on? Any help would be appreciated.