You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm trying to reproduce nimble's experimental results. However, I found that the effect of multi-stream has little effect on the inference latency, but the paper says that it can be up to 1.8×, maybe I have something wrong, I hope you can give me some advice.
I successfully installed nimble in docker:
GPU: 2080s with 8G global memory
Ubuntu 18.04.6 LTS
# inception_v3 [1, 3, 299, 299]
mean (ms) stdev (ms)
pytorch 8.212887 0.211187
mean (ms) stdev (ms)
nimble 2.24783 0.003427
mean (ms) stdev (ms)
nimble-multi 2.31407 0.009554
# inception_v3 [8, 3, 299, 299]
mean (ms) stdev (ms)
pytorch 25.678553 0.287919
mean (ms) stdev (ms)
nimble 17.354554 0.065831
mean (ms) stdev (ms)
nimble-multi 16.428471 0.104019
# densenet201 [1, 3, 224, 224]
mean (ms) stdev (ms)
pytorch 29.020667 0.231637
mean (ms) stdev (ms)
nimble 5.537937 0.004089
mean (ms) stdev (ms)
nimble-multi 5.572467 0.004977
# densenet201 [8, 3, 224, 224]
mean (ms) stdev (ms)
pytorch 31.046828 0.164185
mean (ms) stdev (ms)
nimble 24.178936 0.032238
mean (ms) stdev (ms)
nimble-multi 24.125336 0.060498
# mnasnet0_5 [1, 3, 224, 224]
mean (ms) stdev (ms)
pytorch 4.477023 0.025759
mean (ms) stdev (ms)
nimble 0.565598 0.002112
mean (ms) stdev (ms)
nimble-multi 5.572467 0.004977
# mnasnet0_75 [1, 3, 224, 224]
mean (ms) stdev (ms)
pytorch 4.557251 0.037832
mean (ms) stdev (ms)
nimble 0.68727 0.002274
mean (ms) stdev (ms)
nimble-multi 0.679038 0.002025
# mnasnet1_3 [1, 3, 224, 224]
mean (ms) stdev (ms)
pytorch 4.780402 0.02905
mean (ms) stdev (ms)
nimble 0.950962 0.00627
mean (ms) stdev (ms)
nimble-multi 0.893742 0.06838
# mnasnet1_3 [8, 3, 224, 224]
mean (ms) stdev (ms)
pytorch 6.076544 0.567386
mean (ms) stdev (ms)
nimble 4.953977 0.023374
mean (ms) stdev (ms)
nimble-multi 4.976105 0.025923
The text was updated successfully, but these errors were encountered:
Hi, I'm trying to reproduce nimble's experimental results. However, I found that the effect of multi-stream has little effect on the inference latency, but the paper says that it can be up to 1.8×, maybe I have something wrong, I hope you can give me some advice.
I successfully installed nimble in docker:
GPU: 2080s with 8G global memory
Ubuntu 18.04.6 LTS
The text was updated successfully, but these errors were encountered: