Problem with GEMM benchmark results for NVIDIA Volta #104

wdj · 2018-07-03T19:38:03Z

Your NVIDIA gemm benchmark appears to have a problem. gemm_bench.cu uses uint16_t, an integer type, instead of __half to represent half precision floating point numbers. As a result, rand() in tensor.h fills the matrices A and B with random floating point numbers between 0 and 1 that are converted to integers -- therefore most of the entries are zeros rather than fully random floating point numbers. This results in unrepresentative benchmark timings for Volta GPUs that have power/frequency throttling enabled -- computing on zeros takes much less power than computing on random numbers -- I've confirmed this with nvidia-smi using your benchmark. For your gemm benchmark I've measured performance reported up to ~15% higher due to computing on zeros, an unrepresentative use case, compared to computing on realistic, nonzero inputs.

The fix seems to be replacing uint16_t with __half in the code.

Thank you for your assistance.

WilliamTambellini · 2018-12-21T19:45:39Z

Thank you @wdj
Fixed in my PR :
#110

WilliamTambellini mentioned this issue Dec 21, 2018

Update nvidia gemm_bench.cu for mixed precision f16 to f32 #110

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with GEMM benchmark results for NVIDIA Volta #104

Problem with GEMM benchmark results for NVIDIA Volta #104

wdj commented Jul 3, 2018

WilliamTambellini commented Dec 21, 2018

Problem with GEMM benchmark results for NVIDIA Volta #104

Problem with GEMM benchmark results for NVIDIA Volta #104

Comments

wdj commented Jul 3, 2018

WilliamTambellini commented Dec 21, 2018