Skip to content

bjmsong/hands-on-gemm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Testbed

  • RTX 4090
  • CUDA 12.1
  • CUTLASS 3.4.1
  • cuBLAS 12.01
  • Warm up : 100 times
  • Execution : 100 times
  • DataType: fp32 + fp16

Performance

About

Matrix Multiplication Optimization on GPU/CPU

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published