Skip to content

GPU and CPU measurements for ML inference workloads for power, latency and throughput

Notifications You must be signed in to change notification settings

Aayush-Ankit/ml-inference-benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Running the workloads

Steps to set up dependancies

Rerun torch just to make sure your version's the latest one

  1. git clone [email protected]:Aayush-Ankit/isca_workloads.git
  2. luarocks install torch
    Note: luarocks intall may have an issue with systems seating behind a proxy, so users have to change the git config file to use https:// than git://, so please follow this instructions.
    [url "https://"]
    insteadOf = git://
    Add above statements to your repository gitconfig file "e.g: vim ~/.gitconfig " For more information refer to this link --> https://github.com/luarocks/luarocks/wiki/LuaRocks-through-a-proxy
  3. luarocks install nn
  4. luarocks install dpnn
  5. luarocks install torchx

To use with CUDA

  1. luarocks install cutorch
  2. luarocks install cunn
  3. luarocks install cunnx

Install RNN dependancy (allows using sequencers)

  1. cd rnn
  2. luarocks make rocks/rnn-scm-1.rockspec
    Note: the above command may break due to no proper rnn directory CMakeList cleanup. If that occurs please delete rnn and reclone the directory and run it again.
    2.a rm rnn
    2.b git clone [email protected]:Element-Research/rnn.git
    2.c Repeat steps 1 and 2 again

Yay! Setup's Done!!!

Running a benchmark on CPU/GPU

Some info. about the benchmarks

  1. wlm_bigLSTM - bigLSTM network for word-level language modelling (Google 1B dataset)
  2. wlm_anotherLSTM - another deep LSTM network for word-level language modelling (Google 1B dataset)
  3. nmt_l5 - Google Machine Tranalation for English-French (WMT15 dataset)
  4. nmt_l3 - Google Machine Tranalation for English-French (WMT15 dataset)

th <.lua> -gpu <0/1> -threads <non-zero> -batch <non-zero>

cmdline options:

  1. gpu > use 0 for CPU run, 1 for GPU run (default is CPU)
  2. threads > useful for CPU runs, can increase to evaluate CPU performance (default is 1)
  3. batch > can be varied to find the CPU, GPU numbers (inference time, power) variation. FOr GPU, can increase batchsize until torch throws THCudaCheck: out of memory error

Metrics of Interest which are printed

  1. Number of parameters in the network
  2. Inference time on (CPU/GPU) NOTE: For, CPU inference time, run it twice (and use the 2nd value) to make sure the data movement cost from HDD isn't included
  3. The <>pow.txt file shows gpu power consumption

About

GPU and CPU measurements for ML inference workloads for power, latency and throughput

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published