Skip to content
This repository has been archived by the owner on Mar 22, 2019. It is now read-only.

hiprngGenerateUniform fails on AMD platform for certain conditions #4

Open
AJcodes opened this issue May 31, 2017 · 1 comment
Open

Comments

@AJcodes
Copy link

AJcodes commented May 31, 2017

I've run into an issue running hiprngGenerateUniform on an AMD platform recently, where hipMalloc fails. The same issue does not seem to occur on a Nvidia platform. The generator that I'm using is MRG32k3a, and unfortunately, reducing the problem size is not an option. Below is a short HIP trace of the error on an AMD platform.

<<hip-api tid:1.130 hipMalloc (0x7ffd82b04f28, 5435817984) hip-api tid:1.130 hipMalloc ret=1002 (hipErrorMemoryAllocation)>> <<hip-api tid:1.131 hipMemcpy (0, 0x7fe64b2ad010, 5435817984, hipMemcpyHostToDevice)

@ex-rzr
Copy link

ex-rzr commented May 31, 2017

The current implementation of hiprngGenerate* functions create num streams for num numbers. When num is really big it takes too much time and memory (host and device) to create streams.

https://github.com/ROCmSoftwarePlatform/hcRNG/blob/master/lib/src/hcc_detail/hiprng.cpp#L152

Here is our quick hack:

  #define GenerateUniform(gt)\
  const size_t maxStreamCount = 1024;\
  size_t pos = 0;\
  hcrngStatus hcStatus##gt;\
  while (pos < num) {\
  size_t streamCount, numberCount;\
  if ((num - pos) > maxStreamCount) {\
    streamCount = maxStreamCount;\
    numberCount = (num - pos) / streamCount * streamCount;\
  } else {\
    streamCount = (num - pos);\
    numberCount = streamCount;\
  }\
  printf("num %lu numberCount %lu streamCount %lu pos %lu\n", num, numberCount, streamCount, pos);\
  hcrng##gt##Stream *streams##gt = hcrng##gt##CreateStreams((hcrng##gt##StreamCreator*)generator, streamCount, NULL, NULL); \
  hcrng##gt##Stream *streams_buffer##gt;\
  hipMalloc((void **)&streams_buffer##gt, streamCount * sizeof(hcrng##gt##Stream));\
  hipMemcpy(streams_buffer##gt, streams##gt, streamCount * sizeof(hcrng##gt##Stream), hipMemcpyHostToDevice);\
  free(streams##gt);\
  hcStatus##gt = hcrng##gt##DeviceRandomU01Array_single( /* TODO */ \
       *accl_view, streamCount, streams_buffer##gt, numberCount, outputPtr + pos);\
  hipFree(streams_buffer##gt);\
  pos += numberCount;\
  }\
  return hipHCRNGStatusToHIPStatus(hcStatus##gt);

As you can see, only maxStreamCount = 1024 streams are used here. So any num numbers will be generated using 1 (when num % maxStreamCount == 0) or 2 runs.

Questions:

  1. Is this a right direction to fix the issue?
  2. Are there recommendations for maxStreamCount for various nums (when it's small, large, huge)?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants