You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 22, 2019. It is now read-only.
I've run into an issue running hiprngGenerateUniform on an AMD platform recently, where hipMalloc fails. The same issue does not seem to occur on a Nvidia platform. The generator that I'm using is MRG32k3a, and unfortunately, reducing the problem size is not an option. Below is a short HIP trace of the error on an AMD platform.
The current implementation of hiprngGenerate* functions create num streams for num numbers. When num is really big it takes too much time and memory (host and device) to create streams.
As you can see, only maxStreamCount = 1024 streams are used here. So any num numbers will be generated using 1 (when num % maxStreamCount == 0) or 2 runs.
Questions:
Is this a right direction to fix the issue?
Are there recommendations for maxStreamCount for various nums (when it's small, large, huge)?
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I've run into an issue running
hiprngGenerateUniform
on an AMD platform recently, wherehipMalloc
fails. The same issue does not seem to occur on a Nvidia platform. The generator that I'm using is MRG32k3a, and unfortunately, reducing the problem size is not an option. Below is a short HIP trace of the error on an AMD platform.<<hip-api tid:1.130 hipMalloc (0x7ffd82b04f28, 5435817984) hip-api tid:1.130 hipMalloc ret=1002 (hipErrorMemoryAllocation)>> <<hip-api tid:1.131 hipMemcpy (0, 0x7fe64b2ad010, 5435817984, hipMemcpyHostToDevice)
The text was updated successfully, but these errors were encountered: