Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA]Decrease Pool Size on the fly #724

Open
VibhuJawa opened this issue Mar 10, 2021 · 11 comments
Open

[FEA]Decrease Pool Size on the fly #724

VibhuJawa opened this issue Mar 10, 2021 · 11 comments
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request improvement Improvement / enhancement to an existing function

Comments

@VibhuJawa
Copy link
Member

VibhuJawa commented Mar 10, 2021

Is your feature request related to a problem? Please describe.
A lot of times we use libraries that use RMM Pool (Rapids, cupy, numba) with libraries that have their own Pool( like PyTorch ). This leads to intense competition b/w the libraries for device memory.

The workflows often look like below:

  1. Step 1: Do cudf/cupy based pre-processing. This leads to the expansion of the memory pool. Once pre-processing is complete we often have final memory in use << current rmm Pool size which is the same as peak memory use

  2. Step 2: Do Pytorch based inference/Training: This requires Pytorch to use its own Pool to run inference which competes with the RMM pool which at this point can be really large causing memory problems.

The above pattern of final memory in use being less than the peak memory is very common for NLP workflows because we often go from string representation to numerical representation which leads to a decrease in memory.

Describe the solution you'd like
I wish I could decrease the RMM pool currently in use on the fly using the Python API.

Additional context
This currently can help step side some issues like :

Issue 159: rapidsai/gpu-bdb#159 (We fail some times cause we don't have enough memory for cuBlas initialization due to RMM pool expansion )

Add the hugging face/pyTorch implementation implementation to our benchmarking script.

CC: @BartleyR / @brhodes10 . This might help with #501 for cyber workflows.
CC: @EvenOldridge For inputs on the NvTabular side.

@VibhuJawa VibhuJawa added ? - Needs Triage Need team to review and classify feature request New feature or request labels Mar 10, 2021
@randerzander
Copy link

cc @harrism @jrhemstad re: an offline discussion from this week

@jrhemstad
Copy link
Contributor

I wish I could decrease the RMM pool currently in use on the fly using the Python API.

One very expensive way you can accomplish this is to spill all of your GPU data to host, destroy the pool, allocate a new pool only large enough to hold your data, move your data back from host to device.

Otherwise this is a very difficult problem that may or may not be possible with the virtual memory APIs.

@jakirkham
Copy link
Member

Does this just go back to PyTorch using their own memory manager? If so, I recall we raised issue ( pytorch/pytorch#43144 ) to discuss getting them to accept external memory managers (like RMM) and there was some discussion with them. Though don't know what the status was there. Mark & Jake do either of you know?

@jrhemstad
Copy link
Contributor

Does this just go back to PyTorch using their own memory manager? If so, I recall we raised issue ( pytorch/pytorch#43144 ) to discuss getting them to accept external memory managers (like RMM) and there was some discussion with them. Though don't know what the status was there. Mark & Jake do either of you know?

It's still in the works. I think the idea is that it's unlikely we'll ever be able to get every single library we want to interoperate with will expose hooks for external allocators. So there will likely always be a need to defrag the pool.

@jrhemstad jrhemstad added improvement Improvement / enhancement to an existing function and removed ? - Needs Triage Need team to review and classify labels Mar 10, 2021
@jakirkham
Copy link
Member

Ok thanks for the info.

That's fair. Though do we need to worry about every library? PyTorch comes up frequently and there are many other libraries building off of it. Maybe solving this in PyTorch is enough for many use cases?

Randy & Vibhu are there other places you see this issue or is it mainly with PyTorch?

@harrism
Copy link
Member

harrism commented Mar 10, 2021

Even if everyone uses RMM, fragmentation is a big problem so having an explicit (possibly expensive) button to push to defragment may be valuable.

@VibhuJawa
Copy link
Member Author

VibhuJawa commented Mar 10, 2021

Randy & Vibhu are there other places you see this issue or is it mainly with PyTorch?

When using Spacy it under the hood needs memory for cuBLAS context (rapidsai/gpu-bdb#159 (comment).

With previous queries, we already have increased our POOL so sometimes we don't have memory left for cuBlas context creation which leads to intermittent failures, if we had this feature we could have decreased the Pool in use on the fly and side step this issue.

@jakirkham
Copy link
Member

jakirkham commented Mar 10, 2021

Can spaCy reuse CuPy's cuBLAS context?

@VibhuJawa
Copy link
Member Author

VibhuJawa commented Mar 10, 2021

Can spaCy reuse CuPy's cuBLAS context?

Don't know enough about how they interact to answer this. Maybe @beckernick can answer this once he is back.

Randy & Vibhu are there other places you see this issue or is it mainly with PyTorch?

Wondering if we face similar problems with TensorRT too, may be @benfred from the NVtabular team can shed some light in case they face this competing pool problem with TensorRT engine and RMM?

@jakirkham
Copy link
Member

jakirkham commented Mar 10, 2021

FWIW made a suggestion in the other thread on how to initialize the cuBLAS handle

Edit: Looks like spaCy just uses CuPy AFAICT. Not seeing any direct C calls to cuBLAS. So maybe initializing cuBLAS with CuPy is sufficient

@github-actions
Copy link

github-actions bot commented Apr 9, 2021

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@jrhemstad jrhemstad added 0 - Backlog In queue waiting for assignment and removed inactive-30d labels Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0 - Backlog In queue waiting for assignment feature request New feature or request improvement Improvement / enhancement to an existing function
Projects
Status: No status
Development

No branches or pull requests

5 participants