-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faiss GPU: improve error information for GPU OOM #3060
base: main
Are you sure you want to change the base?
Conversation
Summary: This diff updates logging in case of GPU out of memory errors, whether from `cudaMalloc` directly or from the RAFT allocator. In case of a memory error, allocator state (including an indication of CUDA-reported free memory on the device) is returned as part of the exception message, like this: ``` C++ exception with description "Error in virtual void *faiss::gpu::StandardGpuResourcesImpl::allocMemory(const faiss::gpu::AllocRequest &) at fbcode/faiss/gpu/StandardGpuResources.cpp:570: StandardGpuResources: Faiss device allocator fail type IVFLists dev 1 space Device stream 0x7fa07623b440 size 1024 bytes Allocator state: GPU device 1 allocator state: ========== Device free memory: 82400968704 bytes Allocator temp memory remaining: 1610612720 Outstanding Faiss allocations: Alloc type TemporaryMemoryBuffer: 1 allocations, 1610612736 bytes Alloc type FlatData: 2 allocations, 59648 bytes ``` In the case where Faiss is built using RAFT, previously no error information was provided if the RAFT memory manager had an OOM error, but now it will produce a string similar to the above. The Faiss memory manager (StandardGpuResources) continues to log all allocations made and passed to the RAFT memory manager, so we can also receive an indication of what is allocated and for what purpose. In addition, this fixes the issue where Faiss GPU would not compile (in fbcode at least) if the `USE_NVIDIA_RAFT` define was not available. Now the library compiles both with and without RAFT. Also updated the `#if defined USE_NVIDIA_RAFT` to #ifdef USE_NVIDIA_RAFT` to better conform to the rest of the GPU code. This diff also disables the temporary memory allocation of 1.5 GB made up front if RAFT is being used, which is really what is intended for using the RAFT memory manager. Otherwise this diff does not change the runtime behavior of Faiss GPU otherwise, but this diff is being made to better debug GPU OOM issues with Faiss usage. Reviewed By: mdouze Differential Revision: D49260364
This pull request was exported from Phabricator. Differential Revision: D49260364 |
@cjnolet FYI with the Faiss/RAFT integration when the RAFT memory manager was being used, we were still reserving 1.5 GB of memory taken from the RAFT allocator which was given out using the Faiss memory stack code, as the temporary memory management stuff inside StandardGpuResources was still being used. I presume that the purpose of the RAFT manager is also to allow for efficient memory allocation for large chunks of temporary memory as well (e.g., grab 100 MB to use for temporary calculations, immediately return it). The RAFT manager was thus mainly used for permanent memory allocations (e.g., IVF list data and the like). Instead now we just pass all allocation requests to the RAFT memory manager (temporary or "permanent") instead if we're enabled with that as a part of this diff. |
@cjnolet What is your thought on this change to the memory allocator? Currently in the repo, it was using (likely by accident) both the Faiss and RAFT memory allocator. The Faiss allocator was used for "temporary" memory allocations (handed out from a stack of 1.5 GB pre-allocated up front) while all non-temporary allocations (e.g., IVF list data) or overflow temporary allocations (we asked for more temporary allocation than was currently available) would fall back to RAFT. My concern is if this would be a performance regression, like if the RAFT allocator doesn't function well at handing out temporary 100+ MB chunks of memory which are quickly returned to the allocator (stream ordered, after dependent kernels are allocated). Instead now with this diff all allocations would go to RAFT. Does the RAFT memory allocator handle very large (like 100-500 MB) but temporary (we ask for an allocation and return it within the same C++ function after calling 1-N kernels that use it) allocations well, or is it more of a "small-ish block allocator" strategy (i.e., it only works well on allocation churn for smaller allocations). |
I want to first draw a brief distinction between RMM (RAPIDS memory manager) and RAFT (primitives for vector search, information retreival, and ML). I think you are right about both FAISS and RMM being used at the same time. The RMM lead is out this week, but I'm going to schedule an information exchange session when he's back. In the meantime, I'll say that RMM itself is kind of the "only GPU memory manager you need" in the way that its primary goals are to
Number 2 is often done in the Python layer but could be done in the C++ layer as well. RMM contains different strategies for allocating memory. The default strategy for device memory is a RAFT has found that the default memory resource is usually good enough, but we've recently started to support having a second allocation strategy for temporary workspace allcoations that we call the Does this sound like it will support what FAISS needs? I should mention that RMM is a very lightweight and header-only memory management library and so we could enable RMM in the Does this answer your questions? Do you see any potential problems w/ the way RMM works or bad assumptions w.r.t the way |
@cjnolet @wickedfoo what's status of this PR? |
Summary:
This diff updates logging in case of GPU out of memory errors, whether from
cudaMalloc
directly or from the RAFT allocator. In case of a memory error, allocator state (including an indication of CUDA-reported free memory on the device) is returned as part of the exception message, like this:In the case where Faiss is built using RAFT, previously no error information was provided if the RAFT memory manager had an OOM error, but now it will produce a string similar to the above. The Faiss memory manager (StandardGpuResources) continues to log all allocations made and passed to the RAFT memory manager, so we can also receive an indication of what is allocated and for what purpose.
In addition, this fixes the issue where Faiss GPU would not compile (in fbcode at least) if the
USE_NVIDIA_RAFT
define was not available. Now the library compiles both with and without RAFT.Also updated the
#if defined USE_NVIDIA_RAFT
to #ifdef USE_NVIDIA_RAFT` to better conform to the rest of the GPU code.This diff also disables the temporary memory allocation of 1.5 GB made up front if RAFT is being used, which is really what is intended for using the RAFT memory manager. Otherwise this diff does not change the runtime behavior of Faiss GPU otherwise, but this diff is being made to better debug GPU OOM issues with Faiss usage.
Reviewed By: mdouze
Differential Revision: D49260364