Support PyTorch `set_per_process_memory_fraction` #39

kcm · 2022-12-29T15:32:16Z

Summary

PyTorch allows a limit for GPU memory. This is useful, for example, when a GPU resource is shared.

set_per_process_memory_fraction(fraction, device=None): Set memory fraction for a process. The fraction is used to limit an caching allocator to allocated memory on a CUDA device. The allowed value equals the total visible memory multiplied fraction. If trying to allocate more than the allowed value in a process, will raise an out of memory error in allocator.

Proposal

This setting takes a percentage [0-1] and a device (optional). Use an environment variable alongside ENABLE_CUDA of the format CUDA_MEMORY_FRACTION where the value is 0.0-1.0 and passed to fraction. Additionally, if set, check and prefer CUDA_MEMORY_FRACTION_... variable(s), where the value is the same format, and the ... is passed to device for each variable found.

Questions

Is there a better name than CUDA_MEMORY_FRACTION/CUDA_MEMORY_FRACTION_...?
Do we need multiple device support initially? We don't seem to currently support device selection.
Is this supported on our current PyTorch (1.13)?

The text was updated successfully, but these errors were encountered:

kcm · 2022-12-29T15:37:06Z

One use case is for AWS vGPU support so that multiple consumers of the vGPU device(s) don't assume they have exclusive rights to the full resource usage.

Due to the lack of support for GPU virtualization by Weaviate (see weaviate/t2v-transformers-models#39 for details), we need to ensure each pod that uses the GPU, utilizes the full pod without sharing it. The way to ensure this is to schedule each GPU enabled pod to its own dedicated node using Kubernetes's anti-affinity feature. This addresses this to make it easy to run clusters with GPU support.

kcm added enhancement New feature or request good first issue Good for newcomers labels Dec 29, 2022

kcm mentioned this issue Jan 3, 2023

Add support for PyTorch GPU memory limit #41

Merged

kcm added a commit that referenced this issue Jan 17, 2023

Merge branch 'main' into GH-39

43d0946

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support PyTorch `set_per_process_memory_fraction` #39

Support PyTorch `set_per_process_memory_fraction` #39

kcm commented Dec 29, 2022 •

edited

Loading

kcm commented Dec 29, 2022

Support PyTorch set_per_process_memory_fraction #39

Support PyTorch set_per_process_memory_fraction #39

Comments

kcm commented Dec 29, 2022 • edited Loading

Summary

Proposal

Questions

kcm commented Dec 29, 2022

Support PyTorch `set_per_process_memory_fraction` #39

Support PyTorch `set_per_process_memory_fraction` #39

kcm commented Dec 29, 2022 •

edited

Loading