profiler tool underestimate memory really used underneath by GPU (NVIDIA) #378

fitoule · 2022-01-19T16:35:41Z

Hello I successfully ran the profiler tool on ma classification model to profile the maximum memory usage. Because I want to use different CNN on a same GPU. But I'm really baffled by the results of the profiler. Let me explain

I have a NVIDIA RTX3090 with 24GB memory so for my small CNN I set 512 memory limit in my code before all use with this code :
tf.config.set_logical_device_configuration(gpus[0],[tf.config.LogicalDeviceConfiguration(memory_limit=512)])

It seems to work because of the tensorflow logs
2022-01-19 16:24:13.615890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with **512 MB memory:** -> device: 0, name: GeForce RTX 3090, pci bus id: 0000:2d:00.0, compute capability: 8.6

Nvidia-smi shows that GPU use 419MiB are used

Then I start a batch to make the inference on the classification model with batch size = 1
and tensorboard shows that the model use about 100MiB

so theoretically I could have set a small memory limit (under 512) but .. here is the real use of the memory given by Nvidia-smi is 1869MiB !

Finally if I want a tool to know how much is the real memory consumption of a model, how to use the tensor board profiler ? TensorBoard resuIt is useless actually ?

The text was updated successfully, but these errors were encountered:

fitoule · 2022-01-20T15:57:36Z

Ok I've created a notebook that you can download and execute ( but not on colab because you need to have exclusiv acces on the GPU )
https://github.com/fitoule/tensorflow_gpu_memory-/blob/main/DemoMemoryIssue.ipynb

fitoule · 2022-01-21T14:12:42Z

I made further investigations. Actually the command line works but documentation is not enough clear. on my test when I set memory_limit=200.
A) When I Call import tensorflow => NVIDIA memory allocated is 423MiB
B) When I Call the code with memory limit => NVIDIA memory allocated is 423+200=623MiB
C) When a first inference is called then TensorFlow add a C part memory 938MiB (+423+200) Total = 1561 MiB

So I understand that A+C is a constant that is needed by TensorFlow and the memory_limit affects only the B part. I tested on many different model.
A+B depends on the driver or GPU HW.

So now it's clear for me. But finally documentation could mention this because see for a low model about 100MiB I need 1.5GB ram, it 's confusing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

profiler tool underestimate memory really used underneath by GPU (NVIDIA) #378

profiler tool underestimate memory really used underneath by GPU (NVIDIA) #378

fitoule commented Jan 19, 2022

fitoule commented Jan 20, 2022

fitoule commented Jan 21, 2022

profiler tool underestimate memory really used underneath by GPU (NVIDIA) #378

profiler tool underestimate memory really used underneath by GPU (NVIDIA) #378

Comments

fitoule commented Jan 19, 2022

fitoule commented Jan 20, 2022

fitoule commented Jan 21, 2022