Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

profiler tool underestimate memory really used underneath by GPU (NVIDIA) #378

Open
fitoule opened this issue Jan 19, 2022 · 2 comments
Open

Comments

@fitoule
Copy link

fitoule commented Jan 19, 2022

Hello I successfully ran the profiler tool on ma classification model to profile the maximum memory usage. Because I want to use different CNN on a same GPU. But I'm really baffled by the results of the profiler. Let me explain

I have a NVIDIA RTX3090 with 24GB memory so for my small CNN I set 512 memory limit in my code before all use with this code :
tf.config.set_logical_device_configuration(gpus[0],[tf.config.LogicalDeviceConfiguration(memory_limit=512)])

It seems to work because of the tensorflow logs
2022-01-19 16:24:13.615890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with **512 MB memory:** -> device: 0, name: GeForce RTX 3090, pci bus id: 0000:2d:00.0, compute capability: 8.6

Nvidia-smi shows that GPU use 419MiB are used
smi first

Then I start a batch to make the inference on the classification model with batch size = 1
and tensorboard shows that the model use about 100MiB
tensorboard

so theoretically I could have set a small memory limit (under 512) but .. here is the real use of the memory given by Nvidia-smi is 1869MiB !
nvidiaamemoryused

Finally if I want a tool to know how much is the real memory consumption of a model, how to use the tensor board profiler ? TensorBoard resuIt is useless actually ?

@fitoule
Copy link
Author

fitoule commented Jan 20, 2022

Ok I've created a notebook that you can download and execute ( but not on colab because you need to have exclusiv acces on the GPU )
https://github.com/fitoule/tensorflow_gpu_memory-/blob/main/DemoMemoryIssue.ipynb

@fitoule
Copy link
Author

fitoule commented Jan 21, 2022

I made further investigations. Actually the command line works but documentation is not enough clear. on my test when I set memory_limit=200.
A) When I Call import tensorflow => NVIDIA memory allocated is 423MiB
B) When I Call the code with memory limit => NVIDIA memory allocated is 423+200=623MiB
C) When a first inference is called then TensorFlow add a C part memory 938MiB (+423+200) Total = 1561 MiB

So I understand that A+C is a constant that is needed by TensorFlow and the memory_limit affects only the B part. I tested on many different model.
A+B depends on the driver or GPU HW.

So now it's clear for me. But finally documentation could mention this because see for a low model about 100MiB I need 1.5GB ram, it 's confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant