Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

显存显示问题 #7

Open
qifengz opened this issue Aug 27, 2021 · 8 comments
Open

显存显示问题 #7

qifengz opened this issue Aug 27, 2021 · 8 comments

Comments

@qifengz
Copy link

qifengz commented Aug 27, 2021

容器内执行nvidia-smi返回如下:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00 Driver Version: 440.64.00 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... On | 00000000:00:0A.0 Off | 0 |
| N/A 36C P0 42W / 300W | 112MiB / 16160MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |

Memory-Usage: 112MiB / 16160MiB

  1. 还没程序跑,显示112MiB已使用?
  2. 默认一张卡相当于3张vgpu卡,总的显存不应该是16160MiB/3吗?
@archlitchi
Copy link
Member

112
m是管理vgpu所需要的上下文显存,16160是因为你yaml里面设置的device-memory-scaling为3,所以我们用虚拟显存技术把你的显存也扩充了3倍,如果不使用虚拟显存的话,可以把device-memory-scaling设置为1,这样总的显存就是16160/3了

@qifengz
Copy link
Author

qifengz commented Aug 30, 2021

@archlitchi “管理vgpu所需要的上下文显存“这个是预留的吗?112m在显示上有办法屏蔽吗,因为对用户来说有点误解哈~
还有,这些warning有办法屏蔽?有点影响用户体验。
image

@archlitchi
Copy link
Member

屏蔽warning的话,设置环境变量LIBCUDA_LOG_LEVEL=0即可,但是112M显示暂时不会去改,因为管理vGPU确实需要这么大的显存,显示为0的话不大合适

@qifengz
Copy link
Author

qifengz commented Aug 31, 2021

从用户角度来看,112M会造成困惑,因为我都没用就白白耗费了112M的显存。

@archlitchi
Copy link
Member

@qifengz 这个问题上slack上聊吧

@archlitchi

This comment has been minimized.

@qifengz
Copy link
Author

qifengz commented Sep 2, 2021

@qifengz 这个问题上slack上聊吧
大佬,slack账号多少?

@archlitchi
Copy link
Member

@qifengz 直接加我微信吧 xuanzong4493

archlitchi added a commit that referenced this issue Jan 25, 2024
modify readme_cn to format gpu-pod yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants