Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Display issue when running in Docker environment #98

Closed
3 tasks done
GhostArtyom opened this issue Sep 29, 2023 · 5 comments
Closed
3 tasks done

[BUG] Display issue when running in Docker environment #98

GhostArtyom opened this issue Sep 29, 2023 · 5 comments
Assignees
Labels
bug Something isn't working cli / gui Something related to the CLI

Comments

@GhostArtyom
Copy link

Required prerequisites

  • I have read the documentation https://nvitop.readthedocs.io.
  • I have searched the Issue Tracker that this hasn't already been reported. (comment there if it has.)
  • I have tried the latest version of nvitop in a new isolated virtual environment.

What version of nvitop are you using?

nvitop 1.3.0

Operating system and version

Ubuntu 22.04.3 LTS

NVIDIA driver version

537.42

NVIDIA-SMI

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.112                Driver Version: 537.42       CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4070 Ti     On  | 00000000:01:00.0  On |                  N/A |
|  0%   37C    P8              13W / 285W |   1450MiB / 12282MiB |     11%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        23      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

Python environment

3.7.5 (default, May 29 2023, 13:54:16)
[GCC 7.5.0] linux
nvidia-ml-py==12.535.108
nvitop==1.3.0

Problem description

在WSL安装的Docker里面使用nvitop会产生像乱码一样的界面

image

q退出后却能正常显示Unicode字符
image

使用nvitop -U命令用ASCII字符是能正常显示的
image

在WSL中用nvitop就一切正常
image

Steps to Reproduce

The Python snippets (if any):

Command lines:

nvitop

Traceback

No response

Logs

[DEBUG] 2023-09-29 14:23:28,011 nvitop.api.libnvml::__determine_get_memory_info_version_suffix: Found symbol `nvmlDeviceGetMemoryInfo_v2`.
[DEBUG] 2023-09-29 14:23:28,011 nvitop.api.libnvml::__determine_get_memory_info_version_suffix: NVML get memory info version 2 is available.
[DEBUG] 2023-09-29 14:23:28,022 nvitop.api.libnvml::lookup: Found symbol `nvmlDeviceGetComputeRunningProcesses_v3`.
[DEBUG] 2023-09-29 14:23:28,023 nvitop.api.libnvml::lookup: Found symbol `nvmlDeviceGetConfComputeMemSizeInfo`.
[DEBUG] 2023-09-29 14:23:28,023 nvitop.api.libnvml::lookup: Found symbol `nvmlDeviceGetRunningProcessDetailList`.
[DEBUG] 2023-09-29 14:23:28,023 nvitop.api.libnvml::__determine_get_running_processes_version_suffix: NVML get running process version 3 API with v3 type struct is not available due to incompatible NVIDIA driver. Fallback to use get running process version 3 API with v2 type struct.

Expected behavior

希望能正确显示Unicode版的nvitop不出乱码

Additional context

已经遍历地试过将LANGLC_ALL设为en_US.UTF-8C.UTF-8,都能稳定复现出乱码

@GhostArtyom GhostArtyom added the bug Something isn't working label Sep 29, 2023
@XuehaiPan XuehaiPan changed the title [BUG] Display issue when running in Docker envirnment [BUG] Display issue when running in Docker environment Sep 29, 2023
@XuehaiPan XuehaiPan added the cli / gui Something related to the CLI label Sep 29, 2023
@XuehaiPan
Copy link
Owner

在 WSL 安装的 Docker 里面使用 nvitop 会产生像乱码一样的界面
已经遍历地试过将 LANGLC_ALL 设为 en_US.UTF-8C.UTF-8,都能稳定复现出乱码

@GhostArtyom 可否提供一下 docker image 的相关信息以供问题复现。以及需要您确认一下是否安装了 ncursesw 库以实现 ncurses 的 Unicode 支持。

@GhostArtyom
Copy link
Author

在 WSL 安装的 Docker 里面使用 nvitop 会产生像乱码一样的界面
已经遍历地试过将 LANGLC_ALL 设为 en_US.UTF-8C.UTF-8,都能稳定复现出乱码

@GhostArtyom 可否提供一下 docker image 的相关信息以供问题复现。以及需要您确认一下是否安装了 ncursesw 库以实现 ncurses 的 Unicode 支持。

安装的是 MindSpore 2.1.1 + CUDA 11.6 版本 https://www.mindspore.cn/install/

docker pull swr.cn-south-1.myhuaweicloud.com/mindspore/mindspore-gpu-cuda11.6:2.1.1

image

ncurses 和 ncursesw 库都已安装

apt list | grep ncurses

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

fizmo-ncursesw/jammy 0.7.14-2 amd64
gambas3-gb-ncurses/jammy 3.16.3-3 amd64
lib32ncurses-dev/jammy-updates,jammy-security 6.3-2ubuntu0.1 amd64
lib32ncurses6/jammy-updates,jammy-security 6.3-2ubuntu0.1 amd64
lib32ncursesw6/jammy-updates,jammy-security 6.3-2ubuntu0.1 amd64
libcunit1-ncurses/jammy 2.1-3-dfsg-2.4 amd64
libcunit1-ncurses-dev/jammy 2.1-3-dfsg-2.4 amd64
libncurses-dev/jammy-updates,jammy-security,now 6.3-2ubuntu0.1 amd64 [installed,automatic]
libncurses-gst/jammy 3.2.5-1.3ubuntu1 all
libncurses5/jammy-updates,jammy-security 6.3-2ubuntu0.1 amd64 [upgradable from: 6.1-1ubuntu1.18.04.1]
libncurses5-dev/jammy-updates,jammy-security,now 6.3-2ubuntu0.1 amd64 [installed]
libncurses6/jammy-updates,jammy-security,now 6.3-2ubuntu0.1 amd64 [installed,automatic]
libncursesada-doc/jammy 6.2.20200212-4 all
libncursesada6.2.3/jammy 6.2.20200212-4 amd64
libncursesada9-dev/jammy 6.2.20200212-4 amd64
libncursesw5/jammy-updates,jammy-security 6.3-2ubuntu0.1 amd64 [upgradable from: 6.1-1ubuntu1.18.04.1]
libncursesw5-dev/jammy-updates,jammy-security,now 6.3-2ubuntu0.1 amd64 [installed]
libncursesw6/jammy-updates,jammy-security,now 6.3-2ubuntu0.1 amd64 [installed,automatic]
librust-ncurses-dev/jammy 5.99.0-3 amd64
ncurses-base/jammy-updates,jammy-security,now 6.3-2ubuntu0.1 all [installed]
ncurses-bin/jammy-updates,jammy-security 6.3-2ubuntu0.1 amd64 [upgradable from: 6.1-1ubuntu1.18.04]
ncurses-doc/jammy-updates,jammy-security 6.3-2ubuntu0.1 all
ncurses-examples/jammy-updates,jammy-security 6.3-2ubuntu0.1 amd64
ncurses-hexedit/jammy 0.9.7+orig-7.1 amd64
ncurses-term/jammy-updates,jammy-security 6.3-2ubuntu0.1 all
ruby-ncurses/jammy 1.4.9-1build7 amd64
wordgrinder-ncurses/jammy 0.8-1 amd64

@XuehaiPan
Copy link
Owner

安装的是 MindSpore 2.1.1 + CUDA 11.6 版本 mindspore.cn/install

docker pull swr.cn-south-1.myhuaweicloud.com/mindspore/mindspore-gpu-cuda11.6:2.1.1

@GhostArtyom 感谢提供相关信息,我 docker container 进入后立刻运行 pip3 install nvitop 可以复现该问题:

$ docker run --gpus=all --rm -it -h ubuntu swr.cn-south-1.myhuaweicloud.com/mindspore/mindspore-gpu-cuda11.6:2.1.1

==========
== CUDA ==
==========

CUDA Version 11.6.2

Container image Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

root@ubuntu:/# pip3 install nvitop
root@ubuntu:/# nvitop
image

我发现是 docker image 内 PATH 默认的 Python interpreter 构建时未使用 ncurses 导致的:

root@ubuntu:/# which -a python3
/usr/local/python-3.7.5/bin/python3
root@ubuntu:/# which -a python
/usr/local/bin/python
/usr/local/bin/python

解决方案如下:

apt update
apt install python3-dev python3-pip
/usr/bin/python3 -m pip install --upgrade pip setuptools
/usr/bin/python3 -m pip install nvitop
image

注:docker 内部的 NVML 会调用 host 系统的 NVIDIA 驱动,返回的 PID 也是 host 系统内的 PID。这导致上面 docker 内出现 No Such Process 错误。如果想显示正确的进程信息,在启动 docker run 命令时需要加上 --pid-host 参数。

@GhostArtyom
Copy link
Author

@XuehaiPan 感谢给出解决方案👍 nvitop 太好使了,已经推荐给很多人了

另外 No Such Process 错误或许是由 WSL 没法连接硬件导致的🤔 因为我在 WSL 里用 nvitop 同样是 No Such Process

image

@XuehaiPan
Copy link
Owner

另外 No Such Process 错误或许是由 WSL 没法连接硬件导致的🤔 因为我在 WSL 里用 nvitop 同样是 No Such Process

该问题是 WSL 上游导致的,参考 issue #49:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cli / gui Something related to the CLI
Projects
None yet
Development

No branches or pull requests

2 participants