Question about cuda compatibility #18

pokerfaceSad · 2024-03-08T08:52:57Z

With the upgrade of CUDA and NVML versions, some functions have emerged with a "_v2" suffix, such as nvmlDeviceGetMemoryInfo and nvmlDeviceGetMemoryInfo_v2. When upper-level applications call these functions, they may preferentially invoke the v2 functions. If libcuda.so or libnvidia-ml.so does not declare the v2 functions, then the v1 version will be called, as in this code snippet https://github.com/XuehaiPan/nvitop/blob/470245dc3da0d9f4e3106b2c981d63d23440a5a5/nvitop/api/libnvml.py#L861-L879 .

However, when we implement a hook library like nvshare, if we provide a declaration for the v2 version of the function to be compatible with higher versions and attempt to call the v2 version in the real library, there could be an issue if the real library is a lower version that does not have the v2 function, potentially leading to an exception.

For instance, in this code at https://github.com/grgalex/nvshare/blob/main/src/hook.c#L598 , it returns CUDA_ERROR_NOT_INITIALIZED when real libcuda.so has no cuGetProcAddress_v2 function, which might cause the user program to malfunction.

The text was updated successfully, but these errors were encountered:

grgalex · 2024-03-20T12:09:45Z

@pokerfaceSad I see the problem.

Do you have any thoughts on how we can handle it?

pokerfaceSad · 2024-03-22T04:10:05Z

@pokerfaceSad I see the problem.

Do you have any thoughts on how we can handle it?

I haven't thought of a good way yet.

If we can't solve this problem inside the library, maybe solve it by bypassing.

A simple and inelegant idea is, we can try to recognize the CUDA/NVML version during container or CUDA process start, then set a matched hook library into library path.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about cuda compatibility #18

Question about cuda compatibility #18

pokerfaceSad commented Mar 8, 2024

grgalex commented Mar 20, 2024

pokerfaceSad commented Mar 22, 2024

Question about cuda compatibility #18

Question about cuda compatibility #18

Comments

pokerfaceSad commented Mar 8, 2024

grgalex commented Mar 20, 2024

pokerfaceSad commented Mar 22, 2024