Skip to content
Merged
Changes from 2 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
9ad6bcd
_decide_nvjitlink_or_driver(): catch RuntimeError (bug fix), use impo…
rwgk Sep 30, 2025
9390c85
Fix misunderstanding: RuntimeError is raised only from inner_nvjitlin…
rwgk Oct 1, 2025
2ccdc21
Merge branch 'main' into linker_decide_nvjitlink_or_driver_fix
rwgk Oct 1, 2025
6bf421e
Better way of formatting warning messages.
rwgk Oct 1, 2025
9220d58
Change from importlib.import_module() to plain import (the latter doe…
rwgk Oct 1, 2025
c9a95c0
Merge branch 'main' into linker_decide_nvjitlink_or_driver_fix
rwgk Oct 2, 2025
7e9fce4
Enhance to warning messages, to make them actionable.
rwgk Oct 2, 2025
4724769
Factor out _nvjitlink_has_version_symbol() for clarity and testability
rwgk Oct 2, 2025
6630d8e
Add test_linker_warnings.py
rwgk Oct 2, 2025
f1c29f6
Fix "the the" oversight
rwgk Oct 2, 2025
0948942
Replace "culink APIs" → "driver APIs" in warning message.
rwgk Oct 2, 2025
00e6421
Merge branch 'main' into linker_decide_nvjitlink_or_driver_fix
rwgk Oct 3, 2025
b237669
Fix oversight: test_linker_warnings.py needs to be updated after comm…
rwgk Oct 3, 2025
6787ddf
Merge branch 'main' into linker_decide_nvjitlink_or_driver_fix
rwgk Oct 3, 2025
4dc08e7
fix skipping the check for nvidia-smi (#1084)
leofang Oct 4, 2025
9721041
Merge branch 'main' into linker_decide_nvjitlink_or_driver_fix
leofang Oct 6, 2025
c1a767f
Merge branch 'main' into linker_decide_nvjitlink_or_driver_fix
rwgk Oct 7, 2025
5ba11be
rm cuda_core/tests/test_linker_warnings.py: see https://github.com/NV…
rwgk Oct 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 28 additions & 19 deletions cuda_core/cuda/core/experimental/_linker.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
from __future__ import annotations

import ctypes
import importlib
import sys
import weakref
from contextlib import contextmanager
from dataclasses import dataclass
Expand Down Expand Up @@ -37,28 +39,35 @@ def _decide_nvjitlink_or_driver() -> bool:

_driver_ver = handle_return(driver.cuDriverGetVersion())
_driver_ver = (_driver_ver // 1000, (_driver_ver % 1000) // 10)

def libname():
return "nvJitLink*.dll" if sys.platform == "win32" else "libnvJitLink.so*"

therefore_not_usable = ". Therefore cuda.bindings.nvjitlink is not usable and"

try:
from cuda.bindings import nvjitlink as _nvjitlink
_nvjitlink = importlib.import_module("cuda.bindings.nvjitlink")
except ModuleNotFoundError:
problem = "cuda.bindings.nvjitlink is not available, therefore"
else:
from cuda.bindings._internal import nvjitlink as inner_nvjitlink
except ImportError:
# binding is not available

try:
if inner_nvjitlink._inspect_function_pointer("__nvJitLinkVersion"):
return False # Use nvjitlink
except RuntimeError:
problem = libname() + " is not available" + therefore_not_usable
else:
problem = libname() + " is is too old (<12.3)" + therefore_not_usable
_nvjitlink = None
else:
if inner_nvjitlink._inspect_function_pointer("__nvJitLinkVersion") == 0:
# binding is available, but nvJitLink is not installed
_nvjitlink = None

if _nvjitlink is None:
warn(
"nvJitLink is not installed or too old (<12.3). Therefore it is not usable "
"and the culink APIs will be used instead.",
stacklevel=3,
category=RuntimeWarning,
)
_driver = driver
return True
else:
return False

warn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should always be asking ourselves: "Is this warning actionable by our users?"

If the answer is "no" or "not really", then there's no reason to spam people with "informative" warnings.

This one looks like it is not actionable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actionable (and more so once we figure out how to address #851) because users can update cuda-bindings and/or nvJitLink.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming it is actionable, the warning needs to contain the action that a user can take instead of just outputting the current state of their installation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be much easier if we don't actually want to produce a warning. I was just following the established path.

We could make the errors actionable in this way (for the three situations):

  • cuda.bindings.nvjitlink is not available, therefore the culink APIs will be used instead. For best results, please upgrade to a more recent cuda-bindings version.

  • libnvJitLink.so* is not available. Therefore cuda.bindings.nvjitlink is not usable and the culink APIs will be used instead. For best results, please install a recent version of nvJitLink.

  • libnvJitLink.so* is too old (<12.3). Therefore cuda.bindings.nvjitlink is not usable and the culink APIs will be used instead. For best results, please install a recent version of nvJitLink.

@leofang which way should I go?

A: Simply remove the warnings (and explain in source code comments instead)?

B: Append the "For best results" sentences?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now thinking based on this discussion that as a user of this I would 100% not care about this warning if my software is working. Why would I want to be warned to upgrade every time I run my program? We should avoid pestering our users with warnings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way I find useful to frame the utility of warnings is: if I were in a production on call setting and I saw this warning, would I ignore it, or would I do something?

In that setting, there's tons of information that you have to use to act (usually quickly) and warnings that don't indicate "something bad is about to happen" are an enormous nuisance.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

I want to defer to @leofang for this, because he has significantly more relevant background.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest difference is that the driver APIs do not support minor version compatibility or link LTO IRs, unlike nvJitLink which should be the majority of the use cases. It is useful to warn as a hint for debugging potential errors that are raised in later operations due to driver limitations, because the driver error msg is not super clear.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I worked "do not support minor version compatibility or linking LTO IRs" into the warning messages: commit 7e9fce4

I added two more commits, the first tiny one (4724769) to make testing easier, then the new tests, almost completely LLM generated (6630d8e).

problem + " the culink APIs will be used instead.",
stacklevel=2,
category=RuntimeWarning,
)
_driver = driver
return True


def _lazy_init():
Expand Down
Loading