-
Notifications
You must be signed in to change notification settings - Fork 176
Move pathfinder
to cuda-python top level
#723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…HOUT trying to make anything work.
… trying to make anything work.
…make anything work.
…ns.abc.Sequence (Python 3.9+)
…init__.py (to make the dynamic version work).
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
path_finder
to cuda-python top levelpathfinder
to cuda-python top level
def load_nvidia_dynamic_lib(libname: str) -> LoadedDL: | ||
"""Load a NVIDIA dynamic library by name. | ||
Args: | ||
libname: The name of the library to load (e.g. "cudart", "nvvm", etc.) | ||
Returns: | ||
A LoadedDL object containing the library handle and path | ||
Raises: | ||
RuntimeError: If the library cannot be found or loaded | ||
""" | ||
return _load_nvidia_dynamic_lib.load_lib(libname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could define this wrapper function in the _dynamic_libs
module and just import it here. This way your __init__.py
here wouldn't have a ton of function definitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is meant to be the public API in one view.
This way your
__init__.py
here wouldn't have a ton of function definitions.
I understood exactly that was your goal: a flat list of available APIs.
Note that I moved the docstring here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is meant to be the public API in one view.
Indeed, everything we define (or import) here that is not prefixed with _
constitutes the public API of the module cuda.path_finder
. My above suggestion is just that we import load_nvidia_dynamic_lib
rather than define it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would make the public API far less obvious. E.g. to see the docstring, they'd need to open a private file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's true. They could still do:
from cuda.path_finder import load_nvidia_dynamic_lib
help(load_nvidia_dynamic_lib)
Same as they would do now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'll only work interactively.
What I have now can be inspected in obvious ways directly in the sources, e.g. when looking at the sources on github. I can send URLs pointing to specific APIs in this init.py file.
Each function here will just be:
def function(...) -> ...:
"""docstring""
return call_into_private_code(...)
That's exactly the public API, with a one-line call that's easy to ignore.
What's the point of hiding that away, especially hiding away the docstring and the type hints?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'll only work interactively.
The docstrings and type hints would also get picked up:
- Sphinx (which we use to generate API docs
-pydoc
- the developers' IDEs/lsp
Those are primarily the ways consumers of a package interact with docstrings or type hints, rather than looking directly at the source.
What's the point of hiding that away, especially hiding away the docstring and the type hints?
It tightens the scope of __init__.py
, whose job is:
- to include any initialization code for the module
- to import stuff from submodules that the module wants to expose
- to define
__all__
for the module if needed
As examples, we can look at the __init__.py
from some other popular libraries:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It tightens the scope of init.py, whose job is
The most important job: Show the public API
You didn't answer why you want to hide that away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm advocating for keeping function and type definitions outside of __init__.py
. I don't think that "hides" anything from the user as we still import them here.
The main reason I'm advocating for that is because __init__.py
typically only defines functions and types needed for module initialization, and imports anything else. I think the examples I linked to above are a good demonstration of that.
Defining functions and types beyond that serves to clutter __init__.py
.
I would argue it makes the code base less navigable than more for people looking at the source. Do I expect the function load_nvidia_dynamic_libs
to be defined in a file called _dynamic_libs.py
, or a file called __init__.py
?
If this all seems nitpicky, I apologize. While I do have a strong opinion here, I don't mind at all if another reviewer (or you, as the author of this PR) made the final call about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cuda.core's __init__.py
is a good example for what Ashwin suggested. As a developer who occasionally peeks into someone else's __init__.py
, I certainly get confused why we are defining things in there directly, as opposed to properly organizing them in respective public/private modules, and just import them. What's imported are considered public APIs (including types). Does it make sense?
I'm a bit lost, to be honest, and I’d appreciate some clarification so I can move this forward in the direction you want. The goals that seemed most important to me originally (from a Slack post last Wednesday) were:
After Ashwin’s comment on Slack, I interpreted the request as: prioritize a flat public API, even if that means giving up on some of that isolation — especially when type hints are involved. Then Leo wrote:
That seems more aligned with what I had before the last commit (e.g., a file like I’m happy to implement what you both want here — I just need a bit more clarity. If we go back to what I proposed last week: from cuda.pathfinder import nvidia_dynamic_libs, nvidia_static_libs, nvidia_headers
nvidia_dynamic_libs.load("nvrtc")
nvidia_static_libs.find("cudart")
nvidia_headers.find_file("cuda.h") The file structure was:
I thought that structure balanced modularity and clarity, and users could still import just what they needed. But it sounds like we might instead prefer: from cuda.pathfinder import load_nvidia_dynamic_lib, find_nvidia_static_lib, find_nvidia_header_file That keeps the API flat, but does give up on the isolation goal unless we resort to function-local imports or other workarounds. Could you please help me understand: What would be the preferred file/module structure to go along with the flat API? Should I keep separate modules like |
At this point we need to follow up offline. I think there is a gap that we unfortunately did not really cover during the weekly meeting. I suggest we don't cancel the dev meeting tomorrow and use the time to finalize it. This PR also needs at least @kkraus14's approval for the change of license. |
/ok to test |
/ok to test |
pip install cuda_python*.whl | ||
pip install cuda_pathfinder*.whl cuda_python*.whl | ||
else | ||
pip install $(ls cuda_python*.whl)[all] | ||
pip install $(ls cuda_pathfinder*.whl) $(ls cuda_python*.whl)[all] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to the cuda-python step should be reverted now that you install cuda-pathfinder the first.
pip install (Get-ChildItem -Filter cuda_python*.whl).FullName | ||
$pathfinder = (Get-ChildItem -Filter cuda_pathfinder*.whl).FullName | ||
$umbrella = (Get-ChildItem -Filter cuda_python*.whl).FullName | ||
pip install $pathfinder $umbrella | ||
} else { | ||
pip install "$((Get-ChildItem -Filter cuda_python*.whl).FullName)[all]" | ||
$pathfinder = (Get-ChildItem -Filter cuda_pathfinder*.whl).FullName | ||
$umbrella = (Get-ChildItem -Filter cuda_python*.whl).FullName | ||
pip install "$pathfinder" "$umbrella[all]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to the cuda-python step should be reverted now that you install cuda-pathfinder the first.
… wheels are installed already.
/ok to test |
…ironment variable.
…t_work|see_what_works (/ci/tools/run-tests).
/ok to test |
@leofang Does this ring any bells?
|
Assuming the last commit (bf48142) works as hoped, the Windows |
This error happens usually because the exe or dll is not compatible with the platform, which is indeed the case based on the error log
We're trying to load the 32-bit cudart ( Another bug is: Both wheel and local CTK pipelines hit the same error message, meaning we also installed wheels to the local CTK pipelines. There is at lease one bug in the CI update that caused this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug No.1: we should remove 32-bit cudart DLLs from line 253-255.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: commit 135a37d
# Run only after cuda.bindings tests, so that all wheels are installed already. | ||
- name: Run cuda.pathfinder tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Bug" No.2: cuda.pathfinder tests probably should be run after cuda.bindings but before cuda.core, because cuda.core needs the cudart wheel as a test dependency (it's used by CuPy, not by cuda.core).
In the long run, we should really move cuda.pathfinder to its own test workflow. This kind of execution-order-dependent issues is not obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I made the change locally, but then backed it out again: After removing the 32-bit DLLs (135a37d), I'd expect the tests to work, and it tests a little bit more.
/ok to test |
|
…quires 64-bit Python. Currently running: 32-bit Python
…hfinder/pyproject.toml
…gain with all_must_work
/ok to test |
…HFINDER_TEST_LOAD_NVIDIA_DYNAMIC_LIB_STRICTNESS is defined (to resolve `unbound variable` error).
/ok to test |
/ok to test |
…dows (to see if that resolves the cusolver failures)
/ok to test |
Description
Closes #719
Note: The name is changed from
path_finder
topathfinder
. The word "pathfinder" is very common, and it avoids underscore/dash confusion between directory names and package names.Locally tested under Linux and Windows.
TODO:
.github/workflows
updates (cuda_pathfinder
needs to be pip-installed first)cuda_bindings
pyproject.toml
→ unconditionally depend oncuda_pathfinder
toolshed/
directory