Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] How can I use logger for CAGRA search to find number of hops? #428

Open
Lion815 opened this issue Oct 26, 2024 · 1 comment
Open
Labels
doc Improvements or additions to documentation question Further information is requested

Comments

@Lion815
Copy link

Lion815 commented Oct 26, 2024


Report needed documentation

Report needed documentation
I am trying to work on the pylibraft and cuVS package. Specifically, I want to do some research on the cagra algorithm. And I need to get some intermediate information like how many times of comparison was done during cagra search. So, I wonder if you have some embedded logger in the pylibraft package and cuVS package maybe I can take advantage of. Furthermore, I found that cagra is equipped with a update_dataset method in pylibraft, but not in cuVS, is there some same method in cuVS?

@Lion815 Lion815 added the doc Improvements or additions to documentation label Oct 26, 2024
@cjnolet cjnolet changed the title [DOC] [DOC] How can I use logger for CAGRA search to find number of hops? Oct 28, 2024
@cjnolet cjnolet added the question Further information is requested label Oct 28, 2024
@cjnolet
Copy link
Member

cjnolet commented Oct 28, 2024

Hi @Lion815 thanks for opening an issue here. We use spdlog, but we compile for a specific log level (and above), so anything at a lower level would require recompiling the codebase. I believe we compile for DEBUG and above at the moment and that should be able to be enabled from the command-line with an environment property. However, I don't believe we currently do any logging specifically for the number of hops used to search the CAGRA graph. I think this can also be challenging because of the parallelism involved, so logging this would likely slow down performance (since this information is only available in CUDA kernels and they would likely need to coordinate to compute the result and being able to print it from RAM memory). This is something you could likely instrument specifcially for your case, of course, but not something that exists today.

update_dataset() is exposed in the C++ APIs, mostly for internal use. Please create a feature request and we can expose it in the Python API as well, if that would be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Improvements or additions to documentation question Further information is requested
Projects
Status: No status
Development

No branches or pull requests

2 participants