-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Stop breaking backwards compatibility or at least warn #1386
Comments
Hello @danielzgtg, Thank you for flagging the need for clearer error messages with ROCm and library version mismatches. Your feedback is vital in refining our library's usability. Our team will investigate and refine the error notifications to offer guidance for resolving library version disparities. Additionally, we'll clarify any backward compatibility restrictions to assist users in navigating version conflicts more effectively. We'll keep you updated on our progress as we work to enhance the error messages. Your patience and any additional insights during this process are immensely valuable. Wasiq |
@danielzgtg , |
That explains it. Spent the last week troubleshooting why Rocm suddenly stopped working, turns out to be a backwards compatibility issue. Quite frustrating. |
@danielzgtg and @Trat8547 , Having said that, In general when a major version changes ( we follow semantic versioning) API breaking is expected, and upon reviewing the Release notes we see breaking changes in the HIP, and appropriate notification is published here. Those changes could have contributed to the issue reported here. |
Here: TensorLibrary.txt. I think the Your linked https://rocm.docs.amd.com/en/latest/about/release-notes.html#hip appears to only list API breaking changes. What my issue is about is ABI breaking changes. The problem is that the pytorch ROCm is bundling This is why rebuilding pytorch was a workaround for this problem. But I would rather not wait for the long pytorch compile every time, and I also don't want the prepackaged pytorch builds to contain the |
Describe the bug
rocBLAS 5.6 fails with a confusing error message when mixed with ROCm 6.0 libraries or TensileLibrary.
To Reproduce
Precise version of rocBLAS installed or rocBLAS commit hash if building from source.
Steps to reproduce the behavior:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.6
Expected behavior
I should not have to spend an hour debugging this, and only find the problem using gdb. rocBLAS 5.6 should either succeed or give a clear error message when loading the TensileLibrary from rocBLAS 6.0 or when loaded while mixed in with ROCm shared libraries.
Log-files
Environment
environment.txt
Workaround
Recompile pytorch manually. This will ensure that it loads shared libraries from
/opt
instead ofvenv
.The text was updated successfully, but these errors were encountered: