-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU support with CUDA build #114
Comments
I checked and I managed to reproduce the bug |
I’ll try to get this fixed within the next few hours, will keep you updated. |
@sirmarcel Could you check that this PR resolves this particular issue for you? |
Yep, I will check tomorrow. Thanks for the fast turnaround! |
Okay, this is a weird one. If I directly import the calculator from In my own class, which wraps it, I get strange behaviour. My tests fail with stuff like:
Then I tried to write a minimal example: import torch
from torch.nn import Module
from sphericart.torch import SphericalHarmonics as Sph
class SphericalHarmonics(Module):
def __init__(self, max_angular):
"""Initialise SphericalHarmonics.
Args:
max_angular (int): The maximum spherical harmonic order ``l`` to compute.
"""
super().__init__()
self.max_angular = max_angular
self.m_per_l = [2 * l + 1 for l in range(max_angular + 1)]
self.sph = Sph(l_max=self.max_angular, normalized=True)
def forward(self, R):
"""Compute spherical harmonics for input vectors.
Args:
R (Tensor): Input vectors of shape ``[pair, 3]``.
Returns:
Spherical harmonics of shape
``[pair, sum([2*l+1 for i in range(max_angular+1)])]``.
"""
# R: [pair, 3 (x,y,z)]
# todo: make special case for "mps" backend (need to move to `cpu`)
return self.sph.compute(R) # -> [pair, (l=0 m=0) (l=1 m=-1) (l=1 m=0) ...]
og = Sph(3)
sph = SphericalHarmonics(3)
a = torch.rand(10, 3)
print(f"og out: {og.compute(a)}")
print(f"wrapped out: {sph(a)}") This, fascinatingly, prints the correct output but then fails:
I'd guess something weird happens if you store a reference to the calculator inside some other python construct? No idea. EDIT: This also happens on GPU (i.e. moving
|
OK I'll take another look at this tomorrow and do a deeper dive on whats happening. Thanks for the detailed bug report! |
Thanks @nickjbrowning ! |
Hey @sirmarcel I've just made another commit to #118 that should fix this issue. There was some constructor/destructor juggling that was happening previously since we instantiate the spherical harmonics (cuda) class with a default constructor, and the destructor was creating some oddities. I've now changed this to be pointer-based which allows much cleaner memory management and should fix the double-free that was occurring. I've ran your code sample and it works both for |
Hi @nickjbrowning , sorry for the delay -- I've got testing this on the todo list, but so far haven't had any luck getting an interactive node on our GPU cluster. I'll update you as soon as I get it built + tested. Thanks so much! |
Hi -- built it and ran my test suite. Looks good. No crashes! |
Hi! I've built
sphericart
with CUDA support (withpip install .[torch]
in a clone of this repo). Now, whenever I run tests on a compute node without GPU, the CPU fallback appears to not be working, and the program crashes withApparently,
sphericart
is trying to go through the GPU code path despite all involved tensors being on thecpu
device. This in in contrast with the docs, which state "Depending on the device the tensor is stored on, [...], the calculations will be performed [...] using the CPU or CUDA implementation."It'd be nice to fix this, as this makes it rather annoying to debug when a GPU is not readily available.
Cheers!
The text was updated successfully, but these errors were encountered: