CPU support with CUDA build #114

sirmarcel · 2024-04-29T13:09:25Z

Hi! I've built sphericart with CUDA support (with pip install .[torch] in a clone of this repo). Now, whenever I run tests on a compute node without GPU, the CPU fallback appears to not be working, and the program crashes with

CUDA error at /home/langer/software/sphericart/sphericart-torch/sphericart/src/sphericart_cuda.cu:42 - no CUDA-capable device is detected

Apparently, sphericart is trying to go through the GPU code path despite all involved tensors being on the cpu device. This in in contrast with the docs, which state "Depending on the device the tensor is stored on, [...], the calculations will be performed [...] using the CPU or CUDA implementation."

It'd be nice to fix this, as this makes it rather annoying to debug when a GPU is not readily available.

Cheers!

The text was updated successfully, but these errors were encountered:

frostedoyster · 2024-04-29T14:27:12Z

I checked and I managed to reproduce the bug

nickjbrowning · 2024-04-29T17:19:37Z

I’ll try to get this fixed within the next few hours, will keep you updated.

nickjbrowning · 2024-04-29T21:17:19Z

@sirmarcel Could you check that this PR resolves this particular issue for you?

#118

sirmarcel · 2024-04-29T21:24:35Z

Yep, I will check tomorrow. Thanks for the fast turnaround!

sirmarcel · 2024-04-30T14:32:33Z

Okay, this is a weird one. If I directly import the calculator from sphericart, this fix works.

In my own class, which wraps it, I get strange behaviour. My tests fail with stuff like:

double free or corruption (out)
Aborted (core dumped)

Then I tried to write a minimal example:

import torch
from torch.nn import Module

from sphericart.torch import SphericalHarmonics as Sph

class SphericalHarmonics(Module):
 
    def __init__(self, max_angular):
        """Initialise SphericalHarmonics.

        Args:
            max_angular (int): The maximum spherical harmonic order ``l`` to compute.
        """
        super().__init__()

        self.max_angular = max_angular

        self.m_per_l = [2 * l + 1 for l in range(max_angular + 1)]

        self.sph = Sph(l_max=self.max_angular, normalized=True)


    def forward(self, R):
        """Compute spherical harmonics for input vectors.

        Args:
            R (Tensor): Input vectors of shape ``[pair, 3]``.

        Returns:
            Spherical harmonics of shape
                ``[pair, sum([2*l+1 for i in range(max_angular+1)])]``.

        """
        # R: [pair, 3 (x,y,z)]
        # todo: make special case for "mps" backend (need to move to `cpu`)
        return self.sph.compute(R)  # -> [pair, (l=0 m=0) (l=1 m=-1) (l=1 m=0) ...]

og = Sph(3)

sph = SphericalHarmonics(3)

a = torch.rand(10, 3)

print(f"og out: {og.compute(a)}")
print(f"wrapped out: {sph(a)}")

This, fascinatingly, prints the correct output but then fails:

og out: ... (correct-looking)
wrapped out: ... (correct-looking)
CUDA error at /home/langer/software/sphericart/sphericart-torch/sphericart/src/sphericart_cuda.cu:84 - no CUDA-capable device is detected

I'd guess something weird happens if you store a reference to the calculator inside some other python construct? No idea.

EDIT:

This also happens on GPU (i.e. moving a to cuda above). I get errors looking like:

free(): double free detected in tcache 2
Aborted (core dumped)

nickjbrowning · 2024-05-02T12:48:54Z

OK I'll take another look at this tomorrow and do a deeper dive on whats happening. Thanks for the detailed bug report!

sirmarcel · 2024-05-02T15:46:07Z

Thanks @nickjbrowning !

nickjbrowning · 2024-05-03T10:46:18Z

Hey @sirmarcel

I've just made another commit to #118 that should fix this issue. There was some constructor/destructor juggling that was happening previously since we instantiate the spherical harmonics (cuda) class with a default constructor, and the destructor was creating some oddities.

I've now changed this to be pointer-based which allows much cleaner memory management and should fix the double-free that was occurring.

I've ran your code sample and it works both for export CUDA_VISIBLE_DEVICES=0 and export CUDA_VISIBLE_DEVICES=.

sirmarcel · 2024-05-07T17:42:47Z

Hi @nickjbrowning , sorry for the delay -- I've got testing this on the todo list, but so far haven't had any luck getting an interactive node on our GPU cluster. I'll update you as soon as I get it built + tested. Thanks so much!

sirmarcel · 2024-05-08T16:55:46Z

Hi -- built it and ran my test suite. Looks good. No crashes!

frostedoyster assigned nickjbrowning Apr 29, 2024

nickjbrowning closed this as completed Jun 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU support with CUDA build #114

CPU support with CUDA build #114

sirmarcel commented Apr 29, 2024

frostedoyster commented Apr 29, 2024

nickjbrowning commented Apr 29, 2024

nickjbrowning commented Apr 29, 2024 •

edited

Loading

sirmarcel commented Apr 29, 2024

sirmarcel commented Apr 30, 2024 •

edited

Loading

nickjbrowning commented May 2, 2024

sirmarcel commented May 2, 2024

nickjbrowning commented May 3, 2024 •

edited

Loading

sirmarcel commented May 7, 2024

sirmarcel commented May 8, 2024

CPU support with CUDA build #114

CPU support with CUDA build #114

Comments

sirmarcel commented Apr 29, 2024

frostedoyster commented Apr 29, 2024

nickjbrowning commented Apr 29, 2024

nickjbrowning commented Apr 29, 2024 • edited Loading

sirmarcel commented Apr 29, 2024

sirmarcel commented Apr 30, 2024 • edited Loading

nickjbrowning commented May 2, 2024

sirmarcel commented May 2, 2024

nickjbrowning commented May 3, 2024 • edited Loading

sirmarcel commented May 7, 2024

sirmarcel commented May 8, 2024

nickjbrowning commented Apr 29, 2024 •

edited

Loading

sirmarcel commented Apr 30, 2024 •

edited

Loading

nickjbrowning commented May 3, 2024 •

edited

Loading