Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running on Apple M1/M2/M3 chips #105

Open
alisonpeard opened this issue Oct 1, 2024 · 0 comments
Open

Running on Apple M1/M2/M3 chips #105

alisonpeard opened this issue Oct 1, 2024 · 0 comments

Comments

@alisonpeard
Copy link

Hi,
I'm trying to run the PyTorch training implementation on an Apple M2 chip with MPS. I can run StyleGAN-ADA image generation following these steps but when I try to train DiffAugment I get this error:

/AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArrayConvolutionA14.mm:3237: failed assertion `destination datatype must be fp32'

My steps so far:

  1. Clone the repo and cd data-efficient-gans/DiffAugment-stylegan2-pytorch
  2. conda create -n DiffAug python=3.9
  3. conda activate DiffAug
  4. conda install pytorch torchvision torchaudio -c pytorch
  5. pip install click requests tqdm pyspng ninja imageio-ffmpeg==0.4.3
  6. pip install Pillow psutil scipy
  7. Following the advice here,
    • I replace all instances of torch.device('cuda') with torch.device('mps')
    • I replace random array generation with random_array = np.random.RandomState(seed).randn(1, G.z_dim).astype(np.float32) in generate.py as described.
    • In training_loop.py I replace instances of torch.cuda.Event(enable_timing=True) with time.perf_counter()
    • I remove torch.backends.cuda.matmul.allow_tf32 = allow_tf32, torch.backends.cudnn.allow_tf32 = allow_tf32, torch.cuda.reset_peak_memory_stats(), and all_gen_c = torch.from_numpy(np.stack(all_gen_c)).pin_memory().to(device) as they have no MPS equivalent.

At this point I can generate images from the pretrained model, e.g.,

python generate.py --outdir=out --trunc=1 --seeds=85,265,297,849 \
    --network=https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metfaces.pkl

but training aborts with the message below:

python train.py --outdir=../training-runs --data=../datasets/100-shot-obama.zip --gpus=1 --kimg 1
# /AppleInternal/Library/BuildRoots/20d6c351-ee94-11ec-bcaf-#7247572f23b4/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArrayConvolutionA14.mm:3237: failed assertion `destination datatype must be fp32'

Using pdb I can trace the error from .../DiffAugment-stylegan2-pytorch/training/loss.py(80)accumulate_gradients() -> loss_Gmain.mean().mul(gain).backward():80
totorch/autograd/graph.py(769)_engine_run_backward() -> return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass but I'm not really able to figure out what's going on. I have checked all tensors in the training loop are float-32 type.

Any suggestions would be appreciated! I don't have access to NVIDIA GPUs at the moment and the Colab also seems to be outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant