Out of CUDA memory when training #14

antithing · 2023-06-28T11:50:30Z

On a single RTX 3090. Is there a param I can adjust to make this work?

Thanks

antithing · 2023-06-28T16:34:44Z

I am also trying to train a model at 1920x1080, and when I change the batch size and resolution:

{
    "seed": 2020,
    "save_dir": "release_model/",
    "data_loader": {
        "name": "davis",
        "data_root": "datasets/",
        "w": 1920,
        "h": 1080,
        "sample_length": 10
    },
    "losses": {
        "hole_weight": 1,
        "valid_weight": 1,
        "adversarial_weight": 0.01,
        "GAN_LOSS": "hinge"
    },
    "trainer": {
        "type": "Adam",
        "beta1": 0,
        "beta2": 0.99,
        "lr": 1e-4,
        "d2glr": 1, 
        "batch_size": 4,
        "num_workers": 1,
        "verbosity": 2,
        "log_step": 100,
        "save_freq": 1e4,
        "valid_freq": 1e4, 
        "iterations": 50e4,
        "niter": 30e4,
        "niter_steady": 30e4
    }
}

I see this error:

inpainting\STTN-master\STTN-master\model\sttn.py", line 188, in forward
mm = m.view(b, t, 1, out_h, height, out_w, width)
RuntimeError: shape '[4, 10, 1, 60, 60, 108, 108]' is invalid for input of size 259200

alex-flwls · 2023-07-13T08:38:46Z

Try a lower batch size. Set it to 1 and see what happens. Also would it be possible to run the training at half or a quarter of your resolution and then upscale? Transformers are notorious for scaling quadratically with regard to their input so HD input size with temporal attention is perhaps unlikely to fit in 24GB vRAM (in my opinion, I could be wrong).

alex-flwls · 2023-07-19T12:41:26Z

I got this running at HD resolution (1920x1080) on an Nvidia A10-G (24GB vRAM). This is for inference, I haven't tried training a new model yet.

Here's the patch sizes I used (in model/sttn.py):
patchsize = [(480,270), (160, 90), (32, 18), (16, 9)]

And here's the hyperparameters from test.py:
w, h = 1920, 1080 ref_length = 10 neighbor_stride = 3 default_fps = 24

The results aren't great though. I think this is possibly because limiting the number of neighbour and reference frames will inhibit the ability of the model to infer inpainted regions. Also changing the patch sizes from training is probably not helping.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of CUDA memory when training #14

Out of CUDA memory when training #14

antithing commented Jun 28, 2023

antithing commented Jun 28, 2023

alex-flwls commented Jul 13, 2023

alex-flwls commented Jul 19, 2023

Out of CUDA memory when training #14

Out of CUDA memory when training #14

Comments

antithing commented Jun 28, 2023

antithing commented Jun 28, 2023

alex-flwls commented Jul 13, 2023

alex-flwls commented Jul 19, 2023