Out of memory in evaluation after train epochs #18

caide199212 · 2023-02-22T08:31:07Z

When reproducing the REDS datasets, I reduced the num_input_frames=20 to fit the GPU memory, so the training is fine, 8G is used for every 16G GPU.

But when it's evaluating, the GPU consumed by the training iteration seems not released. So the evaluation encounters the OOM as
RuntimeError: CUDA out of memory. Tried to allocate 436.00 MiB (GPU 1; 15.90 GiB total capacity; 10.98 GiB already allocated; 189.81 MiB free; 14.78 GiB reserved in total by PyTorch)

The config file for the data is as below:
data = dict( workers_per_gpu=4, train_dataloader=dict(samples_per_gpu=1, drop_last=True), test_dataloader=dict(samples_per_gpu=1, workers_per_gpu=1),

BTW, similar problem was reported in mmdet but no feasible solutions was found.

The text was updated successfully, but these errors were encountered:

ericzw · 2023-02-23T09:04:09Z

I don't know how to solve this problem.
Maybe you can skip the evaluation during the training process.

caide199212 · 2023-02-27T01:44:11Z

Good idea, thanks.

sunyclj · 2023-06-02T05:37:05Z

I don't know how to solve this problem.
Maybe you can skip the evaluation during the training process.

when setting code to evaluation = dict(interval=1000000, save_image=False, gpu_collect=True)，the same error occurs.

caide199212 closed this as completed Feb 27, 2023

caide199212 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of memory in evaluation after train epochs #18

Out of memory in evaluation after train epochs #18

caide199212 commented Feb 22, 2023

ericzw commented Feb 23, 2023

caide199212 commented Feb 27, 2023

sunyclj commented Jun 2, 2023

Out of memory in evaluation after train epochs #18

Out of memory in evaluation after train epochs #18

Comments

caide199212 commented Feb 22, 2023

ericzw commented Feb 23, 2023

caide199212 commented Feb 27, 2023

sunyclj commented Jun 2, 2023