Skip to content

Training: CUDA: Out of Memory Optimizations #4

Open
@raks097

Description

@raks097

Hi,
A wonderful paper and thanks for providing the implementation, so that one could reproduce the results.

I have tried training the privileged agent using the script, as mentioned in the README
python train_birdview.py --dataset_dir=../data/sample --log_dir=../logs/sample

I get a Runtime Error : Tried to allocate 144.00 MiB (GPU 0; 10.73 GiB total capacity; 9.77 GiB already allocated; 74.62 MiB free; 69.10 MiB cached). followed by a ConnectionResetError

I tried tracing back the error using nvidia-smi and found that the memory usage quickly builds up ( reaching the maximum) before the training begins.

Any leads and suggestions are much appreciated.
Thanks

Attaching the full stack trace for further reference
stack

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions