Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding multiple object annotations for multiple frames leads OutOfMemory error. #345

Closed
mertikinci opened this issue Oct 1, 2024 · 6 comments

Comments

@mertikinci
Copy link

mertikinci commented Oct 1, 2024

I would like to segment multiple object within a 1-2 min short video clips. But sometimes there are other object entering the scene and I want to segment them also. I have tried adding all annotations in every 30 frame and try to segment afterwards but it raises a cuda out of memory error like below.

segment-anything-2/sam2/modeling/sam/transformer.py", line 355, in forward out = F.scaled_dot_product_attention(q, k, v, dropout_p=dropout_p) torch.cuda.OutOfMemoryError: CUDA out of memory.

I have tried to reset_inference, increase the frame interval for updating the annotations, increase the cuda memory allocation but none of them works.

Is there anyway to free some of the memory while in propagate_in_video step. Or why it raises a memory error ? if I want to give annotations for multiple frames, can I somehow prevent this cuda memory error from happening ?

@heyoeyo
Copy link

heyoeyo commented Oct 1, 2024

One simple change you can try is setting the offload_video_to_cpu option to True when calling the init_state(...) function. That should move the video frames out of VRAM, which might give more headroom for adding more tracked objects (though it will take more regular RAM to support this).

If that's still not enough, there are some slightly more involved changes to the code itself that can reduce memory usage further (see issue #288). You can also try running at a lower resolution (see #257) though this will effect masking accuracy.

@mertikinci
Copy link
Author

thanks for the help! One more question though, when I decrease the number of given annotation frames, gpu memory usage is increasing. If I add bounding box to every frame then it is slightly using little memory. Do you know why is that ?

@mertikinci mertikinci reopened this Oct 9, 2024
@heyoeyo
Copy link

heyoeyo commented Oct 9, 2024

The way the masks are generated when giving prompts vs. having the model auto-track/segment is different, and different data gets stored. It may be that more data gets computed/stored when letting the model auto-segment frames (e.g. since the memory encoding/attention components need to run) that isn't happening when you provide prompts. So that might be why there is less memory used when providing more annotated frames, although it should be a pretty small amount.

@mertikinci
Copy link
Author

mertikinci commented Oct 10, 2024

Ok I see but as long as I understand num_maskmem=7 variable in sam2_base.py stands for how many frames am I intrested while predicting the current frame in propagate_in_video method in sam2_video_predictor.py . Since I am only using 7 frame at a time, why not to free memory allocation for frame 1 while I am on frame 8 ?

Is there any specific reason for keeping used frames features that I cannot get or since this is not crucial is that not implemented yet ?

@heyoeyo
Copy link

heyoeyo commented Oct 10, 2024

why not to free memory allocation for frame 1 while I am on frame 8 ?

Ya the implementation is very memory hungry! Probably it's done that way to make it more responsive (i.e. so when jumping back-and-forth between frames, less stuff needs to be re-computed) and/or it's a leftover side-effect of training the model (which maybe requires keeping old frames for updating the weights).

It is possible to delete the cached data as it's running with some more code modifications (see issue #196), though this will cause problems if it's being used interactively (i.e. trying to re-run old frames may cause an error).

@mertikinci
Copy link
Author

Just in case there may some other wonderers :) the memory usage during propagation is remains still, so I believe it is not open for any dynamic modifications to decrease the gpu memory usage cause it is doing it as efficient as it can right now.

Although max_cond_frames_in_attn in sam2_base.py helped me a lot for reducing the memory usage, if you are annotating multiple times for a video you can decrease the number of this variable and reduce the memory usage by increasing locality a little to not get OOM error maybe.

@mertikinci mertikinci changed the title Adding multiple object annotations for multiple frames. Adding multiple object annotations for multiple frames leads OOM error. Oct 10, 2024
@mertikinci mertikinci changed the title Adding multiple object annotations for multiple frames leads OOM error. Adding multiple object annotations for multiple frames leads OutOfMemory error. Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants