You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using SAM2 to segments videos where typically there is only one person in the foreground. I'm quite often getting flawed masks where there are unsegmented patches like in the below image.
Does anyone with a deeper understanding of the algorithm and the implementation have any suggestions? I've tried using the max_hole_size parameter in SAM2ImagePredictor, but I run into the same issue as #243 . But even if it worked, it does not seem a great solution since there are cases where a small hole could be part of the actual background (depending on the person's pose)
Thanks,
The text was updated successfully, but these errors were encountered:
Here's a similar example that maybe gives a sense of how these issues appear:
The mask is mostly solid except when the prompt point lands on the cat's eyes or nose. From what I can tell, more generally, if the prompt lands on an area that would make sense as a smaller segmentation (i.e. when hovering the eyes it makes sense to segment them vs. the whole cat) then the larger mask can end up with these artifacts. For the person, it could be that the prompt lands on a logo on the shirt or shorts for example (or their eyes/nose/mouth of course).
The artifacts themselves seem to be due to the windowing within the image encoder. So one solution is to play with the window sizing (called window_spec in the configs) of the model, for example using a window size of 1 for the 2nd/4th stages seems to get rid of the artifacts (at least for the large model), though maybe at the cost of other errors. Here's an example with heavy artifacting that is mostly cleaned up just by adjusting the window sizes:
A more brute force solution would be to try adjusting the prompt slightly, to hopefully get it off of areas that cause these artifacts. If you need to automate the process, I think it might be possible to guess when a mask is patchy by checking the stability score, which should be lower when these artifacts appear.
Hi!
I'm using SAM2 to segments videos where typically there is only one person in the foreground. I'm quite often getting flawed masks where there are unsegmented patches like in the below image.
Does anyone with a deeper understanding of the algorithm and the implementation have any suggestions? I've tried using the
max_hole_size
parameter inSAM2ImagePredictor
, but I run into the same issue as #243 . But even if it worked, it does not seem a great solution since there are cases where a small hole could be part of the actual background (depending on the person's pose)Thanks,
The text was updated successfully, but these errors were encountered: