Skip to content

Zihan-Liu-00/GroundingDINO-SAM2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text-prompted Video Segmentation via GroundingDINO and SAM2

How to run

You can run this program by running python main.py.

Before that, please visit GroundingDINO and SAM2 to prepare the environment according to their instructions. Also, please download the checkpoints for SAM2 and GroundingDINO separately. You should place the checkpoint of SAM2 under /checkpoints/sam2.1_hiera_large.pt and the checkpoints of GroundingDINO under /weights/groundingdino_swinb_cogcoor.pth.

In addition to that, you only need to install the environment required for the UI by pip install gradio, and downloading bert-base-uncased from huggingface to your root directory of this project will be recommended.

Tips

Avoid to upload a long video, because it will lead to a very long inference time. A video of 100 frames takes about 8 minutes. Basically, this is a tool for semantic labeling jobs.

Use keyword,keyword,... as your text prompt rather than a long sentence.

What special

The official project of Grounded-SAM-2 is a simple implementation of this project. In this project, we optimized in the following aspects:

  • GroundingDINO only involves a single keyword for each inference, significantly reducing missed detection
  • Searching grounded objects in all frames
  • Instances detected by GroundingDINO are searched across the entire video
  • Better mask post-processing

Contact

Experienced developers are welcome to collaborate with me on this project. If you are interested, please send an email to [email protected].

About

A text prompt video segmentation tool based on GroundingDINO and SAM2

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages