An implementation of WebSAM-Adapter from Ren et al.
The adapter consists of 3 parts: Edge-Component Tuning, Patch-Embedding Tuning, and k adapter units. The EC and PE Tunes are learned linear layers applied to the Sobel-Filtered input image and the encoded patches respectively, with the i-th adapter unit using an MLP to feed the results to the i-th block of SAM's image encoder.
We train the model on Webis-WebSeg-20, a dataset of 8,490 web page images with 42,450 ground truth segmentations.