About loss function #5

wudongming97 · 2022-06-28T02:45:09Z

Hi, I found that the loss used in this repo is a cross-entropy loss between prediction and mask.

loss = F.binary_cross_entropy_with_logits(pred, mask)

But the loss mentioned in the paper is a contrastive loss between visual and textual features.

The text was updated successfully, but these errors were encountered:

Deepayan137 · 2022-07-07T07:45:23Z

I have the same query. Can the authors please clarify?

Deepayan137 · 2022-07-09T04:55:18Z

Hello! I wrote the contrastive learning part by following the instructions in the paper. However, when training the model only with the contrastive loss, the training IOU doesn't seem to improve. Below, I am attaching the code snippet and the training IOU and precision curves. The training is done only for 1 epoch. The brown plots are for cross-entropy loss while the blue plots are for contrastive loss. I would be grateful if you could let me know what I am doing wrong and also if the contrastive loss is supposed to be used in addition to cross-entropy loss.
Thanks

def forward(self, x, word, mask):
      x = self.vis(x)
      B, C, H, W = x.size()
      word = self.txt(word)
      x = x.permute(0, 2, 3, 1)
      out = torch.einsum('nhwc,nc->nhw', x, word).unsqueeze(1)
      out = torch.sigmoid(out) #sigmoid of zt dot zv
      loss = torch.zeros((H, W)).cuda()
      pos_count, neg_count = 0, 0
      for i in range(word.size(0)):
          zt = word[i]
          zt = zt.unsqueeze(0)
          for j in range(x.size(0)):
              zv = x[j]
              zv = zv.reshape(self.in_dim, -1)
              prod = torch.mm(zt, zv).squeeze()
              prod = prod.reshape(H, W)
              if i == j:
                  pos = - torch.log(F.sigmoid(prod))
                  loss += pos
                  pos_count += 1
              else:
                  neg = - torch.log(1 - F.sigmoid(prod))
                  loss += neg
                  neg_count += 1
      total = pos_count + neg_count
      loss = torch.mean(loss)
      if out.shape[-2:] != mask.shape[-2:]:
          mask = F.interpolate(mask, out.shape[-2:],
              mode='nearest').detach()
      return out, loss/total, mask

DerrickWang005 · 2022-07-09T06:03:09Z

please follow our implementation.

CRIS.pytorch/model/layers.py

Lines 47 to 84 in 0df39f0

    
           class Projector(nn.Module): 
        
               def __init__(self, word_dim=1024, in_dim=256, kernel_size=3): 
        
                   super().__init__() 
        
                   self.in_dim = in_dim 
        
                   self.kernel_size = kernel_size 
        
                   # visual projector 
        
                   self.vis = nn.Sequential(  # os16 -> os4 
        
                       nn.Upsample(scale_factor=2, mode='bilinear'), 
        
                       conv_layer(in_dim * 2, in_dim * 2, 3, padding=1), 
        
                       nn.Upsample(scale_factor=2, mode='bilinear'), 
        
                       conv_layer(in_dim * 2, in_dim, 3, padding=1), 
        
                       nn.Conv2d(in_dim, in_dim, 1)) 
        
                   # textual projector 
        
                   out_dim = 1 * in_dim * kernel_size * kernel_size + 1 
        
                   self.txt = nn.Linear(word_dim, out_dim) 
        
               def forward(self, x, word): 
        
                   ''' 
        
                       x: b, 512, 26, 26 
        
                       word: b, 512 
        
                   ''' 
        
                   x = self.vis(x) 
        
                   B, C, H, W = x.size() 
        
                   # 1, b*256, 104, 104 
        
                   x = x.reshape(1, B * C, H, W) 
        
                   # txt: b, (256*3*3 + 1) -> b, 256, 3, 3 / b 
        
                   word = self.txt(word) 
        
                   weight, bias = word[:, :-1], word[:, -1] 
        
                   weight = weight.reshape(B, C, self.kernel_size, self.kernel_size) 
        
                   # Conv2d - 1, b*256, 104, 104 -> 1, b, 104, 104 
        
                   out = F.conv2d(x, 
        
                                  weight, 
        
                                  padding=self.kernel_size // 2, 
        
                                  groups=weight.size(0), 
        
                                  bias=bias) 
        
                   out = out.transpose(0, 1) 
        
                   # b, 1, 104, 104 
        
                   return out

Deepayan137 · 2022-07-09T06:11:42Z

Hello Derrick,

I had seen this implementation. In your paper, you have mentioned equations 9 and 10 as the contrastive loss between pixel embeddings and the text features. I am not able to understand, how it is taken care of in your above code snippet?

tiger990111 · 2022-10-25T19:05:10Z

I have the same query. Can the authors please clarify?

FabianRitter · 2022-11-16T11:16:42Z

No follow up? looks like supervised learning on the code. I assume something is missing in the code.

Starboy-at-earth · 2022-12-03T05:20:54Z

@DerrickWang005 Could you please realse the code snippet of contrastive learning loss?

clownrat6 · 2022-12-10T08:01:20Z

Actually, the implementation is in line with the description of the paper. However, this is actually not the standard contrastive learning.

Fake10086 · 2023-04-26T05:33:06Z

you may take a deeper look at codes mentioned by the author above, and you'll find that conv2d actually acts like element wise product between text and image which can be considered as equation 9&10.

lyu-yx · 2023-10-19T03:50:44Z

I have the same question, could the authors release the latest version of code? @DerrickWang005

DerrickWang005 · 2023-10-19T05:39:21Z

I think this article can answer your question to some extent. @lyu-yx
https://arxiv.org/pdf/2303.15343.pdf

ccccchenllll · 2023-11-22T13:20:33Z

I have the same question. I couldn't find the code about contrastive loss.

Shaosifan · 2024-12-13T20:03:15Z

I have the same question. I couldn't find the code about contrastive loss.

Me too...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About loss function #5

About loss function #5

wudongming97 commented Jun 28, 2022

Deepayan137 commented Jul 7, 2022

Deepayan137 commented Jul 9, 2022 •

edited

Loading

DerrickWang005 commented Jul 9, 2022

Deepayan137 commented Jul 9, 2022

tiger990111 commented Oct 25, 2022

FabianRitter commented Nov 16, 2022

Starboy-at-earth commented Dec 3, 2022

clownrat6 commented Dec 10, 2022

Fake10086 commented Apr 26, 2023

lyu-yx commented Oct 19, 2023 •

edited

Loading

DerrickWang005 commented Oct 19, 2023

ccccchenllll commented Nov 22, 2023

Shaosifan commented Dec 13, 2024

About loss function #5

About loss function #5

Comments

wudongming97 commented Jun 28, 2022

Deepayan137 commented Jul 7, 2022

Deepayan137 commented Jul 9, 2022 • edited Loading

DerrickWang005 commented Jul 9, 2022

Deepayan137 commented Jul 9, 2022

tiger990111 commented Oct 25, 2022

FabianRitter commented Nov 16, 2022

Starboy-at-earth commented Dec 3, 2022

clownrat6 commented Dec 10, 2022

Fake10086 commented Apr 26, 2023

lyu-yx commented Oct 19, 2023 • edited Loading

DerrickWang005 commented Oct 19, 2023

ccccchenllll commented Nov 22, 2023

Shaosifan commented Dec 13, 2024

Deepayan137 commented Jul 9, 2022 •

edited

Loading

lyu-yx commented Oct 19, 2023 •

edited

Loading