BertLayer第二个子层为何有两个nn.Linear #67

aibeiyun · 2021-12-20T23:28:56Z

下面这个代码取自model/modeling_bert.py
按这个代码的意思，我理解应该是在第二个子层（也就是在Attention子层出来以后），进行如下计算
BertIntermediate
1 全连接nn.Linear
2 激活函数
BertOutput
1 全连接nn.Linear
2 残差
3 dropout

这样就有两个nn.Linear了。这和论文上提到的模型不一致。我理解应该是只有一个nn.Linear

class BertIntermediate(nn.Module):
    def __init__(self, config):
        super(BertIntermediate, self).__init__()
        self.dense = nn.Linear(config.hidden_size, config.intermediate_size)
        if isinstance(config.hidden_act, str) or (sys.version_info[0] == 2 and isinstance(config.hidden_act, unicode)):
            self.intermediate_act_fn = ACT2FN[config.hidden_act]
        else:
            self.intermediate_act_fn = config.hidden_act

    def forward(self, hidden_states):
        hidden_states = self.dense(hidden_states)
        hidden_states = self.intermediate_act_fn(hidden_states)
        return hidden_states


class BertOutput(nn.Module):
    def __init__(self, config):
        super(BertOutput, self).__init__()
        self.dense = nn.Linear(config.intermediate_size, config.hidden_size)
        self.LayerNorm = BertLayerNorm(config.hidden_size, eps=config.layer_norm_eps)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)

    def forward(self, hidden_states, input_tensor):
        hidden_states = self.dense(hidden_states)
        hidden_states = self.dropout(hidden_states)
        hidden_states = self.LayerNorm(hidden_states + input_tensor)
        return hidden_states


class BertLayer(nn.Module):
    def __init__(self, config):
        super(BertLayer, self).__init__()
        self.attention = BertAttention(config)
        self.intermediate = BertIntermediate(config)
        self.output = BertOutput(config)

    def forward(self, hidden_states, attention_mask=None, head_mask=None):
        attention_outputs = self.attention(hidden_states, attention_mask, head_mask)
        attention_output = attention_outputs[0]
        intermediate_output = self.intermediate(attention_output)
        layer_output = self.output(intermediate_output, attention_output)
        outputs = (layer_output,) + attention_outputs[1:]  # add attentions if we output them
        return outputs

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BertLayer第二个子层为何有两个nn.Linear #67

BertLayer第二个子层为何有两个nn.Linear #67

aibeiyun commented Dec 20, 2021 •

edited

Loading

BertLayer第二个子层为何有两个nn.Linear #67

BertLayer第二个子层为何有两个nn.Linear #67

Comments

aibeiyun commented Dec 20, 2021 • edited Loading

aibeiyun commented Dec 20, 2021 •

edited

Loading