We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
下面这个代码取自model/modeling_bert.py 按这个代码的意思,我理解应该是在第二个子层(也就是在Attention子层出来以后),进行如下计算 BertIntermediate 1 全连接nn.Linear 2 激活函数 BertOutput 1 全连接nn.Linear 2 残差 3 dropout
这样就有两个nn.Linear了。这和论文上提到的模型不一致。我理解应该是只有一个nn.Linear
class BertIntermediate(nn.Module): def __init__(self, config): super(BertIntermediate, self).__init__() self.dense = nn.Linear(config.hidden_size, config.intermediate_size) if isinstance(config.hidden_act, str) or (sys.version_info[0] == 2 and isinstance(config.hidden_act, unicode)): self.intermediate_act_fn = ACT2FN[config.hidden_act] else: self.intermediate_act_fn = config.hidden_act def forward(self, hidden_states): hidden_states = self.dense(hidden_states) hidden_states = self.intermediate_act_fn(hidden_states) return hidden_states class BertOutput(nn.Module): def __init__(self, config): super(BertOutput, self).__init__() self.dense = nn.Linear(config.intermediate_size, config.hidden_size) self.LayerNorm = BertLayerNorm(config.hidden_size, eps=config.layer_norm_eps) self.dropout = nn.Dropout(config.hidden_dropout_prob) def forward(self, hidden_states, input_tensor): hidden_states = self.dense(hidden_states) hidden_states = self.dropout(hidden_states) hidden_states = self.LayerNorm(hidden_states + input_tensor) return hidden_states class BertLayer(nn.Module): def __init__(self, config): super(BertLayer, self).__init__() self.attention = BertAttention(config) self.intermediate = BertIntermediate(config) self.output = BertOutput(config) def forward(self, hidden_states, attention_mask=None, head_mask=None): attention_outputs = self.attention(hidden_states, attention_mask, head_mask) attention_output = attention_outputs[0] intermediate_output = self.intermediate(attention_output) layer_output = self.output(intermediate_output, attention_output) outputs = (layer_output,) + attention_outputs[1:] # add attentions if we output them return outputs
The text was updated successfully, but these errors were encountered:
No branches or pull requests
下面这个代码取自model/modeling_bert.py
按这个代码的意思,我理解应该是在第二个子层(也就是在Attention子层出来以后),进行如下计算
BertIntermediate
1 全连接nn.Linear
2 激活函数
BertOutput
1 全连接nn.Linear
2 残差
3 dropout
这样就有两个nn.Linear了。这和论文上提到的模型不一致。我理解应该是只有一个nn.Linear
The text was updated successfully, but these errors were encountered: