PRM "True" probability #3

rawsh · 2024-09-20T04:26:39Z

Curious why both the PRMs take the softmax probability of the True token?

class Mistral_PRM(nn.Module):
    def __init__(self, base):
        super(Mistral_PRM, self).__init__()
        self.base_model = base

    def forward(self, input_ids, attention_mask):
        outputs = self.base_model(input_ids=input_ids, attention_mask=attention_mask).logits
        probs = torch.softmax(outputs, dim=-1)
        output = probs[:, -1, 7081]  # n*1 tensor, 7081 is the index of token 'True'
        return output

The text was updated successfully, but these errors were encountered:

zhoubiansining · 2024-12-25T05:32:51Z

For this PRM implementation, we attempt to use the softmax probability of a single token to represent the reward ranged in [0,1]. The choice of this single token is not that important, and 'True' is a relatively natural setting. Besides, this implementation of PRM is only used for comparison in the research process, and it's not a necessary component of our approach.

zhangdan0602 added the about PRM label Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PRM "True" probability #3

PRM "True" probability #3

rawsh commented Sep 20, 2024

zhoubiansining commented Dec 25, 2024

PRM "True" probability #3

PRM "True" probability #3

Comments

rawsh commented Sep 20, 2024

zhoubiansining commented Dec 25, 2024