You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the verbalizer, if one label (e.g. Fruit) corresponds to multiple label words (e.g. apple, pear, watermelon), and the prompted sentence is :
This is a [MASK] comment: it tasted good.
the loss function in your current code will boost the probability of predicting all the three words {apple, watermelon, pear} at the [MASK] position. For a PLM, different contexts would results in different probabilities of the three words. For example, "red" appears more likely around "apple" than "pear". If you boost the probability of generating "pear" around "red", it might ruin the knowledge in the PLM to some extent, or on the contrary increases the difficulties during prompt-tuning.
In the verbalizer, if one label (e.g. Fruit) corresponds to multiple label words (e.g. apple, pear, watermelon), and the prompted sentence is :
This is a [MASK] comment: it tasted good.
the loss function in your current code will boost the probability of predicting all the three words {apple, watermelon, pear} at the [MASK] position. For a PLM, different contexts would results in different probabilities of the three words. For example, "red" appears more likely around "apple" than "pear". If you boost the probability of generating "pear" around "red", it might ruin the knowledge in the PLM to some extent, or on the contrary increases the difficulties during prompt-tuning.
Maybe you can visit this paper to design the loss? https://aclanthology.org/2022.acl-long.158.pdf
The text was updated successfully, but these errors were encountered: