You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 21, 2024. It is now read-only.
When the model was pretrained were there any special tokens utilized? Was token id 0 used as a BoS token, Pad Token, or was it not utilized at all?
I would like to experiment with this model for fine-tuning document classification tasks because of its ability to accept very long sequences. In other LMs, use of special tokens like BoS was helpful for certain document/sequence level fine tune tasks. Appreciate your work on this project!
The text was updated successfully, but these errors were encountered:
EOS was used during training. The EOS and PAD tokens being the same should not be an issue as you normally just add EOS tokens to the end of an example if you want to PAD it anyways.
I did not add a BOS token although I could try this with something like the Llama tokenizer in another run.
The 2.1B model will hopefully be out tomorrow.
Best,
Enrico
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The readme suggests use of GPT-neox-20b tokenizer. This tokenizer has a BoS and EoS token mapped to token id 0.
However when I look at the model implementation in PaLM-rlhf-pytorch, it looks like token id 0 is also used as a padding/mask value
When the model was pretrained were there any special tokens utilized? Was token id 0 used as a BoS token, Pad Token, or was it not utilized at all?
I would like to experiment with this model for fine-tuning document classification tasks because of its ability to accept very long sequences. In other LMs, use of special tokens like BoS was helpful for certain document/sequence level fine tune tasks. Appreciate your work on this project!
The text was updated successfully, but these errors were encountered: