You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I'm trying to use a generative model from huggingface: ealx/gpt-2-pubmed-medium. It has an input token limit of 1024 and I can't seem to make it work for my dataset. I am using a ManualTemplate object as such:
Took me a while to realize that the {"mask"} inside the template does not get truncated. Is it possible to set "shortenable" = "True" for the mask as well? If so, how? My dataset's examples are long and they have a Question: <> and Answer: <>. I want to mask the entire answer to have the model generate one that will hopefully be good. Thank you!
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi! I'm trying to use a generative model from huggingface: ealx/gpt-2-pubmed-medium. It has an input token limit of 1024 and I can't seem to make it work for my dataset. I am using a ManualTemplate object as such:
template_text = (
'Question: {"placeholder":"text_a", "shortenable": "True"} Answer: {"mask"}'
)
template = ManualTemplate(tokenizer=tokenizer, text=template_text)
My tokenizer is defined as such:
wrapped_tokenizer = WrapperClass(
max_seq_length=1024,
decoder_max_length=1024,
tokenizer=tokenizer,
truncate_method="tail"
)
and my PromptDataLoader like this:
)
Took me a while to realize that the {"mask"} inside the template does not get truncated. Is it possible to set "shortenable" = "True" for the mask as well? If so, how? My dataset's examples are long and they have a Question: <> and Answer: <>. I want to mask the entire answer to have the model generate one that will hopefully be good. Thank you!
Beta Was this translation helpful? Give feedback.
All reactions