Manual Template question #248

Chuck1823 · 2023-03-10T07:24:40Z

Chuck1823
Mar 10, 2023

Hi! I'm trying to use a generative model from huggingface: ealx/gpt-2-pubmed-medium. It has an input token limit of 1024 and I can't seem to make it work for my dataset. I am using a ManualTemplate object as such:

template_text = (
'Question: {"placeholder":"text_a", "shortenable": "True"} Answer: {"mask"}'
)
template = ManualTemplate(tokenizer=tokenizer, text=template_text)

My tokenizer is defined as such:

wrapped_tokenizer = WrapperClass(
max_seq_length=1024,
decoder_max_length=1024,
tokenizer=tokenizer,
truncate_method="tail"
)

and my PromptDataLoader like this:

    train_dataloader = PromptDataLoader(
        dataset=data_dict["train"],
        template=template,
        tokenizer=tokenizer,
        tokenizer_wrapper_class=WrapperClass,
        max_seq_length=1024,
        decoder_max_length=1024,
        batch_size=4,
        shuffle=True,
        teacher_forcing=True,
        predict_eos_token=predict_eos_token,
        truncate_method="tail"

)

Took me a while to realize that the {"mask"} inside the template does not get truncated. Is it possible to set "shortenable" = "True" for the mask as well? If so, how? My dataset's examples are long and they have a Question: <> and Answer: <>. I want to mask the entire answer to have the model generate one that will hopefully be good. Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manual Template question #248

{{title}}

Replies: 0 comments

Select a reply

Manual Template question #248

Chuck1823 Mar 10, 2023

Replies: 0 comments

Chuck1823
Mar 10, 2023