Trying to get .ipynb to run #3

Tylersuard · 2020-12-23T02:59:30Z

Hello! Thank you for posting. As per your blog, I opened up a Paperspace instance and cloned your repo, then ran:

git clone https://github.com/cdpierse/script_buddy_v2.git
pip install -r requirements.txt

It all worked perfectly.

Then I opened script_generation.ipynb. I tried to run it, and I got this error:

OSError: Model name './storage/models/' was not found in tokenizers model name list (gpt2, gpt2-medium, gpt2-large, gpt2-xl, distilgpt2). We assumed './storage/models/' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.json', 'merges.txt'] but couldn't find such vocabulary files at this path or url.

cdpierse · 2020-12-28T20:41:32Z

Hey @Tylersuard I know what the issue is here, the model and tokenizer were stored locally at this stage of the process. Thankfully it's all up on the Transformers model hub now. If you make the change below It should all work.

Change

tokenizer = GPT2Tokenizer.from_pretrained(output_dir)
model = GPT2LMHeadModel.from_pretrained(output_dir)

To:

tokenizer = GPT2Tokenizer.from_pretrained("cpierse/gpt2_film_scripts")
model = GPT2LMHeadModel.from_pretrained("cpierse/gpt2_film_scripts")

jkurlandski01 · 2021-02-23T11:54:04Z

Hi. I'm running script_generation.ipynb on Google Drive as a Colab notebook. I amended the tokenizer and model lines above as you advise, but when I get to the cell that creates the ScriptData object I get this error:

AttributeError                            Traceback (most recent call last)
<ipython-input-9-64b0d907a510> in <module>()
----> 1 dataset = ScriptData(tokenizer= tokenizer, file_path= FILE_PATH )
      2 script_loader = DataLoader(dataset,batch_size=4,shuffle=True)

/content/language_modelling.py in __init__(self, tokenizer, file_path, block_size, overwrite_cache)
     39 
     40         block_size = block_size - (
---> 41             tokenizer.max_len - tokenizer.max_len_single_sentence
     42         )
     43 

AttributeError: 'GPT2Tokenizer' object has no attribute 'max_len'

Any help you can provide would be appreciated.

cdpierse · 2021-02-23T12:02:24Z

Hi @jkurlandski01, back when I created this notebook I was using transformers with version 2.6.0 it appears that somewhere along the line in version 3.0 and now 4.0 that tokenizer.max_len has been replaced by tokenizer.model_max_length see (here)[https://huggingface.co/transformers/master/main_classes/tokenizer.html] for a description of all the tokenizers default attributes.

If you change that one line in the language_modelling.py to the new attribute name it should do the trick.

jkurlandski01 · 2021-02-23T12:25:08Z

Thanks for the quick reply!

Your fix worked, then in a later cell in the Colab notebook (for epoch in range(EPOCHS):...) it crashed in the first epoch with this error:

Your session crashed after using all available RAM. If you are interested in access to high-RAM runtimes, you may want to check out Colab Pro.

I think I'm not running this .ipynb file as expected. Should it work in Google's Colab notebooks? I can import only the .ipynb file into Colaboratory, but have to manually upload the script_buddy_v2 project's files. This doesn't seem right to me. When I tried to run it as a Jupyter notebook locally it just hanged on the import statements. What am I doing wrong?

Again, thanks for your help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to get .ipynb to run #3

Trying to get .ipynb to run #3

Tylersuard commented Dec 23, 2020

cdpierse commented Dec 28, 2020

jkurlandski01 commented Feb 23, 2021

cdpierse commented Feb 23, 2021

jkurlandski01 commented Feb 23, 2021

Trying to get .ipynb to run #3

Trying to get .ipynb to run #3

Comments

Tylersuard commented Dec 23, 2020

cdpierse commented Dec 28, 2020

jkurlandski01 commented Feb 23, 2021

cdpierse commented Feb 23, 2021

jkurlandski01 commented Feb 23, 2021