Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to get .ipynb to run #3

Open
Tylersuard opened this issue Dec 23, 2020 · 4 comments
Open

Trying to get .ipynb to run #3

Tylersuard opened this issue Dec 23, 2020 · 4 comments

Comments

@Tylersuard
Copy link

Hello! Thank you for posting. As per your blog, I opened up a Paperspace instance and cloned your repo, then ran:

git clone https://github.com/cdpierse/script_buddy_v2.git
pip install -r requirements.txt

It all worked perfectly.

Then I opened script_generation.ipynb. I tried to run it, and I got this error:

OSError: Model name './storage/models/' was not found in tokenizers model name list (gpt2, gpt2-medium, gpt2-large, gpt2-xl, distilgpt2). We assumed './storage/models/' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.json', 'merges.txt'] but couldn't find such vocabulary files at this path or url.

@cdpierse
Copy link
Owner

Hey @Tylersuard I know what the issue is here, the model and tokenizer were stored locally at this stage of the process. Thankfully it's all up on the Transformers model hub now. If you make the change below It should all work.

Change

tokenizer = GPT2Tokenizer.from_pretrained(output_dir)
model = GPT2LMHeadModel.from_pretrained(output_dir)

To:

tokenizer = GPT2Tokenizer.from_pretrained("cpierse/gpt2_film_scripts")
model = GPT2LMHeadModel.from_pretrained("cpierse/gpt2_film_scripts")

@jkurlandski01
Copy link

Hi. I'm running script_generation.ipynb on Google Drive as a Colab notebook. I amended the tokenizer and model lines above as you advise, but when I get to the cell that creates the ScriptData object I get this error:

AttributeError                            Traceback (most recent call last)
<ipython-input-9-64b0d907a510> in <module>()
----> 1 dataset = ScriptData(tokenizer= tokenizer, file_path= FILE_PATH )
      2 script_loader = DataLoader(dataset,batch_size=4,shuffle=True)

/content/language_modelling.py in __init__(self, tokenizer, file_path, block_size, overwrite_cache)
     39 
     40         block_size = block_size - (
---> 41             tokenizer.max_len - tokenizer.max_len_single_sentence
     42         )
     43 

AttributeError: 'GPT2Tokenizer' object has no attribute 'max_len'

Any help you can provide would be appreciated.

@cdpierse
Copy link
Owner

Hi @jkurlandski01, back when I created this notebook I was using transformers with version 2.6.0 it appears that somewhere along the line in version 3.0 and now 4.0 that tokenizer.max_len has been replaced by tokenizer.model_max_length see (here)[https://huggingface.co/transformers/master/main_classes/tokenizer.html] for a description of all the tokenizers default attributes.

If you change that one line in the language_modelling.py to the new attribute name it should do the trick.

@jkurlandski01
Copy link

Thanks for the quick reply!

Your fix worked, then in a later cell in the Colab notebook (for epoch in range(EPOCHS):...) it crashed in the first epoch with this error:

Your session crashed after using all available RAM. If you are interested in access to high-RAM runtimes, you may want to check out Colab Pro.

I think I'm not running this .ipynb file as expected. Should it work in Google's Colab notebooks? I can import only the .ipynb file into Colaboratory, but have to manually upload the script_buddy_v2 project's files. This doesn't seem right to me. When I tried to run it as a Jupyter notebook locally it just hanged on the import statements. What am I doing wrong?

Again, thanks for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants