Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

an error occurs while training tdl #2

Open
cschy opened this issue Nov 7, 2022 · 9 comments
Open

an error occurs while training tdl #2

cschy opened this issue Nov 7, 2022 · 9 comments

Comments

@cschy
Copy link

cschy commented Nov 7, 2022

(eqg) D:\Project\Educational-Question-Generation\tdl>python train.py
Traceback (most recent call last):
File "train.py", line 15, in
from transformers import BertTokenizerFast as BertTokenizer, BertModel, AdamW, get_linear_schedule_with_warmup
File "D:\Anaconda3\envs\eqg\lib\site-packages\transformers_init_.py", line 21, in
from .configuration_albert import ALBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, AlbertConfig
File "D:\Anaconda3\envs\eqg\lib\site-packages\transformers\configuration_albert.py", line 18, in
from .configuration_utils import PretrainedConfig
File "D:\Anaconda3\envs\eqg\lib\site-packages\transformers\configuration_utils.py", line 24, in
from .file_utils import CONFIG_NAME, cached_path, hf_bucket_url, is_remote_url
File "D:\Anaconda3\envs\eqg\lib\site-packages\transformers\file_utils.py", line 35, in
logger = logging.get_logger(name) # pylint: disable=invalid-name
AttributeError: module 'transformers.utils.logging' has no attribute 'get_logger'

@cschy
Copy link
Author

cschy commented Nov 7, 2022

I find that the logging.py is empty. I don't know why it occurs but it works when I used pip install transformers==3.1.0 instead of cd transformers & pip install .

@zhaozj89
Copy link
Owner

zhaozj89 commented Nov 7, 2022

Thanks for catching this. pip install transformers==3.1.0 may not work because we modify this version's transformers a bit. I update the logging.py file. Let me know if it works.

@cschy
Copy link
Author

cschy commented Nov 7, 2022

Thanks for catching this. pip install transformers==3.1.0 may not work because we modify this version's transformers a bit. I update the logging.py file. Let me know if it works.

The files under all the folders of transformers are empty (i.e. benchmark, commands, data and utils). Do these other empty files affect the training of the model?

@zhaozj89
Copy link
Owner

zhaozj89 commented Nov 7, 2022

Thanks for catching this. pip install transformers==3.1.0 may not work because we modify this version's transformers a bit. I update the logging.py file. Let me know if it works.

The files under all the folders of transformers are empty. (i.e. benchmark, commands, data and utils)

Sorry for the confusion. It might be my network problem, and I did not check it after pushing. I have updated the transformers folder.

@cschy
Copy link
Author

cschy commented Nov 8, 2022

Thanks for catching this. pip install transformers==3.1.0 may not work because we modify this version's transformers a bit. I update the logging.py file. Let me know if it works.

The files under all the folders of transformers are empty. (i.e. benchmark, commands, data and utils)

Sorry for the confusion. It might be my network problem, and I did not check it after pushing. I have updated the transformers folder.

Thank you very much! Here's another problem:
Traceback (most recent call last):
File "tdl/train.py", line 334, in
trainer.fit(model, data_module)
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/states.py", line 48, in wrapped_fn
result = fn(self, *args, **kwargs)
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1073, in fit
results = self.accelerator_backend.train(model)
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/accelerators/gpu_backend.py", line 51, in train
results = self.trainer.run_pretrain_routine(model)
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1224, in run_pretrain_routine
self._run_sanity_check(ref_model, model)
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1257, in _run_sanity_check
eval_results = self._evaluate(model, self.val_dataloaders, max_batches, False)
File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/trainer/evaluation_loop.py", line 305, in _evaluate
for batch_idx, batch in enumerate(dataloader):
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
return self._process_data(data)
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
data.reraise()
File "/opt/conda/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
data = fetcher.fetch(index)
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "tdl/train.py", line 135, in getitem
encoding = self.tokenizer.encode_plus(
File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2027, in encode_plus
return self._encode_plus(
File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 440, in _encode_plus
batched_output = self._batch_encode_plus(
File "/opt/conda/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 372, in _batch_encode_plus
encodings = self._tokenizer.encode(
File "/opt/conda/lib/python3.8/site-packages/tokenizers/implementations/base_tokenizer.py", line 212, in encode
return self._tokenizer.encode(sequence, pair, is_pretokenized, add_special_tokens)
ValueError: TextInputSequence must be str

So what caused this problem?

@zhaozj89
Copy link
Owner

zhaozj89 commented Nov 9, 2022

Sorry for the late reply. Did you use the uploaded transformers or pip install transformers==3.1.0? Literally, this problem is that the tokenizer does not get the correct input. This may be due to the wrong path/format of data or the change of transformers API. As mentioned previously, you need to use the uploaded transformers as we have modified it a bit. If you use it, would you mind installing it in an editable mode, and debugging it a bit?

@cschy
Copy link
Author

cschy commented Nov 9, 2022

I did use the uploaded transformers. I tried to debug it now. Thank you very much for your help!

Sorry for the late reply. Did you use the uploaded transformers or pip install transformers==3.1.0? Literally, this problem is that the tokenizer does not get the correct input. This may be due to the wrong path/format of data or the change of transformers API. As mentioned previously, you need to use the uploaded transformers as we have modified it a bit. If you use it, would you mind installing it in an editable mode, and debugging it a bit?

I did use the uploaded transformers. I tried to debug it now. Thank you very much for your help!

@cschy
Copy link
Author

cschy commented Nov 10, 2022

I found that this is because the type of the first parameter passed to the function self.tokenizer.encode_plushas(in tdl/train.py, class FairytaleQADataset, function __getitem__, at line 135) is dict(need str), so I should change the section to section['section'] which is the same problem as I mentioned in the email. Is it because your python version is able to do the conversion implicitly? Further, I got another error:
File "train.py", line 250, in validation_step
self.log("val_loss", loss, prog_bar=True, logger=True)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 778, in getattr
raise ModuleAttributeError("'{}' object has no attribute '{}'".format(
torch.nn.modules.module.ModuleAttributeError: 'FairytaleTDL' object has no attribute 'log'

I will try to upgrade the version of pytorch-lightning to 0.10.0 as the solution referred from here

@zhaozj89
Copy link
Owner

I used Python3.6 if I remember. Sorry for the confusion. The code may need some debugging to make it work, but I do not expect it has so many problems. You are welcome to post new problems, and I am happy to give useful input as much as I can.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants