Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in ErnieMConverter Class #41

Open
YiandLi opened this issue Jan 27, 2024 · 0 comments
Open

Bug in ErnieMConverter Class #41

YiandLi opened this issue Jan 27, 2024 · 0 comments

Comments

@YiandLi
Copy link

YiandLi commented Jan 27, 2024

Using -m-large version, but met a bug in class ErnieMConverter(Converter):

Traceback (most recent call last):
  File "/Users/liuyilin/Downloads/NLP_project/Kaggle_PIIDD/src/run.py", line 23, in <module>
    ie = UIEPredictor(model='uie-m-large', schema=schema, device="cuda" if torch.cuda.is_available() else "cpu")
  File "/Users/liuyilin/Downloads/NLP_project/Kaggle_PIIDD/uie_pytorch/uie_predictor.py", line 146, in __init__
    self._prepare_predictor()
  File "/Users/liuyilin/Downloads/NLP_project/Kaggle_PIIDD/uie_pytorch/uie_predictor.py", line 160, in _prepare_predictor
    self._tokenizer = ErnieMTokenizerFast.from_pretrained(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2017, in from_pretrained
    return cls._from_pretrained(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2249, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/Users/liuyilin/Downloads/NLP_project/Kaggle_PIIDD/uie_pytorch/tokenizer.py", line 477, in __init__
    super().__init__(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 114, in __init__
    fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 1342, in convert_slow_tokenizer
    return converter_class(transformer_tokenizer).converted()
  File "/Users/liuyilin/Downloads/NLP_project/Kaggle_PIIDD/uie_pytorch/tokenizer.py", line 576, in __init__
    from transformers.utils import sentencepiece_model_pb2 as model_pb2
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/transformers/utils/sentencepiece_model_pb2.py", line 91, in <module>
    _descriptor.EnumValueDescriptor(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/google/protobuf/descriptor.py", line 789, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant