parse cif to train structured data error #114

FlorientHuang · 2024-09-18T06:28:20Z

/HOME/scw6dlr/huangsuyuan/ProteinMPNN-main/training/parse_cif_noX.py:233: SyntaxWarning: invalid escape sequence '\('
  for expression in re.split('\(|\)', oper_expression) if expression]
Traceback (most recent call last):
  File "/HOME/scw6dlr/.conda/envs/mlfold/lib/python3.12/site-packages/pdbx/reader/PdbxReader.py", line 360, in __tokenizer
    line = next(fileIter)
           ^^^^^^^^^^^^^^
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/HOME/scw6dlr/huangsuyuan/ProteinMPNN-main/training/parse_cif_noX.py", line 457, in <module>
    chains,metadata = parse_mmcif(IN)
                      ^^^^^^^^^^^^^^^
  File "/HOME/scw6dlr/huangsuyuan/ProteinMPNN-main/training/parse_cif_noX.py", line 274, in parse_mmcif
    reader.read(data)
  File "/HOME/scw6dlr/.conda/envs/mlfold/lib/python3.12/site-packages/pdbx/reader/PdbxReader.py", line 72, in read
    self.__parser(self.__tokenizer(self.__ifh), containerList)
  File "/HOME/scw6dlr/.conda/envs/mlfold/lib/python3.12/site-packages/pdbx/reader/PdbxReader.py", line 275, in __parser
    curCatName, curAttName, curQuotedString, curWord = next(tokenizer)
                                                       ^^^^^^^^^^^^^^^
RuntimeError: generator raised StopIteration

Is this a package mismatch?

The text was updated successfully, but these errors were encountered:

FlorianWieser1 · 2024-10-30T09:52:20Z

Dear FlorientHuang,
I have the same issue. I installed https://pypi.org/project/pdbx-mmcif/ and I'm passing a gzipped mmcif from the Protein Databank to the script: python parse_cif_noX.py 1UBQ.cif.gz out.
Could you fix the error? What am I missing. Thanks a lot in advance!

FlorianWieser1 · 2024-10-30T10:19:35Z

The issue is likely that we are using a too recent python version. I could fix the issue for me by adding an exception in __tokenizer() method of mlfold/lib/python3.10/site-packages/pdbx/reader/PdbxReader.py.

        # Tokenizer loop begins here ---
        while True:
            try:
                line = next(fileIter)
            except StopIteration:
                return
            self.__curLineNumber += 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parse cif to train structured data error #114

parse cif to train structured data error #114

FlorientHuang commented Sep 18, 2024

FlorianWieser1 commented Oct 30, 2024

FlorianWieser1 commented Oct 30, 2024

parse cif to train structured data error #114

parse cif to train structured data error #114

Comments

FlorientHuang commented Sep 18, 2024

FlorianWieser1 commented Oct 30, 2024

FlorianWieser1 commented Oct 30, 2024