`ParserError: Error tokenizing data.` from `parse_lammps_dumps()` for only SOME files #3409

kimiaGF · 2023-10-16T18:17:53Z

kimiaGF
Oct 16, 2023

I am trying to parse a LAMMPS dump file, containing all simulation snapshots, into a list of LammpsDump objects using the parse_lammps_dumps() function. This method works for some dump files, but not all, even though an identical LAMMPS input script was used to generate all the files.

I have linked two example files to recreate this error:

one that parses without error ('success.dump'),
one that results in the error ('fail.dump').

The error can be recreated by running:

d = parse_lammps_dumps('fail.dump')
dumps = [i for i in d]

Please note these are large at around 1.5 Gb each

I have checked the length of the split line where the error is raised (line 62647) and it is the same length as the headers (20, not 22 like it is claiming). I have also checked for special characters, whitespaces, and new lines and nothing seems to be adding to the list of fields in the fail.dump file. Does anyone else have any experience with this?

Here is the full error message:

ParserError                               Traceback (most recent call last)
[/home/kimia.gh/blue2/B4C_ML_Potential/analysis_scripts/shock/pymatgen_dump_analysis.py](https://vscode-remote+ssh-002dremote-002bhpg-002dcompute-005f10.vscode-resource.vscode-cdn.net/home/kimia.gh/blue2/B4C_ML_Potential/analysis_scripts/shock/pymatgen_dump_analysis.py) in line 9
      <a href='file:///home/kimia.gh/blue2/B4C_ML_Potential/analysis_scripts/shock/pymatgen_dump_analysis.py?line=171'>172</a> num_bins = 25
      <a href='file:///home/kimia.gh/blue2/B4C_ML_Potential/analysis_scripts/shock/pymatgen_dump_analysis.py?line=173'>174</a> d = parse_lammps_dumps(filename)
----> <a href='file:///home/kimia.gh/blue2/B4C_ML_Potential/analysis_scripts/shock/pymatgen_dump_analysis.py?line=174'>175</a> dumps = [i for i in d]

[/home/kimia.gh/blue2/B4C_ML_Potential/analysis_scripts/shock/pymatgen_dump_analysis.py](https://vscode-remote+ssh-002dremote-002bhpg-002dcompute-005f10.vscode-resource.vscode-cdn.net/home/kimia.gh/blue2/B4C_ML_Potential/analysis_scripts/shock/pymatgen_dump_analysis.py) in line 9, in <listcomp>(.0)
      <a href='file:///home/kimia.gh/blue2/B4C_ML_Potential/analysis_scripts/shock/pymatgen_dump_analysis.py?line=171'>172</a> num_bins = 25
      <a href='file:///home/kimia.gh/blue2/B4C_ML_Potential/analysis_scripts/shock/pymatgen_dump_analysis.py?line=173'>174</a> d = parse_lammps_dumps(filename)
----> <a href='file:///home/kimia.gh/blue2/B4C_ML_Potential/analysis_scripts/shock/pymatgen_dump_analysis.py?line=174'>175</a> dumps = [i for i in d]

File [~/.conda/envs/pmg/lib/python3.9/site-packages/pymatgen/io/lammps/outputs.py:123](https://vscode-remote+ssh-002dremote-002bhpg-002dcompute-005f10.vscode-resource.vscode-cdn.net/home/kimia.gh/~/.conda/envs/pmg/lib/python3.9/site-packages/pymatgen/io/lammps/outputs.py:123), in parse_lammps_dumps(file_pattern)
    <a href='file:///home/kimia.gh/.conda/envs/pmg/lib/python3.9/site-packages/pymatgen/io/lammps/outputs.py?line=120'>121</a> if line.startswith("ITEM: TIMESTEP"):
    <a href='file:///home/kimia.gh/.conda/envs/pmg/lib/python3.9/site-packages/pymatgen/io/lammps/outputs.py?line=121'>122</a>     if len(dump_cache) > 0:
--> <a href='file:///home/kimia.gh/.conda/envs/pmg/lib/python3.9/site-packages/pymatgen/io/lammps/outputs.py?line=122'>123</a>         yield LammpsDump.from_str("".join(dump_cache))
    <a href='file:///home/kimia.gh/.conda/envs/pmg/lib/python3.9/site-packages/pymatgen/io/lammps/outputs.py?line=123'>124</a>     dump_cache = [line]
    <a href='file:///home/kimia.gh/.conda/envs/pmg/lib/python3.9/site-packages/pymatgen/io/lammps/outputs.py?line=124'>125</a> else:

File [~/.conda/envs/pmg/lib/python3.9/site-packages/pymatgen/io/lammps/outputs.py:71](https://vscode-remote+ssh-002dremote-002bhpg-002dcompute-005f10.vscode-resource.vscode-cdn.net/home/kimia.gh/~/.conda/envs/pmg/lib/python3.9/site-packages/pymatgen/io/lammps/outputs.py:71), in LammpsDump.from_str(cls, string)
     <a href='file:///home/kimia.gh/.conda/envs/pmg/lib/python3.9/site-packages/pymatgen/io/lammps/outputs.py?line=68'>69</a> box = LammpsBox(bounds, tilt)
     <a href='file:///home/kimia.gh/.conda/envs/pmg/lib/python3.9/site-packages/pymatgen/io/lammps/outputs.py?line=69'>70</a> data_head = lines[8].replace("ITEM: ATOMS", "").split()
---> <a href='file:///home/kimia.gh/.conda/envs/pmg/lib/python3.9/site-packages/pymatgen/io/lammps/outputs.py?line=70'>71</a> data = pd.read_csv(StringIO("\n".join(lines[9:])), names=data_head, delim_whitespace=True)
     <a href='file:///home/kimia.gh/.conda/envs/pmg/lib/python3.9/site-packages/pymatgen/io/lammps/outputs.py?line=71'>72</a> return cls(timestep, n_atoms, box, data)
...

File [~/.conda/envs/pmg/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:2029](https://vscode-remote+ssh-002dremote-002bhpg-002dcompute-005f10.vscode-resource.vscode-cdn.net/home/kimia.gh/~/.conda/envs/pmg/lib/python3.9/site-packages/pandas/_libs/parsers.pyx:2029), in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 20 fields in line 62647, saw 22
Output is truncated. View as a [scrollable element](command:cellOutput.enableScrolling?447a8554-2a73-4a57-a807-865dbd39b69c) or open in a [text editor](command:workbench.action.openLargeOutput?447a8554-2a73-4a57-a807-865dbd39b69c). Adjust cell output [settings](command:workbench.action.openSettings?%5B%22%40tag%3AnotebookOutputLayout%22%5D)...

Package versions:

pandas                    2.0.3            py39h1128e8f_0
pymatgen                  2023.10.4        py39h44dd56e_0    conda-forge
lammps                    2022.06.23      py39h896a7a4_mpich_12    conda-forge

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`ParserError: Error tokenizing data.` from `parse_lammps_dumps()` for only SOME files #3409

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

ParserError: Error tokenizing data. from parse_lammps_dumps() for only SOME files #3409

Uh oh!

kimiaGF Oct 16, 2023

Replies: 0 comments

`ParserError: Error tokenizing data.` from `parse_lammps_dumps()` for only SOME files #3409

kimiaGF
Oct 16, 2023