Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some OUTCARs cannot be parsed #4251

Closed
yantar92 opened this issue Jan 13, 2025 · 12 comments
Closed

Some OUTCARs cannot be parsed #4251

yantar92 opened this issue Jan 13, 2025 · 12 comments
Labels

Comments

@yantar92
Copy link
Contributor

yantar92 commented Jan 13, 2025

Python version

Python 3.12.4

Pymatgen version

2025.1.9

Operating system version

Linux

Current behavior

Traceback (most recent call last):
    outcar = Outcar(os.path.join(wdir, "OUTCAR")).as_dict()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/LUMI_TYKKY_NMfA5bl/miniconda/envs/env1/lib/python3.12/site-packages/pymatgen/io/vasp/outputs.py", line 2161, in __init__
    self.data["nbands"] = self.data["nbands"][0][0]
                          ~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

Expected Behavior

Outcar being parsed.

Minimal example

from pymatgen.io.vasp.outputs import Outcar
out = Outcar('OUTCAR')

Relevant files to reproduce this bug

OUTCAR.txt

@yantar92 yantar92 added the bug label Jan 13, 2025
@DanielYang59
Copy link
Contributor

DanielYang59 commented Jan 14, 2025

Thanks for opening this. I will look into this ASAP, link to #4195


Update: I cannot reproduce this on MacOS 15.2, Python 3.12.8

from pymatgen.io.vasp.outputs import Outcar


outcar = Outcar('OUTCAR.txt')

print(outcar.data["nbands"])  # 24 (seems to work fine)

@yantar92
Copy link
Contributor Author

Oops. Sorry, attached a wrong OUTCAR file. Try this one
OUTCAR.txt

@DanielYang59
Copy link
Contributor

DanielYang59 commented Jan 14, 2025

Thanks this seems to recreate that issue now.

However this VASP job seem to fail owing to a bad KPOINTS file, so perhaps this should not be handled by pymatgen (I would expect a properly started VASP job to always have the NBANDS tag)?

Automatic generation of k-mesh.
 -----------------------------------------------------------------------------
|                                                                             |
|     EEEEEEE  RRRRRR   RRRRRR   OOOOOOO  RRRRRR      ###     ###     ###     |
|     E        R     R  R     R  O     O  R     R     ###     ###     ###     |
|     E        R     R  R     R  O     O  R     R     ###     ###     ###     |
|     EEEEE    RRRRRR   RRRRRR   O     O  RRRRRR       #       #       #      |
|     E        R   R    R   R    O     O  R   R                               |
|     E        R    R   R    R   O     O  R    R      ###     ###     ###     |
|     EEEEEEE  R     R  R     R  OOOOOOO  R     R     ###     ###     ###     |
|                                                                             |
|     Error reading KPOINTS file.                                             |
|     The error occurred at line: 4.                                          |
|                                                                             |
|       ---->  I REFUSE TO CONTINUE WITH THIS SICK JOB ... BYE!!! <----       |
|                                                                             |
 -----------------------------------------------------------------------------

But yes I would be happy to check if it makes sense to add a better default value in such cases instead of throwing an IndexError

If you comment out the NBANDS parsing part, yet another error would be thrown:

Traceback (most recent call last):
  File "/Users/yang/developer/pymatgen/test_read_outcar.py", line 4, in <module>
    outcar = Outcar('OUTCAR.txt')
             ^^^^^^^^^^^^^^^^^^^^
  File "/Users/yang/developer/pymatgen/src/pymatgen/io/vasp/outputs.py", line 2169, in __init__
    self.data["nplwv"] = [[int(self.data["nplwv"][0][0])]]
                               ~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

So perhaps it's better to provide a correctly started VASP job?

@yantar92
Copy link
Contributor Author

Hmm. You are right. My confusion is probably originating from that fact that pymatgen can actually read OUTCAR from failed jobs most of the time. It is just that in this particular case the job failed especially early.

It would be nice to throw a more readable exception (and document this fact) in the scenarios like in my example, when pymatgen cannot extract any kind of useful data from the OUTCAR.

@yantar92 yantar92 changed the title Some OUTCARs cannot be parsed after #4195 Some OUTCARs cannot be parsed Jan 14, 2025
@DanielYang59
Copy link
Contributor

It would be nice to throw a more readable exception (and document this fact) in the scenarios like in my example, when pymatgen cannot extract any kind of useful data from the OUTCAR.

I agree having a descriptive exception message is always better. But currently it's unclear to me what is a reliable criterion from OUTCAR that indicating a VASP job has been properly setup and started. Looks like NBANDS might be one of the first parameters after KPOINTS are setup, do you have any comment on this?


Looking at line 414 of your first OUTCAR (where the job started correctly)

 Dimension of arrays:
   k-points           NKPTS =     20   k-points in BZ     NKDIM =     20   number of bands    NBANDS=     24
   number of dos      NEDOS =    301   number of ions     NIONS =      4
   non local maximal  LDIM  =      6   non local SUM 2l+1 LMDIM =     18
   total plane-waves  NPLWV = 248832
   max r-space proj   IRMAX =      1   max aug-charges    IRDMAX=  34250
   dimension x,y,z NGX =    36 NGY =   36 NGZ =  192
   dimension x,y,z NGXF=    72 NGYF=   72 NGZF=  384
   support grid    NGXF=    72 NGYF=   72 NGZF=  384
   ions per type =               2   2
   NGX,Y,Z   is equivalent  to a cutoff of  13.30, 13.05, 13.50 a.u.
   NGXF,Y,Z  is equivalent  to a cutoff of  26.60, 26.09, 27.00 a.u.

@yantar92
Copy link
Contributor Author

Looks like NBANDS might be one of the first parameters after KPOINTS are setup, do you have any comment on this?

IMHO, trying to deduce the undocumented OUTCAR format exactly is prone to errors.
I think that the best approach would be doing something similar to what the Outcar class docstring explains about special VASP run types:

Note, this class works a bit differently than most of the other
VASP objects, since OUTCAR can be very different depending on which
"type of run" performed.

Create the OUTCAR class with a filename reads "regular parameters" that
are always present.
...
One can then call a specific reader depending on the type of run being
performed. These are currently: read_igpar(), read_lepsilon() and
read_lcalcpol(), read_core_state_eign(), read_avg_core_pot().

According to the docstring, some parts of the OUTCAR are parsed conditionally.

Similar to the optional parts of the OUTCAR, the "always present" parts (not always, as we see in my OUTCAR.txt) might be parsed conditionally - if they can be parsed, they do get added to the class. If not - not added.

Paractically, I imagine that the class constructor can be separated into subroutines each parsing individual part of the file. If any of the subroutines throws an error or detects in any other way that some information is not present, either a error is thrown or a warning is displayed.

@yantar92
Copy link
Contributor Author

Looks like NBANDS might be one of the first parameters after KPOINTS are setup, do you have any comment on this?

If we do want some kind of simple criterion that VASP started a calculation (not stopped in the middle of a setup), one heuristics might be searching for cpu time.+ lines.

@esoteric-ephemera
Copy link
Contributor

NBANDS is not a required INCAR tag, VASP determines this on the fly if it's not set. That's usually what we do when we have no a priori knowledge of the electronic structure of a system

pymatgen can usually parse OUTCARs from failed calculations only if they've reached at least one electronic step. Is there info in the OUTCAR that you need to extract in this case? Most of it is user input since it didn't reach k-point generation

Also, you may want to look into custodian, which can help in recovering from errors

@yantar92
Copy link
Contributor Author

pymatgen can usually parse OUTCARs from failed calculations only if they've reached at least one electronic step. Is there info in the OUTCAR that you need to extract in this case? Most of it is user input since it didn't reach k-point generation

For example, whether WAVECAR/CHGCAR has been read successfully (or at all).

For context, I am using a class derived from the original Outcar with some extensions. The main focus is displaying key information about running VASP calculation.

That said, I am not saying that pymatgen should implement the more granular parsing as I described. Throwing a catchable exception would suffice for my needs.

Also, you may want to look into custodian, which can help in recovering from errors

I've looked at it. It is not suitable for my needs as it produces results that are not comparable when analyzing several related structures. It will also not recover from this particular problem with parsing KPOINTS file (which is simply a VASP bug). But that's probably an off topic to this particular bug report.

@esoteric-ephemera
Copy link
Contributor

Throwing a catchable exception would suffice for my needs.

That's already possible no?

try:
  outcar = Outcar("<filename>")
except Exception as exc:
  outcar = None

Restricting the exception catching to IndexError is probably too restrictive to catch all possible failure modes

There's some precedent for parsing partial VASP output in the Vasprun class by setting exception_on_bad_xml = False. But a similar option would have to be implemented for Outcar.

Maybe using the Vasprun with that option can help as well? The Vasprun class is overall better maintained and provides more detailed information about the calculation than the Outcar class

@yantar92
Copy link
Contributor Author

That's already possible no?

Yes, of course. But it is much better when pymatgen signals a very specific exception type that tells what exactly is wrong.
Just IndexError does not tell much. It might as well be a bug in pymatgen - impossible to say without reading the source code and doing in-depth debugging.

And nothing in the docstring says that some Outcars cannot be read.

There's some precedent for parsing partial VASP output in the Vasprun class by setting exception_on_bad_xml = False. But a similar option would have to be implemented for Outcar.

That would be ideal, although I am not 100% if it is worth the effort.

Maybe using the Vasprun with that option can help as well? The Vasprun class is overall better maintained and provides more detailed information about the calculation than the Outcar class

Oh. I wish it were true. But it is not:

  1. vasprun.xml does not provide magnetization info (AFAIK).
  2. vasprun.xml is not written at all when doing NEB calculations.

@shyuep
Copy link
Member

shyuep commented Jan 17, 2025

I am closing this Issue since I don't think it is fruitful to try to implement. The main reason is that the OUTCAR is a poorly defined text file, unlike say XML. The OUTCAR can be incomplete for any number of reasons. E.g., say if the compute cluster just randomly terminates a run. At best, we can have a big try-catch and raise a "BadOutcar" error, but that is not going to be useful since there is no way to diagnose the reason for the error. In the extreme example, say a run results in a completely empty OUTCAR file - what then? A broad BadOutcar error is not much different from doing a try-catch with a BaseException.

@shyuep shyuep closed this as completed Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants