columns expected but not found: ['structural', 'contents', 'nonstructural', 'asset_id'] #126

anthonyfok · 2021-05-28T22:26:52Z

2022-08-09 Update: @wkhchow encountered this error too, but could not reproduce it the second time. Suspect to be an occasional network error that caused incomplete download. Moved this issue from OpenDRR/opendrr-api to OpenDRR/model-factory. Perhaps adding a fail-safe mechanism in DSRA_outputs2postgres_lfs.py (verify checksum, retry download, etc.) would mitigate the issue?

2021-06-07 Update: I wasn't able to reproduce this in my June 4 to 5 local run (using the pipeline-optimization branch at commit 15c2d1889c8b28a04671fbc10a5a0436ba071289). To be investigated.

On 2021-05-28, Joost encountered ValueError: Usecols do not match columns, columns expected but not found: ['structural', 'contents', 'nonstructural', 'asset_id'] during

[add_data] python3 DSRA_outputs2postgres_lfs.py --dsraModelDir=https://github.com/OpenDRR/scenario-catalogue/tree/master/FINISHED --columnsINI=DSRA_outputs2postgres.ini --eqScenario=SCM7p0_MontrealNW

[2021-06-07 Update] Joost was using commit 6ae277be4189bfee6af52c82afde89cfdc2baabf, which is the tip branch that I renamed to pipeline-optimization_experimental_with_bubble-merged_gen-pygeoapi-config).

It does look like something that #53 intends to fix, i.e. flexible CSV header data loading.

The same routine that Anthony ran on 2021-05-22 was successful though, and there did not seem to be any recent change to the upstream repos.

Hypothesis 1: only have PSRA data enabled in the ENV?

In Joost's .env file, only loadPsraModels is set to true only; all the other load* variables are set to false.

... but on closer look, routines for loadPsraModels are run very first,
And the DSRA_outputs2postgres_lfs.py comes before all of the above, so that's probably not it.

Hypothesis 2: Result of keeping volumes from previous run?

Nope, Joost nuked the volumes.

Hypothesis 3: Upstream repos changed?

Not at first glance, but maybe I missed something.

More info

Refer to the Slack DM log between Joost and me on 2021-05-21.

Failure log

[add_data] python3 DSRA_outputs2postgres_lfs.py --dsraModelDir=https://github.com/OpenDRR/scenario-catalogue/tree/master/FINISHED --columnsINI=DSRA_outputs2postgres.ini --eqScenario=SCM7p0_MontrealNW
python-opendrr_1     | Traceback (most recent call last):
python-opendrr_1     |  File "DSRA_outputs2postgres_lfs.py", line 205, in <module>
python-opendrr_1     |   main() 
python-opendrr_1     |  File "DSRA_outputs2postgres_lfs.py", line 65, in main
python-opendrr_1     |   dfsr[retrofit] = GetDataframeForScenario(url, repo_list, retrofitPrefix, eqscenario, columnConfigParser, auth)
python-opendrr_1     |  File "DSRA_outputs2postgres_lfs.py", line 139, in GetDataframeForScenario
python-opendrr_1     |   dfLosses = pd.read_csv(StringIO(response.content.decode(response.encoding)),
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 676, in parser_f
python-opendrr_1     |   return _read(filepath_or_buffer, kwds)
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 448, in _read
python-opendrr_1     |   parser = TextFileReader(fp_or_buf, **kwds)
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 880, in __init__
python-opendrr_1     |   self._make_engine(self.engine)
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 1114, in _make_engine
python-opendrr_1     |   self._engine = CParserWrapper(self.f, **self.options)
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 1937, in __init__
python-opendrr_1     |   _validate_usecols_names(usecols, self.orig_names)
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 1232, in _validate_usecols_names
python-opendrr_1     |   raise ValueError(
python-opendrr_1     | ValueError: Usecols do not match columns, columns expected but not found: ['structural', 'contents', 'nonstructural', 'asset_id']
python-opendrr_1     | Command exited with non-zero status 1
python-opendrr_1     | 24.71user 10.78system 1:49.17elapsed 32%CPU (0avgtext+0avgdata 1758380maxresident)k
python-opendrr_1     | 248inputs+0outputs (4major+1899875minor)pagefaults 0swaps
python-opendrr_1     | 
python-opendrr_1     | real	1m49.174s
python-opendrr_1     | user	0m24.716s
python-opendrr_1     | sys	0m10.788s
python-opendrr_1 exited with code 1

The text was updated successfully, but these errors were encountered:

drotheram · 2021-10-25T18:51:54Z

Is this issue still relevant?

anthonyfok · 2022-08-09T09:58:58Z

Apparently still relevant, for better or for wosre.

Will (@wkhchow) ran into the same error (with Docker Desktop on Windows) on August 8, 2022 while testing the updates_july2022 branch at commit 141a5455b80ff3152159aad3ef634ed9bc12ba73:

[add_data:942:import_earthquake_scenarios] RUN: python3 DSRA_outputs2postgres_lfs.py --dsraModelDir=https://github.com/OpenDRR/earthquake-scenarios/tree/master/FINISHED --columnsINI=DSRA_outputs2postgres.ini --eqScenario=ACM7p3_LeechRiverFullFault
Traceback (most recent call last):
  File "DSRA_outputs2postgres_lfs.py", line 210, in <module>
    main() 
  File "DSRA_outputs2postgres_lfs.py", line 65, in main
    dfsr[retrofit] = GetDataframeForScenario(url, repo_list, retrofitPrefix, eqscenario, columnConfigParser, auth)
  File "DSRA_outputs2postgres_lfs.py", line 139, in GetDataframeForScenario
    dfLosses = pd.read_csv(StringIO(response.content.decode(response.encoding)),
  File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 676, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 448, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 880, in __init__
    self._make_engine(self.engine)
  File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 1114, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 1937, in __init__
    _validate_usecols_names(usecols, self.orig_names)
  File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 1232, in _validate_usecols_names
    raise ValueError(
ValueError: Usecols do not match columns, columns expected but not found: ['nonstructural', 'contents', 'structural', 'asset_id']
Command exited with non-zero status 1
3.11user 1.71system 0:24.46elapsed 19%CPU (0avgtext+0avgdata 863156maxresident)k
400inputs+0outputs (4major+139972minor)pagefaults 0swaps

real	0m24.468s
user	0m3.117s
sys	0m1.718s

Will started another run of the stack build to see if the error persists or not.

anthonyfok · 2022-08-09T16:09:21Z

Good news! @wkhchow reported at today's scrum that the re-run of the stack build proceeded smoothly past the DSRA import stage into the PostGIS-to-Elasticsearch export stage.

So, it would appear that

ValueError: Usecols do not match columns, columns expected but not found: ['nonstructural', 'contents', 'structural', 'asset_id']
is an intermittent albeit rare error.

Seeing how this happened twice both with DSRA_outputs2postgres_lfs.py from the OpenDRR/model-factory repo, and how DSRA_outputs2postgres_lfs.py uses the Python requests module to download stuff from GitHub LFS, I suspect that the error happened when a network error caused the incomplete download of the data file.

Moving this issue to the OpenDRR/model-factory repo for further investigation of DSRA_outputs2postgres_lfs.py (and maybe other similar scripts too).

anthonyfok added Bug Something isn't working Task labels May 28, 2021

anthonyfok self-assigned this May 28, 2021

anthonyfok transferred this issue from OpenDRR/opendrr-api Aug 9, 2022

anthonyfok mentioned this issue Aug 22, 2022

Create CI tests workflow for the Python scripts in this repo #131

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

columns expected but not found: ['structural', 'contents', 'nonstructural', 'asset_id'] #126

columns expected but not found: ['structural', 'contents', 'nonstructural', 'asset_id'] #126

anthonyfok commented May 28, 2021 •

edited

Loading

drotheram commented Oct 25, 2021

anthonyfok commented Aug 9, 2022 •

edited

Loading

anthonyfok commented Aug 9, 2022

columns expected but not found: ['structural', 'contents', 'nonstructural', 'asset_id'] #126

columns expected but not found: ['structural', 'contents', 'nonstructural', 'asset_id'] #126

Comments

anthonyfok commented May 28, 2021 • edited Loading

Hypothesis 1: only have PSRA data enabled in the ENV?

Hypothesis 2: Result of keeping volumes from previous run?

Hypothesis 3: Upstream repos changed?

More info

Failure log

drotheram commented Oct 25, 2021

anthonyfok commented Aug 9, 2022 • edited Loading

anthonyfok commented Aug 9, 2022

anthonyfok commented May 28, 2021 •

edited

Loading

anthonyfok commented Aug 9, 2022 •

edited

Loading