Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

columns expected but not found: ['structural', 'contents', 'nonstructural', 'asset_id'] #126

Open
anthonyfok opened this issue May 28, 2021 · 3 comments
Assignees
Labels
Bug Something isn't working Task

Comments

@anthonyfok
Copy link
Member

anthonyfok commented May 28, 2021

2022-08-09 Update: @wkhchow encountered this error too, but could not reproduce it the second time. Suspect to be an occasional network error that caused incomplete download. Moved this issue from OpenDRR/opendrr-api to OpenDRR/model-factory. Perhaps adding a fail-safe mechanism in DSRA_outputs2postgres_lfs.py (verify checksum, retry download, etc.) would mitigate the issue?


2021-06-07 Update: I wasn't able to reproduce this in my June 4 to 5 local run (using the pipeline-optimization branch at commit 15c2d1889c8b28a04671fbc10a5a0436ba071289). To be investigated.


On 2021-05-28, Joost encountered ValueError: Usecols do not match columns, columns expected but not found: ['structural', 'contents', 'nonstructural', 'asset_id'] during

[add_data] python3 DSRA_outputs2postgres_lfs.py --dsraModelDir=https://github.com/OpenDRR/scenario-catalogue/tree/master/FINISHED --columnsINI=DSRA_outputs2postgres.ini --eqScenario=SCM7p0_MontrealNW

[2021-06-07 Update] Joost was using commit 6ae277be4189bfee6af52c82afde89cfdc2baabf, which is the tip branch that I renamed to pipeline-optimization_experimental_with_bubble-merged_gen-pygeoapi-config).

It does look like something that #53 intends to fix, i.e. flexible CSV header data loading.

The same routine that Anthony ran on 2021-05-22 was successful though, and there did not seem to be any recent change to the upstream repos.

Hypothesis 1: only have PSRA data enabled in the ENV?

In Joost's .env file, only loadPsraModels is set to true only; all the other load* variables are set to false.

... but on closer look, routines for loadPsraModels are run very first,
And the DSRA_outputs2postgres_lfs.py comes before all of the above, so that's probably not it.

Hypothesis 2: Result of keeping volumes from previous run?

Nope, Joost nuked the volumes.

Hypothesis 3: Upstream repos changed?

Not at first glance, but maybe I missed something.

More info

Refer to the Slack DM log between Joost and me on 2021-05-21.

Failure log

[add_data] python3 DSRA_outputs2postgres_lfs.py --dsraModelDir=https://github.com/OpenDRR/scenario-catalogue/tree/master/FINISHED --columnsINI=DSRA_outputs2postgres.ini --eqScenario=SCM7p0_MontrealNW
python-opendrr_1     | Traceback (most recent call last):
python-opendrr_1     |  File "DSRA_outputs2postgres_lfs.py", line 205, in <module>
python-opendrr_1     |   main() 
python-opendrr_1     |  File "DSRA_outputs2postgres_lfs.py", line 65, in main
python-opendrr_1     |   dfsr[retrofit] = GetDataframeForScenario(url, repo_list, retrofitPrefix, eqscenario, columnConfigParser, auth)
python-opendrr_1     |  File "DSRA_outputs2postgres_lfs.py", line 139, in GetDataframeForScenario
python-opendrr_1     |   dfLosses = pd.read_csv(StringIO(response.content.decode(response.encoding)),
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 676, in parser_f
python-opendrr_1     |   return _read(filepath_or_buffer, kwds)
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 448, in _read
python-opendrr_1     |   parser = TextFileReader(fp_or_buf, **kwds)
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 880, in __init__
python-opendrr_1     |   self._make_engine(self.engine)
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 1114, in _make_engine
python-opendrr_1     |   self._engine = CParserWrapper(self.f, **self.options)
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 1937, in __init__
python-opendrr_1     |   _validate_usecols_names(usecols, self.orig_names)
python-opendrr_1     |  File "/usr/local/lib/python3.8/dist-packages/pandas/io/parsers.py", line 1232, in _validate_usecols_names
python-opendrr_1     |   raise ValueError(
python-opendrr_1     | ValueError: Usecols do not match columns, columns expected but not found: ['structural', 'contents', 'nonstructural', 'asset_id']
python-opendrr_1     | Command exited with non-zero status 1
python-opendrr_1     | 24.71user 10.78system 1:49.17elapsed 32%CPU (0avgtext+0avgdata 1758380maxresident)k
python-opendrr_1     | 248inputs+0outputs (4major+1899875minor)pagefaults 0swaps
python-opendrr_1     | 
python-opendrr_1     | real	1m49.174s
python-opendrr_1     | user	0m24.716s
python-opendrr_1     | sys	0m10.788s
python-opendrr_1 exited with code 1
@anthonyfok anthonyfok added Bug Something isn't working Task labels May 28, 2021
@anthonyfok anthonyfok self-assigned this May 28, 2021
@drotheram
Copy link
Contributor

Is this issue still relevant?

@anthonyfok
Copy link
Member Author

anthonyfok commented Aug 9, 2022

Apparently still relevant, for better or for wosre.

Will (@wkhchow) ran into the same error (with Docker Desktop on Windows) on August 8, 2022 while testing the updates_july2022 branch at commit 141a5455b80ff3152159aad3ef634ed9bc12ba73:

[add_data:942:import_earthquake_scenarios] RUN: python3 DSRA_outputs2postgres_lfs.py --dsraModelDir=https://github.com/OpenDRR/earthquake-scenarios/tree/master/FINISHED --columnsINI=DSRA_outputs2postgres.ini --eqScenario=ACM7p3_LeechRiverFullFault
Traceback (most recent call last):
  File "DSRA_outputs2postgres_lfs.py", line 210, in <module>
    main() 
  File "DSRA_outputs2postgres_lfs.py", line 65, in main
    dfsr[retrofit] = GetDataframeForScenario(url, repo_list, retrofitPrefix, eqscenario, columnConfigParser, auth)
  File "DSRA_outputs2postgres_lfs.py", line 139, in GetDataframeForScenario
    dfLosses = pd.read_csv(StringIO(response.content.decode(response.encoding)),
  File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 676, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 448, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 880, in __init__
    self._make_engine(self.engine)
  File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 1114, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 1937, in __init__
    _validate_usecols_names(usecols, self.orig_names)
  File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 1232, in _validate_usecols_names
    raise ValueError(
ValueError: Usecols do not match columns, columns expected but not found: ['nonstructural', 'contents', 'structural', 'asset_id']
Command exited with non-zero status 1
3.11user 1.71system 0:24.46elapsed 19%CPU (0avgtext+0avgdata 863156maxresident)k
400inputs+0outputs (4major+139972minor)pagefaults 0swaps

real	0m24.468s
user	0m3.117s
sys	0m1.718s

Will started another run of the stack build to see if the error persists or not.

@anthonyfok
Copy link
Member Author

Good news! @wkhchow reported at today's scrum that the re-run of the stack build proceeded smoothly past the DSRA import stage into the PostGIS-to-Elasticsearch export stage.

So, it would appear that

ValueError: Usecols do not match columns, columns expected but not found: ['nonstructural', 'contents', 'structural', 'asset_id']
is an intermittent albeit rare error.

Seeing how this happened twice both with DSRA_outputs2postgres_lfs.py from the OpenDRR/model-factory repo, and how DSRA_outputs2postgres_lfs.py uses the Python requests module to download stuff from GitHub LFS, I suspect that the error happened when a network error caused the incomplete download of the data file.

Moving this issue to the OpenDRR/model-factory repo for further investigation of DSRA_outputs2postgres_lfs.py (and maybe other similar scripts too).

@anthonyfok anthonyfok transferred this issue from OpenDRR/opendrr-api Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Task
Projects
None yet
Development

No branches or pull requests

2 participants