Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_data.sh related Python scripts - flexible data loading #53

Open
5 of 11 tasks
drotheram opened this issue Mar 1, 2021 · 3 comments
Open
5 of 11 tasks

add_data.sh related Python scripts - flexible data loading #53

drotheram opened this issue Mar 1, 2021 · 3 comments
Assignees
Labels

Comments

@drotheram
Copy link
Contributor

drotheram commented Mar 1, 2021

Dynamically load raw model datasets which may have changes to the types of fields by reading and loading those fields. Implement tests on inputs with some reasonable constraints on what the stack will load

See also Issue #48

Major Priorities

PSRA:

  • PSRA_copyTables.py (Preview WIP flexible-csv-headers branch)
    • Shrink the list of expected_headers (though these should probably be removed altogether eventually)
    • pylint, flake8
    • Skip OpenQuake comment header (and deal with U+FEFF BOM). (Originally taken care off by sed `add_data.sh
    • Read CSV header from PostgreSQL dynamically, something like psql opendrr -c 'COPY (SELECT * FROM psra_BC.psra_BC_hcurves_pga WHERE FALSE ) TO STDOUT WITH CSV HEADER;'
    • Quote SQL identifiers when necessary for column names with upper-case letters, ., (), etc.
    • Restore reading config.ini and add error checking if file not found (POSTGRES_* variables not defined)
    • Write some kind of unit test?

DSRA:

Minor Priorities

Exposure:

@jvanulde
Copy link
Contributor

@anthonyfok is this done? If not please move to Sprint 33 Milestone.

@anthonyfok anthonyfok modified the milestones: Sprint 31, Sprint 33 Apr 27, 2021
@anthonyfok
Copy link
Member

@jvanulde Thanks for the reminder! Done, and am in the process of moving other outstanding tasks to Sprint 33 too.

@anthonyfok anthonyfok modified the milestones: Sprint 33, Sprint 34 May 7, 2021
@anthonyfok
Copy link
Member

anthonyfok commented May 10, 2021

Notes

Wed 2021-02-24 Skype meetings

IIRC, this issue was first discussed on Wednesday, 24 February 2021, over Skype meetings with Will and Drew.

add_data.sh is the main orchestration of the whole thing. It has gotten a lot better over time, but it used to be extremely brittle, so anytime anyone changed any little thing, the whole thing would break. So, we've been spending time and effort trying to make this more flexible...

For example: Will wrote the following SQL script to pull in the social vulnerability data:

If the upstream CSV file (created by e.g. Murray or Tiegan) were changed, e.g. the headers Lon and Lat were changed to lowercase lon and lat, this whole SQL script would break.

So, instead of explicitly defining those header files, headerFields,
we actually read them in (in some cases) from the CSV itself.

And that way when the CSV files are changed, then we end up just loading the whole CSV as it is with the headerFields dynamically generated.

... Some fields are critical...

Mon 2021-05-10 Zoom meeting

About model-factory/scripts/PSRA_copyTables.py

The tables are defined in model-factory/scripts/psra_1.Create_tables.sql.

These Python and SQL scripts are called from opendrr-api/python/add_data.sh like so:

# PSRA_1-8
for PT in ${PT_LIST[@]}
do
  python3 PSRA_runCreate_tables.py --province=${PT} --sqlScript="psra_1.Create_tables.sql"
  python3 PSRA_copyTables.py --province=${PT}
  python3 PSRA_sqlWrapper.py --province=${PT} --sqlScript="psra_2.Create_table_updates.sql"
  python3 PSRA_sqlWrapper.py --province=${PT} --sqlScript="psra_3.Create_psra_building_all_indicators.sql"
  python3 PSRA_sqlWrapper.py --province=${PT} --sqlScript="psra_4.Create_psra_sauid_all_indicators.sql"
  python3 PSRA_sqlWrapper.py --province=${PT} --sqlScript="psra_5.Create_psra_sauid_references_indicators.sql"
done

@anthonyfok anthonyfok changed the title add_data.sh - flexible data loading add_data.sh related Python scripts - flexible data loading May 11, 2021
anthonyfok added a commit to anthonyfok/model-factory that referenced this issue May 14, 2021
WIP, have yet to add code to read the header from CSV files.

[Eventually] Fixes OpenDRR#53
@jvanulde jvanulde modified the milestones: Sprint 34, Sprint 35 May 31, 2021
@anthonyfok anthonyfok modified the milestones: Sprint 35, Sprint 36 Jun 7, 2021
@anthonyfok anthonyfok modified the milestones: Sprint 36, Sprint 37 Jun 17, 2021
@drotheram drotheram removed this from the Sprint 37 milestone Jul 5, 2021
@drotheram drotheram added this to the Sprint 38 milestone Jul 5, 2021
@jvanulde jvanulde pinned this issue Jul 7, 2021
@anthonyfok anthonyfok modified the milestones: Sprint 38, Sprint 39 Jul 15, 2021
@drotheram drotheram removed this from the Sprint 39 milestone Sep 13, 2021
@jvanulde jvanulde added this to the Sprint 44 milestone Oct 21, 2021
@anthonyfok anthonyfok modified the milestones: Sprint 44, Sprint 45 Oct 21, 2021
@drotheram drotheram removed this from the Sprint 45 milestone Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants