insert_or_update can have problems with multiple "issue"s

`insert_or_update()` (which is really both `insert_or_update_batch()` and `insert_or_update_bulk()` right now; they should be resolved into one in #989) can have problems when given a set of rows that includes more than one `issue`.  It is not that multiple `issue` values themselves are problematic, but the desired result may not be achieved when there are multiple rows for the same source/signal/time/geo that have different issues.

The problem stems from the fact that [the SQL code in `fix_is_latest_issue_sql`](https://github.com/cmu-delphi/delphi-epidata/blob/a2dcfad838870ce05da45d4d3d9138e5cb60a36d/src/acquisition/covidcast/database.py#L133-L142) kinda naively inherently assumes only one `issue` will be present for each `(source, signal, geo_type, geo_value, time_type, time_value)`.  It should be noted that it sort of presumes most rows inserted ***will*** be a latest issue, which is why the `is_latest_issue` column [defaults to 1 in `insert_into_loader_sql`](https://github.com/cmu-delphi/delphi-epidata/blob/a2dcfad838870ce05da45d4d3d9138e5cb60a36d/src/acquisition/covidcast/database.py#L120-L131).  It should also be noted that "overwrites" will happen when `load.issue>latest.issue`, but ALSO when they are `==`, which lets us update/patch previously-written values that may have been incorrect (then [the SQL that moves the load table data to latest and history tables](https://github.com/cmu-delphi/delphi-epidata/blob/a2dcfad838870ce05da45d4d3d9138e5cb60a36d/src/acquisition/covidcast/database.py#L213-L261) has an ON DUPLICATE clause that preserves the unique key `epimetric_id` for the same unique composite key tuple).

In practice, the problematic behavior does not happen because the only time `insert_or_update()` is called ([in `csv_to_database.py:upload_archive()`](https://github.com/cmu-delphi/delphi-epidata/blob/a2dcfad838870ce05da45d4d3d9138e5cb60a36d/src/acquisition/covidcast/csv_to_database.py#L116)), it is handed rows returned by `CsvImporter.load_csv()`, which [only uses a single `issue` value](https://github.com/cmu-delphi/delphi-epidata/blob/a2dcfad838870ce05da45d4d3d9138e5cb60a36d/src/acquisition/covidcast/csv_importer.py#L415) at a time.  However, it is possible that we might write something else in the future that does not account for this expected invariance, so we should take steps to resolve it now.

Non-compliant calls to `insert_or_update()` can still happen in test code via calls to [`_insert_rows()`](https://github.com/cmu-delphi/delphi-epidata/blob/a2dcfad838870ce05da45d4d3d9138e5cb60a36d/src/acquisition/covidcast/test_utils.py#L193-L197).  [This github search](https://github.com/search?q=repo%3Acmu-delphi%2Fdelphi-epidata+%2F%5C._insert_rows%2F&type=code) provides a list of them.

### Possible ways to address this:
- add a constraint to the load table to enforce that no multiple-issue collisions happen
  - can be done just by removing `issue` from [its UNIQUE INDEX](https://github.com/cmu-delphi/delphi-epidata/blob/a2dcfad838870ce05da45d4d3d9138e5cb60a36d/src/ddl/v4_schema.sql#L94))
  - doesnt actually solve the problem, but does prevent us from suffering the ill effects
- fix the `fix_is_latest_issue_sql` calculation to work with multiple `issue`s
  - perhaps by incorporating a GROUP BY
  - perhaps by inserting a SQL statement before it that does its own GROUP BY that sets `is_latest_issue=0` for non-maximal `issue` rows.

Doing unit/integration testing for these conditions should be ~easy, and some work on this appears to have already been done in #925 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

insert_or_update can have problems with multiple "issue"s #1427

Possible ways to address this:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

insert_or_update can have problems with multiple "issue"s #1427

Description

Possible ways to address this:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions