Bulk import of identifier (and other?) table data from files #62

seasidesparrow · 2024-03-27T16:26:32Z

JournalsDB tables were initially populated by loading existing classic files into tables. Subsequent to this, additions to the tables are made by exporting the tables to Google Sheets, making edits, and reimporting the data. The export/import process records the editorial history and enforces some data requirements, but it is time-intensive if more than a few records are being added.

We need to implement a process in run.py that will trigger the bulk import of data from files, but modified so that the data undergo validity checks prior to db insert. This discussion will focus on the bulk import of ISSNs.

Starting from the assumption that the bibcodes for new records already exist in the master table, we need a task that:

will read a formatted text file containing a bibstem, an ISSN_type, and an ISSN
verify that the bibstems are already in journalsdb (i.e. have a masterid)
verify that the ISSN isn't already in journalsdb.idents
check whether there is an existing ISSN of the given ISSN_type for that bibstem

If the given ISSN is truly new, the task should import the masterid, ISSN_type, and ISSN into idents. If it is not new, or if there is some other issue, it should provide actionable information to the user about what is wrong.

The procedure for this will be something like:

read the file
search journalsdb for the bibstem, and whether that bibstem already has any ISSNs
search journalsdb for the ISSN, and return a warning if it has already been assigned to another bibstem, or the same bibstem but with a different ISSN_type
if bibstem exists, ISSN does not, and ISSN_type has not already been assigned to bibstem, insert (masterid, ISSN, ISSN_type)
if bibstem exists, ISSN does not, but ISSN_type has already been assigned (and you're sure the new record is a valid alternate ISSN for this bibstem), insert (masterid, ISSN, ISSN_type_altN), where the altN is generated from existing table data for alternate ISSNs
if bibstem does not exist, reject the row and warn the user
if bibstem exists and ISSN exists for the same bibstem, reject the row and notify the user the ISSN is already present
if bibstem exists, and ISSN exists but for a different bibstem and/or ISSN_type, reject the row and warn the user there's a problematic record that requires investigation

The code for inserting an identifier of given type already exists in journalsmanager.tasks.task_db_load_identifier, so the new coding required is for reading the file, doing the required sqlalchemy queries to validate the incoming data, and generating logging messages of the appropriate level (info/warning) that will direct the curator on what needs to be resolved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk import of identifier (and other?) table data from files #62

Bulk import of identifier (and other?) table data from files #62

seasidesparrow commented Mar 27, 2024

Bulk import of identifier (and other?) table data from files #62

Bulk import of identifier (and other?) table data from files #62

Comments

seasidesparrow commented Mar 27, 2024