Batch edit for relationships #6283

sharadsw · 2025-02-26T18:19:16Z

Fixes #6126
Fixes #6248

Warning

Use a db used for testing other batch edit PRs or create a new one

Adding docs from #4929:

Batch-editing

Implementation and design

The current implementation uses workbench, and query builder.
The workbench and batch-edit dataset are differntiated at the user level using a new "isupdate" field in spdataset table. DEV note: There is no difference at the code level -- everything is as general as possible. In fact, the isupdate field is only used at the code level to follow a special rollback procedure
The batch-edit datasets can seen using the batch-edit overlay accessible via the batch-edit menu item. Currently, the only way to create a new dataset is via the query builder interface.
To make a batch-edit dataset, go to the query builder, and add the relevant fields to the query. Some fields and relationships are not supported. Nested-to-many, for instance, are supported in workbench, but aren't in the batch-edit. Special fields like nodenumber and highestchildnodenumber, fullname field in tree are also not supported.
Other than relationships mentioned above, every field is supported. If an unsupported field like nodenumber is added, it is rendered as readonly. However, you cannot make nested-to-manys visible (this is different than being able to map it) --- you can map nested-to-manys, and even filter on them. As long as they are hidden, it'll not block. You can also arbitrarily add formatted and aggregated fields (they are unmapped, and are ignored)

Batch edit behaviors

Make a query with columns in the base table, and select relationships to edit. There are 4 different types of relationships, in general. Some example relationships for Collection Object as base table:
- To-one dependent (for ex. collectionobjectattribute),
- To-one independent (for ex. Cataloger, CollectingEvent [when not embedded])
- To-many dependent (for ex. determinations, preparations),
- To-many independent (for ex. None for CO as the base table)
Fields

The following fields are readonly. All other simple fields when changed will be updated

[
    "timestampcreated",
    "timestampmodified",
    "version",
    "nodenumber",
    "highestchildnodenumber",
    "rankid",
    "fullname",
    "age",
]

To-one dependent (for ex. collectionobjectattribute)

These relationships get directly updated, and are not matched. If the to-one is not in the db, it'll create one.
This also includes collectingevent when embedded.

Test cases to consider:
- When mapped, the record is directly updated.
- When mapped, if the record is not present, it'll be created, if not null values are present.
- If the record previously had values, and the values are removed (making the cells completely empty), the to-one dependent record will be deleted. Since it is possible that there may be other fields in the database (but not in the query), we may accidentally delete the record. Eample: user selected collectionobject -> collectionObjectAttribute -> remarks. And say they set remarks to empty in the spreadsheet, it is is possible that integer1 field in collectionObjectAttribute may have some value. To prevent accidental deletion, by default, we look at all the fields in the database for that record (other than system fields), to determine whether we can delete the record or not. This behaviour is controlled by a remote preference. (described in a later section):
To-many dependent (for ex. determinations)

Same as to-one dependent. These relationships get directly updated. If the corresponding record is not present, a new one gets created.

Test cases to consider:
- When mapped, the record is directly updated.
- When mapped, if the record is not present, it'll be created, if not null values are present.
- If the cell data is removed, and if every other field is empty in the database (can be disabled via a preference), the record will be deleted.
To-one independent (for ex. cataloger)

These relationships get matched, and uploaded (if match is not found). During upload, it performs a clone of the record (cloning all the non-unique fields, and dependents). The clone takes into account relationships also mapped. That is, if agent needs to be cloned, and you have mapped agentspecialty, it'll take the agentspecialty mapped (rather than cloning previous's agentspecialty).

Test cases to consider:
- Start from a collectionobject with a cataloger, and map some fields. Change some of the values (like, say, lastname and firstname) to of agents that are present in the db. verify that agent gets matched. Note that the match can be performed with just the visible fields, or can also include fields in the database, not included in the query. This is controlled via a preference. By default, to be cautious in matching, it uses just the fields visible in the query.
- If it is unable to match, it'll clone the existing agent, with data from the sheet. Make an agent with addresses/specialties/variants. Make sure the workbench is able to clone the agent correctly, and if you've provided some dependents in the mapping, it takes it.
- Similar to workbench, you could customize the match behaviour by changing the matching options (like "never ignore", "ignore when blank", and "always ignore")
To-many independent

Same as to-many dependent. The only difference is that we always perform an update (we don't delete these). If a mapped record is not present, it'll create one, without any matching.

Test cases to consider:
- Make collection objects, and assign them collectingevent. Do a query using collectingevent as the base table, and add fields of the CO table. Verify that resetting all fields does not delete the collection object (you'll also need to disable the preference that says to look at all fields for null checks)
- If a record is not present, it'll create one, if there is a non-empty field.
Trees

There are two different routes to perform tree updates.
- Workbench method:
  
  If you want to modify a specific rank, or say reassign species for determination, you'd want to add a specific rank in the query. In this case, it always matches and uploads (and possibly clone), so we don't have updates.
  In the query builder, it'll enforce that you select complete branch of the tree. That is, if your query contains rank "species", and "genus", it'll demand you to add ranks all the way down from "genus" to "species". If used part of a relationship, it'll demand going the way down from "genus" to the lowest rank in the tree.
- Update method:
  
  If in the query builder, there is no visible tree rank field, it allows direct modifications (and, thus, updates) to the tree table. This will be useful if you want to, say, update remarks for ones that match name "ploia"
In both of the above methods, fullname, nodenumber, highestchildnodenumber is completely readonly.

Results

There are 4 new different type of results;

NoChange

Reported when the record was meant to updated, but no change occurred. That is, all the values from the db were the same. This is not visible to the user.

Updated

Reported when the record's fields were changed. This does not consider relationships (they are reported with different result)

Deleted

Reported when a record is deleted. Happens when a dependent relationship's cells are all empty.

MatchedAndChanged

Reported when a to-one independent was matched to another record, different than the current one.

The side panel also shows the results per table, for different categories.

Preferences

There are three different preference options.

Remote Preferenences (2)
Defer For Match
Set by sp7.batchEdit.deferForMatch.This preference controls whether database fields are included for matching or not. Defaults to false.
Defer For Null
Set by sp7.batchEdit.deferForNull.This preference controls whether database fields are included for determining if the record is null or not. For dependents, null records are deleted, so this preference is used to control the caution batch-edit follows

The preferences can also be accessed from going to Data Mapper > Batch Edit Preferences

User Preferenences (1)
Number of query rows

Determines how many number of query results are used for batch-edit. Defaults to 5000.

Show rollback button
Can enable/disable rollback

Rollbacks

Rollbacks are complicated to perform. In the current design, whenever user creates a batch-edit dataset, via the query builder, it makes two datasets. User can only see one of them. The second is a "backer" of the first, and contains a FK to the first (so we can find backer of a dataset later). When rollback is requested, for every row in the main one, we find the original row in the backer, and perform the regular batch-edit update on it. Essentially, it applies original snapshot.

This is highly experimental, so it is recommended to always take a backup of the db, but this should work in a good amount of cases.

Misc

Queries from record set are supported.

Checklist

Self-review the PR after opening it to make sure the changes look good and
self-explanatory (or properly documented)
Add automated tests
Add relevant issue to release milestone
Add relevant documentation (Tester - Dev)
Add a reverse migration if a migration is present in the PR

Testing instructions

Regression tests

Test Add a feature to choose type of taxon tree in WB #5091

Triggered by 350ee9c on branch refs/heads/issue-6126

sharadsw · 2025-02-26T19:40:13Z

TODO: Change upload plan construction for remote to ones

- This was caused because we treat remote to-ones as to-many in the upload plan (affects COGs)

sharadsw

~~Upload plan construction for COGs work but COJO records do not get cloned + updated. Any pointers on where I can look?~~

Okay nevermind, I was testing it incorrectly

specifyweb/stored_queries/batch_edit.py

specifyweb/workbench/upload/clone.py

sharadsw · 2025-03-10T16:45:00Z

TODO:

Consider parsing 'fake' decimals to integer for lat/long values

BUG:

Mapping lines with trees do not have treedef info in the Data Mapper
Handle mapping for any rank

sharadsw · 2025-03-12T20:44:44Z

NOTE:
Test missing ranks error dialog when using different tree tables together in the query.
Example:

CO -> Determination -> Taxon
CO -> CollectingEvent -> Locality -> Geography

Enable relationships

350ee9c

sharadsw changed the base branch from production to issue-6127 February 26, 2025 18:20

Lint code with ESLint and Prettier

2fc7842

Triggered by 350ee9c on branch refs/heads/issue-6126

sharadsw changed the base branch from issue-6127 to issue-2331 February 26, 2025 19:02

Merge remote-tracking branch 'origin/issue-2331' into issue-6126

b1ef1eb

sharadsw added 4 commits February 27, 2025 11:54

Enable data mapper and batch edit preferences

c1f092d

Fix localizations

ec7424c

Consider remote to ones as to many in upload plan

aee76e5

Add remote to ones method

daabdc4

sharadsw mentioned this pull request Mar 3, 2025

Batch edit for multiple trees #6196

Open

12 tasks

sharadsw and others added 7 commits March 3, 2025 16:09

Merge branch 'issue-2331' into issue-6126

5992ae5

Merge remote-tracking branch 'origin/issue-6127' into issue-6126

f4d02ca

Merge remote-tracking branch 'origin/issue-6127' into issue-6126

fc7be1f

Avoid cloning to-ones when committing

d6c608c

- This was caused because we treat remote to-ones as to-many in the upload plan (affects COGs)

Merge branch 'issue-2331' into issue-6126

3c9b344

Fix to many for tree in relationships

82ec65c

Merge remote-tracking branch 'origin/issue-6126' into issue-6126

019e932

sharadsw commented Mar 6, 2025

View reviewed changes

specifyweb/stored_queries/batch_edit.py Show resolved Hide resolved

specifyweb/workbench/upload/clone.py Show resolved Hide resolved

Change revert to rollback in pref localization

8c6d897

sharadsw added 2 commits March 11, 2025 15:09

Use TreeRankRecord in upload plan

cc5cf55

Fix multiple rank in row error

4e93d78

sharadsw added this to the 7.10.3 milestone Mar 11, 2025

melton-jason mentioned this pull request Mar 21, 2025

Add tooltip when batch edit is disabled for trees #6349

Merged

9 tasks

emenslin mentioned this pull request Mar 24, 2025

Cannot batch edit CO dataset with non default cat # #6359

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch edit for relationships #6283

Batch edit for relationships #6283

sharadsw commented Feb 26, 2025 •

edited

Loading

sharadsw commented Feb 26, 2025

sharadsw left a comment •

edited

Loading

sharadsw commented Mar 10, 2025 •

edited

Loading

sharadsw commented Mar 12, 2025

Batch edit for relationships #6283

Are you sure you want to change the base?

Batch edit for relationships #6283

Conversation

sharadsw commented Feb 26, 2025 • edited Loading

Batch-editing

Implementation and design

Batch edit behaviors

Fields

To-one dependent (for ex. collectionobjectattribute)

To-many dependent (for ex. determinations)

To-one independent (for ex. cataloger)

To-many independent

Trees

Results

NoChange

Updated

Deleted

MatchedAndChanged

Preferences

Remote Preferenences (2)

User Preferenences (1)

Rollbacks

Misc

Checklist

Testing instructions

Regression tests

sharadsw commented Feb 26, 2025

sharadsw left a comment • edited Loading

Choose a reason for hiding this comment

sharadsw commented Mar 10, 2025 • edited Loading

sharadsw commented Mar 12, 2025

sharadsw commented Feb 26, 2025 •

edited

Loading

sharadsw left a comment •

edited

Loading

sharadsw commented Mar 10, 2025 •

edited

Loading