Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch edit for relationships #6283

Draft
wants to merge 17 commits into
base: issue-2331
Choose a base branch
from
Draft

Batch edit for relationships #6283

wants to merge 17 commits into from

Conversation

sharadsw
Copy link
Contributor

@sharadsw sharadsw commented Feb 26, 2025

Fixes #6126
Fixes #6248

Warning

Use a db used for testing other batch edit PRs or create a new one

Adding docs from #4929:

Batch-editing

Implementation and design

  1. The current implementation uses workbench, and query builder.
  2. The workbench and batch-edit dataset are differntiated at the user level using a new "isupdate" field in spdataset table. DEV note: There is no difference at the code level -- everything is as general as possible. In fact, the isupdate field is only used at the code level to follow a special rollback procedure
  3. The batch-edit datasets can seen using the batch-edit overlay accessible via the batch-edit menu item. Currently, the only way to create a new dataset is via the query builder interface.
  4. To make a batch-edit dataset, go to the query builder, and add the relevant fields to the query. Some fields and relationships are not supported. Nested-to-many, for instance, are supported in workbench, but aren't in the batch-edit. Special fields like nodenumber and highestchildnodenumber, fullname field in tree are also not supported.
  5. Other than relationships mentioned above, every field is supported. If an unsupported field like nodenumber is added, it is rendered as readonly. However, you cannot make nested-to-manys visible (this is different than being able to map it) --- you can map nested-to-manys, and even filter on them. As long as they are hidden, it'll not block. You can also arbitrarily add formatted and aggregated fields (they are unmapped, and are ignored)

Batch edit behaviors

  • Make a query with columns in the base table, and select relationships to edit. There are 4 different types of relationships, in general. Some example relationships for Collection Object as base table:

    • To-one dependent (for ex. collectionobjectattribute),
    • To-one independent (for ex. Cataloger, CollectingEvent [when not embedded])
    • To-many dependent (for ex. determinations, preparations),
    • To-many independent (for ex. None for CO as the base table)
  • Fields

    The following fields are readonly. All other simple fields when changed will be updated

[
    "timestampcreated",
    "timestampmodified",
    "version",
    "nodenumber",
    "highestchildnodenumber",
    "rankid",
    "fullname",
    "age",
]
  • To-one dependent (for ex. collectionobjectattribute)

    These relationships get directly updated, and are not matched. If the to-one is not in the db, it'll create one.
    This also includes collectingevent when embedded.

    Test cases to consider:

    • When mapped, the record is directly updated.
    • When mapped, if the record is not present, it'll be created, if not null values are present.
    • If the record previously had values, and the values are removed (making the cells completely empty), the to-one dependent record will be deleted. Since it is possible that there may be other fields in the database (but not in the query), we may accidentally delete the record. Eample: user selected collectionobject -> collectionObjectAttribute -> remarks. And say they set remarks to empty in the spreadsheet, it is is possible that integer1 field in collectionObjectAttribute may have some value. To prevent accidental deletion, by default, we look at all the fields in the database for that record (other than system fields), to determine whether we can delete the record or not. This behaviour is controlled by a remote preference. (described in a later section):
  • To-many dependent (for ex. determinations)

    Same as to-one dependent. These relationships get directly updated. If the corresponding record is not present, a new one gets created.

    Test cases to consider:

    • When mapped, the record is directly updated.
    • When mapped, if the record is not present, it'll be created, if not null values are present.
    • If the cell data is removed, and if every other field is empty in the database (can be disabled via a preference), the record will be deleted.
  • To-one independent (for ex. cataloger)

    These relationships get matched, and uploaded (if match is not found). During upload, it performs a clone of the record (cloning all the non-unique fields, and dependents). The clone takes into account relationships also mapped. That is, if agent needs to be cloned, and you have mapped agentspecialty, it'll take the agentspecialty mapped (rather than cloning previous's agentspecialty).

    Test cases to consider:

    • Start from a collectionobject with a cataloger, and map some fields. Change some of the values (like, say, lastname and firstname) to of agents that are present in the db. verify that agent gets matched. Note that the match can be performed with just the visible fields, or can also include fields in the database, not included in the query. This is controlled via a preference. By default, to be cautious in matching, it uses just the fields visible in the query.
    • If it is unable to match, it'll clone the existing agent, with data from the sheet. Make an agent with addresses/specialties/variants. Make sure the workbench is able to clone the agent correctly, and if you've provided some dependents in the mapping, it takes it.
    • Similar to workbench, you could customize the match behaviour by changing the matching options (like "never ignore", "ignore when blank", and "always ignore")
  • To-many independent

    Same as to-many dependent. The only difference is that we always perform an update (we don't delete these). If a mapped record is not present, it'll create one, without any matching.

    Test cases to consider:

    • Make collection objects, and assign them collectingevent. Do a query using collectingevent as the base table, and add fields of the CO table. Verify that resetting all fields does not delete the collection object (you'll also need to disable the preference that says to look at all fields for null checks)
    • If a record is not present, it'll create one, if there is a non-empty field.
  • Trees

    There are two different routes to perform tree updates.

    • Workbench method:

      If you want to modify a specific rank, or say reassign species for determination, you'd want to add a specific rank in the query. In this case, it always matches and uploads (and possibly clone), so we don't have updates.
      In the query builder, it'll enforce that you select complete branch of the tree. That is, if your query contains rank "species", and "genus", it'll demand you to add ranks all the way down from "genus" to "species". If used part of a relationship, it'll demand going the way down from "genus" to the lowest rank in the tree.

    • Update method:

      If in the query builder, there is no visible tree rank field, it allows direct modifications (and, thus, updates) to the tree table. This will be useful if you want to, say, update remarks for ones that match name "ploia"

    In both of the above methods, fullname, nodenumber, highestchildnodenumber is completely readonly.

Results

There are 4 new different type of results;

image

  • NoChange

Reported when the record was meant to updated, but no change occurred. That is, all the values from the db were the same. This is not visible to the user.

  • Updated

Reported when the record's fields were changed. This does not consider relationships (they are reported with different result)

  • Deleted

Reported when a record is deleted. Happens when a dependent relationship's cells are all empty.

  • MatchedAndChanged

Reported when a to-one independent was matched to another record, different than the current one.

  • The side panel also shows the results per table, for different categories.

Preferences

There are three different preference options.

  • Remote Preferenences (2)

  • Defer For Match
    Set by sp7.batchEdit.deferForMatch.This preference controls whether database fields are included for matching or not. Defaults to false.

  • Defer For Null
    Set by sp7.batchEdit.deferForNull.This preference controls whether database fields are included for determining if the record is null or not. For dependents, null records are deleted, so this preference is used to control the caution batch-edit follows

The preferences can also be accessed from going to Data Mapper > Batch Edit Preferences
image

  • User Preferenences (1)

  • Number of query rows

Determines how many number of query results are used for batch-edit. Defaults to 5000.

  • Show rollback button
    Can enable/disable rollback

image

Rollbacks

Rollbacks are complicated to perform. In the current design, whenever user creates a batch-edit dataset, via the query builder, it makes two datasets. User can only see one of them. The second is a "backer" of the first, and contains a FK to the first (so we can find backer of a dataset later). When rollback is requested, for every row in the main one, we find the original row in the backer, and perform the regular batch-edit update on it. Essentially, it applies original snapshot.

This is highly experimental, so it is recommended to always take a backup of the db, but this should work in a good amount of cases.

Misc

  • Queries from record set are supported.

Checklist

  • Self-review the PR after opening it to make sure the changes look good and
    self-explanatory (or properly documented)
  • Add automated tests
  • Add relevant issue to release milestone
  • Add relevant documentation (Tester - Dev)
  • Add a reverse migration if a migration is present in the PR

Testing instructions

Regression tests

@sharadsw sharadsw changed the base branch from production to issue-6127 February 26, 2025 18:20
Triggered by 350ee9c on branch refs/heads/issue-6126
@sharadsw sharadsw changed the base branch from issue-6127 to issue-2331 February 26, 2025 19:02
@sharadsw
Copy link
Contributor Author

TODO: Change upload plan construction for remote to ones

@sharadsw sharadsw mentioned this pull request Mar 3, 2025
12 tasks
Copy link
Contributor Author

@sharadsw sharadsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upload plan construction for COGs work but COJO records do not get cloned + updated. Any pointers on where I can look?

Okay nevermind, I was testing it incorrectly

@sharadsw
Copy link
Contributor Author

sharadsw commented Mar 10, 2025

TODO:

  • Consider parsing 'fake' decimals to integer for lat/long values

BUG:

  • Mapping lines with trees do not have treedef info in the Data Mapper
  • Handle mapping for any rank

@sharadsw sharadsw added this to the 7.10.3 milestone Mar 11, 2025
@sharadsw
Copy link
Contributor Author

NOTE:
Test missing ranks error dialog when using different tree tables together in the query.
Example:

  • CO -> Determination -> Taxon
  • CO -> CollectingEvent -> Locality -> Geography

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 📋Back Log
Development

Successfully merging this pull request may close these issues.

Batch Edit: Batch edit for relationships
1 participant