Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for 116 #236

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from
Open

Fix for 116 #236

wants to merge 3 commits into from

Conversation

vjrj
Copy link
Contributor

@vjrj vjrj commented Apr 15, 2024

This PR fix #116. Basically:

  • Takes into account other contact fields
  • Update also contacts without email
  • Prevent to incorrect associate drs to the first contact without email as Contact.findByEmail(empty) was picked in this case.
  • Do not fail if an email has a trailing or leading space.

Copy link
Contributor

@adam-collins adam-collins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a question.

if (!emails.contains(contact.email)) {
def existingContact = old.getContacts().find {
(it.contact.email && !it.contact.email.isEmpty() && it.contact.email == contact.email) ||
(it.contact.firstName == contact.firstName && it.contact.lastName == contact.lastName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with the context so this might be misinformed.

The firstName/lastName comparison may be a problem. For example:

  1. A contact has a new email (same firstName and lastName). This no longer adds a new contact with the new email. I would always want to add the new email.
  2. A contact has same firstName and lastName as an old contact. The new contact is not added. While unlikely, I do not want prevent two contacts with the same name from existing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's problematic, but as email is not mandatory and we don't have a primary key, we should use the names.

For instance, this is the IPT:
image
Duplicated contacts but with different names or emails results in duplicate contacts. The IPT only fix these duplicates if you use exactly the same names and email.

image

I was thinking that probably a better solution for us to maintain contacts in sync, is to always remove all contact_for from a data resource and re-add with the new contacts instead of trying to compare and update. In this case we can have "orphans" old contacts without any dr, so maybe we can also check this and remove the orphans contacts (imagine a contact without email or a misspelled name that is corrected in a new version of a dr).

I was thinking of adding in a future PR and doing this PR something similar to what we have but improved and less drastic and easy to review.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR, when merging new information, it must not exclude more new contacts that it already does. I now understand it is already excluding those without an email or those that share an email (given that it is not the primary key).

At the very least, this name comparison line needs a change. I'll leave it to you if you want to address it in this PR or remove this line and address it a new PR.

I agree with your suggestion. It is preferable to:

  1. Add all new contacts. There will be a comparison of all contact fields to determine what is new.
  2. Remove old contacts (the ContactFor) that are no longer in the IPT source. This makes the assumption no contact information, for an IPT resource, will be manually altered in the collectory.
  3. Deleting of contacts that are orphaned (those without a ContactFor)

Having looked at the code a little longer I have another question. Previously the creator contact was added as a primary=true. The additional dataset.contacts and dataset.associatedParties are also added as primary=true. Should these be primary=false?

Copy link
Contributor Author

@vjrj vjrj Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we have a list of primary contacts and we check if it should be marked as primary or not. See below the part:
boolean isPrimaryContact = update.primaryContacts.contains(contact)
I think is better to continue with the rest of contactFor part in a new PR to simplify the review.

This PR already addresses the original issues described in #116. Currently the drs with contacts without email are all assigned (incorrectly) to the first contact without email. In our case:

image

@vjrj
Copy link
Contributor Author

vjrj commented Apr 16, 2024

I think we can refactor this code a bit like this for better results:

First the EMLimportService gets a list a current eml contacts in a dr and tries to find the contacts in the db.

  1. If the contact already exist (looking for the email or the full name), update them if we have more info (a name or an email, etc)
  2. If a contact is not in the contact table, add it
  3. take into consideration if a contact is primary or not and maintain an additional list of primary contacts
    return the current updated contact list and the primary contact list

In the IptService we receive that new contact lists and:
4) maintain a list of old contacts in a variable
5) remove all contact_for for that dr
6) relink the new in sync contacts list created in EMLImportService to that dr in contacts_for (as primary or not as we have a list created in 3)
7) loop the old contacts list of step 4) and remove orphan contacts that does not have an entry in contact_for.

With this I think we'll have the contacts exactly with the same info as the EML and the IPT, and contacts not longer linked to a dr will be removed. I think I'm not missing anything, but...

@vjrj
Copy link
Contributor Author

vjrj commented Apr 18, 2024

I added a first part that update existing contacts too taking into consideration if the contact is primary or not.

I can continue with the rest or do it in a new PR in the near future.

This PR solves a lot of issues (the most severe is that currently the first contact without email get associated all drs of contacts without emails, that is a big list for one contact).

def contact = addOrUpdateContact(it)
if (contact){
contacts << contact
primaryContacts << contact
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change to only use eml.dataset.contacts as primaryContacts is OK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what my data manager suggested.

boolean hasEmail = emlElement?.electronicMailAddress?.text()?.trim()?.isEmpty() == false
boolean hasName = emlElement?.individualName?.surName?.text()?.trim()?.isEmpty() == false

if (!contact && (hasEmail || hasName)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inclusion of && (hasEmail || hasName) might cause null contact exceptions further along, if it is ever false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the case in our 500 contacts. Looking to the eml expecification:

image

The only case I can imagine is a contact with only organizationName and no email and no individualName that seems to me something quite useless for a "contact" (Are you suggesting that we should take into account the case of "contact: organizationName: CSIRO"?).

}
return null
} else {
return null
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adam-collins I added this to prevent the NPE if a contact does not have name or email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IPT contacts sync issues
2 participants