Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downweight cases where org unit doesn't match #523

Open
paulalbert1 opened this issue Nov 7, 2023 · 0 comments
Open

Downweight cases where org unit doesn't match #523

paulalbert1 opened this issue Nov 7, 2023 · 0 comments
Assignees

Comments

@paulalbert1
Copy link
Contributor

paulalbert1 commented Nov 7, 2023

Background

There are a number of cases where a user will have org units in their profile and they don't even come close to matching the org unit on file. To this point, we've ignored such cases. But maybe we can use this data to cut down on false positives.

An example is personIdentifier = sue2002 and PMID = 36630615. Psychiatry (sue2002's org unit) is very different than Cell and Developmental Biology.

Screenshot 2023-11-07 at 5 25 19 PM

For our data set, I estimate this will improve accuracy by 0.5%, by reducing the number of false positives. But given our use of organizational synonyms, the only way to tell for certain would be to run this for everyone.

Requirements

This Java file outputs in part a value called strategy.orgUnitScoringStrategy.organizationalUnitDepartmentMatchingScore. This is for a positive departmental match. I want to update the code so it also outputs a organizationalUnitDepartmentNegativeMatchingScore in these circumstances:

  1. identity.getOrganizationalUnits() != null
  2. articleAffiliation != null
  3. The words "Department of ", "Division of ", etc. exist in articleAffiliation string but that match fails.

See this PR. It hasn't been "tested" and it probably doesn't "work," but I think it's on the right track.

Here's how a particular downweight affects the number of true / false positives / negatives. This is from a set of ~200,000 articles.

0 (downweight) - 7657 (error count)

FALSE NEGATIVE	3779
FALSE POSITIVE	3878
TRUE NEGATIVE	11094
TRUE POSITIVE	26427


0.1 - 7560

FALSE NEGATIVE	3976
FALSE POSITIVE	3584
TRUE NEGATIVE	11388
TRUE POSITIVE	26230


0.2 - 7442

FALSE NEGATIVE	4193
FALSE POSITIVE	3249
TRUE NEGATIVE	11723
TRUE POSITIVE	26013


0.3 - 7279

FALSE NEGATIVE	4445
FALSE POSITIVE	2834
TRUE NEGATIVE	12138
TRUE POSITIVE	25761


0.4 - 7303

FALSE NEGATIVE	4675
FALSE POSITIVE	2628
TRUE NEGATIVE	12344
TRUE POSITIVE	25531


0.5 - 7374

FALSE NEGATIVE	5051
FALSE POSITIVE	2323
TRUE NEGATIVE	12649
TRUE POSITIVE	25155

Test case

The combination of personIdentifier = sue2002 and PMID = 36630615 should return this...

        "organizationalUnitEvidence": [
          {
            "identityOrganizationalUnit": "Payne Whitney (Psychiatry)",
            "articleAffiliation": "Department of Cell and Developmental Biology, University College London, London, UK.",
            "organizationalUnitType": "DEPARTMENT",
            "organizationalUnitMatchingScore": -0.4,
            "organizationalUnitModifierScore": 0
          }
        ],
@paulalbert1 paulalbert1 changed the title Down-weight cases where org unit doesn't match Downweight cases where org unit doesn't match Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants