Downweight cases where org unit doesn't match #523

paulalbert1 · 2023-11-07T23:25:40Z

Background

There are a number of cases where a user will have org units in their profile and they don't even come close to matching the org unit on file. To this point, we've ignored such cases. But maybe we can use this data to cut down on false positives.

An example is personIdentifier = sue2002 and PMID = 36630615. Psychiatry (sue2002's org unit) is very different than Cell and Developmental Biology.

For our data set, I estimate this will improve accuracy by 0.5%, by reducing the number of false positives. But given our use of organizational synonyms, the only way to tell for certain would be to run this for everyone.

Requirements

This Java file outputs in part a value called strategy.orgUnitScoringStrategy.organizationalUnitDepartmentMatchingScore. This is for a positive departmental match. I want to update the code so it also outputs a organizationalUnitDepartmentNegativeMatchingScore in these circumstances:

identity.getOrganizationalUnits() != null
articleAffiliation != null
The words "Department of ", "Division of ", etc. exist in articleAffiliation string but that match fails.

See this PR. It hasn't been "tested" and it probably doesn't "work," but I think it's on the right track.

Here's how a particular downweight affects the number of true / false positives / negatives. This is from a set of ~200,000 articles.

0 (downweight) - 7657 (error count)

FALSE NEGATIVE	3779
FALSE POSITIVE	3878
TRUE NEGATIVE	11094
TRUE POSITIVE	26427


0.1 - 7560

FALSE NEGATIVE	3976
FALSE POSITIVE	3584
TRUE NEGATIVE	11388
TRUE POSITIVE	26230


0.2 - 7442

FALSE NEGATIVE	4193
FALSE POSITIVE	3249
TRUE NEGATIVE	11723
TRUE POSITIVE	26013


0.3 - 7279

FALSE NEGATIVE	4445
FALSE POSITIVE	2834
TRUE NEGATIVE	12138
TRUE POSITIVE	25761


0.4 - 7303

FALSE NEGATIVE	4675
FALSE POSITIVE	2628
TRUE NEGATIVE	12344
TRUE POSITIVE	25531


0.5 - 7374

FALSE NEGATIVE	5051
FALSE POSITIVE	2323
TRUE NEGATIVE	12649
TRUE POSITIVE	25155

Test case

The combination of personIdentifier = sue2002 and PMID = 36630615 should return this...

        "organizationalUnitEvidence": [
          {
            "identityOrganizationalUnit": "Payne Whitney (Psychiatry)",
            "articleAffiliation": "Department of Cell and Developmental Biology, University College London, London, UK.",
            "organizationalUnitType": "DEPARTMENT",
            "organizationalUnitMatchingScore": -0.4,
            "organizationalUnitModifierScore": 0
          }
        ],

The text was updated successfully, but these errors were encountered:

paulalbert1 assigned mrj4001 Nov 7, 2023

paulalbert1 changed the title ~~Down-weight cases where org unit doesn't match~~ Downweight cases where org unit doesn't match Nov 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downweight cases where org unit doesn't match #523

Downweight cases where org unit doesn't match #523

paulalbert1 commented Nov 7, 2023 •

edited

Loading

Downweight cases where org unit doesn't match #523

Downweight cases where org unit doesn't match #523

Comments

paulalbert1 commented Nov 7, 2023 • edited Loading

Background

Requirements

Test case

paulalbert1 commented Nov 7, 2023 •

edited

Loading