-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in computing Local Citations with Scopus Input #467
Comments
Unfortunately, Scopus has changed the way it stores references and, to date, there is no way to identify them uniquely (the string does not include the DOI!). We are of course open to accepting proposals on alternative strategies for identifying local citations in Scopus. |
Indeed, the reference does not include the DOI and so lacks a unique key. I have coded a solution that works quite well, but is not fast: extracting the title fields from the references (assuming it is the longest string, most often the case), and then computing the (cosine or levenshtein) similarity with the TI fields of the local corpus. I got very good results on my corpus. To make it faster and avoid problems with truncated titles (which I had in one instance), the similarity matching could be tried on truncated titles (e.g. 100 characters). |
Thanks, I appreciate your suggestion. |
I have noted there is an error when computing Local Citations with a Scopus input file. The problem does not present itself when using a Web of Science input file. In the histNetwork function there is a filter to remove false positives, based on the PP field. However, not all papers do have PP (page numbers). E.g. Journal of Cleaner Production only uses an article identifier, e.g. Van der Waal, Johannes WH, and Thomas Thijssens. "Corporate involvement in sustainable development goals: Exploring the territory." Journal of Cleaner Production 252 (2020): 119625.
It is here: CR <- CR %>%
dplyr::filter(!is.na(PY), (substr(CR$PP,1,1) %in% 0:9))
or here: CR <- CR %>%
left_join(M_merge, join_by("PY", "AU"), relationship = "many-to-many") %>%
dplyr::filter(!is.na(Included)) %>%
group_by(PY,AU) %>%
mutate(toRemove = ifelse(!is.na(PP.y) & PP.x!=PP.y, TRUE,FALSE)) %>% # to remove FALSE POSITIVE
The text was updated successfully, but these errors were encountered: