Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_citation_pairs with less human intervention #43

Open
jeanetteclark opened this issue Sep 17, 2024 · 0 comments
Open

write_citation_pairs with less human intervention #43

jeanetteclark opened this issue Sep 17, 2024 · 0 comments

Comments

@jeanetteclark
Copy link
Collaborator

This is all a bit of a mess, there is definitely a better way to do it.

write_citation_pairs takes a data frame with a column for article id and one for dataset id. It loops through each row and uses crossref::cr_cn to retrieve a full citation for the paper using the article id. We need the information such as authors, title, etc to send to the metrics service.

crossref::cr_cn returns the citation in bibtex format (it can also return json and other formats, optionally). Then, that bibtex is passed to bib2df:bib2df, which parses the text string into a data frame. Parsing this text string is somewhat of a nightmare though, and I ended up refactoring bib2df to accommodate single line bibtex docs, which for some reason crossref::cr_cn started returning. So I did that here, but the method that I had to use requires that you know what the fields are for the bibtex entry are. Occasionally, a bibtex entry will come back with a really oddball field in it, and that field name has to be passed to the extra_fields argument of bib2dfand the function run again to get the correct parsing, otherwise the rest of the document is thrown off. This is all especially frustrating because we only need certain fields to pass to the metrics service, but the ENTIRE doc needs to be processed correctly.

So some options to make this require no human intervention:

  1. Capture the warning output from the first pass, parse it, feed the fields back in for a second pass
    • this seems ridiculous
  2. Have crossref::cr_cn just return the json, parse it, and extract what we need, bypassing bib2df entirely
  3. Find a more straightforward way to retrieve just the information we need, probably by querying the crossref API more directly
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant