You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is all a bit of a mess, there is definitely a better way to do it.
write_citation_pairs takes a data frame with a column for article id and one for dataset id. It loops through each row and uses crossref::cr_cn to retrieve a full citation for the paper using the article id. We need the information such as authors, title, etc to send to the metrics service.
crossref::cr_cn returns the citation in bibtex format (it can also return json and other formats, optionally). Then, that bibtex is passed to bib2df:bib2df, which parses the text string into a data frame. Parsing this text string is somewhat of a nightmare though, and I ended up refactoring bib2df to accommodate single line bibtex docs, which for some reason crossref::cr_cn started returning. So I did that here, but the method that I had to use requires that you know what the fields are for the bibtex entry are. Occasionally, a bibtex entry will come back with a really oddball field in it, and that field name has to be passed to the extra_fields argument of bib2dfand the function run again to get the correct parsing, otherwise the rest of the document is thrown off. This is all especially frustrating because we only need certain fields to pass to the metrics service, but the ENTIRE doc needs to be processed correctly.
So some options to make this require no human intervention:
Capture the warning output from the first pass, parse it, feed the fields back in for a second pass
this seems ridiculous
Have crossref::cr_cn just return the json, parse it, and extract what we need, bypassing bib2df entirely
Find a more straightforward way to retrieve just the information we need, probably by querying the crossref API more directly
The text was updated successfully, but these errors were encountered:
This is all a bit of a mess, there is definitely a better way to do it.
write_citation_pairs
takes a data frame with a column for article id and one for dataset id. It loops through each row and usescrossref::cr_cn
to retrieve a full citation for the paper using the article id. We need the information such as authors, title, etc to send to the metrics service.crossref::cr_cn
returns the citation in bibtex format (it can also return json and other formats, optionally). Then, that bibtex is passed tobib2df:bib2df
, which parses the text string into a data frame. Parsing this text string is somewhat of a nightmare though, and I ended up refactoring bib2df to accommodate single line bibtex docs, which for some reasoncrossref::cr_cn
started returning. So I did that here, but the method that I had to use requires that you know what the fields are for the bibtex entry are. Occasionally, a bibtex entry will come back with a really oddball field in it, and that field name has to be passed to theextra_fields
argument ofbib2df
and the function run again to get the correct parsing, otherwise the rest of the document is thrown off. This is all especially frustrating because we only need certain fields to pass to the metrics service, but the ENTIRE doc needs to be processed correctly.So some options to make this require no human intervention:
The text was updated successfully, but these errors were encountered: