-
Notifications
You must be signed in to change notification settings - Fork 2
/
CollaboratorList.Rmd
129 lines (89 loc) · 4.03 KB
/
CollaboratorList.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
---
title: "NSF Collaboration List from PubMed"
author: "Elana J Fertig"
date: "August 15, 2017"
output: word_document
---
# R session
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r}
library('easyPubMed')
library('plyr')
library('xlsx')
sessionInfo()
```
# Set author for query
This code sets the author from which to query. By default it uses my name, Elana Fertig, in the variable `author`. Please format your name as `"LAST NAME" "FIRST INITIAL"` to match pubmed query structure.
`authorFilter` contains the full names as `"LAST NAME" "INITALS"` to filter out additional authors with other initials. This may require further customization depending on your name.
`affiliationFilter` is a strong with an affiliation to filter for authors with similar names.
```{r}
author <- 'Fertig E'
authorFilter <- 'Fertig ET'
affiliationFilter <- 'Northeast Regional Epilepsy Group'
```
# Establish date range
NSF requires a "list of all persons alphabetically who are current or past collaborators on a project, book, article, rpt, abstract or paper for past 48 mos." By default, we assume that the 48 month period starts on the date of running this software. This can be changed to a different date by storing the desired date in the `currentDate` variable as a `Date` object.
```{r}
currentDate <- Sys.Date()
queryDate <- seq(Sys.Date(), length = 2, by = "-48 months")[2]
```
# Query pubmed
```{r}
pmquery <- sapply(articles_to_list(fetch_pubmed_data(get_pubmed_ids(paste0(author,'[Author]')))),article_to_df)
names(pmquery) <- NULL
# filter out oddly formatted references
pmquery <- pmquery[sapply(pmquery,function(x){any(colnames(x)=='year')})]
```
# Find publications from valid date range
```{r}
# get date of publication
datePMID <- as.Date(sapply(pmquery,function(x){paste(x[1,'year'],
x[1,'month'],
x[1,'day'],sep="-")}))
pmquery <- pmquery[datePMID >= queryDate]
```
# Exclude manuscripts by authors with similar names
Note -- this part of the code may need to be customized depending on your name and additional publications that arise from a pubmed query.
```{r}
getInitials <- function(s) {
sapply(strsplit(s,split=" "), function(x){
toupper(paste(substring(x, 1, 1), collapse = ""))
})
}
validIDs <- !as.logical(sapply(sapply(pmquery,function(x){paste(x$lastname,getInitials(x$firstname))}),function(x){length(grep(authorFilter,x)>0)})) &
!as.logical(sapply(sapply(pmquery,function(x){x$address}),function(x){length(grep(affiliationFilter,x)>0)}))
pmquery <- pmquery[validIDs]
```
# Sort into data frame
```{r}
pmquery.dataframe <- ldply(pmquery,data.frame)
# format author names
pmquery.dataframe$Initials <- getInitials(pmquery.dataframe$firstname)
pmquery.dataframe$ColabType <- 'Co-Author'
```
```{r}
morecollab <- read.xlsx('CollaboratorsNew.xlsx',sheetIndex = 1,header=T)
morecollab$ColabType <- 'Collaborator'
pmquery.dataframe <- rbind(pmquery.dataframe,morecollab)
```
```{r}
# find unique co-authors
pmquery.dataframe <- pmquery.dataframe[!duplicated(paste(pmquery.dataframe$lastname,pmquery.dataframe$Initials)),]
# alphabetize the list
pmquery.dataframe <- pmquery.dataframe[order(pmquery.dataframe$lastname,pmquery.dataframe$Initials),]
```
# Word formatted collaboration list
`r paste(paste(paste(pmquery.dataframe$Initials,pmquery.dataframe$lastname),pmquery.dataframe$address,sep=", "),collapse='; ')`
# CSV for COI
```{r}
csvOut <- data.frame('FastLane Proposal Number'=rep('',nrow(pmquery.dataframe)),
'Last Name'=pmquery.dataframe$lastname,
'First Name'=pmquery.dataframe$firstname,
'Institution or Affiliation'=pmquery.dataframe$address,
'Conflicted with Person (Last Name)'=strsplit(author,split=" ")[[1]][1],
'Conflicted with Person (First Name)'=strsplit(author,split=" ")[[1]][2],
'Type of Conflict'=pmquery.dataframe$ColabType)
write.csv(csvOut,row.names = F,file='ConflictOfInterest.csv')
```