This repository contains code for investigating how often manuscripts in Ecology and Evolutionary Biology that cite the R software language make their R code available. The R scripts that this work relies upon are contained in the folder 'R_scripts'. The data generated by this work (which includes both scripted and manual components) are stored in the 'data' folder. The 'figures' folder contains figures produced for a related manuscript (in review). For more information, see the preprint at: https://www.authorea.com/doi/full/10.22541/au.170003886.68548206/v1
The main data file in this repository is cite_data.RDS. This is an RDS file containing information on citation counts for R files and associated predictor variables. Many of the variables are returned from the Rscopus package (https://cran.r-project.org/web/packages/rscopus/index.html) using the scopus API (https://dev.elsevier.com/sc_apis.html). Metadata on fields returned by the scopus API is available at https://dev.elsevier.com/sc_apis.html. Below, we provide information on fields which are NOT returned by the scopus API (i.e., data which we collected).
- uid = A unique ID assigned to each record.
- r_scripts_available = A binary variable (yes/no) describing whether any R code was shared as part of the publication.
- r_used = A binary variable (yes/no) describing whether R was used in the publication (as opposed to simply referenced without being used).
- data_available = A binary variable (yes/no) describing whether the full data underlying the publication were included.
- comments = Unstructured comments about the record. This may contain information about why a judgement was made or where code was found.
- code location = Text string describing where the code was located, options include: NA, "SI", "figshare", "website", "appendix", "dryad", "github", "Github", "zenodo", "environmental data initiative", "sciencebase.gov", "mendeley data", "osf", "bitbucket"
- code format = Text string describing the format a code was shared in, options include: NA, "word", "pdf", "R", "typeset text", "rtf", "txt", "rmd"
- code license = Text string describing the license for the shared code, if any. Note that "NA" means that a license was not specified, where NA means we did not check. Options include: NA, "NA", "GPL", "CC0", "CC-BY", "MIT", "Open", "copyright"
- n = A numeric index variable used to stratify randomization.
See https://dev.elsevier.com/sc_apis.html for information on the following fields:
- title
- author
- year
- doi
- journal
- issn
- volume
- pages
- date
- display_date
- citations
- article_type
- open_access
The other important data file in this repository is impact_factor.csv. This is a CSV file containing information on the impact factors of journals used in this work, as recorded on June 16, 2023. This information on impact factor was provided by the R package "scholar" (https://cran.r-project.org/web/packages/scholar/index.html). Below we provide information on the fields included.
- needed_journals = The list of journals submitted to the scholar R package. These were extracted from the "journal" field of the file cite_data.RDS (see above).
- Journal = The journal title matched by scholar.
- Cites = The number of citations of that journal.
- ImpactFactor = The journal's impact factor.
- Eigenfactor = The journal's Eigenfactor.
- dist = The distance between the submitted journal name and the returned journal name, as calcualted by scholar.
There are two important R scripts in this repository: 1_data_collection.R and 2_analyses_and_figures.R. The former file was used to select publications for the study (along with relevant metadata). The latter file contains code underlying analyses and visualizations.