Data from: Making headlines: An analysis of US government-funded cancer research mentioned in online media
This repository provides external input data, code, and analysis scripts to fully reproduce results.
The produced dataset consists of 6 different files found in data/results
:
articles_metadata.csv
- basic metadata (doi, title, journal, year)articles_mesh_term_dummies.csv
- dummies for the 13 selected mesh termsarticles_mesh_subterm_dummies.csv
- dummies for 12 mesh subtermsarticles_funding_dummies.csv
- dummies for 7 funding typesarticles_news_coverage.csv
- news mention counts and dummies for the classified tiersnews_mentions_details.csv
- details about individual news articles (PMID, title, venue_name, venue_url, date, URL, summary, and tier)
The folder plots
contains figures and scripts used to create them.
- MeSH terms
- q2018.bin - (available at the NIH FTP server here)
- d2018.bin - (available at the NIH FTP server here)
- List of news outlets (provided by Altmetric.com)
-
Clone the repository including submodules
git clone [email protected]:ScholCommLab/cancer-news.git
-
Install requirements
-
Copy
example_config.yml
toconfig.yml
and insert your Altmetric key. -
Run the following scripts:
-
Collect PubMed data
Rscript 01_collect_pubmed.R
-
Collect altmetrics from Altmetric.com
python 02_collect_altmetrics.py
-
Create dummy variables and export data
python 03_export_data.py
-
The created dataset, as specified in the description of results, is published in the public domain. No rights reserved.
Data provided by the National Library of Medicine (NLM) is distributed under the same conditions as specified by the NLM: https://www.nlm.nih.gov/databases/download/terms_and_conditions.html
Data provided by Altmetric.com LLC cannot be further redistributed without their permission.