Skip to content

Latest commit

 

History

History
47 lines (29 loc) · 5.98 KB

README.md

File metadata and controls

47 lines (29 loc) · 5.98 KB

CoVaxxy: A collection of English-language Twitter posts about COVID-19 vaccines

This project, conducted by Indiana University’s Observatory on Social Media (OSoMe) in collaboration with colleagues from Politecnico di Milano, aims to track and investigate how online information impacts COVID-19 vaccine uptake and health outcomes. We offer public access to a large collection of vaccine-related tweets that are gathered in real-time and updated daily (see our data collection paper for more details). We combine this with vaccine uptake and survey data to create the CoVaxxy dashboard, a web-page that allows anyone to visualize descriptive statistics and preliminary results,

Twitter Data

An on-going collection of English-language Twitter posts about COVID-19 vaccines is available here.

To create as complete a set of Twitter posts related to COVID-19 vaccines as possible, we carefully select a list of keywords through a snowball sampling technique. We start with the two most relevant keywords, i.e., covid and vaccine, as our initial seeds. Next, we gather tweets utilizing the filtered stream endpoint of the Twitter API for three hours. From these gathered tweets, we then identify potential keywords that frequently co-occur with the seeds, adding them to our seed list only after manually ensuring they are closely related to our topic. This process was repeated six times between Dec. 15, 2020 and Jan. 2, 2021 with each iteration's data collection taking place at different times of the day to capture tweets from different geographic areas and demographics. The seed list serves as our initial keyword list. We further refine the keyword list by manually combining certain keywords into composites (e.g.covid19 pfizer), as a way to ensure that the dataset is broad enough to include most relevant (English) conversations while excluding tweets that are not related to the vaccine discussion.

Some notes on the query syntax of Twitter's filtered stream API:

  • Queries that include keywords also match hashtags, URLs, and substrings. For example, covid matches cnn.com/covid and #covid19.
  • Using covid19 pfizer as a composite matching phrase will capture tweets that contain covid19 and pfizer. On the other hand, including covid19, pfizer as separate keywords will capture tweets that contain covid19 or pfizer.

Iffy+

To categorize tweets as low credibility, we utilize the Iffy+ Misinfo/Disinfo list created by Iffy.news. As stated on the Iffy+ page, "Iffy+ merges lists of sites that regularly publish mis/disinformation, as identified by major fact-checking and journalism organizations, into a single dataset." Please checkout the description of the list for more information.

Paper

More details on the data collection can be found in our paper describing the collection of English-language Twitter posts about COVID-19 vaccines:

If you use this data, please cite this reference paper.

Dashboard

We have developed a live dashboard to allow people to visualize descriptive statistics and preliminary results. It is available here: https://osome.iu.edu/tools/covaxxy

Complimentary data sources used by the CoVaxxy dashboard:

  • Vaccination data from the Centers for Disease Control and Prevention data found here, as compiled by Our World in Data here.
  • Vaccine acceptance and refusal data from Carnegie Mellon University's Delphi Epidata API survey data, created by the Delphi Research Group.

VaccinItaly:

  • A member of the CoVaxxy team, Francesco Pierri, has also developed the VaccinItaly dashboard which is similar to CoVaxxy. This dashboard, however, specifically monitors Italian conversations around vaccines on Facebook and Twitter.

Team

Acknowledgments

This project is supported in part by the Knight Foundation and Craig Newmark Philanthropies. We used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562.