Skip to content

Latest commit

 

History

History
46 lines (40 loc) · 2.18 KB

readme.md

File metadata and controls

46 lines (40 loc) · 2.18 KB

Preparation of Data and Statistical Analysis

Ben Sabath June 16, 2021

This directory contains code covering the creation of the data set used for analysis in the Health Effects Institute Final Report "Assessing Adverse Health Effects of Long-Term Exposure to Low Levels of Ambient Pollution: Implementation of Causal Inference Methods."

The directory Confounders contains the process by which the zip code level demographic data, smoking and BMI data, and weather data are acquired and prepared for use. Exposures describes the preparation of the PM2.5 data. Please note that we are unable to provide the code and workflow, as our exposure data is provided by our collaborators. However, the PM2.5 exposure data we used is available for download here. HealthOutcomes contains the code used to select data from the Medicare Beneficiary Summary Files. Finally, MergedData contains the process by which all these data sources are combined and cleaned in order to be analyzed by the code in StatisticalAnalysis.

We have included as much data as we are allowed to share and can feasibly include in a github repo (some files are too large to share). Where we are unable to share data, we have provided instructions on how to acquire the source data and prepare it for use with the data pipelines.

Table of Contents

The directories can be read in any order; however, reading in the following order reflects how data flows in the pipeline.