From e601a703ad021eef4cfe17b491b3ecbcf302d28f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Lucas=20Rod=C3=A9s-Guirao?= Date: Mon, 16 Sep 2024 18:31:28 +0200 Subject: [PATCH] =?UTF-8?q?=F0=9F=93=8A=20covid:=20deaths=20by=20vax=20sta?= =?UTF-8?q?tus=20(#3297)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * 📊 covid: deaths by vax status * snapshot wip * meadow * garden * change entity->country colname * add title_public * grapher wip * infections: add title_public * add country name to indicator title * remove import --- dag/covid.yml | 11 ++ .../covid/latest/deaths_vax_status.meta.yml | 120 ++++++++++++++++++ .../garden/covid/latest/deaths_vax_status.py | 28 ++++ .../covid/latest/infections_model.meta.yml | 8 ++ .../grapher/covid/latest/deaths_vax_status.py | 32 +++++ .../meadow/covid/latest/deaths_vax_status.py | 85 +++++++++++++ snapshots/covid/latest/deaths_vax_status.py | 41 ++++++ .../latest/deaths_vax_status_chile.csv.dvc | 33 +++++ .../latest/deaths_vax_status_england.csv.dvc | 39 ++++++ .../deaths_vax_status_switzerland.csv.dvc | 32 +++++ .../covid/latest/deaths_vax_status_us.csv.dvc | 30 +++++ 11 files changed, 459 insertions(+) create mode 100644 etl/steps/data/garden/covid/latest/deaths_vax_status.meta.yml create mode 100644 etl/steps/data/garden/covid/latest/deaths_vax_status.py create mode 100644 etl/steps/data/grapher/covid/latest/deaths_vax_status.py create mode 100644 etl/steps/data/meadow/covid/latest/deaths_vax_status.py create mode 100644 snapshots/covid/latest/deaths_vax_status.py create mode 100644 snapshots/covid/latest/deaths_vax_status_chile.csv.dvc create mode 100644 snapshots/covid/latest/deaths_vax_status_england.csv.dvc create mode 100644 snapshots/covid/latest/deaths_vax_status_switzerland.csv.dvc create mode 100644 snapshots/covid/latest/deaths_vax_status_us.csv.dvc diff --git a/dag/covid.yml b/dag/covid.yml index 09590f53712..2f396dafef7 100644 --- a/dag/covid.yml +++ b/dag/covid.yml @@ -268,3 +268,14 @@ steps: - data://meadow/covid/latest/infections_model data://grapher/covid/latest/infections_model: - data://garden/covid/latest/infections_model + + # Deaths by vaccination status + data://meadow/covid/latest/deaths_vax_status: + - snapshot://covid/latest/deaths_vax_status_england.csv + - snapshot://covid/latest/deaths_vax_status_us.csv + - snapshot://covid/latest/deaths_vax_status_chile.csv + - snapshot://covid/latest/deaths_vax_status_switzerland.csv + data://garden/covid/latest/deaths_vax_status: + - data://meadow/covid/latest/deaths_vax_status + data://grapher/covid/latest/deaths_vax_status: + - data://garden/covid/latest/deaths_vax_status diff --git a/etl/steps/data/garden/covid/latest/deaths_vax_status.meta.yml b/etl/steps/data/garden/covid/latest/deaths_vax_status.meta.yml new file mode 100644 index 00000000000..2b4311fe036 --- /dev/null +++ b/etl/steps/data/garden/covid/latest/deaths_vax_status.meta.yml @@ -0,0 +1,120 @@ +# NOTE: To learn more about the fields, hover over their names. +definitions: + common: + description_short: |- + Death rates are calculated as the number of deaths in each group, divided by the total number of people in this group. This is given per 100,000 people. + unit: doses + presentation: + topic_tags: + - COVID-19 + + +# Learn more about the available fields: +# http://docs.owid.io/projects/etl/architecture/metadata/reference/ +dataset: + update_period_days: 0 + title: COVID-19, deaths by vaccination status + + +tables: + us: + common: + description_key: + - The mortality rate for the 'All ages' group is age-standardized to account for the different vaccination rates of older and younger people. + variables: + us_unvaccinated: + title: Death rate (weekly) of unvaccinated people - United States, by age + presentation: + title_public: Death rate (weekly) of unvaccinated people - United States, by age + display: + name: Unvaccinated + us_vaccinated_no_biv_booster: + title: Death rate (weekly) of fully vaccinated people (without bivalent booster) - United States, by age + presentation: + title_public: Death rate (weekly) of fully vaccinated people (without bivalent booster) - United States, by age + display: + name: Vaccinated without bivalent booster + us_vaccinated_with_biv_booster: + title: Death rate (weekly) of fully vaccinated people (with bivalent booster) - United States, by age + presentation: + title_public: Death rate (weekly) of fully vaccinated people (with bivalent booster) - United States, by age + display: + name: Vaccinated with bivalent booster + + chile: + common: + description_key: + - The mortality rate for the 'All ages' group is age-standardized to account for the different vaccination rates of older and younger people. + variables: + chile_0_1_dose: + title: Death rate (weekly) of people with 0 or 1 dose - Chile, by age + presentation: + title_public: Death rate (weekly) of people with 0 or 1 dose - Chile, by age + display: + name: 0 or 1 dose + chile_2_doses: + title: Death rate (weekly) of people with 2 doses - Chile, by age + presentation: + title_public: Death rate (weekly) of people with 2 doses - Chile, by age + display: + name: 2 doses + chile_3_doses: + title: Death rate (weekly) of people with 3 doses - Chile, by age + presentation: + title_public: Death rate (weekly) of people with 3 doses - Chile, by age + display: + name: 3 doses + chile_4_doses: + title: Death rate (weekly) of people with 4 doses - Chile, by age + presentation: + title_public: Death rate (weekly) of people with 4 doses - Chile, by age + display: + name: 4 doses + + england: + common: + description_key: + - Unvaccinated people have not received any dose. + - Partially-vaccinated people are excluded. + - Fully-vaccinated people have received all doses prescribed by the initial vaccination protocol. + - The mortality rate is age-standardized to account for the different vaccination rates of older and younger people. + variables: + england_unvaccinated: + title: Death rate (monthly) of unvaccinated people - England, by age + presentation: + title_public: Death rate (monthly) of unvaccinated people - England, by age + display: + name: Unvaccinated + england_fully_vaccinated: + title: Death rate (monthly) of fully vaccinated people - England, by age + presentation: + title_public: Death rate (monthly) of fully vaccinated people - England, by age + display: + name: Fully vaccinated + + switzerland: + common: + description_key: + - Data coverage includes both Switzerland and Liechtenstein. Unvaccinated people have not received any dose. Partially-vaccinated people are excluded. + - Fully-vaccinated people have received all doses prescribed by the initial vaccination protocol. + - The mortality rate for the 'All ages' group is age-standardized to account for the different vaccination rates of older and younger people. + variables: + swi_unvaccinated: + title: Death rate (weekly) of unvaccinated people - Switzerland, by age + presentation: + title_public: Death rate (weekly) of unvaccinated people - Switzerland, by age + display: + name: Unvaccinated + swi_vaccinated_no_booster: + title: Death rate (weekly) of fully vaccinated people (without booster) - Switzerland, by age + presentation: + title_public: Death rate (weekly) of fully vaccinated people (without booster) - Switzerland, by age + display: + name: Fully vaccinated, no booster + swi_vaccinated_with_booster: + title: Death rate (weekly) of fully vaccinated people (with booster) - Switzerland, by age + presentation: + title_public: Death rate (weekly) of fully vaccinated people (with booster) - Switzerland, by age + display: + name: Fully vaccinated, with booster + diff --git a/etl/steps/data/garden/covid/latest/deaths_vax_status.py b/etl/steps/data/garden/covid/latest/deaths_vax_status.py new file mode 100644 index 00000000000..9457e10e832 --- /dev/null +++ b/etl/steps/data/garden/covid/latest/deaths_vax_status.py @@ -0,0 +1,28 @@ +"""Load a meadow dataset and create a garden dataset.""" + +from etl.helpers import PathFinder, create_dataset + +# Get paths and naming conventions for current step. +paths = PathFinder(__file__) + + +def run(dest_dir: str) -> None: + # + # Load inputs. + # + # Load meadow dataset. + ds_meadow = paths.load_dataset("deaths_vax_status") + + # Read table from meadow dataset. + tables = list(ds_meadow) + + # + # Save outputs. + # + # Create a new garden dataset with the same metadata as the meadow dataset. + ds_garden = create_dataset( + dest_dir, tables=tables, check_variables_metadata=True, default_metadata=ds_meadow.metadata + ) + + # Save changes in the new garden dataset. + ds_garden.save() diff --git a/etl/steps/data/garden/covid/latest/infections_model.meta.yml b/etl/steps/data/garden/covid/latest/infections_model.meta.yml index 5f6c0617c61..427fea89640 100644 --- a/etl/steps/data/garden/covid/latest/infections_model.meta.yml +++ b/etl/steps/data/garden/covid/latest/infections_model.meta.yml @@ -41,23 +41,31 @@ tables: variables: icl_infections: title: Daily new estimated COVID-19 infections (ICL, <> estimate) + presentation: + title_public: Daily new estimated COVID-19 infections (ICL, <> estimate) description: |- <% set model_name = "ICL" %> {definitions.others.description} ihme_infections: title: Daily new estimated COVID-19 infections (IHME, <> estimate) + presentation: + title_public: Daily new estimated COVID-19 infections (IHME, <> estimate) description: |- <% set model_name = "IHME" %> {definitions.others.description} lshtm_infections: title: Daily new estimated COVID-19 infections (LSHTM, <> estimate) + presentation: + title_public: Daily new estimated COVID-19 infections (LSHTM, <> estimate) description: |- <% set model_name = "LSHTM" %> {definitions.others.description} yyg_infections: title: Daily new estimated COVID-19 infections (Youyang Gu, <> estimate) + presentation: + title_public: Daily new estimated COVID-19 infections (Youyang Gu, <> estimate) description: |- <% set model_name = "Youyang Gu" %> {definitions.others.description} diff --git a/etl/steps/data/grapher/covid/latest/deaths_vax_status.py b/etl/steps/data/grapher/covid/latest/deaths_vax_status.py new file mode 100644 index 00000000000..d1785282920 --- /dev/null +++ b/etl/steps/data/grapher/covid/latest/deaths_vax_status.py @@ -0,0 +1,32 @@ +"""Load a garden dataset and create a grapher dataset.""" + +from etl.helpers import PathFinder, create_dataset + +# Get paths and naming conventions for current step. +paths = PathFinder(__file__) + + +def run(dest_dir: str) -> None: + # + # Load inputs. + # + # Load garden dataset. + ds_garden = paths.load_dataset("deaths_vax_status") + + # Read table from garden dataset. + tables = list(ds_garden) + + # + # Process data. + # + + # + # Save outputs. + # + # Create a new grapher dataset with the same metadata as the garden dataset. + ds_grapher = create_dataset( + dest_dir, tables=tables, check_variables_metadata=True, default_metadata=ds_garden.metadata + ) + + # Save changes in the new grapher dataset. + ds_grapher.save() diff --git a/etl/steps/data/meadow/covid/latest/deaths_vax_status.py b/etl/steps/data/meadow/covid/latest/deaths_vax_status.py new file mode 100644 index 00000000000..177367db99d --- /dev/null +++ b/etl/steps/data/meadow/covid/latest/deaths_vax_status.py @@ -0,0 +1,85 @@ +"""Load a snapshot and create a meadow dataset.""" + +from etl.helpers import PathFinder, create_dataset + +# Get paths and naming conventions for current step. +paths = PathFinder(__file__) + + +def run(dest_dir: str) -> None: + # + # Load inputs. + # + # Retrieve tables from snapshots + tb_en = paths.read_snap_table("deaths_vax_status_england.csv") + tb_us = paths.read_snap_table("deaths_vax_status_us.csv") + tb_swi = paths.read_snap_table("deaths_vax_status_switzerland.csv") + tb_cl = paths.read_snap_table("deaths_vax_status_chile.csv") + + # + # Process data. + # + # US + rename_cols = { + "Entity": "country", + "Day": "date", + "unvaccinated": "us_unvaccinated", + "vaccinated_without": "us_vaccinated_no_biv_booster", + "vaccinated_with": "us_vaccinated_with_biv_booster", + } + tb_us = tb_us.rename(columns=rename_cols)[rename_cols.values()] + tb_us = tb_us.format(["country", "date"], short_name="us") + + # England + rename_cols = { + "Entity": "country", + "Day": "date", + "Unvaccinated": "england_unvaccinated", + "Fully vaccinated": "england_fully_vaccinated", + } + tb_en = tb_en.rename(columns=rename_cols)[rename_cols.values()] + tb_en = tb_en.format(["country", "date"], short_name="england") + + # Switzerland + rename_cols = { + "Entity": "country", + "Day": "date", + "Unvaccinated": "swi_unvaccinated", + "Fully vaccinated, no booster": "swi_vaccinated_no_booster", + "Fully vaccinated + booster": "swi_vaccinated_with_booster", + } + tb_swi = tb_swi.rename(columns=rename_cols)[rename_cols.values()] + tb_swi = tb_swi.format(["country", "date"], short_name="switzerland") + + # Chile + rename_cols = { + "Entity": "country", + "Day": "date", + "0 or 1 dose": "chile_0_1_dose", + "2 doses": "chile_2_doses", + "3 doses": "chile_3_doses", + "4 doses": "chile_4_doses", + } + tb_cl = tb_cl.rename(columns=rename_cols)[rename_cols.values()] + tb_cl = tb_cl.format(["country", "date"], short_name="chile") + + # Table list + tables = [ + tb_us, + tb_en, + tb_cl, + tb_swi, + ] + + # + # Save outputs. + # + # Create a new meadow dataset with the same metadata as the snapshot. + ds_meadow = create_dataset( + dest_dir, + tables=tables, + check_variables_metadata=True, + ) + + # Save changes in the new meadow dataset. + ds_meadow.save() diff --git a/snapshots/covid/latest/deaths_vax_status.py b/snapshots/covid/latest/deaths_vax_status.py new file mode 100644 index 00000000000..7d1a62f696c --- /dev/null +++ b/snapshots/covid/latest/deaths_vax_status.py @@ -0,0 +1,41 @@ +"""Script to create a snapshot of dataset. + +This data was downloaded from Grapher. It had been imported to Grapher before covid-19-data repository was created. +""" + +from pathlib import Path + +import click + +from etl.snapshot import Snapshot + +# Version for current snapshot dataset. +SNAPSHOT_VERSION = Path(__file__).parent.name + + +@click.command() +@click.option("--upload/--skip-upload", default=True, type=bool, help="Upload dataset to Snapshot") +@click.option("--england", default=None, type=str, help="Path to ICL local data file.") +@click.option("--us", default=None, type=str, help="Path to IHME local data file.") +@click.option("--switzerland", default=None, type=str, help="Path to LSHTM local data file.") +@click.option("--chile", default=None, type=str, help="Path to Youyang Gu local data file.") +def main(england: str, us: str, switzerland: str, chile: str, upload: bool) -> None: + estimates = [ + ("england", england), + ("us", us), + ("switzerland", switzerland), + ("chile", chile), + ] + # Create a new snapshots. + for estimate in estimates: + name = estimate[0] + filename = estimate[1] + + if filename is not None: + snap = Snapshot(f"covid/{SNAPSHOT_VERSION}/deaths_vax_status_{name}.csv") + # Copy local data file to snapshots data folder, add file to DVC and upload to S3. + snap.create_snapshot(filename=filename, upload=upload) + + +if __name__ == "__main__": + main() diff --git a/snapshots/covid/latest/deaths_vax_status_chile.csv.dvc b/snapshots/covid/latest/deaths_vax_status_chile.csv.dvc new file mode 100644 index 00000000000..d81fdeac271 --- /dev/null +++ b/snapshots/covid/latest/deaths_vax_status_chile.csv.dvc @@ -0,0 +1,33 @@ +# Learn more at: +# http://docs.owid.io/projects/etl/architecture/metadata/reference/ +meta: + origin: + # Data product / Snapshot + title: COVID-19, Incidencia de casos segĂșn estado de vacunaciĂłn, grupo de edad, y semana epidemiolĂłgica (Chile) + description: |- + Incidence of deaths according to vaccination status, age group, and epidemiological week. + + Vaccination status is classified as "Fully vaccinated" for those people who have received two doses and more than 14 days have passed since their second dose, or have received a vaccine from a vaccination protocol that includes only a single dose and more than 28 days have elapsed since inoculation. This variable takes the value "Unvaccinated or not fully vaccinated" if people do not have a complete vaccination schedule. + + The mortality rate corresponds to the incidence rate of deaths per 100,000 inhabitants for the age group, corresponding vaccination status and corresponding epidemiological week. + + The mortality rate for the "All ages" group is age-standardized by Our World in Data, using single-year age estimates from the 2022 revision of the United Nations World Population Prospects for Chile. Rates for specific age groups are calculated as crude incidence rates. + date_published: "2023" + + # Citation + producer: Departamento de EpidemiologĂ­a, Ministerio de Salud de Chile. + citation_full: |- + Departamento de EpidemiologĂ­a, Ministerio de Salud de Chile. Accessed via GitHub (https://github.com/MinCiencia/Datos-COVID19). 2023. + + # Files + url_main: https://web.archive.org/web/20230408120752/https://github.com/MinCiencia/Datos-COVID19/tree/master/output/producto89 + date_accessed: 2024-09-16 + + # License + license: + name: CC BY 4.0 + +outs: + - md5: 42f4ead672fe4284bc5d8a59ecfac666 + size: 51346 + path: deaths_vax_status_chile.csv diff --git a/snapshots/covid/latest/deaths_vax_status_england.csv.dvc b/snapshots/covid/latest/deaths_vax_status_england.csv.dvc new file mode 100644 index 00000000000..929394ee2e9 --- /dev/null +++ b/snapshots/covid/latest/deaths_vax_status_england.csv.dvc @@ -0,0 +1,39 @@ +# Learn more at: +# http://docs.owid.io/projects/etl/architecture/metadata/reference/ +meta: + origin: + # Data product / Snapshot + title: COVID-19, Deaths by vaccination status (England) + description: |- + Weekly age-standardized mortality rates and age-specific rates for deaths involving COVID-19 and all deaths by vaccination status. + + Age and vaccination status are defined on the date of death where a death has occurred, and on the last day of the week if not. + + These figures represent death occurrences, there can be a delay between the date a death occurred and the date a death was registered. + + The data represents age-standardized mortality rates per 100,000 person-years, standardized to the 2013 European Standard Population using five-year age groups from those aged 10 years and over. "Person-years" take into account both the number of people and the amount of time spent in each vaccination status. + + Deaths were defined using the International Classification of Diseases, tenth revision (ICD-10). Deaths involving the coronavirus (COVID-19) are defined as those with an underlying cause, or any mention of, ICD-10 codes U07.1 (COVID-19 virus identified) or U07.2 (COVID-19, virus not identified). The source notes that this differs from the definition used in the majority of mortality outputs. + + Figures are based on provisional mortality data and the Public Health Data Asset (PHDA), a linked dataset of people resident in England who could be linked to the 2011 Census and GP Patient Register. Therefore, the number of deaths and related population differ from other ONS mortality publication. + + Rates marked as unreliable due to small numbers of deaths have been removed. + date_published: "2023-08-05" + + # Citation + producer: Office for National Statistics + citation_full: |- + Office for National Statistics, National Immunisation Management Service + + # Files + url_main: https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/datasets/deathsbyvaccinationstatusengland + date_accessed: 2024-09-16 + + # License + license: + name: CC BY 4.0 + +outs: + - md5: 209fd66d825210bbf39384e525d3db64 + size: 1837 + path: deaths_vax_status_england.csv diff --git a/snapshots/covid/latest/deaths_vax_status_switzerland.csv.dvc b/snapshots/covid/latest/deaths_vax_status_switzerland.csv.dvc new file mode 100644 index 00000000000..3944a0e7cc0 --- /dev/null +++ b/snapshots/covid/latest/deaths_vax_status_switzerland.csv.dvc @@ -0,0 +1,32 @@ +# Learn more at: +# http://docs.owid.io/projects/etl/architecture/metadata/reference/ +meta: + origin: + # Data product / Snapshot + title: COVID-19, Switzerland and Liechtenstein + description: |- + The Federal Office of Public Health publishes data on mortality by vaccination status for Switzerland and Liechtenstein, broken down by age groups. + + The information on incidence is per 100,000 population, with the corresponding vaccination status of "fully vaccinated" and "not vaccinated". + + The mortality rate for the "All ages" group is age-standardized by Our World in Data, using single-year age estimates from the 2022 revision of the United Nations World Population Prospects for Chile. Rates for specific age groups are calculated as crude incidence rates. + date_published: "2022" + + # Citation + producer: Federal Office of Public Health + citation_full: |- + Federal Office of Public Health (FOPH). (2023). COVID-19 Switzerland and Liechtenstein [Data set]. OpenData.swiss. Accessed online https://opendata.swiss/en/dataset/covid-19-schweiz + + # Files + url_main: https://opendata.swiss/en/dataset/covid-19-schweiz + date_accessed: 2024-09-16 + + # License + license: + name: opendata.swis terms of use + url: https://opendata.swiss/en/terms-of-use#terms_by + +outs: + - md5: 3b6741095e7904c9843eec56b88a03fb + size: 32386 + path: deaths_vax_status_switzerland.csv diff --git a/snapshots/covid/latest/deaths_vax_status_us.csv.dvc b/snapshots/covid/latest/deaths_vax_status_us.csv.dvc new file mode 100644 index 00000000000..f6f360e0762 --- /dev/null +++ b/snapshots/covid/latest/deaths_vax_status_us.csv.dvc @@ -0,0 +1,30 @@ +# Learn more at: +# http://docs.owid.io/projects/etl/architecture/metadata/reference/ +meta: + origin: + # Data product / Snapshot + title: Rates of COVID-19 Cases or Deaths by Age Group and Updated (Bivalent) Booster Status + description: |- + Data for CDC’s COVID Data Tracker site on Rates of COVID-19 Cases and Deaths by Updated (Bivalent) Booster Status. + + These data were posted and archived on May 30, 2023 and reflect cases among persons with a positive specimen collection date through April 22, 2023, and deaths among persons with a positive specimen collection date through April 1, 2023. These data will no longer be updated after May 2023. + + date_published: "2023-06-01" + + # Citation + producer: Centers for Disease Control and Prevention + citation_full: |- + Centers for Disease Control and Prevention, COVID-19 Response. Rates of COVID-19 Cases or Deaths by Age Group and Updated (Bivalent) Booster Status Public Use Data. 2023. + + # Files + url_main: https://data.cdc.gov/Public-Health-Surveillance/Rates-of-COVID-19-Cases-or-Deaths-by-Age-Group-and/54ys-qyzm + date_accessed: 2024-09-16 + + # License + license: + name: CC BY 4.0 + +outs: + - md5: ac0e4788e1e803013fb6eaca215173ca + size: 27322 + path: deaths_vax_status_us.csv