Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📊 covid: compact dataset #3164

Merged
merged 8 commits into from
Aug 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 39 additions & 13 deletions dag/covid.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,43 @@
steps:
# Explorer
data-private://explorers/covid/latest/covid:
- data://grapher/covid/latest/cases_deaths
- data://grapher/covid/latest/vaccinations_global
- data://grapher/covid/latest/hospital
- data-private://grapher/covid/latest/sequence
- data://grapher/covid/latest/oxcgrt_policy
- data://grapher/covid/latest/tracking_r
- data://grapher/covid/latest/testing
- data://grapher/covid/latest/combined
- data-private://grapher/excess_mortality/latest/excess_mortality_economist
- data://grapher/excess_mortality/latest/excess_mortality

# Compact dataset (similar to former megafile)
data://garden/covid/latest/compact:
# COVID
- data://garden/covid/latest/cases_deaths
- data://garden/covid/latest/vaccinations_global
- data://garden/covid/latest/hospital
- data://garden/covid/latest/oxcgrt_policy
- data://garden/covid/latest/tracking_r
- data://garden/covid/latest/testing
- data://garden/covid/latest/combined
- data://garden/excess_mortality/latest/excess_mortality
# Regions
- data://garden/regions/2023-01-01/regions
# Demography
- data://garden/demography/2024-07-15/population
- data://garden/demography/2023-10-09/life_expectancy
- data://garden/un/2024-07-12/un_wpp
# Econ
- data://garden/wb/2024-03-27/world_bank_pip
- data://garden/un/2024-04-09/undp_hdr
# Health
- data://garden/wash/2024-01-06/who
- data://garden/who/2024-07-30/ghe
# WDI
- data://garden/worldbank_wdi/2024-05-20/wdi

# Sequencing (variants)
data-private://meadow/covid/latest/sequence:
- snapshot-private://covid/latest/sequence.json
Expand Down Expand Up @@ -164,19 +203,6 @@ steps:
data://grapher/covid/latest/vaccinations_global:
- data://garden/covid/latest/vaccinations_global

# Explorer
data-private://explorers/covid/latest/covid:
- data://grapher/covid/latest/cases_deaths
- data://grapher/covid/latest/vaccinations_global
- data://grapher/covid/latest/hospital
- data-private://grapher/covid/latest/sequence
- data://grapher/covid/latest/oxcgrt_policy
- data://grapher/covid/latest/tracking_r
- data://grapher/covid/latest/testing
- data://grapher/covid/latest/combined
- data-private://grapher/excess_mortality/latest/excess_mortality_economist
- data://grapher/excess_mortality/latest/excess_mortality

# Excess Mortality (HMD, WMD, Karlinsky and Kobak)
data://meadow/excess_mortality/latest/hmd_stmf:
- snapshot://excess_mortality/latest/hmd_stmf.csv
Expand Down
41 changes: 28 additions & 13 deletions docs/api/covid.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,33 +2,43 @@

This page is a compact summary of our COVID-19 work, with all the relevant links to download our COVID-19 datasets.

!!! tip "I just want [the data](#download-data)"
!!! tip "I just want [the data](#download-data)!"

## Our work

At Our World in Data, we have been collecting COVID-19 data from various domains since the pandemic started. We believe that to make progress against the outbreak of the Coronavirus disease – COVID-19 – we need to understand how the pandemic is developing. And for this, we need reliable and timely data. Therefore have focused our work on bringing together the research and statistics on the COVID-19 outbreak.

### Legacy data work

We started working on COVID-19 data in early 2020 when we developed and implemented several data pipelines to process and publish the data.
We started working on COVID-19 data in early 2020 when we developed and implemented several data pipelines to process and publish the data. All this work has been live and shared with the public via our GitHub repository [https://github.com/owid/covid-19-data](https://github.com/owid/covid-19-data), and our [old COVID documentation](https://docs.owid.io/projects/covid/en/latest/). We have complemented our data work with extensive research articles, which have been shared on our [topic page](https://ourworldindata.org/coronavirus).

All this work has been live and shared with the public via our GitHub repository [https://github.com/owid/covid-19-data](https://github.com/owid/covid-19-data), and our [old documentation](https://docs.owid.io/projects/covid/en/latest/).
### Publications

We have complemented our data work with extensive research articles, which have been shared on our Topic page: [https://ourworldindata.org/coronavirus](https://ourworldindata.org/coronavirus).
!!! abstract ""

### Publications
:material-file-document: Hasell, J., Mathieu, E., Beltekian, D. et al. **A cross-country database of COVID-19 testing**. _Sci Data_ 7, 345 (2020). [https://doi.org/10.1038/s41597-020-00688-8](https://doi.org/10.1038/s41597-020-00688-8)

!!! abstract ""

:material-file-document: Mathieu, E., Ritchie, H., Ortiz-Ospina, E. et al. **A global database of COVID-19 vaccinations**. _Nat Hum Behav_ 5, 947–953 (2021). [https://doi.org/10.1038/s41562-021-01122-8](https://doi.org/10.1038/s41562-021-01122-8)

!!! abstract ""

- Hasell, J., Mathieu, E., Beltekian, D. et al. **A cross-country database of COVID-19 testing**. _Sci Data_ 7, 345 (2020). [https://doi.org/10.1038/s41597-020-00688-8](https://doi.org/10.1038/s41597-020-00688-8)
- Mathieu, E., Ritchie, H., Ortiz-Ospina, E. et al. **A global database of COVID-19 vaccinations**. _Nat Hum Behav_ 5, 947–953 (2021). [https://doi.org/10.1038/s41562-021-01122-8](https://doi.org/10.1038/s41562-021-01122-8)
- Herre, B., Rodés-Guirao, L., Mathieu, E. et al. **Best practices for government agencies to publish data: lessons from COVID-19**. _The Lancet Public Health_, Viewpoint, Volume 9, ISSUE 6, e407-e410 (2024). [https://doi.org/10.1016/S2468-2667(24)00073-2](<https://doi.org/10.1016/S2468-2667(24)00073-2>)
:material-file-document: Herre, B., Rodés-Guirao, L., Mathieu, E. et al. **Best practices for government agencies to publish data: lessons from COVID-19**. _The Lancet Public Health_, Viewpoint, Volume 9, ISSUE 6, e407-e410 (2024). [https://doi.org/10.1016/S2468-2667(24)00073-2](<https://doi.org/10.1016/S2468-2667(24)00073-2>)

### Transition to ETL

All our COVID-19 data work was done before we had developed our [ETL system](../../architecture). In mid-2024, we decided to migrate all our COVID-19 data work into ETL, and make our data available from our catalog.

## Download data

Find below all COVID-19 data that we have collected. These files are direct exports from our ETL. We also provide metadata, which is essential to understand the various indicators and data licenses.
Our _compact COVID-19 dataset_ is a compilation of our most relevant COVID-19 indicators collected in the last years. It consolidates indicators from various datasets into a single file. It comes with metadata, which explains all the indicators in detail. In the past, this dataset was generated and shared in our [GitHub](https://github.com/owid/covid-19-data/blob/master/public/data) repository.

[:material-download: Download our compact dataset (CSV)](https://catalog.ourworldindata.org/garden/covid/latest/compact/compact.csv){ .md-button .md-button--primary }
[:material-download: Download metadata](https://catalog.ourworldindata.org/garden/covid/latest/compact/compact.meta.json){ .md-button }

In addition to our compact dataset, we also provide individual datasets, with all our COVID-19 indicators. These files are direct exports from our ETL.


| | **:material-database: Data** | **:material-book: Metadata** |
| ------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
Expand All @@ -46,15 +56,20 @@ Find below all COVID-19 data that we have collected. These files are direct expo

All our COVID-19 data pipelines are specified in [our DAG](https://github.com/owid/etl/blob/master/dag/covid.yml).

!!! note "We rely on data providers for this data"
### Data providers

The data produced by third parties and made available by Our World in Data is subject to the license terms from the original third-party authors. We will always indicate the original source of the data in our database, and you should always check the license of any such third-party data before use.

Learn more on the licensing in the metadata files.

The data produced by third parties and made available by Our World in Data is subject to the license terms from the original third-party authors. We will always indicate the original source of the data in our database, and you should always check the license of any such third-party data before use.
### Understanding our metadata
Our metadata contains all the relevant information about an indicator. This includes licenses, descriptions, units, etc. We use this metadata to bake our charts on our site.

Learn more on the licensing in the metadata files.
!!! info "Learn more in our [metadata reference](../architecture/metadata/reference/)."

## Acces the data with our catalog

!!! warning "Our catalog library is in alpha"
!!! warning "Our catalog library is in alpha."

### Install our catalog package

Expand Down
58 changes: 58 additions & 0 deletions etl/steps/data/garden/covid/latest/compact.meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# NOTE: To learn more about the fields, hover over their names.
definitions:
common:
presentation:
topic_tags:
- COVID-19


# Learn more about the available fields:
# http://docs.owid.io/projects/etl/architecture/metadata/reference/
dataset:
update_period_days: 14


tables:
compact:
variables:
# testing_variable:
# title: Testing variable title
# unit: arbitrary units
# short_unit: au
# description_short: Short description of testing variable.
# description_processing: Description of processing of testing variable.
# description_key: List of key points about the indicator.
# description_from_producer: Description of testing variable from producer.
# processing_level: minor
# type:
# sort:
# presentation:
# attribution:
# attribution_short:
# faqs:
# grapher_config:
# title_public:
# title_variant:
# topic_tags:
# display:
# name: Testing variable
# numDecimalPlaces: 0
# tolerance: 0
# color:
# conversionFactor: 1
# description:
# entityAnnotationsMap: Test annotation
# includeInTable:
# isProjection: false
# unit: arbitrary units
# shortUnit: au
# tableDisplay:
# hideAbsoluteChange:
# hideRelativeChange:
# yearIsDay: false
# zeroDay:
# roundingMode:
# numSignificantFigures:
#
{}

Loading
Loading