Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/master' into wizard-anomalist
Browse files Browse the repository at this point in the history
  • Loading branch information
Marigold committed Oct 21, 2024
2 parents f0c131c + 667116d commit b00bf99
Show file tree
Hide file tree
Showing 47 changed files with 1,743 additions and 78 deletions.
8 changes: 8 additions & 0 deletions dag/archive/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,14 @@ steps:
data://grapher/un/2023-09-19/long_run_child_mortality:
- data://garden/un/2023-08-29/long_run_child_mortality

# Oil Spills
data://meadow/itopf/2023-05-18/oil_spills:
- snapshot://itopf/2023-05-18/oil_spills.pdf
data://garden/itopf/2023-05-18/oil_spills:
- data://meadow/itopf/2023-05-18/oil_spills
data://grapher/itopf/2023-05-18/oil_spills:
- data://garden/itopf/2023-05-18/oil_spills

include:
# Include all active steps plus all archive steps.
- dag/main.yml
Expand Down
12 changes: 12 additions & 0 deletions dag/archive/urbanization.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
steps:
#
# GHSL degree of urbanization.
#
data://meadow/urbanization/2024-01-26/ghsl_degree_of_urbanisation:
- snapshot://urbanization/2024-01-26/ghsl_degree_of_urbanisation.zip
data://garden/urbanization/2024-01-26/ghsl_degree_of_urbanisation:
- data://meadow/urbanization/2024-01-26/ghsl_degree_of_urbanisation
- data://garden/wb/2023-04-30/income_groups
- data://garden/regions/2023-01-01/regions
data://grapher/urbanization/2024-01-26/ghsl_degree_of_urbanisation:
- data://garden/urbanization/2024-01-26/ghsl_degree_of_urbanisation
17 changes: 10 additions & 7 deletions dag/health.yml
Original file line number Diff line number Diff line change
Expand Up @@ -844,15 +844,18 @@ steps:
- data://garden/antibiotics/2024-10-09/gram_children


# Cervical cancer incidence rates GCO - Cancer Today (2022)
data://meadow/cancer/2024-10-13/gco_cancer_today_cervical:
- snapshot://cancer/2024-10-13/gco_cancer_today_cervical.csv
# Cervical cancer incidence rates GCO - Cancer Over Time
data://meadow/cancer/2024-10-13/gco_cancer_over_time_cervical:
- snapshot://cancer/2024-10-13/gco_cancer_over_time_cervical.csv
data://garden/cancer/2024-10-13/gco_cervical_cancer:
- data://meadow/cancer/2024-10-13/gco_cancer_today_cervical
data://garden/cancer/2024-10-13/gco_cancer_over_time_cervical:
- data://meadow/cancer/2024-10-13/gco_cancer_over_time_cervical
data://grapher/cancer/2024-10-13/gco_cervical_cancer:
- data://garden/cancer/2024-10-13/gco_cervical_cancer
data://grapher/cancer/2024-10-13/gco_cancer_over_time_cervical:
- data://garden/cancer/2024-10-13/gco_cancer_over_time_cervical

# Cervical cancer incidence rates GCO - Cancer Today (2022)
data://meadow/cancer/2024-10-13/gco_cancer_today_cervical:
- snapshot://cancer/2024-10-13/gco_cancer_today_cervical.csv
data://garden/cancer/2024-10-13/gco_cancer_today_cervical:
- data://meadow/cancer/2024-10-13/gco_cancer_today_cervical
data://grapher/cancer/2024-10-13/gco_cancer_today_cervical:
- data://garden/cancer/2024-10-13/gco_cancer_today_cervical
15 changes: 7 additions & 8 deletions dag/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -268,14 +268,6 @@ steps:
data://grapher/eth/2023-03-15/ethnic_power_relations:
- data://garden/eth/2023-03-15/ethnic_power_relations

# Oil Spills
data://meadow/itopf/2023-05-18/oil_spills:
- snapshot://itopf/2023-05-18/oil_spills.pdf
data://garden/itopf/2023-05-18/oil_spills:
- data://meadow/itopf/2023-05-18/oil_spills
data://grapher/itopf/2023-05-18/oil_spills:
- data://garden/itopf/2023-05-18/oil_spills

# International Monetary Fund, World Economic Outlook
data://meadow/imf/2024-05-02/world_economic_outlook:
- snapshot://imf/2024-05-02/world_economic_outlook.xls
Expand Down Expand Up @@ -802,6 +794,13 @@ steps:
data://grapher/oecd/2024-08-21/official_development_assistance:
- data://garden/oecd/2024-08-21/official_development_assistance

# Oil Spills
data://meadow/itopf/2024-10-16/oil_spills:
- snapshot://itopf/2024-10-16/oil_spills.pdf
data://garden/itopf/2024-10-16/oil_spills:
- data://meadow/itopf/2024-10-16/oil_spills
data://grapher/itopf/2024-10-16/oil_spills:
- data://garden/itopf/2024-10-16/oil_spills
include:
- dag/open_numbers.yml
- dag/faostat.yml
Expand Down
12 changes: 12 additions & 0 deletions dag/urbanization.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,3 +64,15 @@ steps:
- data://meadow/un/2024-02-14/sdgs_urbanization
data://grapher/un/2024-02-14/sdgs_urbanization:
- data://garden/un/2024-02-14/sdgs_urbanization

#
# GHSL degree of urbanization.
#
data://meadow/urbanization/2024-10-14/ghsl_degree_of_urbanisation:
- snapshot://urbanization/2024-10-14/ghsl_degree_of_urbanisation.xlsx
data://garden/urbanization/2024-10-14/ghsl_degree_of_urbanisation:
- data://meadow/urbanization/2024-10-14/ghsl_degree_of_urbanisation
- data://garden/wb/2024-07-29/income_groups
- data://garden/regions/2023-01-01/regions
data://grapher/urbanization/2024-10-14/ghsl_degree_of_urbanisation:
- data://garden/urbanization/2024-10-14/ghsl_degree_of_urbanisation
2 changes: 1 addition & 1 deletion etl/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -427,7 +427,7 @@ def admin_api(self) -> str:
elif self.env_remote == "staging":
return f"http://{self.conf.DB_HOST}.tail6e23.ts.net/admin/api"
elif self.env_remote == "dev":
return "http://localhost:3000/admin/api"
return "http://localhost:3030/admin/api"
else:
raise ValueError(f"Unknown environment: {self.env}")

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ tables:
variables:
antibiotic_usage__pct:
title: Antibiotic usage in children
description_short: The caregiver reported share of children under five years old, with symptoms of lower respiratory tract infection, who received antibiotics for this illness.
lower_uncertainty_interval__pct:
title: Antibiotic usage in children, lower uncertainty interval
upper_uncertainty_interval__pct:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
{
"Argentina": "Argentina",
"Australia": "Australia",
"Austria": "Austria",
"Bahrain": "Bahrain",
"Belarus": "Belarus",
"Canada": "Canada",
"Chile": "Chile",
"China": "China",
"Colombia": "Colombia",
"Costa Rica": "Costa Rica",
"Croatia": "Croatia",
"Cyprus": "Cyprus",
"Czechia": "Czechia",
"Denmark": "Denmark",
"Ecuador": "Ecuador",
"Estonia": "Estonia",
"Finland": "Finland",
"Germany": "Germany",
"Iceland": "Iceland",
"India": "India",
"Ireland": "Ireland",
"Israel": "Israel",
"Italy": "Italy",
"Japan": "Japan",
"Korea, Republic of": "South Korea",
"Kuwait": "Kuwait",
"Latvia": "Latvia",
"Lithuania": "Lithuania",
"Malta": "Malta",
"New Zealand": "New Zealand",
"Norway": "Norway",
"Philippines": "Philippines",
"Poland": "Poland",
"Puerto Rico": "Puerto Rico",
"Qatar": "Qatar",
"Slovenia": "Slovenia",
"Spain": "Spain",
"Sweden": "Sweden",
"Switzerland": "Switzerland",
"Thailand": "Thailand",
"USA": "United States",
"Uganda": "Uganda",
"France (metropolitan)": "France",
"France, Martinique": "Martinique",
"The Netherlands": "Netherlands",
"T\u00fcrkiye": "Turkey",
"UK, England": "England",
"UK, Northern Ireland": "Northern Ireland",
"UK, Scotland": "Scotland",
"UK, Wales": "Wales",
"USA: Black": "United States (Black)",
"USA: White": "United States (White)"
}
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@ dataset:


tables:
gco_cancer_today_cervical:
gco_cancer_over_time_cervical:
variables:
asr:
title: Age-standardized cervical cancer incidence rate per 100,000 women
unit: 'per 100,000 women'
description_short: |-
Estimated number of new cervical [cancer](#dod:cancer) cases per 100,000 women.
Reported number of new cervical [cancer](#dod:cancer) cases per 100,000 women, based on data from cancer registries. Comparisons may be affected by differences in measurement, including screening and diagnosis.
description_from_producer: |-
An age-standardized rate (ASR) is a summary measure of the rate that would have been observed if the population had a standard age structure. Standardization is necessary when comparing several populations that differ with respect to age, because age has a strong influence on the risk of cancer. An ASR is a weighted mean of the age-specific rates; the weighting is based on the population distribution of a standard population. The most frequently used standard population is the World (W) Standard Population. The calculated incidence rate is then called the age-standardized incidence or mortality rate (W), and is expressed per 100 000 person-years. The World Standard Population used in GLOBOCAN was first proposed by Segi (1960)a and later modified by Doll et al. (1966)b.
presentation:
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
"""Load a meadow dataset and create a garden dataset."""

import owid.catalog.processing as pr

from etl.data_helpers import geo
from etl.helpers import PathFinder, create_dataset
Expand All @@ -13,18 +12,15 @@ def run(dest_dir: str) -> None:
#
# Load inputs.
#
# Load meadow datasets for Cancer Today and Cancer Over Time datasets.
ds_meadow_over_time = paths.load_dataset("gco_cancer_over_time_cervical")
ds_meadow_today = paths.load_dataset("gco_cancer_today_cervical")
# Load meadow dataset.
ds_meadow = paths.load_dataset("gco_cancer_over_time_cervical")

# Read table.
tb = ds_meadow["gco_cancer_over_time_cervical"].reset_index()

# Read tables from meadow datasets.
tb_over_time = ds_meadow_over_time["gco_cancer_over_time_cervical"].reset_index()
tb_today = ds_meadow_today["gco_cancer_today_cervical"].reset_index()
#
# Process data.
#
tb = pr.merge(tb_today, tb_over_time, on=["country", "year", "asr"], how="outer")

tb = geo.harmonize_countries(df=tb, countries_file=paths.country_mapping_path)
tb = tb.format(["country", "year"])

Expand All @@ -33,7 +29,7 @@ def run(dest_dir: str) -> None:
#
# Create a new garden dataset with the same metadata as the meadow dataset.
ds_garden = create_dataset(
dest_dir, tables=[tb], check_variables_metadata=True, default_metadata=ds_meadow_over_time.metadata
dest_dir, tables=[tb], check_variables_metadata=True, default_metadata=ds_meadow.metadata
)

# Save changes in the new garden dataset.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,6 @@
"Trinidad and Tobago": "Trinidad and Tobago",
"Tunisia": "Tunisia",
"Turkmenistan": "Turkmenistan",
"USA": "United States",
"Uganda": "Uganda",
"Ukraine": "Ukraine",
"United Arab Emirates": "United Arab Emirates",
Expand All @@ -184,11 +183,5 @@
"Korea, Democratic People Republic of": "North Korea",
"The Netherlands": "Netherlands",
"The Republic of the Gambia": "Gambia",
"T\u00fcrkiye": "Turkey",
"UK, England": "England",
"UK, Northern Ireland": "Northern Ireland",
"UK, Scotland": "Scotland",
"UK, Wales": "Wales",
"USA: Black": "United States (Black)",
"USA: White": "United States (White)"
"T\u00fcrkiye": "Turkey"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# NOTE: To learn more about the fields, hover over their names.
definitions:
common:
presentation:
topic_tags:
- Cancer


# Learn more about the available fields:
# http://docs.owid.io/projects/etl/architecture/metadata/reference/
dataset:
update_period_days: 365


tables:
gco_cancer_today_cervical:
variables:
asr:
title: Age-standardized cervical cancer incidence rate per 100,000 women
unit: 'per 100,000 women'
description_short: |-
Estimated number of new cervical [cancer](#dod:cancer) cases per 100,000 women.
description_from_producer: |-
An age-standardized rate (ASR) is a summary measure of the rate that would have been observed if the population had a standard age structure. Standardization is necessary when comparing several populations that differ with respect to age, because age has a strong influence on the risk of cancer. An ASR is a weighted mean of the age-specific rates; the weighting is based on the population distribution of a standard population. The most frequently used standard population is the World (W) Standard Population. The calculated incidence rate is then called the age-standardized incidence or mortality rate (W), and is expressed per 100 000 person-years. The World Standard Population used in GLOBOCAN was first proposed by Segi (1960)a and later modified by Doll et al. (1966)b.
presentation:
grapher_config:
note: To allow for comparisons between countries and over time, this metric is [age-standardized](#dod:age_standardized). The methods of estimation are country-specific, and the quality of the national estimates depends on the coverage, accuracy, and timeliness of the recorded incidence and mortality data in a given country.
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
"""Load a meadow dataset and create a garden dataset."""

from etl.data_helpers import geo
from etl.helpers import PathFinder, create_dataset

# Get paths and naming conventions for current step.
paths = PathFinder(__file__)


def run(dest_dir: str) -> None:
#
# Load inputs.
#
# Load meadow dataset.
ds_meadow = paths.load_dataset("gco_cancer_today_cervical")

# Read table.

tb = ds_meadow["gco_cancer_today_cervical"].reset_index()
#
# Process data.
#

tb = geo.harmonize_countries(df=tb, countries_file=paths.country_mapping_path)
tb = tb.format(["country", "year"])

#
# Save outputs.
#
# Create a new garden dataset with the same metadata as the meadow dataset.
ds_garden = create_dataset(
dest_dir, tables=[tb], check_variables_metadata=True, default_metadata=ds_meadow.metadata
)

# Save changes in the new garden dataset.
ds_garden.save()
Loading

0 comments on commit b00bf99

Please sign in to comment.