-
-
Notifications
You must be signed in to change notification settings - Fork 21
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
75b7f9b
commit d7c77ff
Showing
4 changed files
with
346 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
149 changes: 149 additions & 0 deletions
149
etl/steps/data/garden/war/2024-08-26/ucdp_prio.meta.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,149 @@ | ||
# NOTE: To learn more about the fields, hover over their names. | ||
definitions: | ||
all: | ||
description_short: |- | ||
<%- if conflict_type == "all" -%> | ||
The << estimate >> estimate of the number of deaths of combatants and civilians due to fighting in interstate, intrastate, extrasystemic, non-state conflicts, and one-sided violence that were ongoing that year<< per_capita >>. | ||
<%- elif conflict_type == "state-based" -%> | ||
The << estimate >> estimate of the number of deaths of combatants and civilians due to fighting in interstate, intrastate, and extrasystemic conflicts that were ongoing that year<< per_capita >>. | ||
<%- elif conflict_type == "intrastate (internationalized)" -%> | ||
The << estimate >> estimate of the number of deaths of combatants and civilians due to fighting in internationalized intrastate conflicts that were ongoing that year<< per_capita >>. | ||
<%- elif conflict_type == "intrastate (non-internationalized)" -%> | ||
The << estimate >> estimate of the number of deaths of combatants and civilians due to fighting in non-internationalized intrastate conflicts that were ongoing that year<< per_capita >>. | ||
<%- elif conflict_type == "one-sided violence" -%> | ||
The << estimate >> estimate of the number of deaths of civilians from one-sided violence that was ongoing that year<< per_capita >>. | ||
<%- elif conflict_type == "non-state conflict" -%> | ||
The << estimate >> estimate of the number of deaths of combatants and civilians due to fighting in non-state conflicts that were ongoing that year<< per_capita >>. | ||
<%- else -%> | ||
The << estimate >> estimate of the number of deaths of combatants and civilians due to fighting in << conflict_type >> conflicts that were ongoing that year<< per_capita >>. | ||
<%- endif -%> | ||
description_short_per_capita: <% set per_capita = ", per 100,000 people" %> | ||
{definitions.all.description_short} | ||
conflict_type_base: |- | ||
This includes combatant and civilian deaths due to fighting | ||
conflict_type: |- | ||
<%- if conflict_type == "all" -%> | ||
An armed conflict is a disagreement between organized groups, or between one organized group and civilians, that causes at least 25 deaths during a year. {definitions.all.conflict_type_base}. | ||
<%- elif conflict_type == "state-based" -%> | ||
A state-based conflict is a conflict between two armed groups, at least one of which is a state, that causes at least 25 deaths during a year. {definitions.all.conflict_type_base}. | ||
<%- elif conflict_type == "interstate" -%> | ||
An interstate conflict is a conflict between states that causes at least 25 deaths during a year. {definitions.all.conflict_type_base}. | ||
<%- elif conflict_type == "intrastate" -%> | ||
An intrastate conflict is a conflict between a state and a non-state armed group that causes at least 25 deaths during a year. {definitions.all.conflict_type_base}. If a foreign state is involved, it is called "internationalized", and "non-internationalized" otherwise. | ||
<%- elif conflict_type == "intrastate (internationalized)" -%> | ||
An internationalized intrastate conflict is a conflict between a state and a non-state armed group, with involvement of a foreign state, that causes at least 25 deaths during a year. {definitions.all.conflict_type_base}. | ||
<%- elif conflict_type == "intrastate (non-internationalized)" -%> | ||
An non-internationalized intrastate conflict is a conflict between a state and a non-state armed group, without involvement of a foreign state, that causes at least 25 deaths during a year. {definitions.all.conflict_type_base}. | ||
<%- elif conflict_type == "extrasystemic" -%> | ||
An extrasystemic conflict is a conflict between a state and a non-state armed group outside its territory that causes at least 25 deaths during a year. {definitions.all.conflict_type_base}. | ||
<%- elif conflict_type == "non-state conflict" -%> | ||
A non-state conflict is a conflict between non-state armed groups, such as rebel groups, criminal organizations, or ethnic groups, that causes at least 25 deaths during a year. {definitions.all.conflict_type_base}. | ||
<%- elif conflict_type == "one-sided violence" -%> | ||
One-sided violence is the use of armed force by a state or non-state armed group against civilians that causes at least 25 civilian deaths during a year. | ||
<%- endif -%> | ||
common: | ||
presentation: | ||
topic_tags: | ||
- War & Peace | ||
grapher_config: | ||
selectedEntityNames: | ||
- Africa | ||
- Americas | ||
- Asia and Oceania | ||
- Europe | ||
- Middle East | ||
description_key: | ||
- |- | ||
{definitions.all.conflict_type} | ||
description_processing: |- | ||
Data prior to 1989 is sourced from PRIO. Data since 1989 is sourced from UCDP. | ||
display: | ||
numDecimalPlaces: 0 | ||
|
||
# Learn more about the available fields: | ||
# http://docs.owid.io/projects/etl/architecture/metadata/reference/dataset/ | ||
dataset: | ||
update_period_days: 365 | ||
title: UCDP/PRIO, History of war | ||
|
||
# Learn more about the available fields: | ||
# http://docs.owid.io/projects/etl/architecture/metadata/reference/tables/ | ||
tables: | ||
# MAIN INDICATORS | ||
ucdp_prio: | ||
variables: | ||
################## | ||
# Ongoing deaths # | ||
################## | ||
number_deaths_ongoing_conflicts: | ||
title: Deaths in ongoing conflicts (best estimate) | ||
unit: deaths | ||
description_short: |- | ||
<% set estimate = "best" %> | ||
{definitions.all.description_short} | ||
description_processing: |- | ||
{definitions.common.description_processing} | ||
For conflict years without a best deaths estimate in the PRIO data, we conservatively coded the low estimate. | ||
number_deaths_ongoing_conflicts_high: | ||
title: Deaths in ongoing conflicts (high estimate) | ||
unit: deaths | ||
description_short: |- | ||
<% set estimate = "high" %> | ||
{definitions.all.description_short} | ||
number_deaths_ongoing_conflicts_low: | ||
title: Deaths in ongoing conflicts (low estimate) | ||
unit: deaths | ||
description_short: |- | ||
<% set estimate = "low" %> | ||
{definitions.all.description_short} | ||
number_deaths_ongoing_conflicts_per_capita: | ||
title: Death rate in ongoing conflicts (best estimate) | ||
unit: deaths per 100,000 people | ||
description_short: |- | ||
<% set estimate = "best" %> | ||
{definitions.all.description_short_per_capita} | ||
description_processing: |- | ||
{definitions.common.description_processing} | ||
For conflict years without a best deaths estimate in the PRIO data, we conservatively coded the low estimate. | ||
display: | ||
numDecimalPlaces: 1 | ||
|
||
number_deaths_ongoing_conflicts_high_per_capita: | ||
title: Death rate in ongoing conflicts (high estimate) | ||
unit: deaths per 100,000 people | ||
description_short: |- | ||
<% set estimate = "high" %> | ||
{definitions.all.description_short_per_capita} | ||
display: | ||
numDecimalPlaces: 1 | ||
|
||
number_deaths_ongoing_conflicts_low_per_capita: | ||
title: Death rate in ongoing conflicts (low estimate) | ||
unit: deaths per 100,000 people | ||
description_short: |- | ||
<% set estimate = "low" %> | ||
{definitions.all.description_short_per_capita} | ||
display: | ||
numDecimalPlaces: 1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,147 @@ | ||
"""Load a meadow dataset and create a garden dataset.""" | ||
|
||
import owid.catalog.processing as pr | ||
from owid.catalog import Table | ||
from shared import add_indicators_extra | ||
|
||
from etl.helpers import PathFinder, create_dataset | ||
|
||
# Get paths and naming conventions for current step. | ||
paths = PathFinder(__file__) | ||
# Index columns | ||
COLUMNS_INDEX = ["year", "region", "conflict_type"] | ||
COLUMNS_INDEX_COUNTRY = ["year", "country", "conflict_type"] | ||
# Rename columns (has an entry for each dataset. All entries should be dictionaries with the same number of entries (and identical values)) | ||
COLUMNS_RENAME = { | ||
"ucdp": { | ||
"number_deaths_ongoing_conflicts": "number_deaths_ongoing_conflicts", | ||
"number_deaths_ongoing_conflicts_high": "number_deaths_ongoing_conflicts_high", | ||
"number_deaths_ongoing_conflicts_low": "number_deaths_ongoing_conflicts_low", | ||
}, | ||
"prio": { | ||
"number_deaths_ongoing_conflicts_battle_low": "number_deaths_ongoing_conflicts_low", | ||
"number_deaths_ongoing_conflicts_battle_high": "number_deaths_ongoing_conflicts_high", | ||
"number_deaths_ongoing_conflicts_battle": "number_deaths_ongoing_conflicts", | ||
}, | ||
} | ||
# Indicator columns | ||
COLUMNS_INDICATORS = list(COLUMNS_RENAME["ucdp"].values()) | ||
# First year in UCDP | ||
YEAR_UCDP_MIN = 1989 | ||
|
||
|
||
def run(dest_dir: str) -> None: | ||
# | ||
# Load inputs. | ||
# | ||
# Load meadow dataset. | ||
ds_ucdp = paths.load_dataset("ucdp") | ||
# Read table from meadow dataset. | ||
tb_ucdp = ds_ucdp["ucdp"].reset_index() | ||
# tb_ucdp_countries = ds_ucdp["ucdp_country"].reset_index() | ||
|
||
# Load meadow dataset. | ||
ds_prio = paths.load_dataset("prio_v31") | ||
# Read table from meadow dataset. | ||
tb_prio = ds_prio["prio_v31"].reset_index() | ||
# tb_prio_countries = ds_prio["prio_v31_country"].reset_index() | ||
|
||
# Read table from COW codes | ||
ds_gw = paths.load_dataset("gleditsch") | ||
tb_regions = ds_gw["gleditsch_regions"].reset_index() | ||
|
||
# | ||
# Process data. | ||
# | ||
## Remove suffix (PRIO) or (UCDP/PRIO) | ||
tb_ucdp["region"] = tb_ucdp["region"].str.replace(r" \(.+\)", "", regex=True) | ||
tb_prio["region"] = tb_prio["region"].str.replace(r" \(.+\)", "", regex=True) | ||
|
||
## In PRIO, change conflict_type 'all' to 'state-based' | ||
tb_prio["conflict_type"] = tb_prio["conflict_type"].replace({"all": "state-based"}) | ||
|
||
# Sanity checks | ||
assert set(tb_ucdp["region"]) == set(tb_prio["region"]), "Missmatch in regions between UCDP and PRIO" | ||
expected_missmatch = {"non-state conflict", "one-sided violence", "all"} | ||
assert ( | ||
set(tb_ucdp["conflict_type"]) - set(tb_prio["conflict_type"]) == expected_missmatch | ||
), "Missmatch in conflict_type between UCDP and PRIO not as expected!" | ||
|
||
# Rename columns, keep relevant indicators | ||
tb_ucdp = tb_ucdp.rename(columns=COLUMNS_RENAME["ucdp"])[COLUMNS_INDEX + COLUMNS_INDICATORS] | ||
tb_prio = tb_prio.rename(columns=COLUMNS_RENAME["prio"])[COLUMNS_INDEX + COLUMNS_INDICATORS] | ||
|
||
# Keep relevant years for each dataset | ||
tb_ucdp = tb_ucdp.dropna(subset=COLUMNS_INDICATORS, how="all") | ||
assert tb_ucdp["year"].min() == YEAR_UCDP_MIN, "UCDP year min is not as expected!" | ||
tb_prio = tb_prio[tb_prio["year"] < YEAR_UCDP_MIN] | ||
|
||
# Concatenate | ||
tb = pr.concat([tb_ucdp, tb_prio], axis=0, ignore_index=True, short_name=paths.short_name) | ||
|
||
# Add conflict rates | ||
tb = add_indicators_extra( | ||
tb, | ||
tb_regions, | ||
columns_conflict_mortality=[ | ||
"number_deaths_ongoing_conflicts", | ||
"number_deaths_ongoing_conflicts_high", | ||
"number_deaths_ongoing_conflicts_low", | ||
], | ||
) | ||
|
||
# tb_country = make_tb_country(tb_ucdp_countries, tb_prio_countries) | ||
|
||
# Set index | ||
tb = tb.format(COLUMNS_INDEX) | ||
# tb_country = tb_country.set_index(COLUMNS_INDEX_COUNTRY, verify_integrity=True) | ||
|
||
# | ||
# Save outputs. | ||
# | ||
tables = [ | ||
tb, | ||
# tb_country, | ||
] | ||
# Create a new garden dataset with the same metadata as the meadow dataset. | ||
ds_garden = create_dataset( | ||
dest_dir, tables=tables, check_variables_metadata=True, default_metadata=ds_ucdp.metadata | ||
) | ||
|
||
# Save changes in the new garden dataset. | ||
ds_garden.save() | ||
|
||
|
||
def make_tb_country(tb_ucdp_countries: Table, tb_prio_countries: Table) -> Table: | ||
"""Combine UCDP and PRIO country data.""" | ||
|
||
# PRIO 'all' conflict is actually 'state-based' | ||
tb_prio_countries["conflict_type"] = tb_prio_countries["conflict_type"].replace({"all": "state-based"}) | ||
|
||
# Sanity checks | ||
assert set(tb_ucdp_countries["conflict_type"]) - set(tb_prio_countries["conflict_type"]) == { | ||
"one-sided violence" | ||
}, "Missmatch in conflict_type between UCDP and PRIO (country) not as expected!" | ||
assert set(tb_prio_countries["conflict_type"]) - set(tb_ucdp_countries["conflict_type"]) == { | ||
"extrasystemic" | ||
}, "Missmatch in conflict_type between UCDP and PRIO (country) not as expected!" | ||
|
||
# Preserve only pre-UCDP-time data in PRIO | ||
assert tb_ucdp_countries["year"].min() == YEAR_UCDP_MIN, "UCDP year min is not as expected!" | ||
tb_prio_countries = tb_prio_countries[tb_prio_countries["year"] < YEAR_UCDP_MIN] | ||
|
||
# Fix extrasystemic: UCDP has no data for extrasystemic, we add zeroes. | ||
## Sanity check: no extrasystemic coming from UCDP | ||
assert "extrasystemic" not in set(tb_ucdp_countries["conflict_type"]), "Extrasystemic conflicts found in UCDP!" | ||
# Build extrasystemic data for UCDP (all zeroes) | ||
tb_extra = tb_ucdp_countries[tb_ucdp_countries["conflict_type"] == "interstate"] | ||
tb_extra["conflict_type"] = "extrasystemic" | ||
tb_extra["participated_in_conflict"] = 0 | ||
## Concatenate with og | ||
tb = pr.concat([tb_ucdp_countries, tb_extra], ignore_index=True) | ||
|
||
# Concatenate | ||
tb = pr.concat( | ||
[tb_ucdp_countries, tb_prio_countries], axis=0, ignore_index=True, short_name=f"{paths.short_name}_country" | ||
) | ||
return tb |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
"""Load a garden dataset and create a grapher dataset.""" | ||
|
||
from etl.helpers import PathFinder, create_dataset | ||
|
||
# Get paths and naming conventions for current step. | ||
paths = PathFinder(__file__) | ||
|
||
|
||
def run(dest_dir: str) -> None: | ||
# | ||
# Load inputs. | ||
# | ||
# Load garden dataset. | ||
ds_garden = paths.load_dataset("ucdp_prio") | ||
|
||
# Read table from garden dataset. | ||
tb = ds_garden["ucdp_prio"] | ||
# tb_country = ds_garden["ucdp_prio_country"] | ||
|
||
# | ||
# Process data. | ||
# | ||
tb = tb.rename_index_names({"region": "country"}) | ||
|
||
# | ||
# Save outputs. | ||
# | ||
tables = [ | ||
tb, | ||
# tb_country, | ||
] | ||
# Create a new grapher dataset with the same metadata as the garden dataset. | ||
ds_grapher = create_dataset( | ||
dest_dir, tables=tables, check_variables_metadata=True, default_metadata=ds_garden.metadata | ||
) | ||
|
||
# Save changes in the new grapher dataset. | ||
ds_grapher.save() |