Skip to content

Commit

Permalink
update ucdp_prio
Browse files Browse the repository at this point in the history
  • Loading branch information
lucasrodes committed Aug 29, 2024
1 parent 75b7f9b commit d7c77ff
Show file tree
Hide file tree
Showing 4 changed files with 346 additions and 4 deletions.
16 changes: 12 additions & 4 deletions dag/war.yml
Original file line number Diff line number Diff line change
Expand Up @@ -117,12 +117,12 @@ steps:
- data://garden/war/2023-09-27/peace_diehl

# UCDP/PRIO
data://garden/war/2023-09-21/ucdp_prio:
- data://garden/war/2023-09-21/ucdp
data://garden/war/2024-08-26/ucdp_prio:
- data://garden/war/2024-08-26/ucdp
- data://garden/war/2023-09-21/prio_v31
- data://garden/countries/2023-09-25/gleditsch
data://grapher/war/2023-10-24/ucdp_prio:
- data://garden/war/2023-09-21/ucdp_prio
data://grapher/war/2024-08-26/ucdp_prio:
- data://garden/war/2024-08-26/ucdp_prio

# Chupilkin and Koczan (Supplementary dataset to CoW)
data://meadow/war/2023-11-29/chupilkin_koczan:
Expand Down Expand Up @@ -248,3 +248,11 @@ steps:
- data://garden/demography/2023-03-31/population
data://grapher/war/2023-09-21/ucdp:
- data://garden/war/2023-09-21/ucdp

# UCDP/PRIO
data://garden/war/2023-09-21/ucdp_prio:
- data://garden/war/2023-09-21/ucdp
- data://garden/war/2023-09-21/prio_v31
- data://garden/countries/2023-09-25/gleditsch
data://grapher/war/2023-10-24/ucdp_prio:
- data://garden/war/2023-09-21/ucdp_prio
149 changes: 149 additions & 0 deletions etl/steps/data/garden/war/2024-08-26/ucdp_prio.meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
# NOTE: To learn more about the fields, hover over their names.
definitions:
all:
description_short: |-
<%- if conflict_type == "all" -%>
The << estimate >> estimate of the number of deaths of combatants and civilians due to fighting in interstate, intrastate, extrasystemic, non-state conflicts, and one-sided violence that were ongoing that year<< per_capita >>.
<%- elif conflict_type == "state-based" -%>
The << estimate >> estimate of the number of deaths of combatants and civilians due to fighting in interstate, intrastate, and extrasystemic conflicts that were ongoing that year<< per_capita >>.
<%- elif conflict_type == "intrastate (internationalized)" -%>
The << estimate >> estimate of the number of deaths of combatants and civilians due to fighting in internationalized intrastate conflicts that were ongoing that year<< per_capita >>.
<%- elif conflict_type == "intrastate (non-internationalized)" -%>
The << estimate >> estimate of the number of deaths of combatants and civilians due to fighting in non-internationalized intrastate conflicts that were ongoing that year<< per_capita >>.
<%- elif conflict_type == "one-sided violence" -%>
The << estimate >> estimate of the number of deaths of civilians from one-sided violence that was ongoing that year<< per_capita >>.
<%- elif conflict_type == "non-state conflict" -%>
The << estimate >> estimate of the number of deaths of combatants and civilians due to fighting in non-state conflicts that were ongoing that year<< per_capita >>.
<%- else -%>
The << estimate >> estimate of the number of deaths of combatants and civilians due to fighting in << conflict_type >> conflicts that were ongoing that year<< per_capita >>.
<%- endif -%>
description_short_per_capita: <% set per_capita = ", per 100,000 people" %>
{definitions.all.description_short}
conflict_type_base: |-
This includes combatant and civilian deaths due to fighting
conflict_type: |-
<%- if conflict_type == "all" -%>
An armed conflict is a disagreement between organized groups, or between one organized group and civilians, that causes at least 25 deaths during a year. {definitions.all.conflict_type_base}.
<%- elif conflict_type == "state-based" -%>
A state-based conflict is a conflict between two armed groups, at least one of which is a state, that causes at least 25 deaths during a year. {definitions.all.conflict_type_base}.
<%- elif conflict_type == "interstate" -%>
An interstate conflict is a conflict between states that causes at least 25 deaths during a year. {definitions.all.conflict_type_base}.
<%- elif conflict_type == "intrastate" -%>
An intrastate conflict is a conflict between a state and a non-state armed group that causes at least 25 deaths during a year. {definitions.all.conflict_type_base}. If a foreign state is involved, it is called "internationalized", and "non-internationalized" otherwise.
<%- elif conflict_type == "intrastate (internationalized)" -%>
An internationalized intrastate conflict is a conflict between a state and a non-state armed group, with involvement of a foreign state, that causes at least 25 deaths during a year. {definitions.all.conflict_type_base}.
<%- elif conflict_type == "intrastate (non-internationalized)" -%>
An non-internationalized intrastate conflict is a conflict between a state and a non-state armed group, without involvement of a foreign state, that causes at least 25 deaths during a year. {definitions.all.conflict_type_base}.
<%- elif conflict_type == "extrasystemic" -%>
An extrasystemic conflict is a conflict between a state and a non-state armed group outside its territory that causes at least 25 deaths during a year. {definitions.all.conflict_type_base}.
<%- elif conflict_type == "non-state conflict" -%>
A non-state conflict is a conflict between non-state armed groups, such as rebel groups, criminal organizations, or ethnic groups, that causes at least 25 deaths during a year. {definitions.all.conflict_type_base}.
<%- elif conflict_type == "one-sided violence" -%>
One-sided violence is the use of armed force by a state or non-state armed group against civilians that causes at least 25 civilian deaths during a year.
<%- endif -%>
common:
presentation:
topic_tags:
- War & Peace
grapher_config:
selectedEntityNames:
- Africa
- Americas
- Asia and Oceania
- Europe
- Middle East
description_key:
- |-
{definitions.all.conflict_type}
description_processing: |-
Data prior to 1989 is sourced from PRIO. Data since 1989 is sourced from UCDP.
display:
numDecimalPlaces: 0

# Learn more about the available fields:
# http://docs.owid.io/projects/etl/architecture/metadata/reference/dataset/
dataset:
update_period_days: 365
title: UCDP/PRIO, History of war

# Learn more about the available fields:
# http://docs.owid.io/projects/etl/architecture/metadata/reference/tables/
tables:
# MAIN INDICATORS
ucdp_prio:
variables:
##################
# Ongoing deaths #
##################
number_deaths_ongoing_conflicts:
title: Deaths in ongoing conflicts (best estimate)
unit: deaths
description_short: |-
<% set estimate = "best" %>
{definitions.all.description_short}
description_processing: |-
{definitions.common.description_processing}
For conflict years without a best deaths estimate in the PRIO data, we conservatively coded the low estimate.
number_deaths_ongoing_conflicts_high:
title: Deaths in ongoing conflicts (high estimate)
unit: deaths
description_short: |-
<% set estimate = "high" %>
{definitions.all.description_short}
number_deaths_ongoing_conflicts_low:
title: Deaths in ongoing conflicts (low estimate)
unit: deaths
description_short: |-
<% set estimate = "low" %>
{definitions.all.description_short}
number_deaths_ongoing_conflicts_per_capita:
title: Death rate in ongoing conflicts (best estimate)
unit: deaths per 100,000 people
description_short: |-
<% set estimate = "best" %>
{definitions.all.description_short_per_capita}
description_processing: |-
{definitions.common.description_processing}
For conflict years without a best deaths estimate in the PRIO data, we conservatively coded the low estimate.
display:
numDecimalPlaces: 1

number_deaths_ongoing_conflicts_high_per_capita:
title: Death rate in ongoing conflicts (high estimate)
unit: deaths per 100,000 people
description_short: |-
<% set estimate = "high" %>
{definitions.all.description_short_per_capita}
display:
numDecimalPlaces: 1

number_deaths_ongoing_conflicts_low_per_capita:
title: Death rate in ongoing conflicts (low estimate)
unit: deaths per 100,000 people
description_short: |-
<% set estimate = "low" %>
{definitions.all.description_short_per_capita}
display:
numDecimalPlaces: 1
147 changes: 147 additions & 0 deletions etl/steps/data/garden/war/2024-08-26/ucdp_prio.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
"""Load a meadow dataset and create a garden dataset."""

import owid.catalog.processing as pr
from owid.catalog import Table
from shared import add_indicators_extra

from etl.helpers import PathFinder, create_dataset

# Get paths and naming conventions for current step.
paths = PathFinder(__file__)
# Index columns
COLUMNS_INDEX = ["year", "region", "conflict_type"]
COLUMNS_INDEX_COUNTRY = ["year", "country", "conflict_type"]
# Rename columns (has an entry for each dataset. All entries should be dictionaries with the same number of entries (and identical values))
COLUMNS_RENAME = {
"ucdp": {
"number_deaths_ongoing_conflicts": "number_deaths_ongoing_conflicts",
"number_deaths_ongoing_conflicts_high": "number_deaths_ongoing_conflicts_high",
"number_deaths_ongoing_conflicts_low": "number_deaths_ongoing_conflicts_low",
},
"prio": {
"number_deaths_ongoing_conflicts_battle_low": "number_deaths_ongoing_conflicts_low",
"number_deaths_ongoing_conflicts_battle_high": "number_deaths_ongoing_conflicts_high",
"number_deaths_ongoing_conflicts_battle": "number_deaths_ongoing_conflicts",
},
}
# Indicator columns
COLUMNS_INDICATORS = list(COLUMNS_RENAME["ucdp"].values())
# First year in UCDP
YEAR_UCDP_MIN = 1989


def run(dest_dir: str) -> None:
#
# Load inputs.
#
# Load meadow dataset.
ds_ucdp = paths.load_dataset("ucdp")
# Read table from meadow dataset.
tb_ucdp = ds_ucdp["ucdp"].reset_index()
# tb_ucdp_countries = ds_ucdp["ucdp_country"].reset_index()

# Load meadow dataset.
ds_prio = paths.load_dataset("prio_v31")
# Read table from meadow dataset.
tb_prio = ds_prio["prio_v31"].reset_index()
# tb_prio_countries = ds_prio["prio_v31_country"].reset_index()

# Read table from COW codes
ds_gw = paths.load_dataset("gleditsch")
tb_regions = ds_gw["gleditsch_regions"].reset_index()

#
# Process data.
#
## Remove suffix (PRIO) or (UCDP/PRIO)
tb_ucdp["region"] = tb_ucdp["region"].str.replace(r" \(.+\)", "", regex=True)
tb_prio["region"] = tb_prio["region"].str.replace(r" \(.+\)", "", regex=True)

## In PRIO, change conflict_type 'all' to 'state-based'
tb_prio["conflict_type"] = tb_prio["conflict_type"].replace({"all": "state-based"})

# Sanity checks
assert set(tb_ucdp["region"]) == set(tb_prio["region"]), "Missmatch in regions between UCDP and PRIO"
expected_missmatch = {"non-state conflict", "one-sided violence", "all"}
assert (
set(tb_ucdp["conflict_type"]) - set(tb_prio["conflict_type"]) == expected_missmatch
), "Missmatch in conflict_type between UCDP and PRIO not as expected!"

# Rename columns, keep relevant indicators
tb_ucdp = tb_ucdp.rename(columns=COLUMNS_RENAME["ucdp"])[COLUMNS_INDEX + COLUMNS_INDICATORS]
tb_prio = tb_prio.rename(columns=COLUMNS_RENAME["prio"])[COLUMNS_INDEX + COLUMNS_INDICATORS]

# Keep relevant years for each dataset
tb_ucdp = tb_ucdp.dropna(subset=COLUMNS_INDICATORS, how="all")
assert tb_ucdp["year"].min() == YEAR_UCDP_MIN, "UCDP year min is not as expected!"
tb_prio = tb_prio[tb_prio["year"] < YEAR_UCDP_MIN]

# Concatenate
tb = pr.concat([tb_ucdp, tb_prio], axis=0, ignore_index=True, short_name=paths.short_name)

# Add conflict rates
tb = add_indicators_extra(
tb,
tb_regions,
columns_conflict_mortality=[
"number_deaths_ongoing_conflicts",
"number_deaths_ongoing_conflicts_high",
"number_deaths_ongoing_conflicts_low",
],
)

# tb_country = make_tb_country(tb_ucdp_countries, tb_prio_countries)

# Set index
tb = tb.format(COLUMNS_INDEX)
# tb_country = tb_country.set_index(COLUMNS_INDEX_COUNTRY, verify_integrity=True)

#
# Save outputs.
#
tables = [
tb,
# tb_country,
]
# Create a new garden dataset with the same metadata as the meadow dataset.
ds_garden = create_dataset(
dest_dir, tables=tables, check_variables_metadata=True, default_metadata=ds_ucdp.metadata
)

# Save changes in the new garden dataset.
ds_garden.save()


def make_tb_country(tb_ucdp_countries: Table, tb_prio_countries: Table) -> Table:
"""Combine UCDP and PRIO country data."""

# PRIO 'all' conflict is actually 'state-based'
tb_prio_countries["conflict_type"] = tb_prio_countries["conflict_type"].replace({"all": "state-based"})

# Sanity checks
assert set(tb_ucdp_countries["conflict_type"]) - set(tb_prio_countries["conflict_type"]) == {
"one-sided violence"
}, "Missmatch in conflict_type between UCDP and PRIO (country) not as expected!"
assert set(tb_prio_countries["conflict_type"]) - set(tb_ucdp_countries["conflict_type"]) == {
"extrasystemic"
}, "Missmatch in conflict_type between UCDP and PRIO (country) not as expected!"

# Preserve only pre-UCDP-time data in PRIO
assert tb_ucdp_countries["year"].min() == YEAR_UCDP_MIN, "UCDP year min is not as expected!"
tb_prio_countries = tb_prio_countries[tb_prio_countries["year"] < YEAR_UCDP_MIN]

# Fix extrasystemic: UCDP has no data for extrasystemic, we add zeroes.
## Sanity check: no extrasystemic coming from UCDP
assert "extrasystemic" not in set(tb_ucdp_countries["conflict_type"]), "Extrasystemic conflicts found in UCDP!"
# Build extrasystemic data for UCDP (all zeroes)
tb_extra = tb_ucdp_countries[tb_ucdp_countries["conflict_type"] == "interstate"]
tb_extra["conflict_type"] = "extrasystemic"
tb_extra["participated_in_conflict"] = 0
## Concatenate with og
tb = pr.concat([tb_ucdp_countries, tb_extra], ignore_index=True)

# Concatenate
tb = pr.concat(
[tb_ucdp_countries, tb_prio_countries], axis=0, ignore_index=True, short_name=f"{paths.short_name}_country"
)
return tb
38 changes: 38 additions & 0 deletions etl/steps/data/grapher/war/2024-08-26/ucdp_prio.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
"""Load a garden dataset and create a grapher dataset."""

from etl.helpers import PathFinder, create_dataset

# Get paths and naming conventions for current step.
paths = PathFinder(__file__)


def run(dest_dir: str) -> None:
#
# Load inputs.
#
# Load garden dataset.
ds_garden = paths.load_dataset("ucdp_prio")

# Read table from garden dataset.
tb = ds_garden["ucdp_prio"]
# tb_country = ds_garden["ucdp_prio_country"]

#
# Process data.
#
tb = tb.rename_index_names({"region": "country"})

#
# Save outputs.
#
tables = [
tb,
# tb_country,
]
# Create a new grapher dataset with the same metadata as the garden dataset.
ds_grapher = create_dataset(
dest_dir, tables=tables, check_variables_metadata=True, default_metadata=ds_garden.metadata
)

# Save changes in the new grapher dataset.
ds_grapher.save()

0 comments on commit d7c77ff

Please sign in to comment.