-
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📊 war: add brecke dataset #1367
Conversation
Hi @lucasrodes! I agree with distributing deaths evenly across years for conflicts that last several years. I also agree that we should code conflict #3233 as region 'World' and distribute the deaths accordingly. I worry that the approach you propose for dealing with the many conflicts for which death estimates are missing will make it difficult to visualize the data, and may confuse our users. We will have missing values for a fair (if not large) share of years. This will distort line charts, as the many years with missing data (which most likely have few deaths) will be skipped, and the lines charted across them. This will make the (implicit) area under the line incorrectly large. Bar charts are not a good alternative, because the dataset covers many years, so users will struggle to see that many years are skipped, or which are not included. At the same time, I agree that entirely ignoring these conflicts and setting their deaths to zero (which is what our previous work with the data did) also seems wrong. I therefore propose another approach: Brecke writes that he only includes major violent conflicts. Among other characteristics, this means for him that there were at least 32 deaths per year. So what we could do for conflicts with missing death estimates is to create a (possibly very) lower-bound estimate of 32 deaths for conflicts that lasted one year, 64 for those lasting two years, and so on. This would take the source seriously, allow us to calculate aggregates while still including these conflicts, and we could use line charts to visualize the data. I would definitely make that clear in the indicator description, and probably even add a disclaimer to each chart using the data. This also means that we for now focus entirely on all fatalities and set military fatalities aside because we cannot make any analogous inferences about the latter. What do you think about this approach? |
@bastianherre |
Hi @bastianherre, in Brecke's dataset, there are 26 conflicts with unspecified end years.
My first intuition is that while these conflicts may have end years later than 2000, we should assume that the dataset only considers data until 2000. This means that:
I haven't found much in Brecke's documentation on missing values for end-years. |
Hi @lucasrodes! Thanks for checking in. Yes, let's do it as you say! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @lucasrodes
The code generally looks good, but I think there are some issues with the World aggregate.
For some of the variables, the World value is less than the sum of all the other regions between 1937-1945 only. I guess this is because of the 'Unknown' flag around WW2? It just looks a bit wrong if you do a stacked area chart e.g. here.
The affected variables are:
- Number of ongoing conflicts - Conflict_type: all
- Number of new conflicts - Conflict_type: all
- Number of ongoing conflicts - Conflict_type: internal
- Number of new conflicts - Conflict_type: internal
For some other variables, the value for the World is greater than the sum of all the other regions, also just between 1937 and 1945. The affected variables are:
- Number of deaths in ongoing conflicts - Conflict_type: internal
- Number of deaths in ongoing conflicts - Conflict_type: all
And for the soldier deaths, the World value is only equal to the sum of the other regions for the years 1937-1945, and for all other years, it is 0. These variables:
Thanks for reviewing, Fiona; it helps a lot, really <3!
The number of ongoing conflicts in the World may not be the sum of all conflicts in all regions. This is because the same conflict may occur in multiple regions (e.g. WWII) but should only be counted as +1 globally. The same happens with the number of new conflicts. In this particular period, we have +1 conflict in all regions (we consider WWII as an 'ongoing conflict' in all regions). So, if you add the numbers for all regions, you'd count this conflict several times. I have added a clarification to the indicator
Good catch! I just found a critical bug in the code. It should be fixed now.
I have just removed these metrics for now. It is not needed. |
parent issue: https://github.com/owid/owid-issues/issues/446
👀 dataset preview
📝 Notes
(mostly for @bastianherre)
conflict_type="intrastate"
. For example, take the year 1539, there are three conflicts: _ Spain-Yucatan, 1539_ (inter), _ Spain (Ghent), 1539-40_ (intra) and Spain-Florida, 1539 (inter). The fieldtotal_fatalities
is only filled for the intra-conflict, for the inter-conflicts is null (and I think assuming zero would be wrong?).