[Fix] recovered cases in China - _subset_by_area() records selection #484

Inglezos · 2021-01-03T00:40:16Z

Summary

For some reason, in the covid19dh.csv file, the recovered for China exist only for province level records while for "China, -" records they are not accumulated there too. The _subset_by_area() method selects only the "China, -" records when no province has been specified. This leads to the wrong result that recovered for China are zero and thus full complement is then applied, despite the fact that the provinces hold the recovered cases information indeed.

Codes and outputs:

import covsirphy as cs
# Dataset preparation
data_loader = cs.DataLoader("input")
jhu_data = data_loader.jhu()
population_data = data_loader.population()
# Scenario analysis
chn_scenario = cs.Scenario(jhu_data, population_data, "China")

Environment

CovsirPhy version: 2.13.3-iota
Python version: 3.8
Installation: Anaconda/pipenv
System: Windows

The text was updated successfully, but these errors were encountered:

lisphilar · 2021-01-03T13:29:43Z

I tried.

df = jhu_data.cleaned()
sum_df = df.loc[(df["Country"] == "China") & (df["Province"] != "-")].groupby("Date").sum()
sum_df.tail()
cs.line_plot(sum_df, title="Total value of provinces in China", y_integer=True)

Date	Confirmed	Infected	Fatal	Recovered
2020/12/30	95876	1282	4781	89813
2020/12/31	95963	1258	4782	89923
2021/1/1	96023	1210	4782	90031
2021/1/2	96086	1203	4784	90099
2021/1/3	96086	1203	4784	90099

chn_scenario = cs.Scenario(jhu_data, population_data, "China")
chn_scenario.records(variables=["Confirmed", "Infected", "Fatal", "Recovered"]).tail()

Date	Confirmed	Infected	Fatal	Recovered
2020/12/30	96592	1614	4784	90194
2020/12/31	96673	1579	4788	90306
2021/1/1	96762	1567	4789	90406
2021/1/2	96829	1524	4790	90515
2021/1/3	96829	1428	4790	90611

lisphilar · 2021-01-03T13:37:46Z

With the results above, I think we can use total value of provinces in China for recovered data in JHUData._cleaning().
Becuase the values of confirmed/fatal are not identical between the first table and the second table, it is recommended to use apply the values of the first table (sum of provinces) as China country level data.

Inglezos · 2021-01-03T18:18:54Z

Yes I agree, the province data seem more correct and hold all the recovered cases information we need.

lisphilar · 2021-01-04T14:02:54Z

I created pull request #491. Please review it.
However, full complement of recovery data is still performed with China dataset.
This may be another issue, but we may need to investigate it. (Could we divide up this work?)

Full complement is performed for many countries as follows.

import covsirphy as cs
data_loader = cs.DataLoader()
jhu_data = loader.jhu()
df = jhu_data.show_complement()
print(df.loc[df["Full_recovered"]].Country.tolist())

['Andorra', 'United Arab Emirates', 'American Samoa', 'Antigua and Barbuda', 'Burundi', 'Benin', 'Bahrain', 'Belarus', 'Bermuda', 'Barbados', 'Brunei', 'Bhutan', 'Chile', "Cote d'Ivoire", 'Cameroon', 'Democratic Republic of the Congo', 'Colombia', 'Comoros', 'Cape Verde', 'Cuba', 'Germany', 'Djibouti', 'Dominica', 'Ecuador', 'Egypt', 'Finland', 'Fiji', 'France', 'Gabon', 'United Kingdom', 'Georgia', 'Ghana', 'Gambia', 'Guinea-Bissau', 'Equatorial Guinea', 'Grand Princess', 'Grenada', 'Guam', 'Croatia', 'Iran', 'Iceland', 'Jordan', 'Kyrgyzstan', 'Cambodia', 'Saint Kitts and Nevis', 'Laos', 'Liechtenstein', 'Madagascar', 'Marshall Islands', 'Malta', 'Montenegro', 'Northern Mariana Islands', 'Mauritania', 'MS Zaandam', 'Mauritius', 'Malaysia', 'Namibia', 'Niger', 'Nicaragua', 'Netherlands', 'Norway', 'New Zealand', 'Pakistan', 'Peru', 'Papua New Guinea', 'Puerto Rico', 'Qatar', 'Saudi Arabia', 'Senegal', 'Singapore', 'Solomon Islands', 'San Marino', 'Serbia', 'South Sudan', 'Sao Tome and Principe', 'Suriname', 'Slovenia', 'Sweden', 'Swaziland', 'Seychelles', 'Chad', 'Togo', 'Thailand', 'Timor-Leste', 'Turkey', 'Taiwan', 'Uzbekistan', 'Holy See', 'Saint Vincent and the Grenadines', 'Virgin Islands, U.S.', 'Vanuatu', 'Samoa', 'Yemen', 'Zambia', 'Zimbabwe', 'China']

Inglezos · 2021-01-04T14:33:38Z

Sure I will check into this too. I don't think France has full complement though, only partial (it just caught my eye).

lisphilar · 2021-01-04T14:47:47Z

Do you have "COVID-19 Data Hub" as-of 31Dec2020 (or before)?
This appears caused by irregular records in raw dataset from Jan2021 and I found a related issue.
covid19datahub/COVID19#145

Inglezos · 2021-01-04T15:23:39Z

Yes I just realized the same problem with the actual dataset.

lisphilar · 2021-01-05T15:08:54Z

I confirmed the issue for France has been solved thanks to "COVID-19 Data Hub" with the latest data.
(We need not create a GitHub issue for this problem.)

lisphilar · 2021-01-05T15:22:56Z

I do not think Singapore recovered data needs full complement. How do you think?
Can we create a new issue for this problem? (Singapore, China)

country = "Singapore"
cs.line_plot(jhu_data.subset(country).set_index("Date"), f"Subset for {country} without complement")

Inglezos · 2021-01-05T15:59:56Z

No no, we need to revise the conditions. The problem is the 99% threshold and to identify when it is stopping

Inglezos · 2021-01-08T13:34:55Z

I confirmed the issue for France has been solved thanks to "COVID-19 Data Hub" with the latest data.
(We need not create a GitHub issue for this problem.)

The France issue unfortunately remains:

Inglezos · 2021-01-08T13:45:39Z

I notified covid19datahub team for this in covid19datahub/COVID19#145.

lisphilar · 2021-01-08T14:09:26Z

Thank you for notification to the team.
This is also discussed in the original dataset repository. opencovid19-fr/data#564

Inglezos · 2021-01-08T14:14:16Z

Yes it seems that it depends on when we download the dataset. If the covid19datahub team has applied preprocessing first then we are okay. This has to be handled preferably by the original source opencovid19-fr.

lisphilar · 2021-01-08T15:06:26Z

We will create a new issue for the threshold of full complement?
With debug for China data, it was difficult to select specific value as threshold. Around June, Recovered is near to Confirmed - Fatal because the outbreak ended very quickly according to the dataset.

Inglezos · 2021-01-08T15:11:46Z

Yes we should. If you have some time please create a new issue, otherwise I will do that later.

lisphilar · 2021-01-08T15:43:15Z

We will move to [Fix] un-expected full complement of JHU data (e.g. China) #514 regarding the problem with full complement.
We will keep eyes on France data with this issue (or create a new issue before release 2.15.0).

Inglezos · 2021-01-09T19:25:22Z

Yes we will continue in 514. I will close this issue.

Inglezos added the bug Something isn't working label Jan 3, 2021

Inglezos changed the title ~~[Fix] _subset_by_area() ignores all province recovered records if no province specified~~ [Fix] recovered cases in China - _subset_by_area() records selection Jan 3, 2021

lisphilar added this to the Release v2.15.0 milestone Jan 3, 2021

lisphilar mentioned this issue Jan 4, 2021

update: use total of province level in China #491

Merged

Inglezos mentioned this issue Jan 8, 2021

[Fix] jhu_data.total() last record issue #483

Closed

Inglezos closed this as completed Jan 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] recovered cases in China - _subset_by_area() records selection #484

[Fix] recovered cases in China - _subset_by_area() records selection #484

Inglezos commented Jan 3, 2021 •

edited

Loading

lisphilar commented Jan 3, 2021

lisphilar commented Jan 3, 2021

Inglezos commented Jan 3, 2021

lisphilar commented Jan 4, 2021 •

edited

Loading

Inglezos commented Jan 4, 2021

lisphilar commented Jan 4, 2021

Inglezos commented Jan 4, 2021

lisphilar commented Jan 5, 2021 •

edited

Loading

lisphilar commented Jan 5, 2021

Inglezos commented Jan 5, 2021

Inglezos commented Jan 8, 2021 •

edited

Loading

Inglezos commented Jan 8, 2021

lisphilar commented Jan 8, 2021 •

edited

Loading

Inglezos commented Jan 8, 2021

lisphilar commented Jan 8, 2021

Inglezos commented Jan 8, 2021

lisphilar commented Jan 8, 2021

Inglezos commented Jan 9, 2021

[Fix] recovered cases in China - _subset_by_area() records selection #484

[Fix] recovered cases in China - _subset_by_area() records selection #484

Comments

Inglezos commented Jan 3, 2021 • edited Loading

Summary

Codes and outputs:

Environment

lisphilar commented Jan 3, 2021

lisphilar commented Jan 3, 2021

Inglezos commented Jan 3, 2021

lisphilar commented Jan 4, 2021 • edited Loading

Inglezos commented Jan 4, 2021

lisphilar commented Jan 4, 2021

Inglezos commented Jan 4, 2021

lisphilar commented Jan 5, 2021 • edited Loading

lisphilar commented Jan 5, 2021

Inglezos commented Jan 5, 2021

Inglezos commented Jan 8, 2021 • edited Loading

Inglezos commented Jan 8, 2021

lisphilar commented Jan 8, 2021 • edited Loading

Inglezos commented Jan 8, 2021

lisphilar commented Jan 8, 2021

Inglezos commented Jan 8, 2021

lisphilar commented Jan 8, 2021

Inglezos commented Jan 9, 2021

Inglezos commented Jan 3, 2021 •

edited

Loading

lisphilar commented Jan 4, 2021 •

edited

Loading

lisphilar commented Jan 5, 2021 •

edited

Loading

Inglezos commented Jan 8, 2021 •

edited

Loading

lisphilar commented Jan 8, 2021 •

edited

Loading