-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix] recovered cases in China - _subset_by_area() records selection #484
Comments
I tried. df = jhu_data.cleaned()
sum_df = df.loc[(df["Country"] == "China") & (df["Province"] != "-")].groupby("Date").sum()
sum_df.tail()
cs.line_plot(sum_df, title="Total value of provinces in China", y_integer=True)
chn_scenario = cs.Scenario(jhu_data, population_data, "China")
chn_scenario.records(variables=["Confirmed", "Infected", "Fatal", "Recovered"]).tail()
|
With the results above, I think we can use total value of provinces in China for recovered data in |
Yes I agree, the province data seem more correct and hold all the recovered cases information we need. |
I created pull request #491. Please review it. Full complement is performed for many countries as follows. import covsirphy as cs
data_loader = cs.DataLoader()
jhu_data = loader.jhu()
df = jhu_data.show_complement()
print(df.loc[df["Full_recovered"]].Country.tolist()) ['Andorra', 'United Arab Emirates', 'American Samoa', 'Antigua and Barbuda', 'Burundi', 'Benin', 'Bahrain', 'Belarus', 'Bermuda', 'Barbados', 'Brunei', 'Bhutan', 'Chile', "Cote d'Ivoire", 'Cameroon', 'Democratic Republic of the Congo', 'Colombia', 'Comoros', 'Cape Verde', 'Cuba', 'Germany', 'Djibouti', 'Dominica', 'Ecuador', 'Egypt', 'Finland', 'Fiji', 'France', 'Gabon', 'United Kingdom', 'Georgia', 'Ghana', 'Gambia', 'Guinea-Bissau', 'Equatorial Guinea', 'Grand Princess', 'Grenada', 'Guam', 'Croatia', 'Iran', 'Iceland', 'Jordan', 'Kyrgyzstan', 'Cambodia', 'Saint Kitts and Nevis', 'Laos', 'Liechtenstein', 'Madagascar', 'Marshall Islands', 'Malta', 'Montenegro', 'Northern Mariana Islands', 'Mauritania', 'MS Zaandam', 'Mauritius', 'Malaysia', 'Namibia', 'Niger', 'Nicaragua', 'Netherlands', 'Norway', 'New Zealand', 'Pakistan', 'Peru', 'Papua New Guinea', 'Puerto Rico', 'Qatar', 'Saudi Arabia', 'Senegal', 'Singapore', 'Solomon Islands', 'San Marino', 'Serbia', 'South Sudan', 'Sao Tome and Principe', 'Suriname', 'Slovenia', 'Sweden', 'Swaziland', 'Seychelles', 'Chad', 'Togo', 'Thailand', 'Timor-Leste', 'Turkey', 'Taiwan', 'Uzbekistan', 'Holy See', 'Saint Vincent and the Grenadines', 'Virgin Islands, U.S.', 'Vanuatu', 'Samoa', 'Yemen', 'Zambia', 'Zimbabwe', 'China'] |
Sure I will check into this too. I don't think France has full complement though, only partial (it just caught my eye). |
Do you have "COVID-19 Data Hub" as-of 31Dec2020 (or before)? |
Yes I just realized the same problem with the actual dataset. |
I confirmed the issue for France has been solved thanks to "COVID-19 Data Hub" with the latest data. |
No no, we need to revise the conditions. The problem is the 99% threshold and to identify when it is stopping |
I notified covid19datahub team for this in covid19datahub/COVID19#145. |
Thank you for notification to the team. |
Yes it seems that it depends on when we download the dataset. If the covid19datahub team has applied preprocessing first then we are okay. This has to be handled preferably by the original source opencovid19-fr. |
We will create a new issue for the threshold of full complement? |
Yes we should. If you have some time please create a new issue, otherwise I will do that later. |
|
Yes we will continue in 514. I will close this issue. |
Summary
For some reason, in the covid19dh.csv file, the recovered for China exist only for province level records while for "China, -" records they are not accumulated there too. The
_subset_by_area()
method selects only the "China, -" records when no province has been specified. This leads to the wrong result that recovered for China are zero and thus full complement is then applied, despite the fact that the provinces hold the recovered cases information indeed.Codes and outputs:
Environment
The text was updated successfully, but these errors were encountered: