Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IMPORTANT: Wrong lat, long values. #43

Open
juancalvof opened this issue Apr 6, 2020 · 6 comments
Open

IMPORTANT: Wrong lat, long values. #43

juancalvof opened this issue Apr 6, 2020 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@juancalvof
Copy link

juancalvof commented Apr 6, 2020

Description

Please, revise lat, long values. There are some countries that are wrong. I have made this Viz for helping to visualize the situation. Clic dot to see country, lat, long values:
https://juancalvo.carto.com/builder/3ad41c17-bc07-4889-b047-5903300806c4/embed

@ManuelAlvarezC ManuelAlvarezC self-assigned this Apr 6, 2020
@ManuelAlvarezC
Copy link
Collaborator

Hi @JuanCalvoFerrandiz, thanks again for your bug report.

Do you mind detailing with data source returns this values?

Thanks!

@ManuelAlvarezC ManuelAlvarezC added the bug Something isn't working label Apr 6, 2020
@juancalvof
Copy link
Author

I hope this data helps :)

Countries data.zip

@ManuelAlvarezC
Copy link
Collaborator

This has already been reported to CoronaDataScraper : covidatlas/coronadatascraper#528

During the afternoon I will try to find more cases using the data you provided and send them also the visualization you made in case it may help them.

@juancalvof
Copy link
Author

juancalvof commented Apr 9, 2020

Hi guys,

This is my exploration code for Viz fixing: Agregation, lat,long, adding ISO 3 and adding an official name column. Hope that helps:

import task_geo.data_sources as ds
import pandas as pd


# A function that returns de unique values of a column id a df sorted
def series_unique(df, column):
    unique_country_base = df.loc[:, column].unique()
    return pd.DataFrame(data=unique_country_base,
                        columns=["unique_" + column]).sort_values("unique_" + column, ignore_index=True)


# A function that creates a dictionary from a values in a column of df_carto
def create_dict(column):
    dict = {}
    for value in df_unique_country_cl.loc[:, "unique_country"]:
        value_dict = df_carto.loc[df_carto['country'] == value, column].iloc[0]
        dict[value] = value_dict
    return dict

# 0_Correction of aggregate values in countries
data_cds = ds.cds()
data_cds.loc[(data_cds["state"].isnull()) & (data_cds["county"].isnull()) & (data_cds["city"].isnull()), "aggregate"]\
    = "country"

# Getting unique values from country column
data_cds_country_raw = data_cds.loc[(data_cds["aggregate"] == "country")]
df_unique_country = series_unique(data_cds_country_raw, "country")

#Getting df_carto
df_carto = pd.read_csv("..\DATA\RAW\Countries data\world_borders.csv", sep=",")
df_carto.rename(columns={"name": "country"}, inplace=True)

# 1_Getting country_carto column
# Getting unique values from country column
df_unique_country_cl = series_unique(df_carto, "country")

# Getting values with no direct equivalence in df_carto
df_left = df_unique_country.merge(df_unique_country_cl, how='outer', indicator=True).loc[
    lambda x: x['_merge'] == 'left_only']

list = df_left.loc[:, "unique_country"]
list2 = ["Brunei Darussalam", "Congo", "Czech Republic", "Cote d'Ivoire", "Timor-Leste", "Swaziland",
         "Iran (Islamic Republic of)", "Kosovo", "Lao People's Democratic Republic", "Libyan Arab Jamahiriya",
         "Republic of Moldova", "Burma", "The former Yugoslav Republic of Macedonia", "Palestine",
         "Western Sahara", "Korea, Democratic People's Republic of", "South Sudan", "Syrian Arab Republic",
         "Sao Tome and Principe", "United Republic of Tanzania", "Bahamas", "Gambia", "Holy See (Vatican City)",
         "Viet Nam"]

# Create a zip object from two lists and then a dict
dict = dict(zip(list, list2))
data_cds.insert(4, "country_carto", data_cds.loc[:, "country"].map(dict).fillna(data_cds.loc[:, "country"]))

# 2_Getting iso
dict_iso = create_dict("iso3")
dict_iso["Kosovo"] = "RKS"
dict_iso["South Sudan"] = "SSD"
data_cds.insert(5, "iso3", data_cds.loc[:, "country_carto"].map(dict_iso))

# Data_cds_country
data_cds_country = data_cds.loc[(data_cds["aggregate"] == "country")]

# 3_Getting lat just in countries
dict_lat = create_dict("lat")
dict_lat["Kosovo"] = 42.667542
dict_lat["South Sudan"] = 6.8769908
data_cds_country['lat'] = data_cds_country.loc[:, "country_carto"].map(dict_lat)

# 4_Getting long just in countries
dict_long = create_dict("lon")
dict_long["Kosovo"] = 21.166191
dict_long["South Sudan"] = 31.3069782
data_cds_country['long'] = data_cds_country["country_carto"].map(dict_long)

data_cds_country.to_csv(r"C:\Users\juanc\Google Drive\CORONAWHY\DATASETS\data_cds_countries.csv", encoding="UTF-8")


[world_borders.zip](https://github.com/CoronaWhy/task-geo/files/4456383/world_borders.zip)

@ManuelAlvarezC
Copy link
Collaborator

While reading the docs I came to the realization that the values of the field aggregationare completely correct, the thing is that we should be looking at the level field. More info

Will upload this along the adding of the iso codes.

@ManuelAlvarezC
Copy link
Collaborator

Update from CDS team:

@ManuelAlvarezC @JuanCalvoFerrandiz we are soon migrating to totally different coordinates, calculated in country-levels. https://github.com/hyperknot/country-levels

Please review if this issue is still present in a few days.

Source: covidatlas/coronadatascraper#528 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants