-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix us census #63
base: master
Are you sure you want to change the base?
Fix us census #63
Conversation
I created the test to check the dataset format but it was failing and its a problem with columns "county" and "state" that are not taking into consideration. I believe the dataset has a correct format, I tried to improve the tester but I don't want to break something. What should I do? @ManuelAlvarezC |
Hi @cgomez9, and thanks for your submission. The issue you mention is caused because you are not following the Data Model. Basically, Please fix it, and I'll review the rest of the submission |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of things are wrong here:
-
The example notebook should contain just an example call for your data source, should be as short as possible, is intended for documentation purposes for non-team members on how to use the data source.
-
The
fips_codes.py
should have been included incountry_codes.py
, and the process will the following:- Watch this PR and wait for it to be resolved and merged.
- Add those changes to your branch.
- Add the contents of this mappings into the file in the following way:
- Rename the column
name
tocountry
and change the calls totranslate_codes
ontask_geo.common.country_codes
so nothing crashes. - Add two columns
region
andsub_region
to the csvcountry_codes.csv
file. - Put the states with with the information of the country, but their ISO codes, so from a state name we can retrieve the name of the country or the codes for the state.
- Put the counties with the state and country information.
- Document the new behavior.
- Rename the column
-
The fixture should be a csv file, of 50 lines of length at maximum.
myzip.extract(listFiles[5]) | ||
data = pd.read_csv(listFiles[5], low_memory=False) | ||
|
||
data = pd.read_json(URL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to assign the dataframe to a variable and then return it, you can return it directly:
return pd.read_json(URL)
|
||
|
||
def county_fips_to_name(fips_code): | ||
return COUNTY_FIPS_CODES[fips_code] if fips_code in COUNTY_FIPS_CODES else 'Unknown' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is bad practice:
- If the value is not present, you should return
np.nan
orNone
instead ofUnknown
- The behavior of what you are doing with this if/else can be replaced with
dict.get
, like this:
return COUNTRY_FIPS_CODES.get(fips_code)
|
||
URL = 'https://api.census.gov/data/2019/pep/population?get=LASTUPDATE,POP,' \ | ||
'DENSITY&for=county:*&in=state:*&key=5436a8b95e523baaa40c22ec906af88a93f405eb ' | ||
API_KEY = '5436a8b95e523baaa40c22ec906af88a93f405eb' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the usage limits and politics of this API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to Terms of Service: "Right to Limit
Your use of the API may be subject to certain limitations on access, calls, or use as set forth within this Agreement or otherwise provided by the Census Bureau. If the Census Bureau reasonably believes that you have attempted to exceed or circumvent these limits, your ability to use the API may be temporarily or permanently blocked. The Census Bureau may monitor your use of the API to improve the service or to insure compliance with this Agreement."
Also, when you fix all this, find a way to delete from the git history the fixture and long notebook you included, as git includes the history it will make the repository unnecesary large and slow. |
I made all the requested changes, waiting for #62 to resolve. |
Description
A rework was made of the US Census module. Previously a zip file was downloaded and processed, now we get the dataset directly from an API in JSON format. Also a new common file was included with FIPS codes of all states and counties of USA.
Fixes #37
Checklist: