Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix us census #63

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Fix us census #63

wants to merge 13 commits into from

Conversation

cgomez9
Copy link

@cgomez9 cgomez9 commented Apr 15, 2020

Description

A rework was made of the US Census module. Previously a zip file was downloaded and processed, now we get the dataset directly from an API in JSON format. Also a new common file was included with FIPS codes of all states and counties of USA.

Fixes #37

Checklist:

  • I have read the CONTRIBUTING GUIDE and made sure this Pull Request is compliant with it.
  • I have read the DATA MODEL and DATA SOURCES and made sure this Pull Request is compliant with it.
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

@cgomez9
Copy link
Author

cgomez9 commented Apr 15, 2020

I created the test to check the dataset format but it was failing and its a problem with columns "county" and "state" that are not taking into consideration. I believe the dataset has a correct format, I tried to improve the tester but I don't want to break something. What should I do? @ManuelAlvarezC

@ManuelAlvarezC
Copy link
Collaborator

Hi @cgomez9, and thanks for your submission.

The issue you mention is caused because you are not following the Data Model. Basically, state column should be renamed to region and county to sub_region.

Please fix it, and I'll review the rest of the submission

@ManuelAlvarezC ManuelAlvarezC self-requested a review April 19, 2020 21:06
@ManuelAlvarezC ManuelAlvarezC added the ci-fail When the CI fails for a PR label Apr 19, 2020
@ManuelAlvarezC ManuelAlvarezC added ci-ok When the CI passes for a PR and removed ci-fail When the CI fails for a PR labels Apr 22, 2020
Copy link
Collaborator

@ManuelAlvarezC ManuelAlvarezC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of things are wrong here:

  1. The example notebook should contain just an example call for your data source, should be as short as possible, is intended for documentation purposes for non-team members on how to use the data source.

  2. The fips_codes.py should have been included in country_codes.py, and the process will the following:

    • Watch this PR and wait for it to be resolved and merged.
    • Add those changes to your branch.
    • Add the contents of this mappings into the file in the following way:
      • Rename the column name to country and change the calls to translate_codes on task_geo.common.country_codes so nothing crashes.
      • Add two columns region and sub_region to the csv country_codes.csv file.
      • Put the states with with the information of the country, but their ISO codes, so from a state name we can retrieve the name of the country or the codes for the state.
      • Put the counties with the state and country information.
      • Document the new behavior.
  3. The fixture should be a csv file, of 50 lines of length at maximum.

myzip.extract(listFiles[5])
data = pd.read_csv(listFiles[5], low_memory=False)

data = pd.read_json(URL)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to assign the dataframe to a variable and then return it, you can return it directly:

return pd.read_json(URL)



def county_fips_to_name(fips_code):
return COUNTY_FIPS_CODES[fips_code] if fips_code in COUNTY_FIPS_CODES else 'Unknown'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bad practice:

  1. If the value is not present, you should return np.nan or None instead of Unknown
  2. The behavior of what you are doing with this if/else can be replaced with dict.get, like this:
return COUNTRY_FIPS_CODES.get(fips_code)


URL = 'https://api.census.gov/data/2019/pep/population?get=LASTUPDATE,POP,' \
'DENSITY&for=county:*&in=state:*&key=5436a8b95e523baaa40c22ec906af88a93f405eb '
API_KEY = '5436a8b95e523baaa40c22ec906af88a93f405eb'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the usage limits and politics of this API?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to Terms of Service: "Right to Limit
Your use of the API may be subject to certain limitations on access, calls, or use as set forth within this Agreement or otherwise provided by the Census Bureau. If the Census Bureau reasonably believes that you have attempted to exceed or circumvent these limits, your ability to use the API may be temporarily or permanently blocked. The Census Bureau may monitor your use of the API to improve the service or to insure compliance with this Agreement."

@ManuelAlvarezC
Copy link
Collaborator

Also, when you fix all this, find a way to delete from the git history the fixture and long notebook you included, as git includes the history it will make the repository unnecesary large and slow.

@cgomez9
Copy link
Author

cgomez9 commented Apr 23, 2020

I made all the requested changes, waiting for #62 to resolve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocked ci-ok When the CI passes for a PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error 500 when calling us_census()
2 participants