Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Review for Project #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Code Review for Project #1

wants to merge 1 commit into from

Conversation

learningsomethingnew
Copy link
Collaborator

WonderingStar to perform code review & LearningSomethingNew to update code on feedback.

Copy link
Collaborator

@WanderingStar WanderingStar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall comments:

  • Good division of work into bite-sized functions
  • Unfortunately, many of the functions are single-purpose because they hard-code application logic in them. Think about whether you can factor things like the names of the directories out to make more generally useful functions that you could reuse elsewhere

Most of the things I comment on below are stylistic or for greater intelligibility of the code. However the use of \ for directory separators means that this code doesn't work on non-Windows OSes (and fails in a way that gives the runner very little information about what went wrong)


def verify_dirs_exist():
#This notebook requires a few directories
dirs = ["download", "download\csv", "download\excel"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these directories intended to be literally named download\csv and download\excel? Or are csv and excel intended to be subdirectories under download? On non-Windows OSes, \ is not the directory separator character, so this does the former. You can use os.path.join('download', 'csv') to work cross-platform.

If the excel and csv directories are meant to be subdirectories of download, you don't need to make download separately. os.makedirs operates recursively.

full_path = os.path.join(curpath, d) # join cwd with proposed d
create_dir_if_not_exists(full_path)

def create_dir_if_not_exists(full_path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.makedirs already does the file existence check, essentially.

try:
    os.makedirs(full_path)
    print(f"Created directory {full_path}")
except FileExistsError:
    print(f"Directory {full_path} already existed")

The try/except version is slightly preferable because in the if version, there's a tiny chance that someone could create the directory between when you do the if and when you do the makedirs.

import glob
import datetime as dt

def verify_dirs_exist():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming of this function is a bit misleading. It doesn't just verify that they exist, it actually creates them if they don't exist. I think it should be called ensure_dirs_exist.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also make this function take in the list of directories to create, so that it's re-usable.

else:
print("Directory ", full_path, " already existed")

def generate_file_name_from_url(url):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment with a sample URL and the file name you're generating from it would make this much easier to understand.

print("Directory ", full_path, " already existed")

def generate_file_name_from_url(url):
month_year = url.split("\\")[-1].split("_")[-1].split(".")[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does url.split("\\") do anything for these URLs? It looks like a sample url is https://www.huduser.gov/portal/datasets/usps/ZIP_COUNTY_032010.xlsx, which doesn't contain the \ character.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the (rare?) cases where a regular expression might make things clearer...

m = re.search('ZIP_COUNTY_(\d\d)(\d\d\d\d)\.xlsx', url) 
month, year = m.groups()

#open and get the excel dataframe
excel_df = process_excel_file(full_file_path)
#merge the excel file with the fips data
merged_df = fips_df.merge(excel_df)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if this function did not refer to fits_df, which is defined outside of the function. That should be passed in as an argument or defined within the function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same is true for cur_year

print(csv_path)
try:
merged_df.to_csv(csv_path, encoding='utf-8', index=False)
except:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This except is too broad. When I first tried to run this, the code did nothing because it was trying to write to a directory that didn't exist. It didn't show me the error because this clause swallowed it. If you're expecting a particular error, you should catch it as specifically as possible.

merged_df.to_csv(csv_path, encoding='utf-8', index=False)
except:
#once we get to a Q that hasn't happened yet, we'll get an XLDRerror
print("Operation has completed")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which operation has completed? This message could be more descriptive. "All files have been downloaded and processed"?

try:
merged_df.to_csv(csv_path, encoding='utf-8', index=False)
except:
#once we get to a Q that hasn't happened yet, we'll get an XLDRerror
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more straightforward way to detect this? Check to see if the input is empty? Check if year > now.year or year == now.year and month > now.month?

import os
import time
import pandas as pd
import glob
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you use glob in this code.

@WanderingStar
Copy link
Collaborator

Oh, one thing I forgot to mention. Your requirements.txt needs xlrd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants