-
Notifications
You must be signed in to change notification settings - Fork 16
Final Project 3 #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, nicely done! Your code is functional and well-documented. There are some stylistic suggestions I've made in this comment. It would be ideal to implement all the functions as fruitful functions because they allow you to write unit tests. And please try to avoid using a global variable.
text_mining.py
Outdated
'New Mexico', 'New York (state)', 'North Carolina', 'North Dakota', 'Ohio', | ||
'Oklahoma', 'Oregon', 'Pennsylvania', 'Rhode Island', 'South Carolina', | ||
'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Vermont', | ||
'Virginia', 'Washington (state)', 'West Virginia', 'Wisconsin', 'Wyoming'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you were feeling really really lazy, you could've used Wikipedia to retrieve these states( and the distances) as well! You can use Pickle to save those data so that you don't send GET requests to Wikipedia everytime you are running this code
text_mining.py
Outdated
# Orders by frequency without repeat terms | ||
seen = {} | ||
ordered_set = [seen.setdefault(x, x) for x in ordered if x not in seen] | ||
freq_words.append(ordered_set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice way of using dictionary to remove duplicates!
and 👍 for the list comprehension
text_mining.py
Outdated
# Delete words that appear in 10 or more articles | ||
for word in hist.keys(): | ||
if hist[word] >= 10: | ||
for i in range(50): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful if you add more comments for this function. I don't understand why you are running this 50 times just from reading your comments. Where is this magic 50 coming from?
text_mining.py
Outdated
find_freq_words(name) | ||
print(name) | ||
|
||
remove_matches() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, this function should only return common words from state 1 and 2. I recommend you move remove_matches()
outside this function to better modularize your code.
text_mining.py
Outdated
seen = {} | ||
ordered_set = [seen.setdefault(x, x) for x in ordered if x not in seen] | ||
freq_words.append(ordered_set) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a void function that uses a global variable. Generally, I would recommend implementing a fruitful function instead that does not rely on a global variable. Please refer to this StackOverflow article to find why https://stackoverflow.com/questions/19158339/why-are-global-variables-evil
text_mining.py
Outdated
for i in range(50): | ||
if word in freq_words[i]: | ||
freq_words[i].remove(word) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, void function vs fruitful function. https://stackoverflow.com/questions/19158339/why-are-global-variables-evil
text_mining.py
Outdated
|
||
|
||
if __name__ == "__main__": | ||
comp_al() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be easier to follow your code if you had all the function executed here in if __name__ == "__main__":
. Please Check out this StackOverFlow article as well. https://stackoverflow.com/questions/19158339/why-are-global-variables-evil
text_mining.py
Outdated
|
||
import wikipedia | ||
from collections import Counter | ||
from itertools import chain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see where you are using this chain. If you are not using this library anymore, please delete it from your final code!
Nicely done revision! |
No description provided.