Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a pull request #12

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

Subeen-Kim
Copy link

No description provided.

# commonwords.py
Required packages:

$ pip install wikipedia

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put more documentation on README file.
Besides required packages, you can put descriptions, relevant links, or how to use it

for word in text_1:
geometric_mean = math.sqrt(histogram_1.get(word, 0)*histogram_2.get(word, 0))
if geometric_mean != 0:
if word not in preposition and word not in pronoun and word not in article and word not in commonverb and word not in alphabet and word not in conjunction and word not in number:
Copy link

@mightydeveloper mightydeveloper Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stopwords = preposition+pronoun+number+article+commonverb+alphabet+conjunction
if word not in stopwords will be more concise code. I would make them as one list before using in the line 63. Also, some libraries already have those commonly ignored list of words as "stopwords". If you use those libraries, you don't have to write out all the words and might save your time.

text = text.replace(')','')
text = text.replace("'",'')
text = text.replace(".",'')
text = text.replace(",",'')
Copy link

@mightydeveloper mightydeveloper Oct 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a minor suggestion, but if you want to make code a bit more scalable and concise code,
you might want to consider

for x, y in [('-',' '), ('(',''), (')',''), ("'",''), (".",''), (",",'')]:
    text = text.replace(x, y)

count[word] = 1
else:
count[word] += 1
return count

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also minor suggestion)
In python library, there is a special dictionary called Counter.
By default, this initializes the count to be 0, so that you wouldn't need to check if the keyword is inside the dictionary or not.
An equivalent code using Counter will be like following

from collections import Counter
count = Counter()
for word in unsorted_words:
    count[word] += 1
return count

count[word] += 1
return count

#histogram(text_to_word('Subeen-is-(an)-idiot'))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the unnecessary comments before submitting

geometric_mean = math.sqrt(histogram_1.get(word, 0)*histogram_2.get(word, 0))
if geometric_mean != 0:
common_count[word] = geometric_mean
else:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else statement is unnecessary here

@mightydeveloper
Copy link

Overall, I think you did a great job on documenting functions using docstrings, but still there can be an improvement on README file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants