Skip to content

Commit

Permalink
fix typos in experiences
Browse files Browse the repository at this point in the history
  • Loading branch information
orchid00 committed Dec 6, 2018
1 parent 7c61c27 commit 9975109
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 6 deletions.
4 changes: 2 additions & 2 deletions docs/experiences_shared.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 4 additions & 4 deletions experiences_shared.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,16 +35,16 @@ found for several job titles.
Once we found some jobs to parse, the other question was, what to look for. For
text mining, there are a few options, sentences, words or group of words. The first
test was with sentences, but that came up to be the first problem. Many ads use
bullet points, which don't end up with a period `.`. In any case, I continue to look
for words in an attempt to see which ones the most common words and what does that
bullet points, which don't end up with a period. In any case, I continued to look
for words in an attempt to see which ones were the most common words and what does that
tell me. Most common words for ads related to "Data Steward" are data, business,
management and experience, [with 96% and above occurrence in the job ads searched](https://github.com/orchid00/jobsScrapping/blob/master/figures/top20words.pdf).
That was a bit interesting but didn't say much to continue. I've also looked for
That was a bit interesting, but didn't say much to continue. I've also looked for
word groups, 2 or 3 up to 6, trying to make sense of the search.
Apart from not giving me any interesting results, I came across with the problem
of duplicated ads, which I just decided to avoid.

Finally after some weeks of leaving it aside. The idea of cleaning the HTML before
Finally, after some weeks of leaving it aside. The idea of cleaning the HTML before
parsing it was what lead to the current implementation. From the HTML code the process is:
1. to replace the end of lines with a period (which is super useful with lists, and
helps down to the road to have sentences).
Expand Down

0 comments on commit 9975109

Please sign in to comment.