Appreciation and Curiosity #2

KevinDanikowski · 2021-02-26T17:57:25Z

Just wanted to say that I think this is an amazing package you created. I'm really curious what sources you used to do the pre-processing? I've found various resources which support ever thing you're doing, but I've not found one succinct approach such as this aside from yours.

hhhhhhhhhn · 2021-12-31T02:00:53Z

First of all, sorry for the very late reply, and thanks for the appreciation.

For the pre-processing, stop words are removed, words are stemmed using snowball stemmers, and finally are divided into n-grams. After that, matching n-grams in both texts are clustered together based on their Chebyshev distance, and each cluster is given a score, equivalent to the match length times its density.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Appreciation and Curiosity #2

Appreciation and Curiosity #2

KevinDanikowski commented Feb 26, 2021

hhhhhhhhhn commented Dec 31, 2021

Appreciation and Curiosity #2

Appreciation and Curiosity #2

Comments

KevinDanikowski commented Feb 26, 2021

hhhhhhhhhn commented Dec 31, 2021