Description
Hi, many thanks for this amazing script!
I tested your "PHP Rapid Automatic Keyword Extraction
" example (shown here https://github.com/yooper/php-text-analysis/wiki/PHP-Rapid-Automatic-Keyword-Extraction) and noticed that there are issues with special chars like the German Umlauts.
I tested it with the German stop word list ("stop-words_german_1_de.txt").
It listed [verst rkte] => 8
as a keyword/score (n-gram = 2), which should be [verstärkte] => 8
and seems to interpret all words that contain a German Umlauts as multiple words in all cases by replacing each German Umlaut by a space " ", see the aforementioned example verst
and rkte
instead of "verstärkte
".
Is there any way to fix this? I tried to convert input text to UTF-8 w/o any impact on this issue.