You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tested it with the German stop word list ("stop-words_german_1_de.txt").
It listed [verst rkte] => 8 as a keyword/score (n-gram = 2), which should be [verstärkte] => 8 and seems to interpret all words that contain a German Umlauts as multiple words in all cases by replacing each German Umlaut by a space " ", see the aforementioned example verst and rkte instead of "verstärkte".
Is there any way to fix this? I tried to convert input text to UTF-8 w/o any impact on this issue.
The text was updated successfully, but these errors were encountered:
because it preserves language specific characters of European languages like German Umlauts (ä, ö, ü), Spanish characters (á, é, í, ñ), French characters (é, è, ê, ç) etc.
Hi, many thanks for this amazing script!
I tested your "
PHP Rapid Automatic Keyword Extraction
" example (shown here https://github.com/yooper/php-text-analysis/wiki/PHP-Rapid-Automatic-Keyword-Extraction) and noticed that there are issues with special chars like the German Umlauts.I tested it with the German stop word list ("stop-words_german_1_de.txt").
It listed
[verst rkte] => 8
as a keyword/score (n-gram = 2), which should be[verstärkte] => 8
and seems to interpret all words that contain a German Umlauts as multiple words in all cases by replacing each German Umlaut by a space " ", see the aforementioned exampleverst
andrkte
instead of "verstärkte
".Is there any way to fix this? I tried to convert input text to UTF-8 w/o any impact on this issue.
The text was updated successfully, but these errors were encountered: