Issue with German Umlauts using "PHP Rapid Automatic Keyword Extraction"

Hi, many thanks for this amazing script!

I tested your "`PHP Rapid Automatic Keyword Extraction`" example (shown here https://github.com/yooper/php-text-analysis/wiki/PHP-Rapid-Automatic-Keyword-Extraction) and noticed that there are issues with special chars like the German Umlauts. 

I tested it with the German stop word list ("stop-words_german_1_de.txt"). 

It listed `[verst rkte] => 8` as a keyword/score (n-gram = 2), which should be `[verstärkte] => 8` and seems to interpret all words that contain a German Umlauts as multiple words in all cases by replacing each German Umlaut by a space " ",  see the aforementioned example `verst` and `rkte` instead of "`verstärkte`".

Is there any way to fix this? I tried to convert input text to UTF-8 w/o any impact on this issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue with German Umlauts using "PHP Rapid Automatic Keyword Extraction" #78

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue with German Umlauts using "PHP Rapid Automatic Keyword Extraction" #78

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions