Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perform unicode normalization #319

Open
maxbachmann opened this issue Apr 20, 2023 · 0 comments
Open

perform unicode normalization #319

maxbachmann opened this issue Apr 20, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@maxbachmann
Copy link
Member

The matching results could be improved by using unicode normalization on them. This should be a processor function, since users might be interested in the distance without normalization. In addition it would be weird if Levenshtein.distance(s1, s2) differs from len(Levenshtein.editops(s1, s2)). At the same time it is not possible to use the normalization for Levenshtein.editops, since the editops need to map to a specific character in the source.

It would probably make sense to update utils.default_process to normalize strings as well.

@maxbachmann maxbachmann added the enhancement New feature or request label Apr 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant