Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to exclude identity matches #405

Open
silenceOfTheLambda opened this issue Sep 2, 2024 · 0 comments
Open

Add option to exclude identity matches #405

silenceOfTheLambda opened this issue Sep 2, 2024 · 0 comments

Comments

@silenceOfTheLambda
Copy link

silenceOfTheLambda commented Sep 2, 2024

It might be great to have an argument in functions to exclude identity matches/scores from the returned output, sth. like include_identity=False, or only_fuzzy=True. Commonly, we want to use fuzzy matching for fuzzy (rather than exact) matching, since the latter can be done via equality checks.

A use case is the fuzzy-matching of a list of strings with itself. Suppose for each of the input strings we want to find the best-matching string other than itself. To do that, currently, one has to remove the (single) input string from the list of strings from which to extract the matching string before calling extract(). But with multi-input-calls to extract() (see #188) that is not possible anymore. Or, if the input string occurs only once among the choices list, one must take the second-best match returned by extract(). Or, if one is interested in the entire similarity matrix, one needs to set the elements corresponding to identity matches to some number < 0 before e.g. applying np.(arg)max to find the (index of) the maximum similarity score.

Having a dedicated argument that takes care of excluding identity matches under the hood of the process module functions may help improve convenience/user-friendlyness :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant