You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It might be great to have an argument in functions to exclude identity matches/scores from the returned output, sth. like include_identity=False, or only_fuzzy=True. Commonly, we want to use fuzzy matching for fuzzy (rather than exact) matching, since the latter can be done via equality checks.
A use case is the fuzzy-matching of a list of strings with itself. Suppose for each of the input strings we want to find the best-matching string other than itself. To do that, currently, one has to remove the (single) input string from the list of strings from which to extract the matching string before calling extract(). But with multi-input-calls to extract() (see #188) that is not possible anymore. Or, if the input string occurs only once among the choices list, one must take the second-best match returned by extract(). Or, if one is interested in the entire similarity matrix, one needs to set the elements corresponding to identity matches to some number < 0 before e.g. applying np.(arg)max to find the (index of) the maximum similarity score.
Having a dedicated argument that takes care of excluding identity matches under the hood of the process module functions may help improve convenience/user-friendlyness :)
The text was updated successfully, but these errors were encountered:
It might be great to have an argument in functions to exclude identity matches/scores from the returned output, sth. like include_identity=False, or only_fuzzy=True. Commonly, we want to use fuzzy matching for fuzzy (rather than exact) matching, since the latter can be done via equality checks.
A use case is the fuzzy-matching of a list of strings with itself. Suppose for each of the input strings we want to find the best-matching string other than itself. To do that, currently, one has to remove the (single) input string from the list of strings from which to extract the matching string before calling
extract()
. But with multi-input-calls toextract()
(see #188) that is not possible anymore. Or, if the input string occurs only once among thechoices
list, one must take the second-best match returned byextract()
. Or, if one is interested in the entire similarity matrix, one needs to set the elements corresponding to identity matches to some number < 0 before e.g. applyingnp.(arg)max
to find the (index of) the maximum similarity score.Having a dedicated argument that takes care of excluding identity matches under the hood of the process module functions may help improve convenience/user-friendlyness :)
The text was updated successfully, but these errors were encountered: