This repository has been archived by the owner on Oct 3, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 9
Fuzzy tag match #59
Labels
Comments
Accounting for spelling mistakes would lead to too much noise. Use substring matching. |
@bakape I would recommend doing research on String metrics https://en.wikipedia.org/wiki/String_metric and that there are many algorithms that account for spelling mistakes... but then again a simpler way would be to use phonetic encoding https://en.wikipedia.org/wiki/Phonetic_encoding which reduces complexity (assuming you know what most tags look like phonetically) |
Thanks for the suggestions, but I only intend to use the facilities
available in the database system. Whatever I'd pick would also have to be
indexed off of the tags in the DB. Substring matching fits this use case.
…On Mon, 11 Mar 2019, 20:08 Donald Tsang, ***@***.***> wrote:
@bakape <https://github.com/bakape> I would recommend doing research on
String metrics https://en.wikipedia.org/wiki/String_metric and that there
are many algorithms that account for spelling mistakes... but then again a
simpler way would be to use phonetic encoding
https://en.wikipedia.org/wiki/Phonetic_encoding which reduces complexity
(assuming you know what most tags look like phonetically)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#59 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AHfPsDEXU_aMQYr8fgqdH2N9RLsgBoEyks5vVpuwgaJpZM4ZZ6mB>
.
|
@bakape in this case, to avoid adding string metric functions, phonetic-encoded substrings would be useful, all that is required is to add an extra column in the tag database to include a phonetic encoding. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Matching tags with similar pronunciation or spelling
Similar to https://gitgud.io/Dizmal/borehole
The text was updated successfully, but these errors were encountered: