Skip to content

Latest commit

 

History

History

levenshtein-distance

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Levenshtein Distance

The Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.

Definition

Mathematically, the Levenshtein distance between two strings a and b (of length |a| and |b| respectively) is given by Levenshtein where

Levenshtein

where Levenshtein is the indicator function equal to 0 when Levenshtein and equal to 1 otherwise, and Levenshtein is the distance between the first i characters of a and the first j characters of b.

Note that the first element in the minimum corresponds to deletion (from a to b), the second to insertion and the third to match or mismatch, depending on whether the respective symbols are the same.

Example

For example, the Levenshtein distance between kitten and sitting is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:

  1. kitten → sitten (substitution of "s" for "k")
  2. sitten → sittin (substitution of "i" for "e")
  3. sittin → sitting (insertion of "g" at the end).

References