The Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.
Mathematically, the Levenshtein distance between two strings
a
and b
(of length |a|
and |b|
respectively) is given by
where
where
is the indicator function equal to 0
when
and equal to 1 otherwise, and
is the distance between the first i
characters of a
and the first
j
characters of b
.
Note that the first element in the minimum corresponds to
deletion (from a
to b
), the second to insertion and
the third to match or mismatch, depending on whether the
respective symbols are the same.
For example, the Levenshtein distance between kitten
and
sitting
is 3
, since the following three edits change one
into the other, and there is no way to do it with fewer than
three edits:
- kitten → sitten (substitution of "s" for "k")
- sitten → sittin (substitution of "i" for "e")
- sittin → sitting (insertion of "g" at the end).