Single-character words in inputs get ignored with White Similarity #23

ranierorusso · 2019-10-29T14:59:37Z

The following returns an exact match:

irb(main):164:0> Text::WhiteSimilarity.new.similarity("John F Kennedy", "John A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Kennedy")
1.0

I would not expect it to match exactly.

This returns NaN:

irb(main):165:0> Text::WhiteSimilarity.new.similarity("C J", "C J")
NaN

I am expecting this to return a 1.0.

The issue is in module Text, in class WhiteSimilarity, in private method word_letter_pairs which always expects the words that are parsed from input string argument to be at least two characters long.

An example of a refactor for this method would be to check for single-character length words and handle them differently:

  def word_letter_pairs(str)
    @word_letter_pairs[str] ||=
      str.upcase.split(/\s+/).map{ |word|
        if word.length == 1
          [word]
        else
          (0 ... (word.length - 1)).map { |i| word[i, 2] }
        end
      }.flatten.freeze
  end

I am using version 1.3.1

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single-character words in inputs get ignored with White Similarity #23

Single-character words in inputs get ignored with White Similarity #23

ranierorusso commented Oct 29, 2019

Single-character words in inputs get ignored with White Similarity #23

Single-character words in inputs get ignored with White Similarity #23

Comments

ranierorusso commented Oct 29, 2019