Skip to content

Latest commit

 

History

History
22 lines (16 loc) · 582 Bytes

README.md

File metadata and controls

22 lines (16 loc) · 582 Bytes

jSimilarity

jSimilarity is a library that implements various similarity measures.

String Character-based Similarities:
Jaro
Jaro-Winker

String Token-based Similarities:
Jaccard
Cosine similarity

Document-based Similarities:
TF-IDF
SoftTFIDF

Useful implemented Utilities
TextDocument
Corpus
BasicTokenizer

JSimilarity mainly focuses on the implementation of tf-idf and also a number of variations are considered (Smooth IDF, Max IDF, Normalized TF, Double Normalization 0.5 etc.)