Skip to content

This project aims to find the most similar data in news categories using different similarity algorithms.

Notifications You must be signed in to change notification settings

siddharthsky/document-similarity-algorithms-NL-p

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Finding Similar Data in News Categories

made-with-python contributions welcome

AboutDatasetSimilarity Algorithms

About

This project aims to find the most similar data in news categories using four different similarity algorithms.

In the similarity algorithm implementation part, four different similarity algorithms are used to find the most similar data in news categories. These algorithms include Cosine Similarity, Jaccard Similarity, Euclidean Distance, and Manhattan Distance.

Dataset

The dataset used in this project is the News Category Dataset, which can be found on Kaggle at https://www.kaggle.com/rmisra/news-category-dataset. This dataset contains news articles from various categories, including business, entertainment, politics, sports, and technology.

Similarity Algorithms

Four different similarity algorithms are used in this project to find the most similar data in news categories:

  • Cosine Similarity
  • Jaccard Similarity
  • Euclidean Distance
  • Manhattan Distance

About

This project aims to find the most similar data in news categories using different similarity algorithms.

Resources

Stars

Watchers

Forks