Skip to content

Rank based information retrieval system. Ranking done based on Tf-Idf scores of documents and queries

License

Notifications You must be signed in to change notification settings

Ojas1804/Rank-Retrieval-Model

Repository files navigation

Create Folders: For program to work, you need to create folders Dataset, Indexes, and tfidf_weights else it would throw an error. Store dataset in Dataset folder. Program will find posting lists and store it in Indexes folder. tfidf_weights folder will store tfidf weights of each document.

This assignment has 7 python files and 4 folders. The 7 python files are:

  • CosineSimilarity.py: Calculated cosine similarity between document and query.
  • Lemmatizer.py: Lemmatizes the words in the document.
  • main.py: Main program to run the program.
  • PostingList.py: Creates posting list for each word in the document.
  • PreprocessQuery.py: Preprocesses the query.
  • Stopwords.py: Removes stopwords from the document.
  • TfIdf.py: Calculates tfidf weights for each document and query.

To test this assignment, run the main.py file. It will ask for query and will return top 10 documents with highest cosine similarity.

About

Rank based information retrieval system. Ranking done based on Tf-Idf scores of documents and queries

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages