Skip to content

Latest commit

 

History

History
47 lines (24 loc) · 1.58 KB

README.md

File metadata and controls

47 lines (24 loc) · 1.58 KB

InformationRetrieval

Instructor: Dr. A. Nikabadi

Course content: CS276 Standford University

Semester: Fall 2022

This project is for Information Retrieval course which aims to implement a search engine for both phrase queries and Free text queries on Fars News Dataset.

First Phase

  1. Preprocessing on data (Noramlization, Tokenization, Stemming, Removing Stopwords)

  2. Working with both most used NLP persian toolkits : hazm, parsivar

  3. Created a positional inverted index

  4. Used Zipf's law

zip2.png

  1. Used Heaps law

zip2.png

  1. Searching by Normal quries, Phrase Queries (used permuterm index), Boolean queries

  2. Ranking results

Second phase

  1. Show words in vector representation

  2. Compute tf-idf

  3. Compute cosine similarity between query terms and documents

  4. Used Index elimination techniques such as creating champion list

  5. Rank results based on most relevent results

phase2.png


Contributors : Rojina kashefi & Leili Barekatein