Skip to content

Developed a Search Engine for both phrase and free text queries on Fars persian news using concepts such as TF-IDF,inverted index, champion list.

Notifications You must be signed in to change notification settings

rojinakashefi/InformationRetrieval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

InformationRetrieval

Instructor: Dr. A. Nikabadi

Course content: CS276 Standford University

Semester: Fall 2022

This project is for Information Retrieval course which aims to implement a search engine for both phrase queries and Free text queries on Fars News Dataset.

First Phase

  1. Preprocessing on data (Noramlization, Tokenization, Stemming, Removing Stopwords)

  2. Working with both most used NLP persian toolkits : hazm, parsivar

  3. Created a positional inverted index

  4. Used Zipf's law

zip2.png

  1. Used Heaps law

zip2.png

  1. Searching by Normal quries, Phrase Queries (used permuterm index), Boolean queries

  2. Ranking results

Second phase

  1. Show words in vector representation

  2. Compute tf-idf

  3. Compute cosine similarity between query terms and documents

  4. Used Index elimination techniques such as creating champion list

  5. Rank results based on most relevent results

phase2.png


Contributors : Rojina kashefi & Leili Barekatein

About

Developed a Search Engine for both phrase and free text queries on Fars persian news using concepts such as TF-IDF,inverted index, champion list.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published