Skip to content

Command line search engine developed using Java implementing advanced data structures and algorithms, web crawling and local file management and searching techniques to analyze prices of rental websites

Notifications You must be signed in to change notification settings

Ramish-Amir/RentalRadar

Repository files navigation

RentalRadar

Concepts Used:

  1. Sorting (Merge Sort)
  2. Ternary Search Trie
  3. Hash Maps
  4. Text Processing (JSoup, String Functions)
  5. Memory Management (Caching)

Flow of Execution of the Search Engine:

  1. Use of Java web crawler to crawl the web and recursively retreive around 1500 URLs from 3 different rental websites.
  2. Each URL is parsed to a text file using JSoup.
  3. Stop words are removed from the Search String given by the user.
  4. String is converted to token using Java String Tokenizer.
  5. All URLs are indexed into a Hash Map.
  6. TST is generated for each text file and frequency of keywords are extracted.
  7. To implement page ranking, frequency of these words along with the URL index are stored in the Hash Map.
  8. The page ranking Hash Map is sorted in decreasing order of frequency words.
  9. Page ranking Hash Map is stored in memory to implement cache and drastically improve search time.

About

Command line search engine developed using Java implementing advanced data structures and algorithms, web crawling and local file management and searching techniques to analyze prices of rental websites

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages