Skip to content

Latest commit

 

History

History
36 lines (24 loc) · 1.07 KB

README.md

File metadata and controls

36 lines (24 loc) · 1.07 KB

C++ Search Engine

Why?

  • Learn C++ (this was my first C++ project)
  • Learn how search engines work

Setup

Prerequisites

  • CMake
  • A C++ compiler (MSVC, GCC, MINGW, etc.)

Building

git clone https://github.com/bensengupta/search
cd search
cmake .
# Run the executable: ./search <title file> <query>
./search titles_100k.txt France

titles_100k.txt is the first 100K Wikipedia page titles extracted from enwiki-latest-all-titles-in-ns0.gz.

TODO

  • Indexing documents should also remove previous documents with same ID from index
    • Removing documents by searching through entire index for ID
    • or pop old document with same ID from storage, find what words it contains and only search & remove in those indices

References