This Text Retrieval system indexes files located in src/main/resources/data/
and allows you to search through
them using a single query. This performs the Porter2 Stemming
Algorithm on each word in the files and in the input
to group like words (such as generous and generosity).
- Put .txt files into
src/main/resources/data/
that you would like to search through - Add or remove words from
src/main/java/process/stoplist.txt
to have them ignored. Stop words do not contribute to the cosine normalization. - Run
src/main/java/index/Invert.java
to index the files - Run one of these files
- Run
src/main/java/search/Driver.java
to run a normal query - Run
src/main/java/search/VSMTester.java
to perform cosine normalization and be returned the top 1000 documents
- Run
- Your query's results should be saved in the top level directory
- NOT:
NOT x
returns all documents that do not containx
- AND:
x AND y
returns all documents that containx
andy
- OR:
x OR y
returns all documents that containx
,y
or both