This Text Retrieval system indexes files located in src/main/resources/data/ and allows you to search through
them using a single query. This performs the Porter2 Stemming
Algorithm on each word in the files and in the input
to group like words (such as generous and generosity).
- Put .txt files into
src/main/resources/data/that you would like to search through - Add or remove words from
src/main/java/process/stoplist.txtto have them ignored. Stop words do not contribute to the cosine normalization. - Run
src/main/java/index/Invert.javato index the files - Run one of these files
- Run
src/main/java/search/Driver.javato run a normal query - Run
src/main/java/search/VSMTester.javato perform cosine normalization and be returned the top 1000 documents
- Run
- Your query's results should be saved in the top level directory
- NOT:
NOT xreturns all documents that do not containx - AND:
x AND yreturns all documents that containxandy - OR:
x OR yreturns all documents that containx,yor both