Keywords Search Engine

A mini search engine implemented using the C programming language.

The implementation of this project was part of the coursework for the National and Kapodistrian University of Athens, Department of Informatics and Telecommunications Systems Programming undergraduate course.

What is it?

The main functionality of the program is that it acts as a search engine that responds to keyword queries from a single user. These keywords are then traced back to a series of texts located inside a document that the user must have provided before the start of the program's execution. Whenever a word is found inside one text, that entire text is returned and displayed on the user's screen alongside a unique number that identifies it and its relevance score.

The program's operations also include calculating a document's frequence vector and a word's term frequency.

How does it work?

At the beginning of its execution, the program reads a series of texts from a user-provided document and stores them in memory. Specifically, it places them in a dynamic array, that's created at runtime, along with an ID and their word count. Afterward, an inverted index is implemented using a Trie data structure, in which every word that has been encountered is stored.

All nodes of the above tree structure hold a single character that might be the last letter of a word. If so, there's a pointer variable that leads to a linked list, where each node holds the ID of a text that the word appears in and the number of its occurrences in that text.

Whenever a user searches for a certain keyword, the program traverses the trie structure looking for it. If the keyword is found, the program moves on to the linked list and stores the IDs of the texts in descending order in a new list, based on the word's relevance score.

Once all the keywords have been searched, the top K results are displayed on the user's screen, where K is a user-defined integer given at the start of the program.

What technologies were used?

C
Inverted index
Trie
Linked list

How do I run it?

Before we start running the program, we need to compile all its files. For that purpose a makefile has been created and we execute it by typing the following on the command prompt:

make

Now we're ready to begin. To start the program we can use either one of the following commands:

./minisearch -i docfile -k K

./minisearch -k K -i docfile

./minisearch -i docfile

where docfile is the name of the file that holds the texts we'll be using and K the number of the top results that we want to be displayed on the screen.

We are then greeted by a message that informs us of the operations that the program is capable of.

Searching for keywords

We can search for certain keywords in the provided texts by using the /search operation. To do that we type:

/search q1 q2 q3 q4 ... q4

Keep in mind that:

at least one keyword must be given at all times,
if more than 10 keywords are given, only the first 10 will be searched for.

The search results will look something like this:

where the number enclosed in parentheses is the ID of the text and the number enclosed in square brackets is the relevance score.

If no results are found, nothing will be displayed on screen:

Finding the document frequency vector

We can use the /df operation to find the document frequency vector of either all words or a single word by typing

/df

or

/df q1

If the word doesn't appear in any of the given texts, we get the following:

Finding the term frequency of a word

To find the term frequency of a word in a certain text, we use the /tf operation by typing:

/tf id word

where id refers to that certain text.

Exiting the program

To terminate the program we type:

/exit

(*) We can also use the make clean command to remove all of the object files and the executable that were created.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
Images		Images
Original		Original
workloads		workloads
.gitignore		.gitignore
EffrosyniPanteliadi-project1.tar.gz		EffrosyniPanteliadi-project1.tar.gz
Makefile		Makefile
README.md		README.md
README.pdf		README.pdf
Test.txt		Test.txt
bigDataset.txt		bigDataset.txt
docfile		docfile
functions.c		functions.c
functions.h		functions.h
main.c		main.c
project1-S18-K24.pdf		project1-S18-K24.pdf
smallDataset.txt		smallDataset.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Keywords Search Engine

What is it?

How does it work?

What technologies were used?

How do I run it?

Searching for keywords

Finding the document frequency vector

Finding the term frequency of a word

Exiting the program

About

Releases

Packages

Languages

frinipanteliadi/Keywords-Search-Engine

Folders and files

Latest commit

History

Repository files navigation

Keywords Search Engine

What is it?

How does it work?

What technologies were used?

How do I run it?

Searching for keywords

Finding the document frequency vector

Finding the term frequency of a word

Exiting the program

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages