-
Content based filtering is one of the two common techniques of recommender systems. intelligible from the name, it uses the content of the entity (to be recommended) to find other relevant recommendations similar to it. In simpler terms the system finds the keywords or attributes related to the product that the user likes, later uses this information to recommend other products having similar attributes.
-
For a book recommendation system, given a book name the recommender will suggest books that are similar to it. The choice is made considering concise information of the book such as its theme, author, series, and summary of the description.
-
The succinct data of keywords that is provided to the recommender system is generated using NLP techniques such as word embeddings. Keywords that most describe the book are extracted from the book description using BERT-embeddings, this word collection is further reduced using the frequentist feature extraction method TF-IDF that ranks the words based on their frequency in the book and the corpus.
-
Once the numeric vector representation of all the books is generated, each word vector is compared against the other vector and similar vectors (books) are found using cosine similarity.
-
For Nodejs application: input
ssh ubuntu@<Your Public Ip address>
in the command to enter the EC2 node sever. -
For recommendation model running on Sanic: input
ssh ubuntu@<Your Public Ip address>
in the command to enter the another EC2 python sever in order to avoid out of memory error, then inputpm2 start 0
to loadapp.py
.
- Command
sudo apt install python-is-python3 python3-pip
to download the python libraries - Check if there is a file names
requirements.txt
in the root directory - Run the following command
pip install -r requirements.txt
to re-create an environment that app.py needs - Input
python3 app.py
in the terminal to run the python file - Wait for the sanic sever for around 3 minutes until running message is shown, such as
Dataset and similarity matrix loaded successfully
, .etc - Testing can be done in the Insomnia to verify the endpoint is smoothly running(Insomnia 2023.5.7)
- Run
pytest
to undergo unit testing support by unit testing module
- Whenever you encouter out-of-RAM error, you should consider applying the swap technique to increase the the amount of virtual memory available to your applications. Swap is a space on a disk that is used when the amount of physical RAM memory is full. When a Linux system runs out of RAM, inactive pages are moved from the RAM to the swap space.
Details can be referred to the following page that teach you how to activate swapfile step-by-step:
https://www.digitalocean.com/community/tutorials/how-to-add-swap-space-on-ubuntu-20-04