Skip to content

tEchhigh4U/Online-Book-Reader-Model

Repository files navigation

Online-Book-Reader-Search-Engine

Content-Based Filtering: NLP Based Book Recommender Using BERT-Embeddings

  • Content based filtering is one of the two common techniques of recommender systems. intelligible from the name, it uses the content of the entity (to be recommended) to find other relevant recommendations similar to it. In simpler terms the system finds the keywords or attributes related to the product that the user likes, later uses this information to recommend other products having similar attributes.

  • For a book recommendation system, given a book name the recommender will suggest books that are similar to it. The choice is made considering concise information of the book such as its theme, author, series, and summary of the description.

Principal of Recommendation Engine

  • The succinct data of keywords that is provided to the recommender system is generated using NLP techniques such as word embeddings. Keywords that most describe the book are extracted from the book description using BERT-embeddings, this word collection is further reduced using the frequentist feature extraction method TF-IDF that ranks the words based on their frequency in the book and the corpus.

  • Once the numeric vector representation of all the books is generated, each word vector is compared against the other vector and similar vectors (books) are found using cosine similarity.

Environmental Setup in 2 EC2 Severs

  1. For Nodejs application: input ssh ubuntu@<Your Public Ip address> in the command to enter the EC2 node sever.

  2. For recommendation model running on Sanic: input ssh ubuntu@<Your Public Ip address> in the command to enter the another EC2 python sever in order to avoid out of memory error, then input pm2 start 0 to load app.py.

Testing Steps in the Sanic Sever

  1. Command sudo apt install python-is-python3 python3-pip to download the python libraries
  2. Check if there is a file names requirements.txt in the root directory
  3. Run the following command pip install -r requirements.txt to re-create an environment that app.py needs
  4. Input python3 app.py in the terminal to run the python file
  5. Wait for the sanic sever for around 3 minutes until running message is shown, such as Dataset and similarity matrix loaded successfully, .etc
  6. Testing can be done in the Insomnia to verify the endpoint is smoothly running(Insomnia 2023.5.7)
  7. Run pytest to undergo unit testing support by unit testing module

Reference

  1. Whenever you encouter out-of-RAM error, you should consider applying the swap technique to increase the the amount of virtual memory available to your applications. Swap is a space on a disk that is used when the amount of physical RAM memory is full. When a Linux system runs out of RAM, inactive pages are moved from the RAM to the swap space. Details can be referred to the following page that teach you how to activate swapfile step-by-step: https://www.digitalocean.com/community/tutorials/how-to-add-swap-space-on-ubuntu-20-04

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published