Skip to content

vivek-rd/deep-rec-sys-amazon-reviews

Repository files navigation

Deep Learning-Based Recommendation System

Project Goal

This project implements and compares three different deep learning-based recommendation systems (DeepCoNN, NRCMA, and HSACN) that utilize the natural language of user reviews from the Amazon User Reviews dataset. The primary goal is to improve rating predictions and provide relevant product recommendations by leveraging the insights contained in review text. The system predicts user ratings for products and uses these predictions to rank relevant products based on a user's search query and past review history.

Tech Stack

Python PyTorch Weights & Biases
NumPy Pandas NLTK HuggingFace Datasets Gensim Surprise Matplotlib Torchinfo TQDM

Approach and Methodology

Models Implemented:

  1. DeepCoNN (Deep Co-Operative Neural Networks): This model uses two parallel convolutional neural networks (CNNs), one for user reviews and one for item reviews. It extracts semantic features from review text using convolutions and max-pooling, then uses a Factorization Machine (FM) layer to model the interaction between user and item latent representations for rating prediction.
  2. NRCMA (Neural Recommendation with Cross-Modality Mutual Attention): NRCMA improves upon the two-tower model by introducing cross-modality mutual attention mechanisms at both word and review levels. This allows the user and item encoders to exchange information, focusing on the most relevant words and reviews for a given user-item interaction. Embeddings are generated using pre-trained GloVe vectors and processed through CNNs before attention layers. The final prediction is made using an FM layer.
  3. HSACN (Hierarchical Self-attentive Convolution Network): HSACN uses a hierarchical approach, encoding words into sentences, sentences into reviews, and reviews into final user/item representations. It combines CNNs for local feature extraction and self-attention mechanisms for aggregation at different levels (sentence, review, entity). This structure allows the model to weigh different parts of the text based on their importance.

Data Processing Pipeline:

  • Raw review data filtered to include only unique user-item pairs.
  • Reviews embedded using pretrained word embeddings (Google-Word2Vec-300).
  • Data structured separately for each model’s specific requirements.

Training and Evaluation:

  • Models trained on Nvidia GPUs (V100-SXM2, T4) and Apple silicon (M1, M2).
  • Evaluated using Mean Squared Error (MSE).

How to Run the Code

Step-by-step Guide

  1. Clone the Repository
git clone https://github.com/your-repo/deep-rec-sys-amazon-reviews.git
cd deep-rec-sys-amazon-reviews
  1. Set Up the Environment
  • Ensure Python 3.12 is installed.
  • Create a virtual environment and install dependencies:
python -m venv venv
source venv/bin/activate
pip install -r pyproject.toml
  1. Data Preparation
  • Obtain the Amazon Reviews dataset (Appliances category) from HuggingFace.
  • Preprocess and filter the data using scripts in the utils folder:
python utils/data_loading.py
  1. Model Training
  • Train DeepCoNN, NRCMA, and HSACN models:
python -m modeling.DeepCoNN_train
python -m modeling.NRCMA_train
python -m modeling.HSACN_train
  1. Generate Embeddings
  • Generate and store embeddings:
python inference/generate_embeddings.py
  1. Run the Application
  • Launch the Streamlit app:
streamlit run app.py

Requirements and Dependencies

  • Python 3.12
  • PyTorch
  • numpy, pandas, nltk, datasets, gensim, surprise, matplotlib, torchinfo, tqdm, wandb
  • Detailed dependencies available in pyproject.toml

Results and Outputs

  • Best Performing Model: NRCMA, demonstrating improved user-item interaction modeling with cross-attention, achieving the lowest MSE loss of 1.57.
  • Evaluation results stored and visualized using Weights and Biases (W&B).

Limitations

  • Evaluation limited to the 'Appliances' category.
  • Does not explicitly handle the cold-start problem for new users/items (< 2 reviews).
  • Computational cost, particularly for HSACN, might limit scalability.

Further Development

  • Exploring additional architectures and enhancing scalability.
  • Integration of richer metadata for improved recommendation accuracy.

For further details, please refer to the complete Final Project Report.

About

Deep Learning based Recommendation System on Amazon user reviews

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages