Deep Learning-Based Recommendation System

Project Goal

This project implements and compares three different deep learning-based recommendation systems (DeepCoNN, NRCMA, and HSACN) that utilize the natural language of user reviews from the Amazon User Reviews dataset. The primary goal is to improve rating predictions and provide relevant product recommendations by leveraging the insights contained in review text. The system predicts user ratings for products and uses these predictions to rank relevant products based on a user's search query and past review history.

Tech Stack

Approach and Methodology

Models Implemented:

DeepCoNN (Deep Co-Operative Neural Networks): This model uses two parallel convolutional neural networks (CNNs), one for user reviews and one for item reviews. It extracts semantic features from review text using convolutions and max-pooling, then uses a Factorization Machine (FM) layer to model the interaction between user and item latent representations for rating prediction.
NRCMA (Neural Recommendation with Cross-Modality Mutual Attention): NRCMA improves upon the two-tower model by introducing cross-modality mutual attention mechanisms at both word and review levels. This allows the user and item encoders to exchange information, focusing on the most relevant words and reviews for a given user-item interaction. Embeddings are generated using pre-trained GloVe vectors and processed through CNNs before attention layers. The final prediction is made using an FM layer.
HSACN (Hierarchical Self-attentive Convolution Network): HSACN uses a hierarchical approach, encoding words into sentences, sentences into reviews, and reviews into final user/item representations. It combines CNNs for local feature extraction and self-attention mechanisms for aggregation at different levels (sentence, review, entity). This structure allows the model to weigh different parts of the text based on their importance.

Data Processing Pipeline:

Raw review data filtered to include only unique user-item pairs.
Reviews embedded using pretrained word embeddings (Google-Word2Vec-300).
Data structured separately for each model’s specific requirements.

Training and Evaluation:

Models trained on Nvidia GPUs (V100-SXM2, T4) and Apple silicon (M1, M2).
Evaluated using Mean Squared Error (MSE).

How to Run the Code

Step-by-step Guide

Clone the Repository

git clone https://github.com/your-repo/deep-rec-sys-amazon-reviews.git
cd deep-rec-sys-amazon-reviews

Set Up the Environment

Ensure Python 3.12 is installed.
Create a virtual environment and install dependencies:

python -m venv venv
source venv/bin/activate
pip install -r pyproject.toml

Data Preparation

Obtain the Amazon Reviews dataset (Appliances category) from HuggingFace.
Preprocess and filter the data using scripts in the utils folder:

python utils/data_loading.py

Model Training

Train DeepCoNN, NRCMA, and HSACN models:

python -m modeling.DeepCoNN_train
python -m modeling.NRCMA_train
python -m modeling.HSACN_train

Generate Embeddings

Generate and store embeddings:

python inference/generate_embeddings.py

Run the Application

Launch the Streamlit app:

streamlit run app.py

Requirements and Dependencies

Python 3.12
PyTorch
numpy, pandas, nltk, datasets, gensim, surprise, matplotlib, torchinfo, tqdm, wandb
Detailed dependencies available in pyproject.toml

Results and Outputs

Best Performing Model: NRCMA, demonstrating improved user-item interaction modeling with cross-attention, achieving the lowest MSE loss of 1.57.
Evaluation results stored and visualized using Weights and Biases (W&B).

Limitations

Evaluation limited to the 'Appliances' category.
Does not explicitly handle the cold-start problem for new users/items (< 2 reviews).
Computational cost, particularly for HSACN, might limit scalability.

Further Development

Exploring additional architectures and enhancing scalability.
Integration of richer metadata for improved recommendation accuracy.

For further details, please refer to the complete Final Project Report.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
assets/images		assets/images
config		config
eda		eda
inference		inference
modeling		modeling
reports		reports
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deep Learning-Based Recommendation System

Project Goal

Tech Stack

Approach and Methodology

Models Implemented:

Data Processing Pipeline:

Training and Evaluation:

How to Run the Code

Step-by-step Guide

Requirements and Dependencies

Results and Outputs

Limitations

Further Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

vivek-rd/deep-rec-sys-amazon-reviews

Folders and files

Latest commit

History

Repository files navigation

Deep Learning-Based Recommendation System

Project Goal

Tech Stack

Approach and Methodology

Models Implemented:

Data Processing Pipeline:

Training and Evaluation:

How to Run the Code

Step-by-step Guide

Requirements and Dependencies

Results and Outputs

Limitations

Further Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages