Skip to content

Using a hybrid model to predict the e-commerce transaction probability. Shap and Lime to add model explainability

Notifications You must be signed in to change notification settings

rjagait/E-commerce-Transaction-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

E-commerce Transaction Prediction

Not all visits to a website conclude in an eCommerce transaction. Only a relatively small number of total visits to an e-retail site proceeds to make a transaction. As the company needs to maximize the sales and revenue, they need to analyze the website traffic, which helps the company to make the correct business decisions.

The source of the dataset is Kaggle, https://www.kaggle.com/roshansharma/online-shoppers-intention and consists of feature vectors belonging to 12,330 sessions. Each session belongs to a different user in one year so that there is no pattern concerning user profile, special days or season.

Project Flow

Data Preprocessing

  1. Removing missing points
  2. One Hot Encoding the ‘month’ feature.
  3. Label Encoding ‘weekend’ and ‘revenue’ features
  4. Outlier Detection and Removal from numerical features

Feature Selection/Dimension Reduction

  1. SelectKBest
  2. ExtraTreesClassifier
  3. PCA

comparison

Comparing the techniques over F1 score for the XGB and MLP models, Extra Trees Classifier gave the highest accuracy in comparison PCA and Select K-Best. Hence, Extra Trees Classifier was used as the Selection criteria for training and test the models being considered.

Data Split and Normalize

  1. Train_test_split to get a balanced split in the ratio of 80:20
  2. Normalizing the data after splitting to prevent data leakage using MinMaxScaler

Prediction Models

  1. Logistic Regression
  2. K-Nearest Neighbors
  3. Random Forest
  4. Boosted Trees – XGBoost
  5. Neural Network – MLP Classifier
  6. Hybrid Model – Voting Classifier

voting classifier

To experiment and determine the combined probability, we fed all the other 5 models implemented to the voting classifier. This probability was determined using the soft voting type functionality provided by the voting classifier, which gets the average of the weighted probabilities of the individual models. Since we can control the weights given to individual models and hence their importance in the final probability determination, we used Bayesian optimization to determine the optimal combination of weights by optimizing it over the F1 score.

Bayesian Optimization

Bayesian

This method is based on building a surrogate probability model in the form of P (Score | Hyperparameters), to get the set of best performing hyper-parameters, by running the optimiser over the iterations with the random values for hyperparameters selected from the range defined.

Model Interpretability and Explicability

Shapley Additive explainations (SHAP)

SHAP We implemented SHAP on the XGB model using TreeShap and plotted both the local and global interpretation. The shown figure is the local interpretation of a specific data point. In a classification problem, the output value in the figure is log odds and not the prediction, and the base value is the average of all prediction. In the plot, every feature is an arrow that pushes to increase or decrease the prediction. Feature Values can be seen at the bottom. Since the output value is less than the base value, the model has predicted that an eCommerce transaction did not take place.

Local Interpretable Model Agnostic (LIME)

Lime

The figure shows the LIME output of a specific datapoint explaining the prediction and contribution of features. The underlying model predicted 1, a successful eCommerce Transaction, with the feature, PageValue, being the driving force behind it. Prediction probabilities per feature and the corresponding values can be seen in the figure as well. For the given data point, the probability for a successful transaction is .94, whereas only 0.06 probability for no transaction.

About

Using a hybrid model to predict the e-commerce transaction probability. Shap and Lime to add model explainability

Resources

Stars

Watchers

Forks