E-commerce Transaction Prediction

Not all visits to a website conclude in an eCommerce transaction. Only a relatively small number of total visits to an e-retail site proceeds to make a transaction. As the company needs to maximize the sales and revenue, they need to analyze the website traffic, which helps the company to make the correct business decisions.

The source of the dataset is Kaggle, https://www.kaggle.com/roshansharma/online-shoppers-intention and consists of feature vectors belonging to 12,330 sessions. Each session belongs to a different user in one year so that there is no pattern concerning user profile, special days or season.

Data Preprocessing

Removing missing points
One Hot Encoding the ‘month’ feature.
Label Encoding ‘weekend’ and ‘revenue’ features
Outlier Detection and Removal from numerical features

Feature Selection/Dimension Reduction

SelectKBest
ExtraTreesClassifier
PCA

Comparing the techniques over F1 score for the XGB and MLP models, Extra Trees Classifier gave the highest accuracy in comparison PCA and Select K-Best. Hence, Extra Trees Classifier was used as the Selection criteria for training and test the models being considered.

Data Split and Normalize

Train_test_split to get a balanced split in the ratio of 80:20
Normalizing the data after splitting to prevent data leakage using MinMaxScaler

Prediction Models

Logistic Regression
K-Nearest Neighbors
Random Forest
Boosted Trees – XGBoost
Neural Network – MLP Classifier
Hybrid Model – Voting Classifier

To experiment and determine the combined probability, we fed all the other 5 models implemented to the voting classifier. This probability was determined using the soft voting type functionality provided by the voting classifier, which gets the average of the weighted probabilities of the individual models. Since we can control the weights given to individual models and hence their importance in the final probability determination, we used Bayesian optimization to determine the optimal combination of weights by optimizing it over the F1 score.

Bayesian Optimization

This method is based on building a surrogate probability model in the form of P (Score | Hyperparameters), to get the set of best performing hyper-parameters, by running the optimiser over the iterations with the random values for hyperparameters selected from the range defined.

Model Interpretability and Explicability

Shapley Additive explainations (SHAP)

We implemented SHAP on the XGB model using TreeShap and plotted both the local and global interpretation. The shown figure is the local interpretation of a specific data point. In a classification problem, the output value in the figure is log odds and not the prediction, and the base value is the average of all prediction. In the plot, every feature is an arrow that pushes to increase or decrease the prediction. Feature Values can be seen at the bottom. Since the output value is less than the base value, the model has predicted that an eCommerce transaction did not take place.

Local Interpretable Model Agnostic (LIME)

The figure shows the LIME output of a specific datapoint explaining the prediction and contribution of features. The underlying model predicted 1, a successful eCommerce Transaction, with the feature, PageValue, being the driving force behind it. Prediction probabilities per feature and the corresponding values can be seen in the figure as well. For the given data point, the probability for a successful transaction is .94, whereas only 0.06 probability for no transaction.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Code.ipynb		Code.ipynb
README.md		README.md
online_shoppers_intention.csv		online_shoppers_intention.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

E-commerce Transaction Prediction

Data Preprocessing

Feature Selection/Dimension Reduction

Data Split and Normalize

Prediction Models

Bayesian Optimization

Model Interpretability and Explicability

Shapley Additive explainations (SHAP)

Local Interpretable Model Agnostic (LIME)

About

Uh oh!

Languages

rjagait/E-commerce-Transaction-Prediction

Folders and files

Latest commit

History

Repository files navigation

E-commerce Transaction Prediction

Data Preprocessing

Feature Selection/Dimension Reduction

Data Split and Normalize

Prediction Models

Bayesian Optimization

Model Interpretability and Explicability

Shapley Additive explainations (SHAP)

Local Interpretable Model Agnostic (LIME)

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages