Skip to content

SnehaaDurairaj/Store-Sales-Prediction

Repository files navigation

Store Sales Prediction

Abstract:
Accurate store sales prediction is essential for retailers, influencing inventory management, resource allocation,
and overall business success. This project employs machine learning techniques to predict future sales for a retail
chain in a South American country, utilizing historical sales data to develop a predictive model.

1. Introduction:
Predicting store sales accurately remains a significant challenge for retailers, impacting stocked and understocked 
items from a consumer perspective and overall business operations from a seller's perspective. The objective of this 
project is to identify the most accurate sales forecasting method to optimize inventory management, staffing 
decisions, and targeted promotions.

2. Literature Review:
The literature review highlights the effectiveness of various machine learning algorithms for sales prediction, 
including studies on the impact of feature correlation, the use of XGBoost for disaggregated demand forecasting, 
and the comparison of different algorithms like Gradient Boosting Machines (GBM) and Random Forests (RF).

3. Methodology:

  3.1 Data Source:
  A comprehensive dataset from the Kaggle website includes three tables: Stores, Sales, and Transactions, providing 
  detailed information on store metadata, daily sales transactions, and total transactions.

  3.2 Data Cleaning and Preparation:
  Data cleaning addressed missing values and merged the three tables into a single dataframe for a unified view of 
  sales patterns. The data was transformed through feature engineering, including merging similar categories and 
  splitting date columns.

  3.3 Feature Selection:
  Eleven features were initially identified for analysis, but due to dataset limitations, traditional feature 
  selection techniques were not feasible. All features were used as inputs.
  
  3.4 Data Visualization:
  Data visualization involved univariate and bivariate analysis to understand variable distributions and 
  relationships, identifying key factors influencing sales.
  
  3.5 Model Building:
  The project employed both non-parametric (Decision Trees, Random Forests, Gradient Boosting, AdaBoost) and 
  parametric (Support Vector Regression) regression techniques to build predictive models.

4. Results:
The models were evaluated based on Root Mean Squared Error (RMSE) and Explained Variance. Random Forest with feature 
selection achieved the best performance, with an RMSE of 0.69 and an Explained Variance of 0.47. Decision Trees 
without feature selection also performed well, while Gradient Boosting, ADA Boosting, and SVM showed lower 
performance.

5. Discussion:
The study found that the Random Forest model, particularly with feature selection, was the most effective in 
predicting store sales. The robustness of Random Forest, even without feature selection, highlighted its potential 
for accurate predictions.

6. Conclusion:
Machine learning techniques, especially Random Forest, demonstrated feasibility for store sales prediction. Accurate 
sales forecasting empowers retailers to optimize operations and profitability.

7. Future Work:
Future improvements include incorporating additional relevant features like weather data, holidays, and promotional 
activities. Leveraging big data tools like Apache Spark for distributed computing could enhance data processing 
and model training efficiency.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published