Store Sales Prediction
Abstract:
Accurate store sales prediction is essential for retailers, influencing inventory management, resource allocation,
and overall business success. This project employs machine learning techniques to predict future sales for a retail
chain in a South American country, utilizing historical sales data to develop a predictive model.
1. Introduction:
Predicting store sales accurately remains a significant challenge for retailers, impacting stocked and understocked
items from a consumer perspective and overall business operations from a seller's perspective. The objective of this
project is to identify the most accurate sales forecasting method to optimize inventory management, staffing
decisions, and targeted promotions.
2. Literature Review:
The literature review highlights the effectiveness of various machine learning algorithms for sales prediction,
including studies on the impact of feature correlation, the use of XGBoost for disaggregated demand forecasting,
and the comparison of different algorithms like Gradient Boosting Machines (GBM) and Random Forests (RF).
3. Methodology:
3.1 Data Source:
A comprehensive dataset from the Kaggle website includes three tables: Stores, Sales, and Transactions, providing
detailed information on store metadata, daily sales transactions, and total transactions.
3.2 Data Cleaning and Preparation:
Data cleaning addressed missing values and merged the three tables into a single dataframe for a unified view of
sales patterns. The data was transformed through feature engineering, including merging similar categories and
splitting date columns.
3.3 Feature Selection:
Eleven features were initially identified for analysis, but due to dataset limitations, traditional feature
selection techniques were not feasible. All features were used as inputs.
3.4 Data Visualization:
Data visualization involved univariate and bivariate analysis to understand variable distributions and
relationships, identifying key factors influencing sales.
3.5 Model Building:
The project employed both non-parametric (Decision Trees, Random Forests, Gradient Boosting, AdaBoost) and
parametric (Support Vector Regression) regression techniques to build predictive models.
4. Results:
The models were evaluated based on Root Mean Squared Error (RMSE) and Explained Variance. Random Forest with feature
selection achieved the best performance, with an RMSE of 0.69 and an Explained Variance of 0.47. Decision Trees
without feature selection also performed well, while Gradient Boosting, ADA Boosting, and SVM showed lower
performance.
5. Discussion:
The study found that the Random Forest model, particularly with feature selection, was the most effective in
predicting store sales. The robustness of Random Forest, even without feature selection, highlighted its potential
for accurate predictions.
6. Conclusion:
Machine learning techniques, especially Random Forest, demonstrated feasibility for store sales prediction. Accurate
sales forecasting empowers retailers to optimize operations and profitability.
7. Future Work:
Future improvements include incorporating additional relevant features like weather data, holidays, and promotional
activities. Leveraging big data tools like Apache Spark for distributed computing could enhance data processing
and model training efficiency.
-
Notifications
You must be signed in to change notification settings - Fork 0
SnehaaDurairaj/Store-Sales-Prediction
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published