Walmart is an American multinational retail corporation that operates a chain of hypermarkets, department stores and grocery stores. As of Jul 2019, Walmart has 11,200 stores in 27 countries with revenues exceeding $500 billion. A challenge facing the retail industry such as Walmart’s is to ensure the supply chain and warehouse space usage is optimized to ensure supply meets demand effectively, especially during spikes such as the holiday seasons. This is where accurate sales forecasting enable companies to make informed business decisions. Companies can base their forecasts on past sales data, industry-wide comparisons and economic trends. However, a forecasting challenge is the need to make decisions based on limited history. If Christmas comes but once a year, so does the chance to see how strategic decisions impacted the bottom line.
Historical sales data for 45 Walmart stores located in different regions has been provided. Each store contains many departments, and the sales for each department in each store needs to be projected. Additionally, Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of which are the Super Bowl, Labor Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. These markdowns are known to affect sales, but it is challenging to predict which departments are affected and the extent of the impact.
This problem is sourced from Walmart's Kaggle Competition.
This project requires Python 3.x and the following Python libraries installed:
You will also need to have software installed to run and execute an iPython Notebook
This project contains four files:
WalmartStoreSales.ipynb
: This is the main Jupyter Notebook with the project code.stores.csv
: The stores dataset. Will be processed by the notebook.features.csv
: The features dataset. Will be processed by the notebook.train.csv
: The training dataset. Will be processed by the notebook.test.csv
: The testing dataset. Will be processed by the notebook.
In a terminal or command window, navigate to the top-level project directory finding_donors/
(that contains this README) and run one of the following commands:
ipython notebook WalmartStoreSales.ipynb
or
jupyter notebook WalmartStoreSales.ipynb
This will open the iPython Notebook software and project file in your browser.
The dataset for this project is provided by Walmart on Kaggle and contains 4 files.
-
stores.csv: Contains anonymized information about the 45 stores, indicating the type and size of store.
Row Count: 45
/File Size: 1 KB
-
features.csv: Contains additional data related to the store, department, and regional activity for the given dates.
Row Count: 8191
/File Size: 579 KB
-
train.csv: This is the historical training data, which covers to 2010-02-05 to 2012-11-01
Row Count: 422000
/File Size: 12542 KB
-
test.csv: Identical to train.csv, except weekly sales data is withheld.
Row Count: 115000
/File Size: 2538 KB
Features
Store
: Store NumberType
: Store Type (A, B & C)Size
: Store SizeDate
: The Week (YYYY-MM-DD)Temperature
: Average TemperatureFuel_Price
: Cost of FuelMarkDown1
: Promotional Markdown #1MarkDown2
: Promotional Markdown #2MarkDown3
: Promotional Markdown #3MarkDown4
: Promotional Markdown #4MarkDown5
: Promotional Markdown $5CPI
: Consumer Price IndexUnemployment
: Unemployment RateIsHoliday
: Is Holiday Week (Y or N)Dept
: Department Number
Target Variable
Weekly_Sales
: Weekly Sales
My solution to the problem is also available as a Kaggle Kernel