Skip to content

vicky60629/Walmart-Store-Sales-Forecasting

Repository files navigation

πŸ›’ Walmart Store Sales Forecasting

Python Machine Learning Flask Status

Predicting weekly department-level sales across 45 Walmart stores using historical markdown and holiday data β€” deployed as a live Flask web application.


πŸ“Œ Problem Statement

Retail chains like Walmart face a critical challenge: accurately forecasting sales to optimize inventory, staffing, and supply chain decisions β€” especially during high-impact holiday seasons.

This project builds an end-to-end machine learning solution to predict weekly store sales using 2+ years of historical data from 45 Walmart stores across different US regions. The model accounts for promotional markdowns, seasonal spikes (Super Bowl, Thanksgiving, Christmas, Labour Day), and store-level features.


🎯 Business Impact

  • Helps Walmart plan stock levels and warehouse space before high-demand periods
  • Identifies which stores and departments are most affected by holiday markdowns
  • Enables data-driven decisions on regional promotions and resource allocation

πŸ“Š Dataset

Source: Kaggle β€” Walmart Store Sales Forecasting

File Description
train.csv Historical weekly sales per store & department (Feb 2010 – Oct 2012)
test.csv Target period for prediction
features.csv Store-level features: Temperature, Fuel Price, CPI, Unemployment, MarkDowns
stores.csv Store metadata: type (A/B/C) and size

Key Challenge: Holiday weeks are weighted 5x higher in evaluation (WMAE metric), making accurate holiday prediction critical.


πŸ—οΈ Project Architecture

Walmart-Store-Sales-Forecasting/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ train.csv
β”‚   β”œβ”€β”€ test.csv
β”‚   β”œβ”€β”€ features.csv
β”‚   └── stores.csv
β”œβ”€β”€ templates/
β”‚   └── index.html          # Flask frontend
β”œβ”€β”€ static/
β”‚   └── style.css
β”œβ”€β”€ Walmart Store Sales Forecasting.ipynb   # Full EDA + Modelling
β”œβ”€β”€ app.py                  # Flask application
β”œβ”€β”€ model.pkl               # Serialized trained model
β”œβ”€β”€ requirements.txt
└── Dockerfile

πŸ” Approach

1. Exploratory Data Analysis (EDA)

  • Analyzed seasonal patterns in weekly sales across stores and departments
  • Identified holiday spikes β€” Thanksgiving consistently outperforms Christmas in actual sales
  • Found that January sales drop sharply post-holiday season
  • Examined impact of external factors: CPI, fuel price, unemployment, temperature

2. Feature Engineering

  • Extracted time features: Week, Month, Year from date
  • Engineered holiday flags for Super Bowl, Labour Day, Thanksgiving, Christmas
  • Merged store metadata (type, size) with weekly sales and external features
  • Handled missing MarkDown values (up to 4,140 nulls in some columns) using median imputation

3. Modelling

  • Evaluated multiple regression approaches for time-series forecasting
  • Applied Random Forest Regressor as the primary model due to its ability to capture non-linear relationships and feature interactions
  • Used cross-validation with time-aware splits to prevent data leakage

4. Evaluation Metric

WMAE (Weighted Mean Absolute Error) β€” holiday weeks weighted 5x:

WMAE = (1 / Ξ£wα΅’) Γ— Ξ£ wα΅’ |yα΅’ - Ε·α΅’|

5. Deployment

  • Serialized model using pickle β†’ model.pkl
  • Built a Flask web app for real-time sales prediction
  • Containerized with Docker for portability
  • Deployed on web (previously Heroku, migrating to Render)

πŸ“ˆ Results

Metric Value
Evaluation WMAE (Weighted MAE)
Holiday Weight 5Γ— non-holiday weeks
Stores Covered 45 stores, multiple departments
Prediction Period Weekly

Key Finding: Holiday markdown events β€” especially Thanksgiving β€” are the single strongest predictor of sales spikes. Store type and size also significantly influence baseline weekly sales.


πŸ› οΈ Tech Stack

Category Tools
Language Python 3.8+
Data Processing Pandas, NumPy
Visualisation Matplotlib, Seaborn
Modelling Scikit-learn (Random Forest, Linear Regression)
Web Framework Flask
Deployment Docker, Heroku β†’ Render
Notebook Jupyter Notebook

πŸš€ Running Locally

Option 1: Standard Setup

# Clone the repository
git clone https://github.com/vicky60629/Walmart-Store-Sales-Forecasting.git
cd Walmart-Store-Sales-Forecasting

# Install dependencies
pip install -r requirements.txt

# Run the Flask app
python app.py

Then open http://localhost:5000 in your browser.

Option 2: Docker

docker build -t walmart-forecasting .
docker run -p 5000:5000 walmart-forecasting

πŸ“Έ App Preview

Input Form Prediction Output
Input Output

πŸ’‘ Key Learnings & Future Improvements

What worked well:

  • Feature engineering around holidays significantly improved prediction accuracy
  • Random Forest handled the non-linear interactions between store size, type, and markdown events effectively

Future enhancements:

  • Integrate XGBoost / LightGBM for potential performance gains
  • Add SARIMA / Prophet for pure time-series baseline comparison
  • Build interactive dashboard with Streamlit or Plotly Dash
  • Incorporate external data: weather APIs, economic indicators
  • Retrain with MLflow for experiment tracking

πŸ‘¨β€πŸ’» About the Author

Vicky Gupta β€” Data Engineering Analyst @ Accenture (4.5 years) | Aspiring Data Scientist

Skilled in PySpark, ETL pipelines, and end-to-end ML systems. Passionate about building data products that solve real business problems.

πŸ”— LinkedIn | GitHub


πŸ“„ License

This project is licensed under the MIT License β€” see the LICENSE file for details.


⭐ If you found this project useful, please consider starring the repository!

Releases

No releases published

Packages

 
 
 

Contributors

Languages