Project 27

Demand Forecasting

Predict next-quarter product demand for a retail chain using historical sales data, seasonal indices, and promotion effects to optimize inventory purchasing decisions.

Project Overview

Scenario: You are a demand planning analyst at BrightMart, a retail chain with 120 stores across 5 regions. The merchandise planning team needs accurate demand forecasts for the next quarter (Q2 2026) to place purchase orders with suppliers. Over-forecasting leads to excess inventory and markdowns; under-forecasting causes stockouts and lost sales.

Objective: Analyze 3 years of historical sales data to build demand forecasts using time series decomposition, moving averages, and exponential smoothing. Evaluate forecast accuracy with MAE and MAPE metrics to select the best model for each product category.

Tools: SQL (PostgreSQL), Python (pandas, matplotlib, numpy, statsmodels), Excel

Dataset Description

Five tables covering sales transactions, product catalog, seasonal patterns, promotional events, and external demand drivers.

sales_history

ColumnTypeDescription
sale_idINT (PK)Unique transaction identifier
product_idINT (FK)Product reference
sale_dateDATEDate of sale
store_idINTStore location identifier
quantity_soldINTUnits sold in transaction
unit_priceDECIMAL(8,2)Price per unit at time of sale
promotion_idINT (FK, nullable)Active promotion, if any

products

ColumnTypeDescription
product_idINT (PK)Unique product identifier
product_nameVARCHAR(100)Product name
categoryVARCHAR(50)Product category (Electronics, Apparel, Grocery, etc.)
subcategoryVARCHAR(50)Product subcategory
cost_priceDECIMAL(8,2)Wholesale cost per unit

seasons

ColumnTypeDescription
month_numINT (PK)Month number (1-12)
season_nameVARCHAR(20)Season label (Spring, Summer, Fall, Winter)
seasonal_indexDECIMAL(4,3)Demand multiplier (1.000 = average)

promotions

ColumnTypeDescription
promotion_idINT (PK)Unique promotion identifier
promo_nameVARCHAR(100)Promotion name
start_dateDATEPromotion start
end_dateDATEPromotion end
discount_pctDECIMAL(5,2)Discount percentage offered
promo_typeVARCHAR(30)Type (BOGO, Percentage Off, Bundle)

external_factors

ColumnTypeDescription
month_yearDATE (PK)First day of month
cpi_indexDECIMAL(6,2)Consumer Price Index
unemployment_rateDECIMAL(4,2)Regional unemployment rate (%)
avg_temperatureDECIMAL(5,1)Average monthly temperature (F)

SQL Analysis

Query 1: Monthly Sales Patterns by Category

Aggregate historical sales into monthly totals per category to reveal trends and seasonal rhythms. This forms the foundation for all forecasting models.

SELECT
    p.category,
    DATE_TRUNC('month', sh.sale_date) AS sale_month,
    SUM(sh.quantity_sold) AS total_units,
    SUM(sh.quantity_sold * sh.unit_price) AS total_revenue,
    COUNT(DISTINCT sh.store_id) AS active_stores,
    ROUND(AVG(sh.quantity_sold), 2) AS avg_units_per_txn
FROM sales_history sh
JOIN products p ON sh.product_id = p.product_id
WHERE sh.sale_date >= '2023-01-01'
GROUP BY p.category, DATE_TRUNC('month', sh.sale_date)
ORDER BY p.category, sale_month;

Query 2: Seasonal Indices by Category

Calculate seasonal indices by comparing each month's average sales to the overall monthly average for that category. An index above 1.0 indicates above-average demand; below 1.0 indicates a slow period.

WITH monthly_sales AS (
    SELECT
        p.category,
        EXTRACT(MONTH FROM sh.sale_date) AS month_num,
        EXTRACT(YEAR FROM sh.sale_date) AS sale_year,
        SUM(sh.quantity_sold) AS monthly_units
    FROM sales_history sh
    JOIN products p ON sh.product_id = p.product_id
    GROUP BY p.category, EXTRACT(MONTH FROM sh.sale_date), EXTRACT(YEAR FROM sh.sale_date)
),
category_avg AS (
    SELECT
        category,
        ROUND(AVG(monthly_units), 0) AS overall_monthly_avg
    FROM monthly_sales
    GROUP BY category
)
SELECT
    ms.category,
    ms.month_num,
    s.season_name,
    ROUND(AVG(ms.monthly_units), 0) AS avg_month_units,
    ca.overall_monthly_avg,
    ROUND(AVG(ms.monthly_units)::NUMERIC / ca.overall_monthly_avg, 3) AS seasonal_index
FROM monthly_sales ms
JOIN category_avg ca ON ms.category = ca.category
JOIN seasons s ON ms.month_num = s.month_num
GROUP BY ms.category, ms.month_num, s.season_name, ca.overall_monthly_avg
ORDER BY ms.category, ms.month_num;

Query 3: Promotion Impact Analysis

Measure how much promotions lift sales compared to non-promotion periods. This helps adjust forecasts when future promotions are planned.

WITH promo_sales AS (
    SELECT
        p.category,
        pr.promo_type,
        pr.discount_pct,
        SUM(sh.quantity_sold) AS promo_units,
        COUNT(DISTINCT sh.sale_id) AS promo_transactions,
        SUM(sh.quantity_sold * sh.unit_price) AS promo_revenue
    FROM sales_history sh
    JOIN products p ON sh.product_id = p.product_id
    JOIN promotions pr ON sh.promotion_id = pr.promotion_id
    GROUP BY p.category, pr.promo_type, pr.discount_pct
),
baseline_sales AS (
    SELECT
        p.category,
        ROUND(AVG(sh.quantity_sold), 2) AS avg_baseline_units
    FROM sales_history sh
    JOIN products p ON sh.product_id = p.product_id
    WHERE sh.promotion_id IS NULL
    GROUP BY p.category
)
SELECT
    ps.category,
    ps.promo_type,
    ps.discount_pct,
    ps.promo_transactions,
    ROUND(ps.promo_units::NUMERIC / ps.promo_transactions, 2) AS avg_promo_units,
    bs.avg_baseline_units,
    ROUND(
        (ps.promo_units::NUMERIC / ps.promo_transactions - bs.avg_baseline_units)
        / bs.avg_baseline_units * 100, 1
    ) AS lift_pct
FROM promo_sales ps
JOIN baseline_sales bs ON ps.category = bs.category
ORDER BY lift_pct DESC;

Query 4: Year-over-Year Growth with Moving Average

Compare each month to the same month last year and compute a 3-month moving average to smooth out noise. Window functions make this calculation concise.

WITH monthly AS (
    SELECT
        DATE_TRUNC('month', sh.sale_date) AS sale_month,
        SUM(sh.quantity_sold) AS total_units
    FROM sales_history sh
    GROUP BY DATE_TRUNC('month', sh.sale_date)
)
SELECT
    sale_month,
    total_units,
    LAG(total_units, 12) OVER (ORDER BY sale_month) AS same_month_last_year,
    ROUND(
        (total_units - LAG(total_units, 12) OVER (ORDER BY sale_month))::NUMERIC
        / NULLIF(LAG(total_units, 12) OVER (ORDER BY sale_month), 0) * 100, 1
    ) AS yoy_growth_pct,
    ROUND(
        AVG(total_units) OVER (
            ORDER BY sale_month
            ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
        ), 0
    ) AS moving_avg_3m
FROM monthly
ORDER BY sale_month;

Python Analysis

Time Series Decomposition

Decompose the sales time series into trend, seasonal, and residual components using additive decomposition from statsmodels. This reveals the underlying structure driving demand patterns.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose

# Load and prepare data
sales = pd.read_csv('sales_history.csv', parse_dates=['sale_date'])
monthly = sales.groupby(pd.Grouper(key='sale_date', freq='M'))['quantity_sold'].sum()
monthly.index = monthly.index.to_period('M').to_timestamp()

# Decompose time series (additive model, period=12 for monthly data)
decomposition = seasonal_decompose(monthly, model='additive', period=12)

fig, axes = plt.subplots(4, 1, figsize=(14, 10), sharex=True)
components = [
    ('Observed', decomposition.observed, '#3b82f6'),
    ('Trend', decomposition.trend, '#10b981'),
    ('Seasonal', decomposition.seasonal, '#f59e0b'),
    ('Residual', decomposition.resid, '#ef4444')
]
for ax, (title, data, color) in zip(axes, components):
    ax.plot(data, color=color, linewidth=1.5)
    ax.set_ylabel(title, fontsize=11, fontweight='bold')
    ax.grid(True, alpha=0.3)

axes[0].set_title('Time Series Decomposition of Monthly Sales', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig('decomposition.png', dpi=150)
plt.show()

Moving Average and Exponential Smoothing Forecasts

Compare three forecasting approaches: simple 3-month moving average, 6-month moving average, and simple exponential smoothing. Each method has different strengths for capturing recent trends versus long-term patterns.

from statsmodels.tsa.holtwinters import SimpleExpSmoothing

# Moving averages
monthly_df = monthly.reset_index()
monthly_df.columns = ['date', 'units']

monthly_df['ma_3'] = monthly_df['units'].rolling(window=3).mean()
monthly_df['ma_6'] = monthly_df['units'].rolling(window=6).mean()

# Simple Exponential Smoothing (alpha=0.3)
ses_model = SimpleExpSmoothing(monthly_df['units'].values).fit(
    smoothing_level=0.3, optimized=False
)
monthly_df['ses'] = ses_model.fittedvalues

# Forecast next 3 months
forecast_periods = 3
ses_forecast = ses_model.forecast(forecast_periods)
last_ma3 = monthly_df['ma_3'].iloc[-1]
last_ma6 = monthly_df['ma_6'].iloc[-1]

# Plot comparison
fig, ax = plt.subplots(figsize=(14, 7))
ax.plot(monthly_df['date'], monthly_df['units'], 'o-', color='#64748b',
        alpha=0.5, markersize=3, label='Actual')
ax.plot(monthly_df['date'], monthly_df['ma_3'], color='#3b82f6',
        linewidth=2, label='3-Month MA')
ax.plot(monthly_df['date'], monthly_df['ma_6'], color='#10b981',
        linewidth=2, label='6-Month MA')
ax.plot(monthly_df['date'], monthly_df['ses'], color='#f59e0b',
        linewidth=2, label='Exp. Smoothing (a=0.3)')

# Plot forecasts
future_dates = pd.date_range(monthly_df['date'].iloc[-1] + pd.DateOffset(months=1),
                              periods=forecast_periods, freq='MS')
ax.plot(future_dates, ses_forecast, 's--', color='#ef4444',
        markersize=8, linewidth=2, label='SES Forecast')
ax.axvline(x=monthly_df['date'].iloc[-1], color='gray', linestyle='--', alpha=0.5)

ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Units Sold', fontsize=12)
ax.set_title('Demand Forecasting: Method Comparison', fontsize=14, fontweight='bold')
ax.legend(loc='upper left')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('forecast_comparison.png', dpi=150)
plt.show()

Forecast Accuracy: MAE and MAPE

Evaluate each model using Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) on the last 6 months of data held out as a test set. Lower values indicate better forecasting performance.

# Hold out last 6 months for testing
train = monthly_df.iloc[:-6].copy()
test = monthly_df.iloc[-6:].copy()
actuals = test['units'].values

# Generate test-period forecasts for each method
# MA3: rolling average from training tail
ma3_preds = []
window = list(train['units'].values[-3:])
for _ in range(len(test)):
    pred = np.mean(window[-3:])
    ma3_preds.append(pred)
    window.append(pred)

# MA6: rolling average from training tail
ma6_preds = []
window6 = list(train['units'].values[-6:])
for _ in range(len(test)):
    pred = np.mean(window6[-6:])
    ma6_preds.append(pred)
    window6.append(pred)

# SES: refit on training data and forecast
ses_train = SimpleExpSmoothing(train['units'].values).fit(
    smoothing_level=0.3, optimized=False
)
ses_preds = ses_train.forecast(len(test))

# Calculate metrics
def calc_metrics(actual, predicted):
    mae = np.mean(np.abs(actual - predicted))
    mape = np.mean(np.abs((actual - predicted) / actual)) * 100
    return mae, mape

models = {
    '3-Month MA': np.array(ma3_preds),
    '6-Month MA': np.array(ma6_preds),
    'Exp. Smoothing': ses_preds
}

print(f"{'Model':<20} {'MAE':>10} {'MAPE (%)':>10}")
print("-" * 42)
for name, preds in models.items():
    mae, mape = calc_metrics(actuals, preds)
    print(f"{name:<20} {mae:>10.1f} {mape:>10.1f}")

# Visualize accuracy comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
model_names = list(models.keys())
maes = [calc_metrics(actuals, p)[0] for p in models.values()]
mapes = [calc_metrics(actuals, p)[1] for p in models.values()]

colors = ['#3b82f6', '#10b981', '#f59e0b']
ax1.bar(model_names, maes, color=colors, alpha=0.85)
ax1.set_ylabel('MAE (Units)', fontsize=12)
ax1.set_title('Mean Absolute Error', fontsize=13, fontweight='bold')

ax2.bar(model_names, mapes, color=colors, alpha=0.85)
ax2.set_ylabel('MAPE (%)', fontsize=12)
ax2.set_title('Mean Absolute Percentage Error', fontsize=13, fontweight='bold')

plt.tight_layout()
plt.savefig('forecast_accuracy.png', dpi=150)
plt.show()

Excel Instructions

Build a demand forecast spreadsheet with seasonal adjustments and an accuracy tracking dashboard.

  1. Import Historical Sales: Import the monthly aggregated sales data (from your SQL query output) into a sheet called "History." Columns should include: Month, Category, Units Sold, Revenue. Sort by Category then Month ascending.
  2. Calculate Moving Averages: Add columns for 3-month MA and 6-month MA. For the 3-month MA in cell E4, use =AVERAGE(D2:D4) and drag down. For the 6-month MA in F7, use =AVERAGE(D2:D7) and drag down. These columns will have blank cells at the top where insufficient history exists.
  3. Compute Seasonal Indices: Create a "Seasonal" sheet. For each month (1-12), calculate the average units across all years, then divide by the grand monthly average. In B2, use =AVERAGEIFS(History!D:D, MONTH(History!A:A), A2) / AVERAGE(History!D:D). A value of 1.15 means that month is 15% above average.
  4. Build Forecast Sheet: Create a "Forecast" sheet with future months (Apr-Jun 2026). Calculate base forecast using the 3-month MA of the most recent data, then multiply by the seasonal index: =base_forecast * VLOOKUP(MONTH(A2), Seasonal!A:B, 2, FALSE). Add a column for promotion adjustments if a promo is planned.
  5. Exponential Smoothing: In a new column, implement SES manually. In cell G2, set the first value equal to the first actual. In G3, use =0.3*D2 + 0.7*G2 (alpha=0.3) and drag down. This gives you a smoothed forecast that weights recent data more heavily.
  6. Accuracy Tracker: Create an "Accuracy" sheet. Once actual Q2 data arrives, compute MAE with =AVERAGE(ABS(Actual - Forecast)) using an array formula, and MAPE with =AVERAGE(ABS((Actual-Forecast)/Actual))*100. Build a comparison chart showing each model's MAE and MAPE.
  7. Dashboard: Create a summary dashboard sheet with a line chart showing Historical vs Forecast (with a dashed line for the forecast portion), a table of seasonal indices, and KPI cards for best model MAE, forecast horizon, and estimated purchase order value. Add a dropdown slicer for product category.

Key Insights & Recommendations

Seasonal Demand Spikes

Electronics demand surges 42% in November-December (seasonal index 1.42) while Apparel peaks in March-April (index 1.28). Pre-position inventory 6-8 weeks before peak seasons to avoid stockouts during high-demand windows.

Promotion Lift Effect

BOGO promotions generate a 65% average lift but cannibalize the following 2 weeks by 20%. Percentage-off promos show a more moderate 35% lift with minimal cannibalization. Factor in post-promo dips when planning adjacent months.

Best Forecast Model

Exponential smoothing (alpha=0.3) achieved the lowest MAPE of 8.2% across all categories, outperforming the 3-month MA (11.4%) and 6-month MA (9.8%). Recommend SES as the primary forecast method for purchase order planning.

External Factor Sensitivity

A 1-point increase in the unemployment rate correlates with a 4.3% decline in discretionary spending (Electronics, Apparel). Monitor economic indicators and adjust Q2 forecasts if macroeconomic conditions shift.

Store-Level Variance

Forecast accuracy varies significantly by store: top 20% of stores have MAPE under 6%, while bottom 20% exceed 15%. High-variance stores should carry extra safety stock. Consider store-level forecasting for the top 30 locations by volume.

Inventory Cost Savings

Switching from blanket ordering to forecast-driven purchasing could reduce excess inventory by an estimated $380K per quarter. The improved forecasts would also cut stockout incidents by 25%, recovering approximately $150K in lost sales.

Knowledge Check

Question 1

In the seasonal indices SQL query, what does a seasonal index of 0.75 for a given month and category indicate?

Question 2

What is the main advantage of exponential smoothing over a simple moving average?

Question 3

MAPE stands for Mean Absolute Percentage Error. Why might MAPE be misleading when actual values are close to zero?

Question 4

In the year-over-year growth query, what does LAG(total_units, 12) OVER (ORDER BY sale_month) return?

Question 5

When the time series decomposition shows large residuals in specific months, what does this most likely indicate?

← Previous Project Next Project →