Demand Forecasting
Predict next-quarter product demand for a retail chain using historical sales data, seasonal indices, and promotion effects to optimize inventory purchasing decisions.
Project Overview
Scenario: You are a demand planning analyst at BrightMart, a retail chain with 120 stores across 5 regions. The merchandise planning team needs accurate demand forecasts for the next quarter (Q2 2026) to place purchase orders with suppliers. Over-forecasting leads to excess inventory and markdowns; under-forecasting causes stockouts and lost sales.
Objective: Analyze 3 years of historical sales data to build demand forecasts using time series decomposition, moving averages, and exponential smoothing. Evaluate forecast accuracy with MAE and MAPE metrics to select the best model for each product category.
Tools: SQL (PostgreSQL), Python (pandas, matplotlib, numpy, statsmodels), Excel
Dataset Description
Five tables covering sales transactions, product catalog, seasonal patterns, promotional events, and external demand drivers.
sales_history
| Column | Type | Description |
|---|---|---|
| sale_id | INT (PK) | Unique transaction identifier |
| product_id | INT (FK) | Product reference |
| sale_date | DATE | Date of sale |
| store_id | INT | Store location identifier |
| quantity_sold | INT | Units sold in transaction |
| unit_price | DECIMAL(8,2) | Price per unit at time of sale |
| promotion_id | INT (FK, nullable) | Active promotion, if any |
products
| Column | Type | Description |
|---|---|---|
| product_id | INT (PK) | Unique product identifier |
| product_name | VARCHAR(100) | Product name |
| category | VARCHAR(50) | Product category (Electronics, Apparel, Grocery, etc.) |
| subcategory | VARCHAR(50) | Product subcategory |
| cost_price | DECIMAL(8,2) | Wholesale cost per unit |
seasons
| Column | Type | Description |
|---|---|---|
| month_num | INT (PK) | Month number (1-12) |
| season_name | VARCHAR(20) | Season label (Spring, Summer, Fall, Winter) |
| seasonal_index | DECIMAL(4,3) | Demand multiplier (1.000 = average) |
promotions
| Column | Type | Description |
|---|---|---|
| promotion_id | INT (PK) | Unique promotion identifier |
| promo_name | VARCHAR(100) | Promotion name |
| start_date | DATE | Promotion start |
| end_date | DATE | Promotion end |
| discount_pct | DECIMAL(5,2) | Discount percentage offered |
| promo_type | VARCHAR(30) | Type (BOGO, Percentage Off, Bundle) |
external_factors
| Column | Type | Description |
|---|---|---|
| month_year | DATE (PK) | First day of month |
| cpi_index | DECIMAL(6,2) | Consumer Price Index |
| unemployment_rate | DECIMAL(4,2) | Regional unemployment rate (%) |
| avg_temperature | DECIMAL(5,1) | Average monthly temperature (F) |
SQL Analysis
Query 1: Monthly Sales Patterns by Category
Aggregate historical sales into monthly totals per category to reveal trends and seasonal rhythms. This forms the foundation for all forecasting models.
SELECT
p.category,
DATE_TRUNC('month', sh.sale_date) AS sale_month,
SUM(sh.quantity_sold) AS total_units,
SUM(sh.quantity_sold * sh.unit_price) AS total_revenue,
COUNT(DISTINCT sh.store_id) AS active_stores,
ROUND(AVG(sh.quantity_sold), 2) AS avg_units_per_txn
FROM sales_history sh
JOIN products p ON sh.product_id = p.product_id
WHERE sh.sale_date >= '2023-01-01'
GROUP BY p.category, DATE_TRUNC('month', sh.sale_date)
ORDER BY p.category, sale_month;
Query 2: Seasonal Indices by Category
Calculate seasonal indices by comparing each month's average sales to the overall monthly average for that category. An index above 1.0 indicates above-average demand; below 1.0 indicates a slow period.
WITH monthly_sales AS (
SELECT
p.category,
EXTRACT(MONTH FROM sh.sale_date) AS month_num,
EXTRACT(YEAR FROM sh.sale_date) AS sale_year,
SUM(sh.quantity_sold) AS monthly_units
FROM sales_history sh
JOIN products p ON sh.product_id = p.product_id
GROUP BY p.category, EXTRACT(MONTH FROM sh.sale_date), EXTRACT(YEAR FROM sh.sale_date)
),
category_avg AS (
SELECT
category,
ROUND(AVG(monthly_units), 0) AS overall_monthly_avg
FROM monthly_sales
GROUP BY category
)
SELECT
ms.category,
ms.month_num,
s.season_name,
ROUND(AVG(ms.monthly_units), 0) AS avg_month_units,
ca.overall_monthly_avg,
ROUND(AVG(ms.monthly_units)::NUMERIC / ca.overall_monthly_avg, 3) AS seasonal_index
FROM monthly_sales ms
JOIN category_avg ca ON ms.category = ca.category
JOIN seasons s ON ms.month_num = s.month_num
GROUP BY ms.category, ms.month_num, s.season_name, ca.overall_monthly_avg
ORDER BY ms.category, ms.month_num;
Query 3: Promotion Impact Analysis
Measure how much promotions lift sales compared to non-promotion periods. This helps adjust forecasts when future promotions are planned.
WITH promo_sales AS (
SELECT
p.category,
pr.promo_type,
pr.discount_pct,
SUM(sh.quantity_sold) AS promo_units,
COUNT(DISTINCT sh.sale_id) AS promo_transactions,
SUM(sh.quantity_sold * sh.unit_price) AS promo_revenue
FROM sales_history sh
JOIN products p ON sh.product_id = p.product_id
JOIN promotions pr ON sh.promotion_id = pr.promotion_id
GROUP BY p.category, pr.promo_type, pr.discount_pct
),
baseline_sales AS (
SELECT
p.category,
ROUND(AVG(sh.quantity_sold), 2) AS avg_baseline_units
FROM sales_history sh
JOIN products p ON sh.product_id = p.product_id
WHERE sh.promotion_id IS NULL
GROUP BY p.category
)
SELECT
ps.category,
ps.promo_type,
ps.discount_pct,
ps.promo_transactions,
ROUND(ps.promo_units::NUMERIC / ps.promo_transactions, 2) AS avg_promo_units,
bs.avg_baseline_units,
ROUND(
(ps.promo_units::NUMERIC / ps.promo_transactions - bs.avg_baseline_units)
/ bs.avg_baseline_units * 100, 1
) AS lift_pct
FROM promo_sales ps
JOIN baseline_sales bs ON ps.category = bs.category
ORDER BY lift_pct DESC;
Query 4: Year-over-Year Growth with Moving Average
Compare each month to the same month last year and compute a 3-month moving average to smooth out noise. Window functions make this calculation concise.
WITH monthly AS (
SELECT
DATE_TRUNC('month', sh.sale_date) AS sale_month,
SUM(sh.quantity_sold) AS total_units
FROM sales_history sh
GROUP BY DATE_TRUNC('month', sh.sale_date)
)
SELECT
sale_month,
total_units,
LAG(total_units, 12) OVER (ORDER BY sale_month) AS same_month_last_year,
ROUND(
(total_units - LAG(total_units, 12) OVER (ORDER BY sale_month))::NUMERIC
/ NULLIF(LAG(total_units, 12) OVER (ORDER BY sale_month), 0) * 100, 1
) AS yoy_growth_pct,
ROUND(
AVG(total_units) OVER (
ORDER BY sale_month
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
), 0
) AS moving_avg_3m
FROM monthly
ORDER BY sale_month;
Python Analysis
Time Series Decomposition
Decompose the sales time series into trend, seasonal, and residual components using additive decomposition from statsmodels. This reveals the underlying structure driving demand patterns.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
# Load and prepare data
sales = pd.read_csv('sales_history.csv', parse_dates=['sale_date'])
monthly = sales.groupby(pd.Grouper(key='sale_date', freq='M'))['quantity_sold'].sum()
monthly.index = monthly.index.to_period('M').to_timestamp()
# Decompose time series (additive model, period=12 for monthly data)
decomposition = seasonal_decompose(monthly, model='additive', period=12)
fig, axes = plt.subplots(4, 1, figsize=(14, 10), sharex=True)
components = [
('Observed', decomposition.observed, '#3b82f6'),
('Trend', decomposition.trend, '#10b981'),
('Seasonal', decomposition.seasonal, '#f59e0b'),
('Residual', decomposition.resid, '#ef4444')
]
for ax, (title, data, color) in zip(axes, components):
ax.plot(data, color=color, linewidth=1.5)
ax.set_ylabel(title, fontsize=11, fontweight='bold')
ax.grid(True, alpha=0.3)
axes[0].set_title('Time Series Decomposition of Monthly Sales', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig('decomposition.png', dpi=150)
plt.show()
Moving Average and Exponential Smoothing Forecasts
Compare three forecasting approaches: simple 3-month moving average, 6-month moving average, and simple exponential smoothing. Each method has different strengths for capturing recent trends versus long-term patterns.
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
# Moving averages
monthly_df = monthly.reset_index()
monthly_df.columns = ['date', 'units']
monthly_df['ma_3'] = monthly_df['units'].rolling(window=3).mean()
monthly_df['ma_6'] = monthly_df['units'].rolling(window=6).mean()
# Simple Exponential Smoothing (alpha=0.3)
ses_model = SimpleExpSmoothing(monthly_df['units'].values).fit(
smoothing_level=0.3, optimized=False
)
monthly_df['ses'] = ses_model.fittedvalues
# Forecast next 3 months
forecast_periods = 3
ses_forecast = ses_model.forecast(forecast_periods)
last_ma3 = monthly_df['ma_3'].iloc[-1]
last_ma6 = monthly_df['ma_6'].iloc[-1]
# Plot comparison
fig, ax = plt.subplots(figsize=(14, 7))
ax.plot(monthly_df['date'], monthly_df['units'], 'o-', color='#64748b',
alpha=0.5, markersize=3, label='Actual')
ax.plot(monthly_df['date'], monthly_df['ma_3'], color='#3b82f6',
linewidth=2, label='3-Month MA')
ax.plot(monthly_df['date'], monthly_df['ma_6'], color='#10b981',
linewidth=2, label='6-Month MA')
ax.plot(monthly_df['date'], monthly_df['ses'], color='#f59e0b',
linewidth=2, label='Exp. Smoothing (a=0.3)')
# Plot forecasts
future_dates = pd.date_range(monthly_df['date'].iloc[-1] + pd.DateOffset(months=1),
periods=forecast_periods, freq='MS')
ax.plot(future_dates, ses_forecast, 's--', color='#ef4444',
markersize=8, linewidth=2, label='SES Forecast')
ax.axvline(x=monthly_df['date'].iloc[-1], color='gray', linestyle='--', alpha=0.5)
ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Units Sold', fontsize=12)
ax.set_title('Demand Forecasting: Method Comparison', fontsize=14, fontweight='bold')
ax.legend(loc='upper left')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('forecast_comparison.png', dpi=150)
plt.show()
Forecast Accuracy: MAE and MAPE
Evaluate each model using Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) on the last 6 months of data held out as a test set. Lower values indicate better forecasting performance.
# Hold out last 6 months for testing
train = monthly_df.iloc[:-6].copy()
test = monthly_df.iloc[-6:].copy()
actuals = test['units'].values
# Generate test-period forecasts for each method
# MA3: rolling average from training tail
ma3_preds = []
window = list(train['units'].values[-3:])
for _ in range(len(test)):
pred = np.mean(window[-3:])
ma3_preds.append(pred)
window.append(pred)
# MA6: rolling average from training tail
ma6_preds = []
window6 = list(train['units'].values[-6:])
for _ in range(len(test)):
pred = np.mean(window6[-6:])
ma6_preds.append(pred)
window6.append(pred)
# SES: refit on training data and forecast
ses_train = SimpleExpSmoothing(train['units'].values).fit(
smoothing_level=0.3, optimized=False
)
ses_preds = ses_train.forecast(len(test))
# Calculate metrics
def calc_metrics(actual, predicted):
mae = np.mean(np.abs(actual - predicted))
mape = np.mean(np.abs((actual - predicted) / actual)) * 100
return mae, mape
models = {
'3-Month MA': np.array(ma3_preds),
'6-Month MA': np.array(ma6_preds),
'Exp. Smoothing': ses_preds
}
print(f"{'Model':<20} {'MAE':>10} {'MAPE (%)':>10}")
print("-" * 42)
for name, preds in models.items():
mae, mape = calc_metrics(actuals, preds)
print(f"{name:<20} {mae:>10.1f} {mape:>10.1f}")
# Visualize accuracy comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
model_names = list(models.keys())
maes = [calc_metrics(actuals, p)[0] for p in models.values()]
mapes = [calc_metrics(actuals, p)[1] for p in models.values()]
colors = ['#3b82f6', '#10b981', '#f59e0b']
ax1.bar(model_names, maes, color=colors, alpha=0.85)
ax1.set_ylabel('MAE (Units)', fontsize=12)
ax1.set_title('Mean Absolute Error', fontsize=13, fontweight='bold')
ax2.bar(model_names, mapes, color=colors, alpha=0.85)
ax2.set_ylabel('MAPE (%)', fontsize=12)
ax2.set_title('Mean Absolute Percentage Error', fontsize=13, fontweight='bold')
plt.tight_layout()
plt.savefig('forecast_accuracy.png', dpi=150)
plt.show()
Excel Instructions
Build a demand forecast spreadsheet with seasonal adjustments and an accuracy tracking dashboard.
- Import Historical Sales: Import the monthly aggregated sales data (from your SQL query output) into a sheet called "History." Columns should include: Month, Category, Units Sold, Revenue. Sort by Category then Month ascending.
- Calculate Moving Averages: Add columns for 3-month MA and 6-month MA. For the 3-month MA in cell E4, use
=AVERAGE(D2:D4)and drag down. For the 6-month MA in F7, use=AVERAGE(D2:D7)and drag down. These columns will have blank cells at the top where insufficient history exists. - Compute Seasonal Indices: Create a "Seasonal" sheet. For each month (1-12), calculate the average units across all years, then divide by the grand monthly average. In B2, use
=AVERAGEIFS(History!D:D, MONTH(History!A:A), A2) / AVERAGE(History!D:D). A value of 1.15 means that month is 15% above average. - Build Forecast Sheet: Create a "Forecast" sheet with future months (Apr-Jun 2026). Calculate base forecast using the 3-month MA of the most recent data, then multiply by the seasonal index:
=base_forecast * VLOOKUP(MONTH(A2), Seasonal!A:B, 2, FALSE). Add a column for promotion adjustments if a promo is planned. - Exponential Smoothing: In a new column, implement SES manually. In cell G2, set the first value equal to the first actual. In G3, use
=0.3*D2 + 0.7*G2(alpha=0.3) and drag down. This gives you a smoothed forecast that weights recent data more heavily. - Accuracy Tracker: Create an "Accuracy" sheet. Once actual Q2 data arrives, compute MAE with
=AVERAGE(ABS(Actual - Forecast))using an array formula, and MAPE with=AVERAGE(ABS((Actual-Forecast)/Actual))*100. Build a comparison chart showing each model's MAE and MAPE. - Dashboard: Create a summary dashboard sheet with a line chart showing Historical vs Forecast (with a dashed line for the forecast portion), a table of seasonal indices, and KPI cards for best model MAE, forecast horizon, and estimated purchase order value. Add a dropdown slicer for product category.
Key Insights & Recommendations
Seasonal Demand Spikes
Electronics demand surges 42% in November-December (seasonal index 1.42) while Apparel peaks in March-April (index 1.28). Pre-position inventory 6-8 weeks before peak seasons to avoid stockouts during high-demand windows.
Promotion Lift Effect
BOGO promotions generate a 65% average lift but cannibalize the following 2 weeks by 20%. Percentage-off promos show a more moderate 35% lift with minimal cannibalization. Factor in post-promo dips when planning adjacent months.
Best Forecast Model
Exponential smoothing (alpha=0.3) achieved the lowest MAPE of 8.2% across all categories, outperforming the 3-month MA (11.4%) and 6-month MA (9.8%). Recommend SES as the primary forecast method for purchase order planning.
External Factor Sensitivity
A 1-point increase in the unemployment rate correlates with a 4.3% decline in discretionary spending (Electronics, Apparel). Monitor economic indicators and adjust Q2 forecasts if macroeconomic conditions shift.
Store-Level Variance
Forecast accuracy varies significantly by store: top 20% of stores have MAPE under 6%, while bottom 20% exceed 15%. High-variance stores should carry extra safety stock. Consider store-level forecasting for the top 30 locations by volume.
Inventory Cost Savings
Switching from blanket ordering to forecast-driven purchasing could reduce excess inventory by an estimated $380K per quarter. The improved forecasts would also cut stockout incidents by 25%, recovering approximately $150K in lost sales.
Knowledge Check
In the seasonal indices SQL query, what does a seasonal index of 0.75 for a given month and category indicate?
What is the main advantage of exponential smoothing over a simple moving average?
MAPE stands for Mean Absolute Percentage Error. Why might MAPE be misleading when actual values are close to zero?
In the year-over-year growth query, what does LAG(total_units, 12) OVER (ORDER BY sale_month) return?
When the time series decomposition shows large residuals in specific months, what does this most likely indicate?