Revenue Forecasting
The CFO of a mid-size financial services company needs quarterly revenue forecasts to support budgeting, hiring plans, and investor communications. You will analyze historical revenue data with SQL, build an ARIMA time series model in Python, and create an Excel forecast workbook with what-if scenario analysis.
Project Overview
Business Scenario
Meridian Financial Group generates revenue from three streams: advisory fees, transaction commissions, and interest income. The company has 5 years of monthly revenue data and needs to forecast the next 12 months. Historically, revenue shows clear seasonal patterns (Q4 is strongest due to year-end advisory activity, Q1 is weakest). The CFO has asked for a base-case forecast with optimistic and pessimistic scenarios to present to the board.
- Goal: Produce a 12-month revenue forecast with confidence intervals and scenario analysis
- Tools: SQL (trend and seasonal analysis), Python (ARIMA time series model), Excel (forecast dashboard)
- Stakeholder: CFO and Finance Planning team
- Deliverables: Forecast model, seasonal decomposition chart, accuracy metrics (MAPE), scenario workbook
Dataset
Database Schema
The financial database contains monthly revenue records, expense breakdowns, market indicators, and seasonal adjustment factors.
revenue_monthly
| Column | Type | Description |
|---|---|---|
| revenue_id | INT (PK) | Record identifier |
| month_date | DATE | First day of the month (e.g., 2021-01-01) |
| revenue_stream | VARCHAR(30) | advisory_fees, commissions, interest_income |
| amount | DECIMAL(14,2) | Monthly revenue amount |
| client_count | INT | Number of active clients that month |
expenses
| Column | Type | Description |
|---|---|---|
| expense_id | INT (PK) | Record identifier |
| month_date | DATE | Month of expense |
| category | VARCHAR(30) | salaries, technology, marketing, rent, other |
| amount | DECIMAL(14,2) | Monthly expense amount |
market_indicators
| Column | Type | Description |
|---|---|---|
| indicator_id | INT (PK) | Record identifier |
| month_date | DATE | Month |
| sp500_return | DECIMAL(6,4) | S&P 500 monthly return |
| fed_funds_rate | DECIMAL(5,2) | Federal Funds rate (%) |
| vix | DECIMAL(6,2) | CBOE Volatility Index |
seasonal_factors
| Column | Type | Description |
|---|---|---|
| month_num | INT (PK) | Month number (1-12) |
| seasonal_index | DECIMAL(6,4) | Seasonal adjustment multiplier (1.0 = average) |
| description | VARCHAR(100) | Explanation of seasonal pattern |
SQL Analysis
Query 1: Monthly Revenue Trends with Year-over-Year Growth
Calculate total monthly revenue and compare each month to the same month in the prior year to identify growth trends.
WITH monthly_totals AS (
SELECT
month_date,
YEAR(month_date) AS yr,
MONTH(month_date) AS mo,
SUM(amount) AS total_revenue
FROM revenue_monthly
GROUP BY month_date
)
SELECT
curr.month_date,
curr.total_revenue,
prev.total_revenue AS prev_year_revenue,
ROUND((curr.total_revenue - prev.total_revenue) / prev.total_revenue * 100, 2) AS yoy_growth_pct
FROM monthly_totals curr
LEFT JOIN monthly_totals prev
ON curr.yr = prev.yr + 1 AND curr.mo = prev.mo
ORDER BY curr.month_date;
Query 2: Revenue by Stream with Rolling 3-Month Average
Break down revenue by stream and smooth out month-to-month noise using a rolling average. This reveals the underlying trend for each revenue line.
SELECT
month_date,
revenue_stream,
amount,
ROUND(AVG(amount) OVER (
PARTITION BY revenue_stream
ORDER BY month_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
), 2) AS rolling_3mo_avg
FROM revenue_monthly
ORDER BY revenue_stream, month_date;
Query 3: Seasonal Pattern Analysis
Calculate the average revenue for each calendar month across all years. This quantifies the seasonal pattern that the ARIMA model will need to capture.
SELECT
MONTH(rm.month_date) AS month_num,
MONTHNAME(rm.month_date) AS month_name,
ROUND(AVG(total_rev), 2) AS avg_monthly_revenue,
ROUND(AVG(total_rev) / (SELECT AVG(amount_sum)
FROM (SELECT SUM(amount) AS amount_sum FROM revenue_monthly GROUP BY month_date) t
), 4) AS seasonal_index
FROM (
SELECT month_date, SUM(amount) AS total_rev
FROM revenue_monthly
GROUP BY month_date
) rm
GROUP BY MONTH(rm.month_date), MONTHNAME(rm.month_date)
ORDER BY month_num;
Python Analysis
Time Series Decomposition
Decompose the revenue series into trend, seasonal, and residual components. This helps verify the seasonal patterns found in SQL and confirms the data is suitable for ARIMA modeling.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
# Load monthly revenue totals
df = pd.read_csv('revenue_monthly_totals.csv', parse_dates=['month_date'])
df = df.set_index('month_date').sort_index()
df = df.asfreq('MS') # Ensure monthly start frequency
# Multiplicative decomposition (seasonal effect is proportional)
decomp = seasonal_decompose(df['total_revenue'], model='multiplicative', period=12)
fig, axes = plt.subplots(4, 1, figsize=(14, 10), sharex=True)
decomp.observed.plot(ax=axes[0], title='Observed Revenue')
decomp.trend.plot(ax=axes[1], title='Trend')
decomp.seasonal.plot(ax=axes[2], title='Seasonal Pattern')
decomp.resid.plot(ax=axes[3], title='Residual')
for ax in axes:
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('decomposition.png', dpi=150)
plt.show()
ARIMA Revenue Forecast
Fit a Seasonal ARIMA (SARIMA) model to capture both the trend and the 12-month seasonal cycle. The model is trained on all but the last 6 months, which are held out for validation.
from statsmodels.tsa.statespace.sarimax import SARIMAX
import warnings
warnings.filterwarnings('ignore')
# Split: train on all data except last 6 months
train = df['total_revenue'][:-6]
test = df['total_revenue'][-6:]
# SARIMA(1,1,1)(1,1,1,12) - common starting point for monthly seasonal data
model = SARIMAX(train,
order=(1, 1, 1),
seasonal_order=(1, 1, 1, 12),
enforce_stationarity=False,
enforce_invertibility=False)
results = model.fit(disp=False)
print(results.summary().tables[1])
# Forecast 18 months ahead (6 validation + 12 future)
forecast = results.get_forecast(steps=18)
forecast_mean = forecast.predicted_mean
forecast_ci = forecast.conf_int(alpha=0.05)
# Plot
plt.figure(figsize=(14, 7))
plt.plot(train.index, train, label='Training Data', color='steelblue')
plt.plot(test.index, test, label='Actual (Holdout)', color='green', linewidth=2)
plt.plot(forecast_mean.index, forecast_mean, label='Forecast', color='darkorange', linewidth=2)
plt.fill_between(forecast_ci.index,
forecast_ci.iloc[:, 0],
forecast_ci.iloc[:, 1],
color='orange', alpha=0.2, label='95% Confidence Interval')
plt.title('Revenue Forecast - SARIMA Model')
plt.xlabel('Date')
plt.ylabel('Monthly Revenue ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('revenue_forecast.png', dpi=150)
plt.show()
MAPE Calculation (Model Accuracy)
Mean Absolute Percentage Error (MAPE) measures the average forecast error as a percentage. A MAPE below 10% is generally considered a good forecast for business revenue.
# Evaluate on the 6-month holdout period
validation_forecast = forecast_mean[:6]
mape = np.mean(np.abs((test.values - validation_forecast.values) / test.values)) * 100
mae = np.mean(np.abs(test.values - validation_forecast.values))
print(f"MAPE: {mape:.2f}%")
print(f"MAE: ${mae:,.2f}")
print()
# Monthly comparison
comparison = pd.DataFrame({
'Actual': test.values,
'Forecast': validation_forecast.values,
'Error %': ((validation_forecast.values - test.values) / test.values * 100).round(2)
}, index=test.index)
print(comparison.to_string())
Forecast Visualization with Scenarios
Create optimistic (+10%), base, and pessimistic (-10%) scenarios to give the CFO a range of outcomes for planning.
# Extract future 12-month forecast (beyond validation period)
future_forecast = forecast_mean[6:]
scenarios = pd.DataFrame({
'Month': future_forecast.index,
'Pessimistic': (future_forecast * 0.90).values,
'Base Case': future_forecast.values,
'Optimistic': (future_forecast * 1.10).values
})
plt.figure(figsize=(12, 6))
plt.plot(scenarios['Month'], scenarios['Pessimistic'], 'r--', label='Pessimistic (-10%)')
plt.plot(scenarios['Month'], scenarios['Base Case'], 'b-', linewidth=2, label='Base Case')
plt.plot(scenarios['Month'], scenarios['Optimistic'], 'g--', label='Optimistic (+10%)')
plt.fill_between(scenarios['Month'], scenarios['Pessimistic'], scenarios['Optimistic'],
alpha=0.1, color='blue')
plt.title('12-Month Revenue Forecast Scenarios')
plt.xlabel('Month')
plt.ylabel('Revenue ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('forecast_scenarios.png', dpi=150)
plt.show()
# Annual totals
for col in ['Pessimistic', 'Base Case', 'Optimistic']:
print(f"{col}: ${scenarios[col].sum():,.0f}")
Excel Analysis
=base_forecast * (1 + growth_rate). This lets the CFO quickly see how different growth assumptions affect the total.=forecast - budget and variance percentage. Use conditional formatting to highlight months where the forecast falls below budget (red) or exceeds it (green).Key Insights
Seasonality Drives Planning
Revenue consistently peaks in Q4 (November-December) and dips in Q1 (January-February). This seasonal pattern should directly inform hiring timelines, marketing spend allocation, and cash reserve planning throughout the year.
MAPE Communicates Confidence
A MAPE of 6-8% means the forecast is typically within $50K-$80K of actual revenue each month. Presenting this error range to the CFO builds trust: it says "here is what we expect, and here is how much we might be off by."
Scenarios Beat Point Estimates
A single forecast number creates false precision. Providing pessimistic, base, and optimistic scenarios allows the finance team to plan for the likely outcome while preparing contingencies for the worst case.
Model Selection Matters
SARIMA works well here because revenue has both a clear trend and a repeating 12-month cycle. For revenue with no seasonality, a simpler ARIMA would suffice. For highly irregular data, exponential smoothing or Prophet may be more robust.
Quiz
What does ARIMA stand for?
What does a MAPE of 7% mean for this revenue forecast?
In the SARIMA model (1,1,1)(1,1,1,12), what does the "12" represent?
Why is multiplicative decomposition used instead of additive for this revenue data?
What is the purpose of the holdout validation approach used in the forecast?