Project 14

Revenue Forecasting

The CFO of a mid-size financial services company needs quarterly revenue forecasts to support budgeting, hiring plans, and investor communications. You will analyze historical revenue data with SQL, build an ARIMA time series model in Python, and create an Excel forecast workbook with what-if scenario analysis.

Project Overview

Business Scenario

Meridian Financial Group generates revenue from three streams: advisory fees, transaction commissions, and interest income. The company has 5 years of monthly revenue data and needs to forecast the next 12 months. Historically, revenue shows clear seasonal patterns (Q4 is strongest due to year-end advisory activity, Q1 is weakest). The CFO has asked for a base-case forecast with optimistic and pessimistic scenarios to present to the board.

  • Goal: Produce a 12-month revenue forecast with confidence intervals and scenario analysis
  • Tools: SQL (trend and seasonal analysis), Python (ARIMA time series model), Excel (forecast dashboard)
  • Stakeholder: CFO and Finance Planning team
  • Deliverables: Forecast model, seasonal decomposition chart, accuracy metrics (MAPE), scenario workbook

Dataset

Database Schema

The financial database contains monthly revenue records, expense breakdowns, market indicators, and seasonal adjustment factors.

revenue_monthly

ColumnTypeDescription
revenue_idINT (PK)Record identifier
month_dateDATEFirst day of the month (e.g., 2021-01-01)
revenue_streamVARCHAR(30)advisory_fees, commissions, interest_income
amountDECIMAL(14,2)Monthly revenue amount
client_countINTNumber of active clients that month

expenses

ColumnTypeDescription
expense_idINT (PK)Record identifier
month_dateDATEMonth of expense
categoryVARCHAR(30)salaries, technology, marketing, rent, other
amountDECIMAL(14,2)Monthly expense amount

market_indicators

ColumnTypeDescription
indicator_idINT (PK)Record identifier
month_dateDATEMonth
sp500_returnDECIMAL(6,4)S&P 500 monthly return
fed_funds_rateDECIMAL(5,2)Federal Funds rate (%)
vixDECIMAL(6,2)CBOE Volatility Index

seasonal_factors

ColumnTypeDescription
month_numINT (PK)Month number (1-12)
seasonal_indexDECIMAL(6,4)Seasonal adjustment multiplier (1.0 = average)
descriptionVARCHAR(100)Explanation of seasonal pattern

SQL Analysis

Query 1: Monthly Revenue Trends with Year-over-Year Growth

Calculate total monthly revenue and compare each month to the same month in the prior year to identify growth trends.

WITH monthly_totals AS (
    SELECT
        month_date,
        YEAR(month_date) AS yr,
        MONTH(month_date) AS mo,
        SUM(amount) AS total_revenue
    FROM revenue_monthly
    GROUP BY month_date
)
SELECT
    curr.month_date,
    curr.total_revenue,
    prev.total_revenue AS prev_year_revenue,
    ROUND((curr.total_revenue - prev.total_revenue) / prev.total_revenue * 100, 2) AS yoy_growth_pct
FROM monthly_totals curr
LEFT JOIN monthly_totals prev
    ON curr.yr = prev.yr + 1 AND curr.mo = prev.mo
ORDER BY curr.month_date;

Query 2: Revenue by Stream with Rolling 3-Month Average

Break down revenue by stream and smooth out month-to-month noise using a rolling average. This reveals the underlying trend for each revenue line.

SELECT
    month_date,
    revenue_stream,
    amount,
    ROUND(AVG(amount) OVER (
        PARTITION BY revenue_stream
        ORDER BY month_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ), 2) AS rolling_3mo_avg
FROM revenue_monthly
ORDER BY revenue_stream, month_date;

Query 3: Seasonal Pattern Analysis

Calculate the average revenue for each calendar month across all years. This quantifies the seasonal pattern that the ARIMA model will need to capture.

SELECT
    MONTH(rm.month_date) AS month_num,
    MONTHNAME(rm.month_date) AS month_name,
    ROUND(AVG(total_rev), 2) AS avg_monthly_revenue,
    ROUND(AVG(total_rev) / (SELECT AVG(amount_sum)
        FROM (SELECT SUM(amount) AS amount_sum FROM revenue_monthly GROUP BY month_date) t
    ), 4) AS seasonal_index
FROM (
    SELECT month_date, SUM(amount) AS total_rev
    FROM revenue_monthly
    GROUP BY month_date
) rm
GROUP BY MONTH(rm.month_date), MONTHNAME(rm.month_date)
ORDER BY month_num;

Python Analysis

Time Series Decomposition

Decompose the revenue series into trend, seasonal, and residual components. This helps verify the seasonal patterns found in SQL and confirms the data is suitable for ARIMA modeling.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose

# Load monthly revenue totals
df = pd.read_csv('revenue_monthly_totals.csv', parse_dates=['month_date'])
df = df.set_index('month_date').sort_index()
df = df.asfreq('MS')  # Ensure monthly start frequency

# Multiplicative decomposition (seasonal effect is proportional)
decomp = seasonal_decompose(df['total_revenue'], model='multiplicative', period=12)

fig, axes = plt.subplots(4, 1, figsize=(14, 10), sharex=True)
decomp.observed.plot(ax=axes[0], title='Observed Revenue')
decomp.trend.plot(ax=axes[1], title='Trend')
decomp.seasonal.plot(ax=axes[2], title='Seasonal Pattern')
decomp.resid.plot(ax=axes[3], title='Residual')

for ax in axes:
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('decomposition.png', dpi=150)
plt.show()

ARIMA Revenue Forecast

Fit a Seasonal ARIMA (SARIMA) model to capture both the trend and the 12-month seasonal cycle. The model is trained on all but the last 6 months, which are held out for validation.

from statsmodels.tsa.statespace.sarimax import SARIMAX
import warnings
warnings.filterwarnings('ignore')

# Split: train on all data except last 6 months
train = df['total_revenue'][:-6]
test = df['total_revenue'][-6:]

# SARIMA(1,1,1)(1,1,1,12) - common starting point for monthly seasonal data
model = SARIMAX(train,
                order=(1, 1, 1),
                seasonal_order=(1, 1, 1, 12),
                enforce_stationarity=False,
                enforce_invertibility=False)
results = model.fit(disp=False)

print(results.summary().tables[1])

# Forecast 18 months ahead (6 validation + 12 future)
forecast = results.get_forecast(steps=18)
forecast_mean = forecast.predicted_mean
forecast_ci = forecast.conf_int(alpha=0.05)

# Plot
plt.figure(figsize=(14, 7))
plt.plot(train.index, train, label='Training Data', color='steelblue')
plt.plot(test.index, test, label='Actual (Holdout)', color='green', linewidth=2)
plt.plot(forecast_mean.index, forecast_mean, label='Forecast', color='darkorange', linewidth=2)
plt.fill_between(forecast_ci.index,
                 forecast_ci.iloc[:, 0],
                 forecast_ci.iloc[:, 1],
                 color='orange', alpha=0.2, label='95% Confidence Interval')
plt.title('Revenue Forecast - SARIMA Model')
plt.xlabel('Date')
plt.ylabel('Monthly Revenue ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('revenue_forecast.png', dpi=150)
plt.show()

MAPE Calculation (Model Accuracy)

Mean Absolute Percentage Error (MAPE) measures the average forecast error as a percentage. A MAPE below 10% is generally considered a good forecast for business revenue.

# Evaluate on the 6-month holdout period
validation_forecast = forecast_mean[:6]

mape = np.mean(np.abs((test.values - validation_forecast.values) / test.values)) * 100
mae = np.mean(np.abs(test.values - validation_forecast.values))

print(f"MAPE: {mape:.2f}%")
print(f"MAE:  ${mae:,.2f}")
print()

# Monthly comparison
comparison = pd.DataFrame({
    'Actual': test.values,
    'Forecast': validation_forecast.values,
    'Error %': ((validation_forecast.values - test.values) / test.values * 100).round(2)
}, index=test.index)
print(comparison.to_string())

Forecast Visualization with Scenarios

Create optimistic (+10%), base, and pessimistic (-10%) scenarios to give the CFO a range of outcomes for planning.

# Extract future 12-month forecast (beyond validation period)
future_forecast = forecast_mean[6:]

scenarios = pd.DataFrame({
    'Month': future_forecast.index,
    'Pessimistic': (future_forecast * 0.90).values,
    'Base Case': future_forecast.values,
    'Optimistic': (future_forecast * 1.10).values
})

plt.figure(figsize=(12, 6))
plt.plot(scenarios['Month'], scenarios['Pessimistic'], 'r--', label='Pessimistic (-10%)')
plt.plot(scenarios['Month'], scenarios['Base Case'], 'b-', linewidth=2, label='Base Case')
plt.plot(scenarios['Month'], scenarios['Optimistic'], 'g--', label='Optimistic (+10%)')
plt.fill_between(scenarios['Month'], scenarios['Pessimistic'], scenarios['Optimistic'],
                 alpha=0.1, color='blue')
plt.title('12-Month Revenue Forecast Scenarios')
plt.xlabel('Month')
plt.ylabel('Revenue ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('forecast_scenarios.png', dpi=150)
plt.show()

# Annual totals
for col in ['Pessimistic', 'Base Case', 'Optimistic']:
    print(f"{col}: ${scenarios[col].sum():,.0f}")

Excel Analysis

  • Revenue Forecast Sheet: Import the Python forecast output (month, base, optimistic, pessimistic) into Excel. Add a line chart with all three scenarios. Format the base case as a solid line and the scenarios as dashed lines. Add data labels on the base case showing monthly values.
  • What-If Scenario Table: Create a two-variable data table where rows represent growth rate adjustments (-15% to +15% in 5% increments) and columns represent the three revenue streams. Each cell calculates the annual total using =base_forecast * (1 + growth_rate). This lets the CFO quickly see how different growth assumptions affect the total.
  • Budget Comparison: Add a column for the approved budget next to the forecast. Calculate variance using =forecast - budget and variance percentage. Use conditional formatting to highlight months where the forecast falls below budget (red) or exceeds it (green).
  • Seasonal Pattern Chart: Create a bar chart showing the seasonal index for each month (from the SQL analysis). Overlay it with the actual monthly averages. This gives the finance team an intuitive view of when revenue peaks and troughs.
  • Profit Margin Projection: Combine the revenue forecast with expense projections (assume expenses grow at 5% annually). Calculate projected monthly profit and margin percentage. Add a break-even line to the chart showing the minimum revenue needed to cover costs.
  • Executive Summary Tab: Create a single-page summary with key metrics in large font: forecasted annual revenue, YoY growth rate, best and worst months, MAPE accuracy score, and a small sparkline for the trend. This tab is designed for board presentation.
  • Key Insights

    Seasonality Drives Planning

    Revenue consistently peaks in Q4 (November-December) and dips in Q1 (January-February). This seasonal pattern should directly inform hiring timelines, marketing spend allocation, and cash reserve planning throughout the year.

    MAPE Communicates Confidence

    A MAPE of 6-8% means the forecast is typically within $50K-$80K of actual revenue each month. Presenting this error range to the CFO builds trust: it says "here is what we expect, and here is how much we might be off by."

    Scenarios Beat Point Estimates

    A single forecast number creates false precision. Providing pessimistic, base, and optimistic scenarios allows the finance team to plan for the likely outcome while preparing contingencies for the worst case.

    Model Selection Matters

    SARIMA works well here because revenue has both a clear trend and a repeating 12-month cycle. For revenue with no seasonality, a simpler ARIMA would suffice. For highly irregular data, exponential smoothing or Prophet may be more robust.

    Quiz

    Question 1

    What does ARIMA stand for?

    Question 2

    What does a MAPE of 7% mean for this revenue forecast?

    Question 3

    In the SARIMA model (1,1,1)(1,1,1,12), what does the "12" represent?

    Question 4

    Why is multiplicative decomposition used instead of additive for this revenue data?

    Question 5

    What is the purpose of the holdout validation approach used in the forecast?

    ← Previous Project Next Project →