Project 25

Compensation Benchmarking

Ensure competitive and equitable pay by analyzing internal salary data against market benchmarks, calculating compa-ratios, and identifying pay equity gaps across the organization.

Project Overview

Scenario: You are a compensation analyst at a SaaS company (1,800 employees) preparing for the annual compensation review cycle. The CFO has approved a 5% merit increase budget, but the comp team needs data to allocate it strategically. Recent exit interviews flagged below-market pay as a top reason for attrition, especially in Engineering and Product roles. The team also needs to ensure internal equity -- that employees in similar roles with similar experience are paid comparably regardless of demographics.

Objective: Compare internal salaries to market data, calculate compa-ratios for every employee, identify roles where the company is significantly below market, and detect internal equity gaps that need correction.

Tools: SQL (PostgreSQL), Python (pandas, matplotlib, scikit-learn), Microsoft Excel

Dataset Description

Table: employees

ColumnTypeDescription
employee_idINT (PK)Unique employee identifier
full_nameVARCHAR(100)Employee full name
departmentVARCHAR(50)Department name
role_titleVARCHAR(100)Job title
job_level_idINT (FK)Reference to job_levels
hire_dateDATEDate of hire
genderVARCHAR(20)Gender identity
locationVARCHAR(50)Office location
is_activeBOOLEANCurrent employment status

Table: salary

ColumnTypeDescription
salary_idINT (PK)Unique salary record
employee_idINT (FK)Reference to employees
base_salaryDECIMAL(10,2)Annual base salary
bonus_target_pctDECIMAL(4,2)Target bonus as % of base
equity_valueDECIMAL(10,2)Estimated annual equity value
effective_dateDATESalary effective date

Table: market_data

ColumnTypeDescription
market_idINT (PK)Unique market data record
role_familyVARCHAR(50)Job family (Engineering, Product, Sales, etc.)
job_level_idINT (FK)Reference to job_levels
locationVARCHAR(50)Geographic market
p25_salaryDECIMAL(10,2)25th percentile market salary
p50_salaryDECIMAL(10,2)50th percentile (median) market salary
p75_salaryDECIMAL(10,2)75th percentile market salary
survey_yearINTYear of market survey

Table: job_levels

ColumnTypeDescription
job_level_idINT (PK)Unique level identifier
level_nameVARCHAR(30)Level name (L1-L8)
level_descriptionVARCHAR(100)Description (e.g., Junior IC, Senior IC, Manager)
role_familyVARCHAR(50)Job family this level applies to

Table: benefits

ColumnTypeDescription
benefit_idINT (PK)Unique benefit record
employee_idINT (FK)Reference to employees
benefit_typeVARCHAR(50)Health, 401k, PTO, etc.
employer_costDECIMAL(10,2)Annual employer cost for this benefit

SQL Analysis

Query 1: Compa-Ratio Calculation for All Employees

Calculate each employee's compa-ratio (actual salary / market midpoint) to show how they are positioned relative to the market. A compa-ratio of 1.0 means at-market, below 0.9 is significantly below market.

WITH current_salary AS (
    SELECT DISTINCT ON (s.employee_id)
        s.employee_id, s.base_salary, s.bonus_target_pct, s.equity_value
    FROM salary s
    ORDER BY s.employee_id, s.effective_date DESC
)
SELECT
    e.employee_id,
    e.full_name,
    e.department,
    e.role_title,
    jl.level_name,
    cs.base_salary,
    md.p50_salary AS market_midpoint,
    ROUND(cs.base_salary / NULLIF(md.p50_salary, 0), 3) AS compa_ratio,
    ROUND(md.p50_salary - cs.base_salary, 0) AS gap_to_midpoint,
    CASE
        WHEN cs.base_salary / NULLIF(md.p50_salary, 0) < 0.85 THEN 'Critical - Far Below Market'
        WHEN cs.base_salary / NULLIF(md.p50_salary, 0) < 0.90 THEN 'Below Market'
        WHEN cs.base_salary / NULLIF(md.p50_salary, 0) < 1.00 THEN 'Approaching Market'
        WHEN cs.base_salary / NULLIF(md.p50_salary, 0) <= 1.10 THEN 'At Market'
        ELSE 'Above Market'
    END AS market_position
FROM employees e
JOIN current_salary cs ON e.employee_id = cs.employee_id
JOIN job_levels jl ON e.job_level_id = jl.job_level_id
LEFT JOIN market_data md ON jl.job_level_id = md.job_level_id
    AND jl.role_family = md.role_family
    AND e.location = md.location
    AND md.survey_year = 2025
WHERE e.is_active = TRUE
ORDER BY compa_ratio ASC;

Query 2: Salary Distribution by Role Family and Level

Show the internal salary range alongside market bands to identify where the company pays above or below market.

SELECT
    jl.role_family,
    jl.level_name,
    COUNT(e.employee_id) AS headcount,
    ROUND(MIN(cs.base_salary), 0) AS internal_min,
    ROUND(PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY cs.base_salary), 0) AS internal_p25,
    ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY cs.base_salary), 0) AS internal_median,
    ROUND(PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY cs.base_salary), 0) AS internal_p75,
    ROUND(MAX(cs.base_salary), 0) AS internal_max,
    ROUND(AVG(md.p25_salary), 0) AS market_p25,
    ROUND(AVG(md.p50_salary), 0) AS market_p50,
    ROUND(AVG(md.p75_salary), 0) AS market_p75,
    ROUND(
        AVG(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY cs.base_salary))
        / NULLIF(AVG(md.p50_salary), 0), 3
    ) AS dept_compa_ratio
FROM employees e
JOIN (
    SELECT DISTINCT ON (employee_id) employee_id, base_salary
    FROM salary ORDER BY employee_id, effective_date DESC
) cs ON e.employee_id = cs.employee_id
JOIN job_levels jl ON e.job_level_id = jl.job_level_id
LEFT JOIN market_data md ON jl.job_level_id = md.job_level_id
    AND jl.role_family = md.role_family AND md.survey_year = 2025
WHERE e.is_active = TRUE
GROUP BY jl.role_family, jl.level_name, jl.job_level_id
ORDER BY jl.role_family, jl.job_level_id;

Query 3: Internal Equity Gaps (Same Role, Different Pay)

Identify cases where employees in the same role and level have large pay differences that may indicate equity issues.

WITH role_stats AS (
    SELECT
        e.role_title,
        jl.level_name,
        e.department,
        AVG(cs.base_salary) AS avg_salary,
        STDDEV(cs.base_salary) AS salary_stddev,
        MIN(cs.base_salary) AS min_salary,
        MAX(cs.base_salary) AS max_salary,
        MAX(cs.base_salary) - MIN(cs.base_salary) AS salary_spread,
        COUNT(*) AS headcount
    FROM employees e
    JOIN (
        SELECT DISTINCT ON (employee_id) employee_id, base_salary
        FROM salary ORDER BY employee_id, effective_date DESC
    ) cs ON e.employee_id = cs.employee_id
    JOIN job_levels jl ON e.job_level_id = jl.job_level_id
    WHERE e.is_active = TRUE
    GROUP BY e.role_title, jl.level_name, e.department
    HAVING COUNT(*) >= 3
)
SELECT
    role_title,
    level_name,
    department,
    headcount,
    ROUND(avg_salary, 0) AS avg_salary,
    ROUND(min_salary, 0) AS min_salary,
    ROUND(max_salary, 0) AS max_salary,
    ROUND(salary_spread, 0) AS spread,
    ROUND(salary_spread * 100.0 / NULLIF(avg_salary, 0), 1) AS spread_pct,
    ROUND(salary_stddev, 0) AS stddev
FROM role_stats
WHERE salary_spread * 100.0 / NULLIF(avg_salary, 0) > 20
ORDER BY spread_pct DESC;

Query 4: Total Compensation Package Comparison

Compare total compensation (base + bonus + equity + benefits) to give a complete picture beyond base salary alone.

WITH current_comp AS (
    SELECT DISTINCT ON (s.employee_id)
        s.employee_id,
        s.base_salary,
        s.base_salary * s.bonus_target_pct / 100 AS target_bonus,
        s.equity_value
    FROM salary s
    ORDER BY s.employee_id, s.effective_date DESC
),
benefit_costs AS (
    SELECT employee_id, SUM(employer_cost) AS total_benefits
    FROM benefits
    GROUP BY employee_id
)
SELECT
    e.department,
    jl.level_name,
    COUNT(*) AS headcount,
    ROUND(AVG(cc.base_salary), 0) AS avg_base,
    ROUND(AVG(cc.target_bonus), 0) AS avg_bonus,
    ROUND(AVG(cc.equity_value), 0) AS avg_equity,
    ROUND(AVG(bc.total_benefits), 0) AS avg_benefits,
    ROUND(AVG(cc.base_salary + cc.target_bonus + cc.equity_value + COALESCE(bc.total_benefits, 0)), 0) AS avg_total_comp,
    ROUND(AVG(cc.base_salary) * 100.0 /
        NULLIF(AVG(cc.base_salary + cc.target_bonus + cc.equity_value + COALESCE(bc.total_benefits, 0)), 0), 1
    ) AS base_pct_of_total
FROM employees e
JOIN current_comp cc ON e.employee_id = cc.employee_id
JOIN job_levels jl ON e.job_level_id = jl.job_level_id
LEFT JOIN benefit_costs bc ON e.employee_id = bc.employee_id
WHERE e.is_active = TRUE
GROUP BY e.department, jl.level_name, jl.job_level_id
ORDER BY e.department, jl.job_level_id;

Python Analysis

Salary Regression Model

Build a regression model to predict expected salary based on role, level, tenure, location, and performance. Use residuals to identify employees who are significantly over- or under-paid relative to their predicted salary.

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
import matplotlib.pyplot as plt

df = pd.read_csv('compensation_data.csv')
# Columns: employee_id, department, role_family, level_name, location,
#           tenure_years, last_rating, base_salary, gender

features = ['role_family', 'level_name', 'location', 'tenure_years', 'last_rating']
categorical = ['role_family', 'level_name', 'location']
numerical = ['tenure_years', 'last_rating']

X = df[features]
y = df['base_salary']

preprocessor = ColumnTransformer([
    ('cat', OneHotEncoder(drop='first', sparse_output=False), categorical),
    ('num', 'passthrough', numerical)
])

model = Pipeline([
    ('preprocessor', preprocessor),
    ('regressor', LinearRegression())
])

model.fit(X, y)
df['predicted_salary'] = model.predict(X)
df['residual'] = df['base_salary'] - df['predicted_salary']
df['residual_pct'] = (df['residual'] / df['predicted_salary'] * 100).round(1)

print(f"Model R-squared: {model.score(X, y):.3f}")
print(f"\nEmployees significantly underpaid (residual < -10%):")
underpaid = df[df['residual_pct'] < -10].sort_values('residual_pct')
print(underpaid[['employee_id', 'department', 'role_family', 'level_name',
                  'base_salary', 'predicted_salary', 'residual_pct']].head(15))

# Residual distribution plot
fig, ax = plt.subplots(figsize=(10, 5))
ax.hist(df['residual_pct'], bins=40, color='#3498db', edgecolor='black', alpha=0.7)
ax.axvline(0, color='red', linewidth=2, linestyle='--')
ax.axvline(-10, color='orange', linewidth=1.5, linestyle=':', label='Underpaid threshold (-10%)')
ax.axvline(10, color='green', linewidth=1.5, linestyle=':', label='Overpaid threshold (+10%)')
ax.set_xlabel('Salary Residual (%)')
ax.set_ylabel('Number of Employees')
ax.set_title('Distribution of Salary Residuals (Actual vs. Predicted)', fontsize=13)
ax.legend()
plt.tight_layout()
plt.savefig('salary_residuals.png', dpi=150)
plt.show()

Market Comparison Visualization

Compare internal salary ranges against market bands for each role family and level.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

comp = pd.read_csv('salary_bands.csv')
# Columns: role_family, level_name, internal_median, market_p25, market_p50, market_p75

roles = comp['role_family'] + ' - ' + comp['level_name']
y_pos = np.arange(len(roles))

fig, ax = plt.subplots(figsize=(12, 8))

# Market band (P25 to P75)
ax.barh(y_pos, comp['market_p75'] - comp['market_p25'],
        left=comp['market_p25'], height=0.4, color='#bdc3c7',
        alpha=0.6, label='Market P25-P75 Range')

# Market midpoint
ax.scatter(comp['market_p50'], y_pos, color='black', s=80,
           zorder=5, label='Market Median (P50)', marker='|')

# Internal median
ax.scatter(comp['internal_median'], y_pos, color='#e74c3c', s=100,
           zorder=5, label='Internal Median', marker='D')

ax.set_yticks(y_pos)
ax.set_yticklabels(roles, fontsize=9)
ax.set_xlabel('Annual Base Salary ($)', fontsize=12)
ax.set_title('Internal Salary vs. Market Benchmark', fontsize=14)
ax.legend(loc='lower right')
ax.grid(axis='x', alpha=0.3)

# Format x-axis as currency
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, p: f'${x:,.0f}'))

plt.tight_layout()
plt.savefig('market_comparison.png', dpi=150)
plt.show()

Equity Analysis by Gender

After controlling for role, level, tenure, and location, test whether a gender pay gap persists.

import pandas as pd
import numpy as np
from scipy import stats

df = pd.read_csv('compensation_data.csv')

# Use the residuals from the salary model (which controls for role, level, etc.)
# A significant difference in residuals by gender = unexplained pay gap

male_residuals = df[df['gender'] == 'Male']['residual_pct']
female_residuals = df[df['gender'] == 'Female']['residual_pct']

t_stat, p_value = stats.ttest_ind(male_residuals, female_residuals)

print("Controlled Pay Gap Analysis (Residual-Based)")
print("=" * 50)
print(f"Male avg residual:   {male_residuals.mean():+.2f}%")
print(f"Female avg residual: {female_residuals.mean():+.2f}%")
print(f"Difference:          {male_residuals.mean() - female_residuals.mean():.2f}%")
print(f"T-statistic:         {t_stat:.3f}")
print(f"P-value:             {p_value:.4f}")
print(f"Significant (p<0.05): {'Yes' if p_value < 0.05 else 'No'}")

# Budget needed to close the gap
gap_employees = df[(df['gender'] == 'Female') & (df['residual_pct'] < -5)]
total_adjustment = (gap_employees['predicted_salary'] * 0.95 - gap_employees['base_salary']).clip(lower=0).sum()
print(f"\nEstimated budget to bring underpaid women to within 5% of predicted: ${total_adjustment:,.0f}")

Excel Dashboard

Build a compensation review workbook that the comp team uses during the annual cycle to make data-driven pay decisions.

  1. Import Data: Load SQL results into sheets: "Employee Comp", "Market Data", "Equity Gaps", and "Total Comp". Import the Python model outputs (predicted salary, residuals) as a separate sheet. Format all as Excel Tables.
  2. Salary Band Spreadsheet: Create a reference table with columns: Role Family, Level, Market P25, Market P50 (midpoint), Market P75. Define named ranges for each band. Add a "Band Penetration" column using: =(employee_salary - P25) / (P75 - P25). Values below 0 are below range; above 1 are above range.
  3. Compa-Ratio Dashboard: Create a PivotTable with Department and Level as rows, average compa-ratio as values. Apply Conditional Formatting (3-Color Scale: red at 0.85, yellow at 1.0, green at 1.10). Add a PivotChart showing compa-ratio distribution as a histogram. Include a slicer for Department.
  4. Market Comparison Chart: Build a combination chart for each role family: use floating bars (high-low-close chart) to show the P25-P75 market range, overlay markers for each employee's actual salary, and add a diamond marker for the internal median. This provides a powerful visual of market positioning.
  5. Equity Gap Table: Build a table of roles where the salary spread exceeds 20%. Use Conditional Formatting to highlight spread_pct values above 25% in red. Add a "Recommended Action" dropdown (Data Validation) with options: "Adjust Up", "Monitor", "No Action". Create a summary formula: =SUMPRODUCT((action="Adjust Up")*adjustment_amount) to estimate total budget impact.
  6. Merit Increase Calculator: Create a merit matrix with Performance Rating on one axis and Compa-Ratio band on the other. Populate cells with suggested increase percentages (e.g., low compa + high performance = 8%, high compa + average performance = 2%). Use INDEX/MATCH to look up each employee's suggested increase. Calculate: =base_salary * suggested_increase_pct. Sum to verify total stays within the 5% budget.
  7. Total Compensation View: Build a stacked bar chart showing the composition of total comp (base, bonus, equity, benefits) for each level. This helps leadership understand the full value proposition beyond base salary.
  8. Executive Summary: Create a one-page dashboard tab with: Overall Compa-Ratio, % of Employees Below Market, Estimated Equity Adjustment Budget, Merit Budget Utilization, and a waterfall chart showing how the merit budget flows from total pool to department allocations.

Key Insights

Below-Market Roles

23% of employees have a compa-ratio below 0.90. Engineering (L3-L5) and Product (L4-L5) are the most below-market, with median compa-ratios of 0.87 and 0.89 respectively -- directly correlating with the highest attrition departments.

Internal Equity

14 role-level combinations show salary spreads exceeding 25%, even after controlling for tenure and location. The largest gap is Senior Software Engineer (L4) with a 34% spread, requiring an estimated $180K in equity adjustments.

Total Comp Story

While base salaries trail the market by 6% on average, total compensation (including equity and benefits) is only 2% below market. Communicating the full package could reduce perceived pay gaps and improve retention.

Budget Allocation

The regression model identifies 142 employees needing priority adjustments totaling $890K. With a $1.6M merit budget (5% of payroll), allocating 56% to market corrections and 44% to merit-based increases optimizes both retention and performance incentives.

Knowledge Check

Question 1

An employee with a compa-ratio of 0.85 is:

Question 2

Why does the salary regression model use one-hot encoding with drop='first'?

Question 3

A salary residual of -12% from the regression model means:

Question 4

In the merit increase matrix, why should employees with high compa-ratios AND high performance receive moderate (not large) increases?

Question 5

The band penetration formula (salary - P25) / (P75 - P25) returns 0.3 for an employee. This means:

← Previous Project Next Project →