Project 22

Workforce Diversity Dashboard

Build comprehensive visibility into diversity, equity, and inclusion metrics to help a D&I officer track representation, pay equity, and promotion fairness across the organization.

Project Overview

Scenario: You are supporting the Diversity & Inclusion officer at a financial services firm with 4,000 employees across 8 offices. The board has requested quarterly diversity reporting. The D&I team needs a repeatable analytics pipeline that tracks gender and ethnicity representation at every job level, identifies pay gaps, and monitors whether promotion rates are equitable across demographic groups.

Objective: Analyze workforce composition data to surface representation gaps, quantify pay disparities controlling for role and tenure, and track year-over-year trends to measure the impact of diversity initiatives.

Tools: SQL (PostgreSQL), Python (pandas, matplotlib, seaborn, statsmodels), Microsoft Excel

Dataset Description

Table: employees

ColumnTypeDescription
employee_idINT (PK)Unique employee identifier
genderVARCHAR(20)Gender identity
ethnicityVARCHAR(50)Self-reported ethnicity
department_idINT (FK)Reference to departments
job_levelVARCHAR(20)Level: Junior, Mid, Senior, Lead, Director, VP
hire_dateDATEDate of hire
is_activeBOOLEANCurrent employment status

Table: departments

ColumnTypeDescription
department_idINT (PK)Unique department identifier
department_nameVARCHAR(100)Department name

Table: promotions

ColumnTypeDescription
promotion_idINT (PK)Unique promotion record
employee_idINT (FK)Employee promoted
promotion_dateDATEDate of promotion
from_levelVARCHAR(20)Previous job level
to_levelVARCHAR(20)New job level

Table: salary

ColumnTypeDescription
salary_idINT (PK)Unique salary record
employee_idINT (FK)Reference to employees
annual_salaryDECIMAL(10,2)Current annual salary
effective_dateDATESalary effective date

Table: hiring

ColumnTypeDescription
hire_idINT (PK)Unique hire record
employee_idINT (FK)Reference to employees
sourceVARCHAR(50)Recruiting source/channel
offer_dateDATEDate offer was extended
start_dateDATEFirst day of employment

SQL Analysis

Query 1: Gender and Ethnicity Distribution by Job Level

Break down the current active workforce by job level, gender, and ethnicity to show where representation gaps exist at senior levels.

SELECT
    e.job_level,
    e.gender,
    e.ethnicity,
    COUNT(*) AS headcount,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY e.job_level), 1) AS pct_of_level
FROM employees e
WHERE e.is_active = TRUE
GROUP BY e.job_level, e.gender, e.ethnicity
ORDER BY
    CASE e.job_level
        WHEN 'Junior' THEN 1 WHEN 'Mid' THEN 2 WHEN 'Senior' THEN 3
        WHEN 'Lead' THEN 4 WHEN 'Director' THEN 5 WHEN 'VP' THEN 6
    END,
    headcount DESC;

The window function SUM(COUNT(*)) OVER (PARTITION BY job_level) calculates the total headcount at each level, allowing us to compute the percentage representation within each tier.

Query 2: Pay Gap Analysis by Gender (Controlled for Role and Tenure)

Calculate the median salary for each gender within the same job level and tenure band to isolate pay gaps from structural differences.

WITH current_salary AS (
    SELECT DISTINCT ON (s.employee_id)
        s.employee_id, s.annual_salary
    FROM salary s
    ORDER BY s.employee_id, s.effective_date DESC
),
employee_data AS (
    SELECT
        e.employee_id, e.gender, e.job_level,
        cs.annual_salary,
        EXTRACT(YEAR FROM AGE(CURRENT_DATE, e.hire_date)) AS tenure_years
    FROM employees e
    JOIN current_salary cs ON e.employee_id = cs.employee_id
    WHERE e.is_active = TRUE
)
SELECT
    job_level,
    gender,
    COUNT(*) AS employees,
    ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY annual_salary), 0) AS median_salary,
    ROUND(AVG(annual_salary), 0) AS avg_salary,
    ROUND(AVG(tenure_years), 1) AS avg_tenure
FROM employee_data
GROUP BY job_level, gender
ORDER BY
    CASE job_level
        WHEN 'Junior' THEN 1 WHEN 'Mid' THEN 2 WHEN 'Senior' THEN 3
        WHEN 'Lead' THEN 4 WHEN 'Director' THEN 5 WHEN 'VP' THEN 6
    END,
    gender;

Query 3: Promotion Rate by Demographic Group

Compare the percentage of each demographic group that received a promotion in the past calendar year to identify disparities in advancement opportunities.

WITH eligible AS (
    SELECT
        e.employee_id, e.gender, e.ethnicity, e.job_level
    FROM employees e
    WHERE e.is_active = TRUE
      AND e.hire_date < DATE_TRUNC('year', CURRENT_DATE)
),
promoted AS (
    SELECT DISTINCT p.employee_id
    FROM promotions p
    WHERE p.promotion_date >= DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 year'
      AND p.promotion_date < DATE_TRUNC('year', CURRENT_DATE)
)
SELECT
    el.gender,
    el.ethnicity,
    COUNT(el.employee_id) AS eligible_count,
    COUNT(pr.employee_id) AS promoted_count,
    ROUND(COUNT(pr.employee_id) * 100.0 / NULLIF(COUNT(el.employee_id), 0), 1) AS promotion_rate_pct
FROM eligible el
LEFT JOIN promoted pr ON el.employee_id = pr.employee_id
GROUP BY el.gender, el.ethnicity
ORDER BY promotion_rate_pct DESC;

Query 4: Hiring Source Diversity

Evaluate which recruiting channels produce the most diverse candidate pools by analyzing the demographic composition of hires by source.

SELECT
    h.source,
    e.gender,
    COUNT(*) AS hires,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY h.source), 1) AS pct_of_source,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY e.gender), 1) AS pct_of_gender_total
FROM hiring h
JOIN employees e ON h.employee_id = e.employee_id
WHERE h.start_date >= CURRENT_DATE - INTERVAL '2 years'
GROUP BY h.source, e.gender
ORDER BY h.source, hires DESC;

Python Analysis

Representation Visualization

Create a stacked bar chart showing demographic composition at each job level to visually highlight where representation narrows at senior tiers.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load data
df = pd.read_csv('diversity_by_level.csv')
# Columns: job_level, gender, ethnicity, headcount, pct_of_level

level_order = ['Junior', 'Mid', 'Senior', 'Lead', 'Director', 'VP']

# Gender representation by level
gender_pivot = df.groupby(['job_level', 'gender'])['headcount'].sum().unstack(fill_value=0)
gender_pct = gender_pivot.div(gender_pivot.sum(axis=1), axis=0) * 100
gender_pct = gender_pct.reindex(level_order)

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Gender stacked bar
gender_pct.plot(kind='bar', stacked=True, ax=axes[0],
                color=['#3498db', '#e74c3c', '#95a5a6'])
axes[0].set_title('Gender Representation by Job Level', fontsize=13)
axes[0].set_ylabel('Percentage (%)')
axes[0].set_xlabel('')
axes[0].legend(title='Gender', loc='upper right')
axes[0].axhline(50, color='gray', linestyle='--', alpha=0.5)
axes[0].set_xticklabels(level_order, rotation=0)

# Ethnicity representation by level
eth_pivot = df.groupby(['job_level', 'ethnicity'])['headcount'].sum().unstack(fill_value=0)
eth_pct = eth_pivot.div(eth_pivot.sum(axis=1), axis=0) * 100
eth_pct = eth_pct.reindex(level_order)

eth_pct.plot(kind='bar', stacked=True, ax=axes[1],
             colormap='Set2')
axes[1].set_title('Ethnicity Representation by Job Level', fontsize=13)
axes[1].set_ylabel('Percentage (%)')
axes[1].set_xlabel('')
axes[1].legend(title='Ethnicity', loc='upper right', fontsize=8)
axes[1].set_xticklabels(level_order, rotation=0)

plt.tight_layout()
plt.savefig('representation_by_level.png', dpi=150)
plt.show()

Pay Equity Regression Analysis

Use OLS regression to measure the pay gap while controlling for job level, tenure, and department -- isolating the effect of demographic variables on salary.

import pandas as pd
import statsmodels.formula.api as smf

# Load merged employee + salary data
df = pd.read_csv('employee_salary_data.csv')
# Columns: employee_id, gender, ethnicity, job_level, department_name,
#           tenure_years, annual_salary

# Log-transform salary for better regression fit
import numpy as np
df['log_salary'] = np.log(df['annual_salary'])

# OLS regression controlling for level, tenure, department
model = smf.ols(
    'log_salary ~ C(gender) + C(ethnicity) + C(job_level) + tenure_years + C(department_name)',
    data=df
).fit()

print(model.summary())

# Extract gender and ethnicity coefficients
coefs = model.params.filter(like='gender').append(
    model.params.filter(like='ethnicity')
)
conf = model.conf_int().loc[coefs.index]

# Convert log coefficients to percentage differences
pct_diff = (np.exp(coefs) - 1) * 100
pct_conf_low = (np.exp(conf[0]) - 1) * 100
pct_conf_high = (np.exp(conf[1]) - 1) * 100

results = pd.DataFrame({
    'pct_salary_diff': pct_diff.round(2),
    'ci_low': pct_conf_low.round(2),
    'ci_high': pct_conf_high.round(2),
    'p_value': model.pvalues.loc[coefs.index].round(4)
})
print('\nPay Gap (% difference vs. reference group):')
print(results)

Year-over-Year Diversity Trend

Track how representation has changed over the past 5 years to measure progress from D&I initiatives.

import pandas as pd
import matplotlib.pyplot as plt

# Load annual snapshots
trends = pd.read_csv('diversity_trends.csv')
# Columns: year, gender, pct_representation

fig, ax = plt.subplots(figsize=(10, 5))
for gender in trends['gender'].unique():
    subset = trends[trends['gender'] == gender]
    ax.plot(subset['year'], subset['pct_representation'],
            marker='o', linewidth=2, label=gender)

ax.set_title('Gender Representation Trend (2021-2025)', fontsize=14)
ax.set_xlabel('Year')
ax.set_ylabel('Representation (%)')
ax.set_ylim(0, 70)
ax.axhline(50, color='gray', linestyle='--', alpha=0.5, label='Parity')
ax.legend()
ax.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig('diversity_trend.png', dpi=150)
plt.show()

Excel Dashboard

Create a quarterly diversity report that the D&I officer can share with the board.

  1. Import Data: Load SQL query results into four sheets: "Representation", "Pay Gap", "Promotions", and "Hiring Sources". Format each as an Excel Table.
  2. Diversity Dashboard Tab: Create a summary dashboard. At the top, place KPI cards using merged cells: Total Headcount, Female %, Underrepresented Minority %, and Overall Pay Gap %. Use formulas like =COUNTIFS(gender,"Female")/COUNTA(gender).
  3. Representation Chart: Build a 100% stacked bar chart with job levels on the X-axis and gender percentage on the Y-axis. Add a reference line at 50% to highlight parity. Duplicate for ethnicity breakdown.
  4. Pay Band Comparison: Create a grouped bar chart showing median salary by gender for each job level. Add error bars using the confidence interval values from the regression output. Highlight statistically significant gaps with a different color.
  5. Promotion Equity Table: Build a formatted table with Conditional Formatting. Apply Color Scales (red-white-green) to the promotion rate column. Add sparklines showing 3-year promotion rate trends for each group.
  6. Hiring Pipeline: Create a Sunburst chart (Insert > Sunburst) with Source as the outer ring and Gender as the inner ring. This shows which channels produce the most balanced candidate pools at a glance.
  7. Trend Line Chart: Plot year-over-year representation for each demographic group. Add linear trendlines with equations to project when parity milestones might be reached at current rates.
  8. Interactive Filters: Add slicers for Department, Job Level, and Year. Connect all PivotCharts to the slicers using Report Connections so the entire dashboard filters simultaneously.

Key Insights

Leadership Gap

While women represent 47% of Junior roles, they hold only 22% of Director and VP positions. The "leaky pipeline" narrows most sharply at the Senior-to-Lead transition.

Controlled Pay Gap

After controlling for job level, tenure, and department, women earn 3.2% less than men (p < 0.01). The gap is largest at the Lead level (5.1%) and smallest at Junior (0.8%).

Promotion Disparity

Promotion rates for underrepresented minorities are 11.4% vs. 15.8% for the majority group. The gap persists even after controlling for performance ratings.

Source Matters

Employee referrals produce 68% majority-group hires, while university partnerships and diversity job boards yield significantly more balanced candidate pools (48% / 52%).

Knowledge Check

Question 1

Why is it important to control for job level and tenure when analyzing pay gaps?

Question 2

In the pay equity regression, why is salary log-transformed before fitting the model?

Question 3

What does SUM(COUNT(*)) OVER (PARTITION BY e.job_level) compute in the representation query?

Question 4

Which chart type best shows the hierarchical breakdown of hiring sources by gender composition?

Question 5

A regression coefficient for gender of -0.032 on log(salary) means:

← Previous Project Next Project →