Workforce Diversity Dashboard
Build comprehensive visibility into diversity, equity, and inclusion metrics to help a D&I officer track representation, pay equity, and promotion fairness across the organization.
Project Overview
Scenario: You are supporting the Diversity & Inclusion officer at a financial services firm with 4,000 employees across 8 offices. The board has requested quarterly diversity reporting. The D&I team needs a repeatable analytics pipeline that tracks gender and ethnicity representation at every job level, identifies pay gaps, and monitors whether promotion rates are equitable across demographic groups.
Objective: Analyze workforce composition data to surface representation gaps, quantify pay disparities controlling for role and tenure, and track year-over-year trends to measure the impact of diversity initiatives.
Tools: SQL (PostgreSQL), Python (pandas, matplotlib, seaborn, statsmodels), Microsoft Excel
Dataset Description
Table: employees
| Column | Type | Description |
|---|---|---|
| employee_id | INT (PK) | Unique employee identifier |
| gender | VARCHAR(20) | Gender identity |
| ethnicity | VARCHAR(50) | Self-reported ethnicity |
| department_id | INT (FK) | Reference to departments |
| job_level | VARCHAR(20) | Level: Junior, Mid, Senior, Lead, Director, VP |
| hire_date | DATE | Date of hire |
| is_active | BOOLEAN | Current employment status |
Table: departments
| Column | Type | Description |
|---|---|---|
| department_id | INT (PK) | Unique department identifier |
| department_name | VARCHAR(100) | Department name |
Table: promotions
| Column | Type | Description |
|---|---|---|
| promotion_id | INT (PK) | Unique promotion record |
| employee_id | INT (FK) | Employee promoted |
| promotion_date | DATE | Date of promotion |
| from_level | VARCHAR(20) | Previous job level |
| to_level | VARCHAR(20) | New job level |
Table: salary
| Column | Type | Description |
|---|---|---|
| salary_id | INT (PK) | Unique salary record |
| employee_id | INT (FK) | Reference to employees |
| annual_salary | DECIMAL(10,2) | Current annual salary |
| effective_date | DATE | Salary effective date |
Table: hiring
| Column | Type | Description |
|---|---|---|
| hire_id | INT (PK) | Unique hire record |
| employee_id | INT (FK) | Reference to employees |
| source | VARCHAR(50) | Recruiting source/channel |
| offer_date | DATE | Date offer was extended |
| start_date | DATE | First day of employment |
SQL Analysis
Query 1: Gender and Ethnicity Distribution by Job Level
Break down the current active workforce by job level, gender, and ethnicity to show where representation gaps exist at senior levels.
SELECT
e.job_level,
e.gender,
e.ethnicity,
COUNT(*) AS headcount,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY e.job_level), 1) AS pct_of_level
FROM employees e
WHERE e.is_active = TRUE
GROUP BY e.job_level, e.gender, e.ethnicity
ORDER BY
CASE e.job_level
WHEN 'Junior' THEN 1 WHEN 'Mid' THEN 2 WHEN 'Senior' THEN 3
WHEN 'Lead' THEN 4 WHEN 'Director' THEN 5 WHEN 'VP' THEN 6
END,
headcount DESC;
The window function SUM(COUNT(*)) OVER (PARTITION BY job_level) calculates the total headcount at each level, allowing us to compute the percentage representation within each tier.
Query 2: Pay Gap Analysis by Gender (Controlled for Role and Tenure)
Calculate the median salary for each gender within the same job level and tenure band to isolate pay gaps from structural differences.
WITH current_salary AS (
SELECT DISTINCT ON (s.employee_id)
s.employee_id, s.annual_salary
FROM salary s
ORDER BY s.employee_id, s.effective_date DESC
),
employee_data AS (
SELECT
e.employee_id, e.gender, e.job_level,
cs.annual_salary,
EXTRACT(YEAR FROM AGE(CURRENT_DATE, e.hire_date)) AS tenure_years
FROM employees e
JOIN current_salary cs ON e.employee_id = cs.employee_id
WHERE e.is_active = TRUE
)
SELECT
job_level,
gender,
COUNT(*) AS employees,
ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY annual_salary), 0) AS median_salary,
ROUND(AVG(annual_salary), 0) AS avg_salary,
ROUND(AVG(tenure_years), 1) AS avg_tenure
FROM employee_data
GROUP BY job_level, gender
ORDER BY
CASE job_level
WHEN 'Junior' THEN 1 WHEN 'Mid' THEN 2 WHEN 'Senior' THEN 3
WHEN 'Lead' THEN 4 WHEN 'Director' THEN 5 WHEN 'VP' THEN 6
END,
gender;
Query 3: Promotion Rate by Demographic Group
Compare the percentage of each demographic group that received a promotion in the past calendar year to identify disparities in advancement opportunities.
WITH eligible AS (
SELECT
e.employee_id, e.gender, e.ethnicity, e.job_level
FROM employees e
WHERE e.is_active = TRUE
AND e.hire_date < DATE_TRUNC('year', CURRENT_DATE)
),
promoted AS (
SELECT DISTINCT p.employee_id
FROM promotions p
WHERE p.promotion_date >= DATE_TRUNC('year', CURRENT_DATE) - INTERVAL '1 year'
AND p.promotion_date < DATE_TRUNC('year', CURRENT_DATE)
)
SELECT
el.gender,
el.ethnicity,
COUNT(el.employee_id) AS eligible_count,
COUNT(pr.employee_id) AS promoted_count,
ROUND(COUNT(pr.employee_id) * 100.0 / NULLIF(COUNT(el.employee_id), 0), 1) AS promotion_rate_pct
FROM eligible el
LEFT JOIN promoted pr ON el.employee_id = pr.employee_id
GROUP BY el.gender, el.ethnicity
ORDER BY promotion_rate_pct DESC;
Query 4: Hiring Source Diversity
Evaluate which recruiting channels produce the most diverse candidate pools by analyzing the demographic composition of hires by source.
SELECT
h.source,
e.gender,
COUNT(*) AS hires,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY h.source), 1) AS pct_of_source,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY e.gender), 1) AS pct_of_gender_total
FROM hiring h
JOIN employees e ON h.employee_id = e.employee_id
WHERE h.start_date >= CURRENT_DATE - INTERVAL '2 years'
GROUP BY h.source, e.gender
ORDER BY h.source, hires DESC;
Python Analysis
Representation Visualization
Create a stacked bar chart showing demographic composition at each job level to visually highlight where representation narrows at senior tiers.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load data
df = pd.read_csv('diversity_by_level.csv')
# Columns: job_level, gender, ethnicity, headcount, pct_of_level
level_order = ['Junior', 'Mid', 'Senior', 'Lead', 'Director', 'VP']
# Gender representation by level
gender_pivot = df.groupby(['job_level', 'gender'])['headcount'].sum().unstack(fill_value=0)
gender_pct = gender_pivot.div(gender_pivot.sum(axis=1), axis=0) * 100
gender_pct = gender_pct.reindex(level_order)
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
# Gender stacked bar
gender_pct.plot(kind='bar', stacked=True, ax=axes[0],
color=['#3498db', '#e74c3c', '#95a5a6'])
axes[0].set_title('Gender Representation by Job Level', fontsize=13)
axes[0].set_ylabel('Percentage (%)')
axes[0].set_xlabel('')
axes[0].legend(title='Gender', loc='upper right')
axes[0].axhline(50, color='gray', linestyle='--', alpha=0.5)
axes[0].set_xticklabels(level_order, rotation=0)
# Ethnicity representation by level
eth_pivot = df.groupby(['job_level', 'ethnicity'])['headcount'].sum().unstack(fill_value=0)
eth_pct = eth_pivot.div(eth_pivot.sum(axis=1), axis=0) * 100
eth_pct = eth_pct.reindex(level_order)
eth_pct.plot(kind='bar', stacked=True, ax=axes[1],
colormap='Set2')
axes[1].set_title('Ethnicity Representation by Job Level', fontsize=13)
axes[1].set_ylabel('Percentage (%)')
axes[1].set_xlabel('')
axes[1].legend(title='Ethnicity', loc='upper right', fontsize=8)
axes[1].set_xticklabels(level_order, rotation=0)
plt.tight_layout()
plt.savefig('representation_by_level.png', dpi=150)
plt.show()
Pay Equity Regression Analysis
Use OLS regression to measure the pay gap while controlling for job level, tenure, and department -- isolating the effect of demographic variables on salary.
import pandas as pd
import statsmodels.formula.api as smf
# Load merged employee + salary data
df = pd.read_csv('employee_salary_data.csv')
# Columns: employee_id, gender, ethnicity, job_level, department_name,
# tenure_years, annual_salary
# Log-transform salary for better regression fit
import numpy as np
df['log_salary'] = np.log(df['annual_salary'])
# OLS regression controlling for level, tenure, department
model = smf.ols(
'log_salary ~ C(gender) + C(ethnicity) + C(job_level) + tenure_years + C(department_name)',
data=df
).fit()
print(model.summary())
# Extract gender and ethnicity coefficients
coefs = model.params.filter(like='gender').append(
model.params.filter(like='ethnicity')
)
conf = model.conf_int().loc[coefs.index]
# Convert log coefficients to percentage differences
pct_diff = (np.exp(coefs) - 1) * 100
pct_conf_low = (np.exp(conf[0]) - 1) * 100
pct_conf_high = (np.exp(conf[1]) - 1) * 100
results = pd.DataFrame({
'pct_salary_diff': pct_diff.round(2),
'ci_low': pct_conf_low.round(2),
'ci_high': pct_conf_high.round(2),
'p_value': model.pvalues.loc[coefs.index].round(4)
})
print('\nPay Gap (% difference vs. reference group):')
print(results)
Year-over-Year Diversity Trend
Track how representation has changed over the past 5 years to measure progress from D&I initiatives.
import pandas as pd
import matplotlib.pyplot as plt
# Load annual snapshots
trends = pd.read_csv('diversity_trends.csv')
# Columns: year, gender, pct_representation
fig, ax = plt.subplots(figsize=(10, 5))
for gender in trends['gender'].unique():
subset = trends[trends['gender'] == gender]
ax.plot(subset['year'], subset['pct_representation'],
marker='o', linewidth=2, label=gender)
ax.set_title('Gender Representation Trend (2021-2025)', fontsize=14)
ax.set_xlabel('Year')
ax.set_ylabel('Representation (%)')
ax.set_ylim(0, 70)
ax.axhline(50, color='gray', linestyle='--', alpha=0.5, label='Parity')
ax.legend()
ax.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.savefig('diversity_trend.png', dpi=150)
plt.show()
Excel Dashboard
Create a quarterly diversity report that the D&I officer can share with the board.
- Import Data: Load SQL query results into four sheets: "Representation", "Pay Gap", "Promotions", and "Hiring Sources". Format each as an Excel Table.
- Diversity Dashboard Tab: Create a summary dashboard. At the top, place KPI cards using merged cells: Total Headcount, Female %, Underrepresented Minority %, and Overall Pay Gap %. Use formulas like
=COUNTIFS(gender,"Female")/COUNTA(gender). - Representation Chart: Build a 100% stacked bar chart with job levels on the X-axis and gender percentage on the Y-axis. Add a reference line at 50% to highlight parity. Duplicate for ethnicity breakdown.
- Pay Band Comparison: Create a grouped bar chart showing median salary by gender for each job level. Add error bars using the confidence interval values from the regression output. Highlight statistically significant gaps with a different color.
- Promotion Equity Table: Build a formatted table with Conditional Formatting. Apply Color Scales (red-white-green) to the promotion rate column. Add sparklines showing 3-year promotion rate trends for each group.
- Hiring Pipeline: Create a Sunburst chart (Insert > Sunburst) with Source as the outer ring and Gender as the inner ring. This shows which channels produce the most balanced candidate pools at a glance.
- Trend Line Chart: Plot year-over-year representation for each demographic group. Add linear trendlines with equations to project when parity milestones might be reached at current rates.
- Interactive Filters: Add slicers for Department, Job Level, and Year. Connect all PivotCharts to the slicers using Report Connections so the entire dashboard filters simultaneously.
Key Insights
Leadership Gap
While women represent 47% of Junior roles, they hold only 22% of Director and VP positions. The "leaky pipeline" narrows most sharply at the Senior-to-Lead transition.
Controlled Pay Gap
After controlling for job level, tenure, and department, women earn 3.2% less than men (p < 0.01). The gap is largest at the Lead level (5.1%) and smallest at Junior (0.8%).
Promotion Disparity
Promotion rates for underrepresented minorities are 11.4% vs. 15.8% for the majority group. The gap persists even after controlling for performance ratings.
Source Matters
Employee referrals produce 68% majority-group hires, while university partnerships and diversity job boards yield significantly more balanced candidate pools (48% / 52%).
Knowledge Check
Why is it important to control for job level and tenure when analyzing pay gaps?
In the pay equity regression, why is salary log-transformed before fitting the model?
What does SUM(COUNT(*)) OVER (PARTITION BY e.job_level) compute in the representation query?
Which chart type best shows the hierarchical breakdown of hiring sources by gender composition?
A regression coefficient for gender of -0.032 on log(salary) means: