Project 24

Employee Performance Scorecard

Build a data-driven performance management system that tracks goal completion, analyzes rating distributions, and identifies the relationship between performance, tenure, and training.

Project Overview

Scenario: You are working with the HR Business Partner team at a healthcare company (3,200 employees). The current performance review process has been criticized as subjective and inconsistent. Some managers rate everyone highly while others maintain strict distributions. HR wants to move toward a data-informed approach that combines goal completion metrics with manager reviews and productivity data.

Objective: Analyze performance review data to detect rating inflation and leniency bias, correlate performance with tenure and training participation, and design a scorecard framework that standardizes evaluation across departments.

Tools: SQL (PostgreSQL), Python (pandas, matplotlib, scipy), Microsoft Excel

Dataset Description

Table: employees

ColumnTypeDescription
employee_idINT (PK)Unique employee identifier
full_nameVARCHAR(100)Employee full name
departmentVARCHAR(50)Department name
role_titleVARCHAR(100)Job title
hire_dateDATEDate of hire
manager_idINT (FK)Direct manager employee_id
training_hoursINTTotal training hours completed

Table: goals

ColumnTypeDescription
goal_idINT (PK)Unique goal identifier
employee_idINT (FK)Reference to employees
goal_descriptionTEXTDescription of the goal
target_dateDATEGoal deadline
completion_dateDATEActual completion date (NULL if incomplete)
statusVARCHAR(20)Completed, In Progress, Overdue, Cancelled
weightDECIMAL(3,2)Goal weight (0.00 - 1.00)

Table: reviews

ColumnTypeDescription
review_idINT (PK)Unique review identifier
employee_idINT (FK)Employee being reviewed
reviewer_idINT (FK)Manager conducting review
review_periodVARCHAR(10)Review period (e.g., 2025-H1)
review_dateDATEDate review was submitted

Table: ratings

ColumnTypeDescription
rating_idINT (PK)Unique rating identifier
review_idINT (FK)Reference to reviews
categoryVARCHAR(50)Rating category (Technical, Communication, Leadership, etc.)
scoreDECIMAL(2,1)Score (1.0-5.0)

Table: productivity_metrics

ColumnTypeDescription
metric_idINT (PK)Unique metric record
employee_idINT (FK)Reference to employees
monthDATEMonth of measurement
tasks_completedINTNumber of tasks completed
projects_deliveredINTNumber of projects delivered
quality_scoreDECIMAL(4,1)Quality metric (0-100)

SQL Analysis

Query 1: Rating Distribution by Manager (Leniency Detection)

Compare the average rating each manager gives to their direct reports to identify managers who consistently rate high (leniency bias) or low (severity bias).

WITH manager_ratings AS (
    SELECT
        m.full_name AS manager_name,
        m.department,
        COUNT(DISTINCT r.employee_id) AS direct_reports_reviewed,
        ROUND(AVG(rt.score), 2) AS avg_rating_given,
        ROUND(STDDEV(rt.score), 2) AS rating_stddev,
        ROUND(
            AVG(rt.score) - AVG(AVG(rt.score)) OVER (),
            2
        ) AS deviation_from_company_avg
    FROM reviews r
    JOIN ratings rt ON r.review_id = rt.review_id
    JOIN employees m ON r.reviewer_id = m.employee_id
    WHERE r.review_period = '2025-H2'
    GROUP BY m.employee_id, m.full_name, m.department
    HAVING COUNT(DISTINCT r.employee_id) >= 3
)
SELECT
    manager_name,
    department,
    direct_reports_reviewed,
    avg_rating_given,
    rating_stddev,
    deviation_from_company_avg,
    CASE
        WHEN deviation_from_company_avg > 0.5 THEN 'Lenient'
        WHEN deviation_from_company_avg < -0.5 THEN 'Strict'
        ELSE 'Calibrated'
    END AS bias_flag
FROM manager_ratings
ORDER BY deviation_from_company_avg DESC;

Query 2: Goal Completion Rates by Department

Measure how effectively each department completes assigned goals, broken down by status.

SELECT
    e.department,
    COUNT(g.goal_id) AS total_goals,
    SUM(CASE WHEN g.status = 'Completed' THEN 1 ELSE 0 END) AS completed,
    SUM(CASE WHEN g.status = 'In Progress' THEN 1 ELSE 0 END) AS in_progress,
    SUM(CASE WHEN g.status = 'Overdue' THEN 1 ELSE 0 END) AS overdue,
    ROUND(
        SUM(CASE WHEN g.status = 'Completed' THEN g.weight ELSE 0 END)
        / NULLIF(SUM(g.weight), 0) * 100, 1
    ) AS weighted_completion_pct,
    ROUND(AVG(
        CASE WHEN g.status = 'Completed'
        THEN g.completion_date - g.target_date END
    ), 1) AS avg_days_early_late
FROM employees e
JOIN goals g ON e.employee_id = g.employee_id
WHERE g.status != 'Cancelled'
GROUP BY e.department
ORDER BY weighted_completion_pct DESC;

Query 3: Manager Rating vs. Team Productivity

Compare manager-given ratings against objective productivity metrics to see if subjective reviews align with measurable output.

WITH team_performance AS (
    SELECT
        e.manager_id,
        ROUND(AVG(rt.score), 2) AS avg_manager_rating,
        ROUND(AVG(pm.tasks_completed), 1) AS avg_tasks_per_month,
        ROUND(AVG(pm.quality_score), 1) AS avg_quality_score,
        ROUND(
            SUM(CASE WHEN g.status = 'Completed' THEN 1 ELSE 0 END) * 100.0
            / NULLIF(COUNT(g.goal_id), 0), 1
        ) AS goal_completion_pct
    FROM employees e
    JOIN reviews r ON e.employee_id = r.employee_id AND r.review_period = '2025-H2'
    JOIN ratings rt ON r.review_id = rt.review_id
    LEFT JOIN productivity_metrics pm ON e.employee_id = pm.employee_id
        AND pm.month >= '2025-07-01' AND pm.month <= '2025-12-31'
    LEFT JOIN goals g ON e.employee_id = g.employee_id
    WHERE e.manager_id IS NOT NULL
    GROUP BY e.manager_id
)
SELECT
    m.full_name AS manager_name,
    m.department,
    tp.avg_manager_rating,
    tp.avg_tasks_per_month,
    tp.avg_quality_score,
    tp.goal_completion_pct,
    ROUND(tp.avg_manager_rating - (tp.avg_quality_score / 20), 2) AS rating_vs_quality_gap
FROM team_performance tp
JOIN employees m ON tp.manager_id = m.employee_id
ORDER BY rating_vs_quality_gap DESC;

Python Analysis

Performance Rating Distribution

Visualize the distribution of performance ratings across the company and compare against an ideal bell curve.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

ratings = pd.read_csv('all_ratings.csv')
# Columns: review_id, employee_id, category, score, department

# Overall distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Histogram with KDE
scores = ratings.groupby('review_id')['score'].mean()
axes[0].hist(scores, bins=20, density=True, alpha=0.7, color='#3498db', edgecolor='black')

# Overlay normal distribution
mu, sigma = scores.mean(), scores.std()
x = np.linspace(1, 5, 100)
axes[0].plot(x, stats.norm.pdf(x, mu, sigma), 'r--', linewidth=2, label=f'Normal (mu={mu:.2f}, sigma={sigma:.2f})')
axes[0].set_title('Rating Distribution vs. Normal Curve', fontsize=13)
axes[0].set_xlabel('Average Rating')
axes[0].set_ylabel('Density')
axes[0].legend()

# Box plot by department
dept_scores = ratings.merge(
    pd.read_csv('employees.csv')[['employee_id', 'department']],
    on='employee_id'
)
dept_order = dept_scores.groupby('department')['score'].median().sort_values().index
dept_scores.boxplot(column='score', by='department', ax=axes[1],
                    positions=range(len(dept_order)))
axes[1].set_title('Rating Distribution by Department', fontsize=13)
axes[1].set_xlabel('')
axes[1].set_ylabel('Score')
axes[1].set_xticklabels(dept_order, rotation=45, ha='right', fontsize=9)
plt.suptitle('')

plt.tight_layout()
plt.savefig('performance_distribution.png', dpi=150)
plt.show()

Performance vs. Tenure and Training Correlation

Analyze whether longer tenure or more training hours correlate with higher performance ratings.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import pearsonr

df = pd.read_csv('employee_performance_summary.csv')
# Columns: employee_id, department, tenure_months, training_hours,
#           avg_rating, goal_completion_pct

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Tenure vs Rating
axes[0].scatter(df['tenure_months'], df['avg_rating'], alpha=0.4, s=20, color='#3498db')
z = np.polyfit(df['tenure_months'], df['avg_rating'], 2)
p = np.poly1d(z)
x_line = np.linspace(df['tenure_months'].min(), df['tenure_months'].max(), 100)
axes[0].plot(x_line, p(x_line), 'r-', linewidth=2)
r_tenure, p_val = pearsonr(df['tenure_months'], df['avg_rating'])
axes[0].set_title(f'Tenure vs. Performance (r={r_tenure:.3f}, p={p_val:.4f})', fontsize=12)
axes[0].set_xlabel('Tenure (months)')
axes[0].set_ylabel('Average Rating')

# Training vs Rating
axes[1].scatter(df['training_hours'], df['avg_rating'], alpha=0.4, s=20, color='#2ecc71')
z2 = np.polyfit(df['training_hours'], df['avg_rating'], 1)
p2 = np.poly1d(z2)
x_line2 = np.linspace(df['training_hours'].min(), df['training_hours'].max(), 100)
axes[1].plot(x_line2, p2(x_line2), 'r-', linewidth=2)
r_training, p_val2 = pearsonr(df['training_hours'], df['avg_rating'])
axes[1].set_title(f'Training Hours vs. Performance (r={r_training:.3f}, p={p_val2:.4f})', fontsize=12)
axes[1].set_xlabel('Training Hours')
axes[1].set_ylabel('Average Rating')

plt.tight_layout()
plt.savefig('performance_correlations.png', dpi=150)
plt.show()

Bell Curve Calibration Analysis

Test whether each manager's rating distribution follows the expected bell curve, using a chi-squared test.

import pandas as pd
from scipy.stats import chisquare
import numpy as np

ratings = pd.read_csv('ratings_by_manager.csv')
# Columns: manager_name, department, score

# Expected distribution: 5% (1-2), 20% (2-3), 50% (3-4), 20% (4-4.5), 5% (4.5-5)
expected_pcts = np.array([0.05, 0.20, 0.50, 0.20, 0.05])
bins = [0, 2.0, 3.0, 4.0, 4.5, 5.01]
labels = ['Needs Improvement', 'Below Expectations', 'Meets Expectations',
          'Exceeds Expectations', 'Outstanding']

results = []
for manager in ratings['manager_name'].unique():
    mgr_data = ratings[ratings['manager_name'] == manager]['score']
    if len(mgr_data) < 10:
        continue
    observed = np.histogram(mgr_data, bins=bins)[0]
    expected = expected_pcts * len(mgr_data)
    # Combine small expected cells
    if any(expected < 5):
        continue
    stat, p_value = chisquare(observed, f_exp=expected)
    results.append({
        'manager': manager,
        'n_ratings': len(mgr_data),
        'chi2_stat': round(stat, 2),
        'p_value': round(p_value, 4),
        'follows_bell_curve': p_value > 0.05
    })

results_df = pd.DataFrame(results).sort_values('p_value')
print(results_df.to_string(index=False))
print(f"\nManagers aligned with bell curve: "
      f"{results_df['follows_bell_curve'].sum()} / {len(results_df)}")

Excel Dashboard

Design a performance scorecard template and dashboard that HR Business Partners can use during calibration sessions.

  1. Import Data: Load SQL results into sheets: "Ratings by Manager", "Goal Completion", and "Team Productivity". Also import the Python correlation results. Format each as an Excel Table.
  2. Individual Scorecard Template: Create a template sheet with sections: Employee Info (name, department, tenure), Goal Summary (table with goals, weights, status, weighted score formula: =SUMPRODUCT(weight, IF(status="Completed",1,0))/SUM(weight)), Rating Summary (spider/radar chart from category scores), and a composite score formula combining 40% goal completion + 40% avg rating + 20% productivity score.
  3. Rating Distribution Chart: Build a histogram of all ratings using the Analysis ToolPak or FREQUENCY function. Overlay the expected bell curve using a secondary series. Use Conditional Formatting to highlight managers whose distributions deviate significantly (from the chi-squared results).
  4. Manager Calibration View: Create a PivotTable with Managers as rows, Rating Bins (1-2, 2-3, 3-4, 4-5) as columns, and Count of ratings as values. Add a PivotChart (stacked bar). This lets HR see at a glance which managers skew high or low.
  5. Goal Tracker: Build a department-level goal tracker with columns: Department, Total Goals, Completed, Overdue, Weighted %. Add sparklines showing monthly completion trends. Use Data Bars on the Weighted % column for quick visual comparison.
  6. Correlation Dashboard: Create scatter charts for Tenure vs. Rating and Training vs. Rating. Add linear trendlines with R-squared values displayed. Place these alongside a summary table showing correlation coefficients.
  7. Bias Detection Alert: Create a summary table of managers flagged as Lenient or Strict. Use Conditional Formatting with custom formulas: =AND(bias_flag="Lenient") to highlight rows in orange, and =AND(bias_flag="Strict") for blue highlighting. Add a comment column where HRBP can note calibration actions.
  8. Dashboard Tab: Combine KPI cards (Company Avg Rating, Goal Completion %, Training Hours/Employee, Managers Needing Calibration) with slicers for Department and Review Period. Include a leaderboard table of top 10 and bottom 10 performers by composite score.

Key Insights

Manager Bias

34% of managers rate their teams an average of 0.5+ points above the company mean. These lenient ratings dilute the value of performance reviews and create equity concerns when tied to compensation.

Training Impact

Employees with 40+ training hours score 0.4 points higher on average (r=0.31, p<0.001). The correlation is strongest for employees in their first 2 years, suggesting training has the greatest ROI during onboarding.

Goal Disconnect

While 78% of goals are marked "Completed", only 61% were completed by the target date. The Operations department has the highest weighted completion at 89%, while Marketing lags at 64%.

Rating Inflation

The company-wide rating mean is 3.8 out of 5.0, with only 3% of employees rated below 3.0. The distribution is left-skewed rather than following the expected bell curve, suggesting systemic leniency.

Knowledge Check

Question 1

What does a chi-squared test measure in the context of performance rating calibration?

Question 2

In the composite performance score formula (40% goal + 40% rating + 20% productivity), why might goal completion and rating receive equal weight?

Question 3

The SQL leniency detection query uses a window function AVG(AVG(rt.score)) OVER (). What does this compute?

Question 4

Why is a radar (spider) chart appropriate for displaying an individual employee's category ratings?

Question 5

The weighted goal completion formula SUM(weight * completed) / SUM(weight) is preferred over a simple count because:

← Previous Project Next Project →