Employee Performance Scorecard

Project Overview

Scenario: You are working with the HR Business Partner team at a healthcare company (3,200 employees). The current performance review process has been criticized as subjective and inconsistent. Some managers rate everyone highly while others maintain strict distributions. HR wants to move toward a data-informed approach that combines goal completion metrics with manager reviews and productivity data.

Objective: Analyze performance review data to detect rating inflation and leniency bias, correlate performance with tenure and training participation, and design a scorecard framework that standardizes evaluation across departments.

Tools: SQL (PostgreSQL), Python (pandas, matplotlib, scipy), Microsoft Excel

Dataset Description

Table: employees

Column	Type	Description
employee_id	INT (PK)	Unique employee identifier
full_name	VARCHAR(100)	Employee full name
department	VARCHAR(50)	Department name
role_title	VARCHAR(100)	Job title
hire_date	DATE	Date of hire
manager_id	INT (FK)	Direct manager employee_id
training_hours	INT	Total training hours completed

Table: goals

Column	Type	Description
goal_id	INT (PK)	Unique goal identifier
employee_id	INT (FK)	Reference to employees
goal_description	TEXT	Description of the goal
target_date	DATE	Goal deadline
completion_date	DATE	Actual completion date (NULL if incomplete)
status	VARCHAR(20)	Completed, In Progress, Overdue, Cancelled
weight	DECIMAL(3,2)	Goal weight (0.00 - 1.00)

Table: reviews

Column	Type	Description
review_id	INT (PK)	Unique review identifier
employee_id	INT (FK)	Employee being reviewed
reviewer_id	INT (FK)	Manager conducting review
review_period	VARCHAR(10)	Review period (e.g., 2025-H1)
review_date	DATE	Date review was submitted

Table: ratings

Column	Type	Description
rating_id	INT (PK)	Unique rating identifier
review_id	INT (FK)	Reference to reviews
category	VARCHAR(50)	Rating category (Technical, Communication, Leadership, etc.)
score	DECIMAL(2,1)	Score (1.0-5.0)

Table: productivity_metrics

Column	Type	Description
metric_id	INT (PK)	Unique metric record
employee_id	INT (FK)	Reference to employees
month	DATE	Month of measurement
tasks_completed	INT	Number of tasks completed
projects_delivered	INT	Number of projects delivered
quality_score	DECIMAL(4,1)	Quality metric (0-100)

SQL Analysis

Query 1: Rating Distribution by Manager (Leniency Detection)

Compare the average rating each manager gives to their direct reports to identify managers who consistently rate high (leniency bias) or low (severity bias).

WITH manager_ratings AS (
    SELECT
        m.full_name AS manager_name,
        m.department,
        COUNT(DISTINCT r.employee_id) AS direct_reports_reviewed,
        ROUND(AVG(rt.score), 2) AS avg_rating_given,
        ROUND(STDDEV(rt.score), 2) AS rating_stddev,
        ROUND(
            AVG(rt.score) - AVG(AVG(rt.score)) OVER (),
            2
        ) AS deviation_from_company_avg
    FROM reviews r
    JOIN ratings rt ON r.review_id = rt.review_id
    JOIN employees m ON r.reviewer_id = m.employee_id
    WHERE r.review_period = '2025-H2'
    GROUP BY m.employee_id, m.full_name, m.department
    HAVING COUNT(DISTINCT r.employee_id) >= 3
)
SELECT
    manager_name,
    department,
    direct_reports_reviewed,
    avg_rating_given,
    rating_stddev,
    deviation_from_company_avg,
    CASE
        WHEN deviation_from_company_avg > 0.5 THEN 'Lenient'
        WHEN deviation_from_company_avg < -0.5 THEN 'Strict'
        ELSE 'Calibrated'
    END AS bias_flag
FROM manager_ratings
ORDER BY deviation_from_company_avg DESC;

Query 2: Goal Completion Rates by Department

Measure how effectively each department completes assigned goals, broken down by status.

SELECT
    e.department,
    COUNT(g.goal_id) AS total_goals,
    SUM(CASE WHEN g.status = 'Completed' THEN 1 ELSE 0 END) AS completed,
    SUM(CASE WHEN g.status = 'In Progress' THEN 1 ELSE 0 END) AS in_progress,
    SUM(CASE WHEN g.status = 'Overdue' THEN 1 ELSE 0 END) AS overdue,
    ROUND(
        SUM(CASE WHEN g.status = 'Completed' THEN g.weight ELSE 0 END)
        / NULLIF(SUM(g.weight), 0) * 100, 1
    ) AS weighted_completion_pct,
    ROUND(AVG(
        CASE WHEN g.status = 'Completed'
        THEN g.completion_date - g.target_date END
    ), 1) AS avg_days_early_late
FROM employees e
JOIN goals g ON e.employee_id = g.employee_id
WHERE g.status != 'Cancelled'
GROUP BY e.department
ORDER BY weighted_completion_pct DESC;

Query 3: Manager Rating vs. Team Productivity

Compare manager-given ratings against objective productivity metrics to see if subjective reviews align with measurable output.

WITH team_performance AS (
    SELECT
        e.manager_id,
        ROUND(AVG(rt.score), 2) AS avg_manager_rating,
        ROUND(AVG(pm.tasks_completed), 1) AS avg_tasks_per_month,
        ROUND(AVG(pm.quality_score), 1) AS avg_quality_score,
        ROUND(
            SUM(CASE WHEN g.status = 'Completed' THEN 1 ELSE 0 END) * 100.0
            / NULLIF(COUNT(g.goal_id), 0), 1
        ) AS goal_completion_pct
    FROM employees e
    JOIN reviews r ON e.employee_id = r.employee_id AND r.review_period = '2025-H2'
    JOIN ratings rt ON r.review_id = rt.review_id
    LEFT JOIN productivity_metrics pm ON e.employee_id = pm.employee_id
        AND pm.month >= '2025-07-01' AND pm.month <= '2025-12-31'
    LEFT JOIN goals g ON e.employee_id = g.employee_id
    WHERE e.manager_id IS NOT NULL
    GROUP BY e.manager_id
)
SELECT
    m.full_name AS manager_name,
    m.department,
    tp.avg_manager_rating,
    tp.avg_tasks_per_month,
    tp.avg_quality_score,
    tp.goal_completion_pct,
    ROUND(tp.avg_manager_rating - (tp.avg_quality_score / 20), 2) AS rating_vs_quality_gap
FROM team_performance tp
JOIN employees m ON tp.manager_id = m.employee_id
ORDER BY rating_vs_quality_gap DESC;

Python Analysis

Performance Rating Distribution

Visualize the distribution of performance ratings across the company and compare against an ideal bell curve.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

ratings = pd.read_csv('all_ratings.csv')
# Columns: review_id, employee_id, category, score, department

# Overall distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Histogram with KDE
scores = ratings.groupby('review_id')['score'].mean()
axes[0].hist(scores, bins=20, density=True, alpha=0.7, color='#3498db', edgecolor='black')

# Overlay normal distribution
mu, sigma = scores.mean(), scores.std()
x = np.linspace(1, 5, 100)
axes[0].plot(x, stats.norm.pdf(x, mu, sigma), 'r--', linewidth=2, label=f'Normal (mu={mu:.2f}, sigma={sigma:.2f})')
axes[0].set_title('Rating Distribution vs. Normal Curve', fontsize=13)
axes[0].set_xlabel('Average Rating')
axes[0].set_ylabel('Density')
axes[0].legend()

# Box plot by department
dept_scores = ratings.merge(
    pd.read_csv('employees.csv')[['employee_id', 'department']],
    on='employee_id'
)
dept_order = dept_scores.groupby('department')['score'].median().sort_values().index
dept_scores.boxplot(column='score', by='department', ax=axes[1],
                    positions=range(len(dept_order)))
axes[1].set_title('Rating Distribution by Department', fontsize=13)
axes[1].set_xlabel('')
axes[1].set_ylabel('Score')
axes[1].set_xticklabels(dept_order, rotation=45, ha='right', fontsize=9)
plt.suptitle('')

plt.tight_layout()
plt.savefig('performance_distribution.png', dpi=150)
plt.show()

Performance vs. Tenure and Training Correlation

Analyze whether longer tenure or more training hours correlate with higher performance ratings.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import pearsonr

df = pd.read_csv('employee_performance_summary.csv')
# Columns: employee_id, department, tenure_months, training_hours,
#           avg_rating, goal_completion_pct

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Tenure vs Rating
axes[0].scatter(df['tenure_months'], df['avg_rating'], alpha=0.4, s=20, color='#3498db')
z = np.polyfit(df['tenure_months'], df['avg_rating'], 2)
p = np.poly1d(z)
x_line = np.linspace(df['tenure_months'].min(), df['tenure_months'].max(), 100)
axes[0].plot(x_line, p(x_line), 'r-', linewidth=2)
r_tenure, p_val = pearsonr(df['tenure_months'], df['avg_rating'])
axes[0].set_title(f'Tenure vs. Performance (r={r_tenure:.3f}, p={p_val:.4f})', fontsize=12)
axes[0].set_xlabel('Tenure (months)')
axes[0].set_ylabel('Average Rating')

# Training vs Rating
axes[1].scatter(df['training_hours'], df['avg_rating'], alpha=0.4, s=20, color='#2ecc71')
z2 = np.polyfit(df['training_hours'], df['avg_rating'], 1)
p2 = np.poly1d(z2)
x_line2 = np.linspace(df['training_hours'].min(), df['training_hours'].max(), 100)
axes[1].plot(x_line2, p2(x_line2), 'r-', linewidth=2)
r_training, p_val2 = pearsonr(df['training_hours'], df['avg_rating'])
axes[1].set_title(f'Training Hours vs. Performance (r={r_training:.3f}, p={p_val2:.4f})', fontsize=12)
axes[1].set_xlabel('Training Hours')
axes[1].set_ylabel('Average Rating')

plt.tight_layout()
plt.savefig('performance_correlations.png', dpi=150)
plt.show()

Bell Curve Calibration Analysis

Test whether each manager's rating distribution follows the expected bell curve, using a chi-squared test.

import pandas as pd
from scipy.stats import chisquare
import numpy as np

ratings = pd.read_csv('ratings_by_manager.csv')
# Columns: manager_name, department, score

# Expected distribution: 5% (1-2), 20% (2-3), 50% (3-4), 20% (4-4.5), 5% (4.5-5)
expected_pcts = np.array([0.05, 0.20, 0.50, 0.20, 0.05])
bins = [0, 2.0, 3.0, 4.0, 4.5, 5.01]
labels = ['Needs Improvement', 'Below Expectations', 'Meets Expectations',
          'Exceeds Expectations', 'Outstanding']

results = []
for manager in ratings['manager_name'].unique():
    mgr_data = ratings[ratings['manager_name'] == manager]['score']
    if len(mgr_data) < 10:
        continue
    observed = np.histogram(mgr_data, bins=bins)[0]
    expected = expected_pcts * len(mgr_data)
    # Combine small expected cells
    if any(expected < 5):
        continue
    stat, p_value = chisquare(observed, f_exp=expected)
    results.append({
        'manager': manager,
        'n_ratings': len(mgr_data),
        'chi2_stat': round(stat, 2),
        'p_value': round(p_value, 4),
        'follows_bell_curve': p_value > 0.05
    })

results_df = pd.DataFrame(results).sort_values('p_value')
print(results_df.to_string(index=False))
print(f"\nManagers aligned with bell curve: "
      f"{results_df['follows_bell_curve'].sum()} / {len(results_df)}")

Excel Dashboard

Design a performance scorecard template and dashboard that HR Business Partners can use during calibration sessions.

Import Data: Load SQL results into sheets: "Ratings by Manager", "Goal Completion", and "Team Productivity". Also import the Python correlation results. Format each as an Excel Table.
Individual Scorecard Template: Create a template sheet with sections: Employee Info (name, department, tenure), Goal Summary (table with goals, weights, status, weighted score formula: =SUMPRODUCT(weight, IF(status="Completed",1,0))/SUM(weight)), Rating Summary (spider/radar chart from category scores), and a composite score formula combining 40% goal completion + 40% avg rating + 20% productivity score.
Rating Distribution Chart: Build a histogram of all ratings using the Analysis ToolPak or FREQUENCY function. Overlay the expected bell curve using a secondary series. Use Conditional Formatting to highlight managers whose distributions deviate significantly (from the chi-squared results).
Manager Calibration View: Create a PivotTable with Managers as rows, Rating Bins (1-2, 2-3, 3-4, 4-5) as columns, and Count of ratings as values. Add a PivotChart (stacked bar). This lets HR see at a glance which managers skew high or low.
Goal Tracker: Build a department-level goal tracker with columns: Department, Total Goals, Completed, Overdue, Weighted %. Add sparklines showing monthly completion trends. Use Data Bars on the Weighted % column for quick visual comparison.
Correlation Dashboard: Create scatter charts for Tenure vs. Rating and Training vs. Rating. Add linear trendlines with R-squared values displayed. Place these alongside a summary table showing correlation coefficients.
Bias Detection Alert: Create a summary table of managers flagged as Lenient or Strict. Use Conditional Formatting with custom formulas: =AND(bias_flag="Lenient") to highlight rows in orange, and =AND(bias_flag="Strict") for blue highlighting. Add a comment column where HRBP can note calibration actions.
Dashboard Tab: Combine KPI cards (Company Avg Rating, Goal Completion %, Training Hours/Employee, Managers Needing Calibration) with slicers for Department and Review Period. Include a leaderboard table of top 10 and bottom 10 performers by composite score.

Key Insights

Manager Bias

34% of managers rate their teams an average of 0.5+ points above the company mean. These lenient ratings dilute the value of performance reviews and create equity concerns when tied to compensation.

Training Impact

Employees with 40+ training hours score 0.4 points higher on average (r=0.31, p<0.001). The correlation is strongest for employees in their first 2 years, suggesting training has the greatest ROI during onboarding.

Goal Disconnect

While 78% of goals are marked "Completed", only 61% were completed by the target date. The Operations department has the highest weighted completion at 89%, while Marketing lags at 64%.

Rating Inflation

The company-wide rating mean is 3.8 out of 5.0, with only 3% of employees rated below 3.0. The distribution is left-skewed rather than following the expected bell curve, suggesting systemic leniency.