Employee Performance Scorecard
Build a data-driven performance management system that tracks goal completion, analyzes rating distributions, and identifies the relationship between performance, tenure, and training.
Project Overview
Scenario: You are working with the HR Business Partner team at a healthcare company (3,200 employees). The current performance review process has been criticized as subjective and inconsistent. Some managers rate everyone highly while others maintain strict distributions. HR wants to move toward a data-informed approach that combines goal completion metrics with manager reviews and productivity data.
Objective: Analyze performance review data to detect rating inflation and leniency bias, correlate performance with tenure and training participation, and design a scorecard framework that standardizes evaluation across departments.
Tools: SQL (PostgreSQL), Python (pandas, matplotlib, scipy), Microsoft Excel
Dataset Description
Table: employees
| Column | Type | Description |
|---|---|---|
| employee_id | INT (PK) | Unique employee identifier |
| full_name | VARCHAR(100) | Employee full name |
| department | VARCHAR(50) | Department name |
| role_title | VARCHAR(100) | Job title |
| hire_date | DATE | Date of hire |
| manager_id | INT (FK) | Direct manager employee_id |
| training_hours | INT | Total training hours completed |
Table: goals
| Column | Type | Description |
|---|---|---|
| goal_id | INT (PK) | Unique goal identifier |
| employee_id | INT (FK) | Reference to employees |
| goal_description | TEXT | Description of the goal |
| target_date | DATE | Goal deadline |
| completion_date | DATE | Actual completion date (NULL if incomplete) |
| status | VARCHAR(20) | Completed, In Progress, Overdue, Cancelled |
| weight | DECIMAL(3,2) | Goal weight (0.00 - 1.00) |
Table: reviews
| Column | Type | Description |
|---|---|---|
| review_id | INT (PK) | Unique review identifier |
| employee_id | INT (FK) | Employee being reviewed |
| reviewer_id | INT (FK) | Manager conducting review |
| review_period | VARCHAR(10) | Review period (e.g., 2025-H1) |
| review_date | DATE | Date review was submitted |
Table: ratings
| Column | Type | Description |
|---|---|---|
| rating_id | INT (PK) | Unique rating identifier |
| review_id | INT (FK) | Reference to reviews |
| category | VARCHAR(50) | Rating category (Technical, Communication, Leadership, etc.) |
| score | DECIMAL(2,1) | Score (1.0-5.0) |
Table: productivity_metrics
| Column | Type | Description |
|---|---|---|
| metric_id | INT (PK) | Unique metric record |
| employee_id | INT (FK) | Reference to employees |
| month | DATE | Month of measurement |
| tasks_completed | INT | Number of tasks completed |
| projects_delivered | INT | Number of projects delivered |
| quality_score | DECIMAL(4,1) | Quality metric (0-100) |
SQL Analysis
Query 1: Rating Distribution by Manager (Leniency Detection)
Compare the average rating each manager gives to their direct reports to identify managers who consistently rate high (leniency bias) or low (severity bias).
WITH manager_ratings AS (
SELECT
m.full_name AS manager_name,
m.department,
COUNT(DISTINCT r.employee_id) AS direct_reports_reviewed,
ROUND(AVG(rt.score), 2) AS avg_rating_given,
ROUND(STDDEV(rt.score), 2) AS rating_stddev,
ROUND(
AVG(rt.score) - AVG(AVG(rt.score)) OVER (),
2
) AS deviation_from_company_avg
FROM reviews r
JOIN ratings rt ON r.review_id = rt.review_id
JOIN employees m ON r.reviewer_id = m.employee_id
WHERE r.review_period = '2025-H2'
GROUP BY m.employee_id, m.full_name, m.department
HAVING COUNT(DISTINCT r.employee_id) >= 3
)
SELECT
manager_name,
department,
direct_reports_reviewed,
avg_rating_given,
rating_stddev,
deviation_from_company_avg,
CASE
WHEN deviation_from_company_avg > 0.5 THEN 'Lenient'
WHEN deviation_from_company_avg < -0.5 THEN 'Strict'
ELSE 'Calibrated'
END AS bias_flag
FROM manager_ratings
ORDER BY deviation_from_company_avg DESC;
Query 2: Goal Completion Rates by Department
Measure how effectively each department completes assigned goals, broken down by status.
SELECT
e.department,
COUNT(g.goal_id) AS total_goals,
SUM(CASE WHEN g.status = 'Completed' THEN 1 ELSE 0 END) AS completed,
SUM(CASE WHEN g.status = 'In Progress' THEN 1 ELSE 0 END) AS in_progress,
SUM(CASE WHEN g.status = 'Overdue' THEN 1 ELSE 0 END) AS overdue,
ROUND(
SUM(CASE WHEN g.status = 'Completed' THEN g.weight ELSE 0 END)
/ NULLIF(SUM(g.weight), 0) * 100, 1
) AS weighted_completion_pct,
ROUND(AVG(
CASE WHEN g.status = 'Completed'
THEN g.completion_date - g.target_date END
), 1) AS avg_days_early_late
FROM employees e
JOIN goals g ON e.employee_id = g.employee_id
WHERE g.status != 'Cancelled'
GROUP BY e.department
ORDER BY weighted_completion_pct DESC;
Query 3: Manager Rating vs. Team Productivity
Compare manager-given ratings against objective productivity metrics to see if subjective reviews align with measurable output.
WITH team_performance AS (
SELECT
e.manager_id,
ROUND(AVG(rt.score), 2) AS avg_manager_rating,
ROUND(AVG(pm.tasks_completed), 1) AS avg_tasks_per_month,
ROUND(AVG(pm.quality_score), 1) AS avg_quality_score,
ROUND(
SUM(CASE WHEN g.status = 'Completed' THEN 1 ELSE 0 END) * 100.0
/ NULLIF(COUNT(g.goal_id), 0), 1
) AS goal_completion_pct
FROM employees e
JOIN reviews r ON e.employee_id = r.employee_id AND r.review_period = '2025-H2'
JOIN ratings rt ON r.review_id = rt.review_id
LEFT JOIN productivity_metrics pm ON e.employee_id = pm.employee_id
AND pm.month >= '2025-07-01' AND pm.month <= '2025-12-31'
LEFT JOIN goals g ON e.employee_id = g.employee_id
WHERE e.manager_id IS NOT NULL
GROUP BY e.manager_id
)
SELECT
m.full_name AS manager_name,
m.department,
tp.avg_manager_rating,
tp.avg_tasks_per_month,
tp.avg_quality_score,
tp.goal_completion_pct,
ROUND(tp.avg_manager_rating - (tp.avg_quality_score / 20), 2) AS rating_vs_quality_gap
FROM team_performance tp
JOIN employees m ON tp.manager_id = m.employee_id
ORDER BY rating_vs_quality_gap DESC;
Python Analysis
Performance Rating Distribution
Visualize the distribution of performance ratings across the company and compare against an ideal bell curve.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
ratings = pd.read_csv('all_ratings.csv')
# Columns: review_id, employee_id, category, score, department
# Overall distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Histogram with KDE
scores = ratings.groupby('review_id')['score'].mean()
axes[0].hist(scores, bins=20, density=True, alpha=0.7, color='#3498db', edgecolor='black')
# Overlay normal distribution
mu, sigma = scores.mean(), scores.std()
x = np.linspace(1, 5, 100)
axes[0].plot(x, stats.norm.pdf(x, mu, sigma), 'r--', linewidth=2, label=f'Normal (mu={mu:.2f}, sigma={sigma:.2f})')
axes[0].set_title('Rating Distribution vs. Normal Curve', fontsize=13)
axes[0].set_xlabel('Average Rating')
axes[0].set_ylabel('Density')
axes[0].legend()
# Box plot by department
dept_scores = ratings.merge(
pd.read_csv('employees.csv')[['employee_id', 'department']],
on='employee_id'
)
dept_order = dept_scores.groupby('department')['score'].median().sort_values().index
dept_scores.boxplot(column='score', by='department', ax=axes[1],
positions=range(len(dept_order)))
axes[1].set_title('Rating Distribution by Department', fontsize=13)
axes[1].set_xlabel('')
axes[1].set_ylabel('Score')
axes[1].set_xticklabels(dept_order, rotation=45, ha='right', fontsize=9)
plt.suptitle('')
plt.tight_layout()
plt.savefig('performance_distribution.png', dpi=150)
plt.show()
Performance vs. Tenure and Training Correlation
Analyze whether longer tenure or more training hours correlate with higher performance ratings.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import pearsonr
df = pd.read_csv('employee_performance_summary.csv')
# Columns: employee_id, department, tenure_months, training_hours,
# avg_rating, goal_completion_pct
fig, axes = plt.subplots(1, 2, figsize=(14, 6))
# Tenure vs Rating
axes[0].scatter(df['tenure_months'], df['avg_rating'], alpha=0.4, s=20, color='#3498db')
z = np.polyfit(df['tenure_months'], df['avg_rating'], 2)
p = np.poly1d(z)
x_line = np.linspace(df['tenure_months'].min(), df['tenure_months'].max(), 100)
axes[0].plot(x_line, p(x_line), 'r-', linewidth=2)
r_tenure, p_val = pearsonr(df['tenure_months'], df['avg_rating'])
axes[0].set_title(f'Tenure vs. Performance (r={r_tenure:.3f}, p={p_val:.4f})', fontsize=12)
axes[0].set_xlabel('Tenure (months)')
axes[0].set_ylabel('Average Rating')
# Training vs Rating
axes[1].scatter(df['training_hours'], df['avg_rating'], alpha=0.4, s=20, color='#2ecc71')
z2 = np.polyfit(df['training_hours'], df['avg_rating'], 1)
p2 = np.poly1d(z2)
x_line2 = np.linspace(df['training_hours'].min(), df['training_hours'].max(), 100)
axes[1].plot(x_line2, p2(x_line2), 'r-', linewidth=2)
r_training, p_val2 = pearsonr(df['training_hours'], df['avg_rating'])
axes[1].set_title(f'Training Hours vs. Performance (r={r_training:.3f}, p={p_val2:.4f})', fontsize=12)
axes[1].set_xlabel('Training Hours')
axes[1].set_ylabel('Average Rating')
plt.tight_layout()
plt.savefig('performance_correlations.png', dpi=150)
plt.show()
Bell Curve Calibration Analysis
Test whether each manager's rating distribution follows the expected bell curve, using a chi-squared test.
import pandas as pd
from scipy.stats import chisquare
import numpy as np
ratings = pd.read_csv('ratings_by_manager.csv')
# Columns: manager_name, department, score
# Expected distribution: 5% (1-2), 20% (2-3), 50% (3-4), 20% (4-4.5), 5% (4.5-5)
expected_pcts = np.array([0.05, 0.20, 0.50, 0.20, 0.05])
bins = [0, 2.0, 3.0, 4.0, 4.5, 5.01]
labels = ['Needs Improvement', 'Below Expectations', 'Meets Expectations',
'Exceeds Expectations', 'Outstanding']
results = []
for manager in ratings['manager_name'].unique():
mgr_data = ratings[ratings['manager_name'] == manager]['score']
if len(mgr_data) < 10:
continue
observed = np.histogram(mgr_data, bins=bins)[0]
expected = expected_pcts * len(mgr_data)
# Combine small expected cells
if any(expected < 5):
continue
stat, p_value = chisquare(observed, f_exp=expected)
results.append({
'manager': manager,
'n_ratings': len(mgr_data),
'chi2_stat': round(stat, 2),
'p_value': round(p_value, 4),
'follows_bell_curve': p_value > 0.05
})
results_df = pd.DataFrame(results).sort_values('p_value')
print(results_df.to_string(index=False))
print(f"\nManagers aligned with bell curve: "
f"{results_df['follows_bell_curve'].sum()} / {len(results_df)}")
Excel Dashboard
Design a performance scorecard template and dashboard that HR Business Partners can use during calibration sessions.
- Import Data: Load SQL results into sheets: "Ratings by Manager", "Goal Completion", and "Team Productivity". Also import the Python correlation results. Format each as an Excel Table.
- Individual Scorecard Template: Create a template sheet with sections: Employee Info (name, department, tenure), Goal Summary (table with goals, weights, status, weighted score formula:
=SUMPRODUCT(weight, IF(status="Completed",1,0))/SUM(weight)), Rating Summary (spider/radar chart from category scores), and a composite score formula combining 40% goal completion + 40% avg rating + 20% productivity score. - Rating Distribution Chart: Build a histogram of all ratings using the Analysis ToolPak or FREQUENCY function. Overlay the expected bell curve using a secondary series. Use Conditional Formatting to highlight managers whose distributions deviate significantly (from the chi-squared results).
- Manager Calibration View: Create a PivotTable with Managers as rows, Rating Bins (1-2, 2-3, 3-4, 4-5) as columns, and Count of ratings as values. Add a PivotChart (stacked bar). This lets HR see at a glance which managers skew high or low.
- Goal Tracker: Build a department-level goal tracker with columns: Department, Total Goals, Completed, Overdue, Weighted %. Add sparklines showing monthly completion trends. Use Data Bars on the Weighted % column for quick visual comparison.
- Correlation Dashboard: Create scatter charts for Tenure vs. Rating and Training vs. Rating. Add linear trendlines with R-squared values displayed. Place these alongside a summary table showing correlation coefficients.
- Bias Detection Alert: Create a summary table of managers flagged as Lenient or Strict. Use Conditional Formatting with custom formulas:
=AND(bias_flag="Lenient")to highlight rows in orange, and=AND(bias_flag="Strict")for blue highlighting. Add a comment column where HRBP can note calibration actions. - Dashboard Tab: Combine KPI cards (Company Avg Rating, Goal Completion %, Training Hours/Employee, Managers Needing Calibration) with slicers for Department and Review Period. Include a leaderboard table of top 10 and bottom 10 performers by composite score.
Key Insights
Manager Bias
34% of managers rate their teams an average of 0.5+ points above the company mean. These lenient ratings dilute the value of performance reviews and create equity concerns when tied to compensation.
Training Impact
Employees with 40+ training hours score 0.4 points higher on average (r=0.31, p<0.001). The correlation is strongest for employees in their first 2 years, suggesting training has the greatest ROI during onboarding.
Goal Disconnect
While 78% of goals are marked "Completed", only 61% were completed by the target date. The Operations department has the highest weighted completion at 89%, while Marketing lags at 64%.
Rating Inflation
The company-wide rating mean is 3.8 out of 5.0, with only 3% of employees rated below 3.0. The distribution is left-skewed rather than following the expected bell curve, suggesting systemic leniency.
Knowledge Check
What does a chi-squared test measure in the context of performance rating calibration?
In the composite performance score formula (40% goal + 40% rating + 20% productivity), why might goal completion and rating receive equal weight?
The SQL leniency detection query uses a window function AVG(AVG(rt.score)) OVER (). What does this compute?
Why is a radar (spider) chart appropriate for displaying an individual employee's category ratings?
The weighted goal completion formula SUM(weight * completed) / SUM(weight) is preferred over a simple count because: