Recruitment Funnel Analytics
Optimize the hiring pipeline by analyzing conversion rates at each funnel stage, evaluating source effectiveness, and identifying bottlenecks that slow time-to-hire.
Project Overview
Scenario: You are a data analyst embedded in the talent acquisition team at a fast-growing e-commerce company. The team processed 12,000 applications last year but filled only 340 positions. The VP of People wants to know where candidates are dropping off, which recruiting sources deliver the best return on investment, and why certain roles take 3x longer to fill than others.
Objective: Map the end-to-end recruitment funnel from application to hire, quantify conversion rates at each stage, compare source effectiveness, and pinpoint time-to-hire bottlenecks by role and department.
Tools: SQL (PostgreSQL), Python (pandas, matplotlib, plotly), Microsoft Excel
Dataset Description
Table: applications
| Column | Type | Description |
|---|---|---|
| application_id | INT (PK) | Unique application identifier |
| candidate_name | VARCHAR(100) | Candidate full name |
| job_posting_id | INT (FK) | Reference to job_postings |
| source_id | INT (FK) | Reference to sources |
| applied_date | DATE | Date application submitted |
| status | VARCHAR(30) | Current status: Applied, Screened, Interviewed, Offered, Hired, Rejected |
Table: interviews
| Column | Type | Description |
|---|---|---|
| interview_id | INT (PK) | Unique interview identifier |
| application_id | INT (FK) | Reference to applications |
| interview_date | DATE | Date of interview |
| interview_round | INT | Round number (1, 2, 3...) |
| interviewer_id | INT | Employee who conducted interview |
| score | DECIMAL(3,1) | Interview score (1.0-10.0) |
| outcome | VARCHAR(20) | Pass, Fail, or Pending |
Table: offers
| Column | Type | Description |
|---|---|---|
| offer_id | INT (PK) | Unique offer identifier |
| application_id | INT (FK) | Reference to applications |
| offer_date | DATE | Date offer extended |
| salary_offered | DECIMAL(10,2) | Annual salary offered |
| accepted | BOOLEAN | Whether candidate accepted |
| response_date | DATE | Date candidate responded |
Table: hires
| Column | Type | Description |
|---|---|---|
| hire_id | INT (PK) | Unique hire record |
| application_id | INT (FK) | Reference to applications |
| start_date | DATE | Employee start date |
Table: job_postings
| Column | Type | Description |
|---|---|---|
| job_posting_id | INT (PK) | Unique posting identifier |
| role_title | VARCHAR(100) | Job title |
| department | VARCHAR(50) | Hiring department |
| posted_date | DATE | Date posting went live |
| closed_date | DATE | Date posting was closed |
| positions_open | INT | Number of positions to fill |
Table: sources
| Column | Type | Description |
|---|---|---|
| source_id | INT (PK) | Unique source identifier |
| source_name | VARCHAR(50) | Channel name (LinkedIn, Indeed, Referral, etc.) |
| annual_cost | DECIMAL(10,2) | Annual spend on this channel |
SQL Analysis
Query 1: Overall Funnel Conversion Rates
Calculate the number of candidates at each funnel stage and the conversion rate from one stage to the next.
WITH funnel AS (
SELECT
COUNT(DISTINCT a.application_id) AS applied,
COUNT(DISTINCT CASE WHEN a.status IN ('Screened','Interviewed','Offered','Hired')
THEN a.application_id END) AS screened,
COUNT(DISTINCT i.application_id) AS interviewed,
COUNT(DISTINCT o.application_id) AS offered,
COUNT(DISTINCT h.application_id) AS hired
FROM applications a
LEFT JOIN interviews i ON a.application_id = i.application_id
LEFT JOIN offers o ON a.application_id = o.application_id
LEFT JOIN hires h ON a.application_id = h.application_id
)
SELECT
applied,
screened,
ROUND(screened * 100.0 / NULLIF(applied, 0), 1) AS applied_to_screen_pct,
interviewed,
ROUND(interviewed * 100.0 / NULLIF(screened, 0), 1) AS screen_to_interview_pct,
offered,
ROUND(offered * 100.0 / NULLIF(interviewed, 0), 1) AS interview_to_offer_pct,
hired,
ROUND(hired * 100.0 / NULLIF(offered, 0), 1) AS offer_acceptance_pct,
ROUND(hired * 100.0 / NULLIF(applied, 0), 1) AS overall_conversion_pct
FROM funnel;
This gives a single-row summary of the entire funnel. The overall conversion from application to hire is typically 2-4% for competitive roles.
Query 2: Source Effectiveness and Cost-Per-Hire
Compare recruiting sources by volume, conversion rate, and cost efficiency.
SELECT
s.source_name,
COUNT(DISTINCT a.application_id) AS applications,
COUNT(DISTINCT h.application_id) AS hires,
ROUND(COUNT(DISTINCT h.application_id) * 100.0
/ NULLIF(COUNT(DISTINCT a.application_id), 0), 1) AS conversion_pct,
s.annual_cost,
ROUND(s.annual_cost / NULLIF(COUNT(DISTINCT h.application_id), 0), 0) AS cost_per_hire,
ROUND(AVG(CASE WHEN h.application_id IS NOT NULL
THEN h.start_date - a.applied_date END), 0) AS avg_days_to_hire
FROM applications a
JOIN sources s ON a.source_id = s.source_id
LEFT JOIN hires h ON a.application_id = h.application_id
GROUP BY s.source_name, s.annual_cost
ORDER BY cost_per_hire ASC NULLS LAST;
Query 3: Time-to-Hire by Role and Department
Identify which roles and departments experience the longest hiring cycles.
SELECT
jp.department,
jp.role_title,
COUNT(DISTINCT h.hire_id) AS total_hires,
ROUND(AVG(h.start_date - a.applied_date), 0) AS avg_days_to_hire,
ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY h.start_date - a.applied_date), 0) AS median_days,
MIN(h.start_date - a.applied_date) AS fastest_hire_days,
MAX(h.start_date - a.applied_date) AS slowest_hire_days
FROM hires h
JOIN applications a ON h.application_id = a.application_id
JOIN job_postings jp ON a.job_posting_id = jp.job_posting_id
GROUP BY jp.department, jp.role_title
HAVING COUNT(DISTINCT h.hire_id) >= 3
ORDER BY avg_days_to_hire DESC;
Query 4: Interview Stage Bottleneck Analysis
Measure the average delay between interview rounds to find where the process stalls.
WITH round_gaps AS (
SELECT
i1.application_id,
i1.interview_round AS from_round,
i2.interview_round AS to_round,
i2.interview_date - i1.interview_date AS days_between_rounds,
jp.department
FROM interviews i1
JOIN interviews i2
ON i1.application_id = i2.application_id
AND i2.interview_round = i1.interview_round + 1
JOIN applications a ON i1.application_id = a.application_id
JOIN job_postings jp ON a.job_posting_id = jp.job_posting_id
)
SELECT
department,
from_round || ' to ' || to_round AS transition,
COUNT(*) AS candidates,
ROUND(AVG(days_between_rounds), 1) AS avg_days_gap,
MAX(days_between_rounds) AS max_days_gap
FROM round_gaps
GROUP BY department, from_round, to_round
ORDER BY department, from_round;
Python Analysis
Recruitment Funnel Visualization
Build a funnel chart showing candidate volume and drop-off at each stage.
import pandas as pd
import plotly.graph_objects as go
# Funnel data from SQL query
stages = ['Applied', 'Screened', 'Interviewed', 'Offered', 'Hired']
counts = [12000, 4200, 1680, 520, 340]
fig = go.Figure(go.Funnel(
y=stages,
x=counts,
textinfo='value+percent initial+percent previous',
textposition='inside',
marker=dict(color=['#3498db', '#2ecc71', '#f39c12', '#e74c3c', '#9b59b6']),
connector=dict(line=dict(color='#bdc3c7', width=2))
))
fig.update_layout(
title='Recruitment Funnel (Annual)',
font=dict(size=14),
width=700, height=500
)
fig.write_image('recruitment_funnel.png', scale=2)
fig.show()
Source ROI Analysis
Scatter plot comparing each source's cost-per-hire against hire quality (measured by average interview score of hired candidates).
import pandas as pd
import matplotlib.pyplot as plt
source_data = pd.read_csv('source_effectiveness.csv')
# Columns: source_name, applications, hires, conversion_pct,
# annual_cost, cost_per_hire, avg_interview_score
fig, ax = plt.subplots(figsize=(10, 7))
scatter = ax.scatter(
source_data['cost_per_hire'],
source_data['avg_interview_score'],
s=source_data['hires'] * 5, # bubble size = number of hires
alpha=0.7,
edgecolors='black',
linewidths=0.5
)
for _, row in source_data.iterrows():
ax.annotate(row['source_name'],
(row['cost_per_hire'], row['avg_interview_score']),
textcoords='offset points', xytext=(8, 5), fontsize=10)
ax.set_xlabel('Cost per Hire ($)', fontsize=12)
ax.set_ylabel('Avg Interview Score (Hired Candidates)', fontsize=12)
ax.set_title('Source ROI: Cost vs. Quality', fontsize=14)
ax.axhline(source_data['avg_interview_score'].mean(), color='gray',
linestyle='--', alpha=0.5, label='Avg Score')
ax.axvline(source_data['cost_per_hire'].mean(), color='gray',
linestyle=':', alpha=0.5, label='Avg Cost')
ax.legend(fontsize=10)
# Quadrant labels
ax.text(0.02, 0.98, 'High Quality / Low Cost\n(Best)', transform=ax.transAxes,
fontsize=9, va='top', color='green', style='italic')
ax.text(0.75, 0.02, 'Low Quality / High Cost\n(Worst)', transform=ax.transAxes,
fontsize=9, va='bottom', color='red', style='italic')
plt.tight_layout()
plt.savefig('source_roi.png', dpi=150)
plt.show()
Bottleneck Identification: Time Between Stages
Box plot showing the distribution of days spent at each funnel stage to identify where delays cluster.
import pandas as pd
import matplotlib.pyplot as plt
# Load stage transition times
transitions = pd.read_csv('stage_transitions.csv')
# Columns: application_id, stage, days_in_stage
stage_order = ['Application to Screen', 'Screen to Interview',
'Interview to Offer', 'Offer to Hire']
fig, ax = plt.subplots(figsize=(10, 6))
data_to_plot = [transitions[transitions['stage'] == s]['days_in_stage']
for s in stage_order]
bp = ax.boxplot(data_to_plot, labels=stage_order, patch_artist=True,
medianprops=dict(color='black', linewidth=2))
colors = ['#3498db', '#2ecc71', '#f39c12', '#9b59b6']
for patch, color in zip(bp['boxes'], colors):
patch.set_facecolor(color)
patch.set_alpha(0.7)
ax.set_ylabel('Days', fontsize=12)
ax.set_title('Time Distribution at Each Funnel Stage', fontsize=14)
ax.grid(axis='y', alpha=0.3)
# Add median annotations
medians = [d.median() for d in data_to_plot]
for i, med in enumerate(medians):
ax.annotate(f'{med:.0f}d', (i + 1, med),
textcoords='offset points', xytext=(25, 5),
fontsize=11, fontweight='bold', color=colors[i])
plt.tight_layout()
plt.savefig('stage_bottlenecks.png', dpi=150)
plt.show()
Excel Dashboard
Build a recruitment operations dashboard that the talent acquisition team can update weekly.
- Import Data: Load the SQL output into sheets: "Funnel Summary", "Source Data", "Time to Hire", and "Bottlenecks". Format each as an Excel Table with descriptive names (tblFunnel, tblSources, etc.).
- Funnel Bar Chart: Create a horizontal bar chart with stages on the Y-axis and candidate count on the X-axis. Sort from Applied (top) to Hired (bottom). Add data labels showing both the count and the stage-to-stage conversion rate as a percentage.
- Source Comparison Table: Build a formatted table with columns: Source, Applications, Hires, Conversion %, Cost/Hire, Avg Days. Apply Conditional Formatting: green-to-red Color Scale on Cost/Hire (green = low cost). Add Icon Sets (arrows) to Conversion % column.
- Time-to-Hire Tracker: Create a combo chart with roles on the X-axis. Use clustered bars for average days-to-hire and a line overlay for the median. Add a horizontal target line at 30 days using a constant series. Color bars red if they exceed the target.
- Bottleneck Heatmap: Create a matrix with Departments as rows and Stage Transitions as columns. Fill cells with average days. Apply Conditional Formatting (Color Scales) so longer delays appear darker red. This immediately shows which department-stage combinations need attention.
- Weekly Tracker: Add a "Pipeline" sheet with columns for each week. Use
=COUNTIFS(tblApplications[applied_date],">="&week_start, tblApplications[status],"Screened")to count candidates at each stage per week. Create a stacked area chart to visualize pipeline flow over time. - Dashboard Summary: Create a top-level tab with KPI cards: Open Requisitions, Avg Time-to-Hire, Offer Acceptance Rate, Top Source, and Pipeline Volume. Add slicers for Department and Date Range connected to all PivotTables.
- ROI Calculator: Add input cells for source budget adjustments. Use a formula:
=new_budget / current_cost_per_hireto project expected hires from budget reallocation. Include a scenario comparison table showing current vs. proposed budget allocation.
Key Insights
Screening Bottleneck
65% of candidates are rejected at the screening stage, with an average 8-day delay. Implementing automated resume parsing could reduce screening time by 60% and surface qualified candidates faster.
Offer Acceptance Gap
The offer acceptance rate is 65%, below the 80% industry benchmark. Engineering roles have the lowest acceptance (52%), suggesting salary offers are not competitive for technical positions.
Source ROI Winner
Employee referrals produce a 12% application-to-hire rate at $1,200 cost-per-hire, compared to LinkedIn's 2.1% rate at $4,800. Investing in a referral bonus program would yield 3x the ROI.
Round 2-3 Scheduling
The average gap between interview rounds 2 and 3 is 14 days -- double the round 1-2 gap. Coordinating panel interviews and offering flexible scheduling could cut 5-7 days from the process.
Knowledge Check
In the funnel query, why is COUNT(DISTINCT application_id) used instead of COUNT(*)?
Cost-per-hire is calculated as:
Why does the time-to-hire query include HAVING COUNT(DISTINCT h.hire_id) >= 3?
In the Source ROI scatter plot, a source appearing in the top-left quadrant indicates:
The self-join in the bottleneck query (i1 JOIN i2 ON ... i2.interview_round = i1.interview_round + 1) is used to: