Project 23

Recruitment Funnel Analytics

Optimize the hiring pipeline by analyzing conversion rates at each funnel stage, evaluating source effectiveness, and identifying bottlenecks that slow time-to-hire.

Project Overview

Scenario: You are a data analyst embedded in the talent acquisition team at a fast-growing e-commerce company. The team processed 12,000 applications last year but filled only 340 positions. The VP of People wants to know where candidates are dropping off, which recruiting sources deliver the best return on investment, and why certain roles take 3x longer to fill than others.

Objective: Map the end-to-end recruitment funnel from application to hire, quantify conversion rates at each stage, compare source effectiveness, and pinpoint time-to-hire bottlenecks by role and department.

Tools: SQL (PostgreSQL), Python (pandas, matplotlib, plotly), Microsoft Excel

Dataset Description

Table: applications

ColumnTypeDescription
application_idINT (PK)Unique application identifier
candidate_nameVARCHAR(100)Candidate full name
job_posting_idINT (FK)Reference to job_postings
source_idINT (FK)Reference to sources
applied_dateDATEDate application submitted
statusVARCHAR(30)Current status: Applied, Screened, Interviewed, Offered, Hired, Rejected

Table: interviews

ColumnTypeDescription
interview_idINT (PK)Unique interview identifier
application_idINT (FK)Reference to applications
interview_dateDATEDate of interview
interview_roundINTRound number (1, 2, 3...)
interviewer_idINTEmployee who conducted interview
scoreDECIMAL(3,1)Interview score (1.0-10.0)
outcomeVARCHAR(20)Pass, Fail, or Pending

Table: offers

ColumnTypeDescription
offer_idINT (PK)Unique offer identifier
application_idINT (FK)Reference to applications
offer_dateDATEDate offer extended
salary_offeredDECIMAL(10,2)Annual salary offered
acceptedBOOLEANWhether candidate accepted
response_dateDATEDate candidate responded

Table: hires

ColumnTypeDescription
hire_idINT (PK)Unique hire record
application_idINT (FK)Reference to applications
start_dateDATEEmployee start date

Table: job_postings

ColumnTypeDescription
job_posting_idINT (PK)Unique posting identifier
role_titleVARCHAR(100)Job title
departmentVARCHAR(50)Hiring department
posted_dateDATEDate posting went live
closed_dateDATEDate posting was closed
positions_openINTNumber of positions to fill

Table: sources

ColumnTypeDescription
source_idINT (PK)Unique source identifier
source_nameVARCHAR(50)Channel name (LinkedIn, Indeed, Referral, etc.)
annual_costDECIMAL(10,2)Annual spend on this channel

SQL Analysis

Query 1: Overall Funnel Conversion Rates

Calculate the number of candidates at each funnel stage and the conversion rate from one stage to the next.

WITH funnel AS (
    SELECT
        COUNT(DISTINCT a.application_id) AS applied,
        COUNT(DISTINCT CASE WHEN a.status IN ('Screened','Interviewed','Offered','Hired')
              THEN a.application_id END) AS screened,
        COUNT(DISTINCT i.application_id) AS interviewed,
        COUNT(DISTINCT o.application_id) AS offered,
        COUNT(DISTINCT h.application_id) AS hired
    FROM applications a
    LEFT JOIN interviews i ON a.application_id = i.application_id
    LEFT JOIN offers o ON a.application_id = o.application_id
    LEFT JOIN hires h ON a.application_id = h.application_id
)
SELECT
    applied,
    screened,
    ROUND(screened * 100.0 / NULLIF(applied, 0), 1) AS applied_to_screen_pct,
    interviewed,
    ROUND(interviewed * 100.0 / NULLIF(screened, 0), 1) AS screen_to_interview_pct,
    offered,
    ROUND(offered * 100.0 / NULLIF(interviewed, 0), 1) AS interview_to_offer_pct,
    hired,
    ROUND(hired * 100.0 / NULLIF(offered, 0), 1) AS offer_acceptance_pct,
    ROUND(hired * 100.0 / NULLIF(applied, 0), 1) AS overall_conversion_pct
FROM funnel;

This gives a single-row summary of the entire funnel. The overall conversion from application to hire is typically 2-4% for competitive roles.

Query 2: Source Effectiveness and Cost-Per-Hire

Compare recruiting sources by volume, conversion rate, and cost efficiency.

SELECT
    s.source_name,
    COUNT(DISTINCT a.application_id) AS applications,
    COUNT(DISTINCT h.application_id) AS hires,
    ROUND(COUNT(DISTINCT h.application_id) * 100.0
          / NULLIF(COUNT(DISTINCT a.application_id), 0), 1) AS conversion_pct,
    s.annual_cost,
    ROUND(s.annual_cost / NULLIF(COUNT(DISTINCT h.application_id), 0), 0) AS cost_per_hire,
    ROUND(AVG(CASE WHEN h.application_id IS NOT NULL
              THEN h.start_date - a.applied_date END), 0) AS avg_days_to_hire
FROM applications a
JOIN sources s ON a.source_id = s.source_id
LEFT JOIN hires h ON a.application_id = h.application_id
GROUP BY s.source_name, s.annual_cost
ORDER BY cost_per_hire ASC NULLS LAST;

Query 3: Time-to-Hire by Role and Department

Identify which roles and departments experience the longest hiring cycles.

SELECT
    jp.department,
    jp.role_title,
    COUNT(DISTINCT h.hire_id) AS total_hires,
    ROUND(AVG(h.start_date - a.applied_date), 0) AS avg_days_to_hire,
    ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY h.start_date - a.applied_date), 0) AS median_days,
    MIN(h.start_date - a.applied_date) AS fastest_hire_days,
    MAX(h.start_date - a.applied_date) AS slowest_hire_days
FROM hires h
JOIN applications a ON h.application_id = a.application_id
JOIN job_postings jp ON a.job_posting_id = jp.job_posting_id
GROUP BY jp.department, jp.role_title
HAVING COUNT(DISTINCT h.hire_id) >= 3
ORDER BY avg_days_to_hire DESC;

Query 4: Interview Stage Bottleneck Analysis

Measure the average delay between interview rounds to find where the process stalls.

WITH round_gaps AS (
    SELECT
        i1.application_id,
        i1.interview_round AS from_round,
        i2.interview_round AS to_round,
        i2.interview_date - i1.interview_date AS days_between_rounds,
        jp.department
    FROM interviews i1
    JOIN interviews i2
        ON i1.application_id = i2.application_id
        AND i2.interview_round = i1.interview_round + 1
    JOIN applications a ON i1.application_id = a.application_id
    JOIN job_postings jp ON a.job_posting_id = jp.job_posting_id
)
SELECT
    department,
    from_round || ' to ' || to_round AS transition,
    COUNT(*) AS candidates,
    ROUND(AVG(days_between_rounds), 1) AS avg_days_gap,
    MAX(days_between_rounds) AS max_days_gap
FROM round_gaps
GROUP BY department, from_round, to_round
ORDER BY department, from_round;

Python Analysis

Recruitment Funnel Visualization

Build a funnel chart showing candidate volume and drop-off at each stage.

import pandas as pd
import plotly.graph_objects as go

# Funnel data from SQL query
stages = ['Applied', 'Screened', 'Interviewed', 'Offered', 'Hired']
counts = [12000, 4200, 1680, 520, 340]

fig = go.Figure(go.Funnel(
    y=stages,
    x=counts,
    textinfo='value+percent initial+percent previous',
    textposition='inside',
    marker=dict(color=['#3498db', '#2ecc71', '#f39c12', '#e74c3c', '#9b59b6']),
    connector=dict(line=dict(color='#bdc3c7', width=2))
))

fig.update_layout(
    title='Recruitment Funnel (Annual)',
    font=dict(size=14),
    width=700, height=500
)
fig.write_image('recruitment_funnel.png', scale=2)
fig.show()

Source ROI Analysis

Scatter plot comparing each source's cost-per-hire against hire quality (measured by average interview score of hired candidates).

import pandas as pd
import matplotlib.pyplot as plt

source_data = pd.read_csv('source_effectiveness.csv')
# Columns: source_name, applications, hires, conversion_pct,
#           annual_cost, cost_per_hire, avg_interview_score

fig, ax = plt.subplots(figsize=(10, 7))
scatter = ax.scatter(
    source_data['cost_per_hire'],
    source_data['avg_interview_score'],
    s=source_data['hires'] * 5,  # bubble size = number of hires
    alpha=0.7,
    edgecolors='black',
    linewidths=0.5
)

for _, row in source_data.iterrows():
    ax.annotate(row['source_name'],
                (row['cost_per_hire'], row['avg_interview_score']),
                textcoords='offset points', xytext=(8, 5), fontsize=10)

ax.set_xlabel('Cost per Hire ($)', fontsize=12)
ax.set_ylabel('Avg Interview Score (Hired Candidates)', fontsize=12)
ax.set_title('Source ROI: Cost vs. Quality', fontsize=14)
ax.axhline(source_data['avg_interview_score'].mean(), color='gray',
           linestyle='--', alpha=0.5, label='Avg Score')
ax.axvline(source_data['cost_per_hire'].mean(), color='gray',
           linestyle=':', alpha=0.5, label='Avg Cost')
ax.legend(fontsize=10)

# Quadrant labels
ax.text(0.02, 0.98, 'High Quality / Low Cost\n(Best)', transform=ax.transAxes,
        fontsize=9, va='top', color='green', style='italic')
ax.text(0.75, 0.02, 'Low Quality / High Cost\n(Worst)', transform=ax.transAxes,
        fontsize=9, va='bottom', color='red', style='italic')

plt.tight_layout()
plt.savefig('source_roi.png', dpi=150)
plt.show()

Bottleneck Identification: Time Between Stages

Box plot showing the distribution of days spent at each funnel stage to identify where delays cluster.

import pandas as pd
import matplotlib.pyplot as plt

# Load stage transition times
transitions = pd.read_csv('stage_transitions.csv')
# Columns: application_id, stage, days_in_stage

stage_order = ['Application to Screen', 'Screen to Interview',
               'Interview to Offer', 'Offer to Hire']

fig, ax = plt.subplots(figsize=(10, 6))
data_to_plot = [transitions[transitions['stage'] == s]['days_in_stage']
                for s in stage_order]

bp = ax.boxplot(data_to_plot, labels=stage_order, patch_artist=True,
                medianprops=dict(color='black', linewidth=2))

colors = ['#3498db', '#2ecc71', '#f39c12', '#9b59b6']
for patch, color in zip(bp['boxes'], colors):
    patch.set_facecolor(color)
    patch.set_alpha(0.7)

ax.set_ylabel('Days', fontsize=12)
ax.set_title('Time Distribution at Each Funnel Stage', fontsize=14)
ax.grid(axis='y', alpha=0.3)

# Add median annotations
medians = [d.median() for d in data_to_plot]
for i, med in enumerate(medians):
    ax.annotate(f'{med:.0f}d', (i + 1, med),
                textcoords='offset points', xytext=(25, 5),
                fontsize=11, fontweight='bold', color=colors[i])

plt.tight_layout()
plt.savefig('stage_bottlenecks.png', dpi=150)
plt.show()

Excel Dashboard

Build a recruitment operations dashboard that the talent acquisition team can update weekly.

  1. Import Data: Load the SQL output into sheets: "Funnel Summary", "Source Data", "Time to Hire", and "Bottlenecks". Format each as an Excel Table with descriptive names (tblFunnel, tblSources, etc.).
  2. Funnel Bar Chart: Create a horizontal bar chart with stages on the Y-axis and candidate count on the X-axis. Sort from Applied (top) to Hired (bottom). Add data labels showing both the count and the stage-to-stage conversion rate as a percentage.
  3. Source Comparison Table: Build a formatted table with columns: Source, Applications, Hires, Conversion %, Cost/Hire, Avg Days. Apply Conditional Formatting: green-to-red Color Scale on Cost/Hire (green = low cost). Add Icon Sets (arrows) to Conversion % column.
  4. Time-to-Hire Tracker: Create a combo chart with roles on the X-axis. Use clustered bars for average days-to-hire and a line overlay for the median. Add a horizontal target line at 30 days using a constant series. Color bars red if they exceed the target.
  5. Bottleneck Heatmap: Create a matrix with Departments as rows and Stage Transitions as columns. Fill cells with average days. Apply Conditional Formatting (Color Scales) so longer delays appear darker red. This immediately shows which department-stage combinations need attention.
  6. Weekly Tracker: Add a "Pipeline" sheet with columns for each week. Use =COUNTIFS(tblApplications[applied_date],">="&week_start, tblApplications[status],"Screened") to count candidates at each stage per week. Create a stacked area chart to visualize pipeline flow over time.
  7. Dashboard Summary: Create a top-level tab with KPI cards: Open Requisitions, Avg Time-to-Hire, Offer Acceptance Rate, Top Source, and Pipeline Volume. Add slicers for Department and Date Range connected to all PivotTables.
  8. ROI Calculator: Add input cells for source budget adjustments. Use a formula: =new_budget / current_cost_per_hire to project expected hires from budget reallocation. Include a scenario comparison table showing current vs. proposed budget allocation.

Key Insights

Screening Bottleneck

65% of candidates are rejected at the screening stage, with an average 8-day delay. Implementing automated resume parsing could reduce screening time by 60% and surface qualified candidates faster.

Offer Acceptance Gap

The offer acceptance rate is 65%, below the 80% industry benchmark. Engineering roles have the lowest acceptance (52%), suggesting salary offers are not competitive for technical positions.

Source ROI Winner

Employee referrals produce a 12% application-to-hire rate at $1,200 cost-per-hire, compared to LinkedIn's 2.1% rate at $4,800. Investing in a referral bonus program would yield 3x the ROI.

Round 2-3 Scheduling

The average gap between interview rounds 2 and 3 is 14 days -- double the round 1-2 gap. Coordinating panel interviews and offering flexible scheduling could cut 5-7 days from the process.

Knowledge Check

Question 1

In the funnel query, why is COUNT(DISTINCT application_id) used instead of COUNT(*)?

Question 2

Cost-per-hire is calculated as:

Question 3

Why does the time-to-hire query include HAVING COUNT(DISTINCT h.hire_id) >= 3?

Question 4

In the Source ROI scatter plot, a source appearing in the top-left quadrant indicates:

Question 5

The self-join in the bottleneck query (i1 JOIN i2 ON ... i2.interview_round = i1.interview_round + 1) is used to:

← Previous Project Next Project →