Recruitment Funnel Analytics

Project Overview

Scenario: You are a data analyst embedded in the talent acquisition team at a fast-growing e-commerce company. The team processed 12,000 applications last year but filled only 340 positions. The VP of People wants to know where candidates are dropping off, which recruiting sources deliver the best return on investment, and why certain roles take 3x longer to fill than others.

Objective: Map the end-to-end recruitment funnel from application to hire, quantify conversion rates at each stage, compare source effectiveness, and pinpoint time-to-hire bottlenecks by role and department.

Tools: SQL (PostgreSQL), Python (pandas, matplotlib, plotly), Microsoft Excel

Dataset Description

Table: applications

Column	Type	Description
application_id	INT (PK)	Unique application identifier
candidate_name	VARCHAR(100)	Candidate full name
job_posting_id	INT (FK)	Reference to job_postings
source_id	INT (FK)	Reference to sources
applied_date	DATE	Date application submitted
status	VARCHAR(30)	Current status: Applied, Screened, Interviewed, Offered, Hired, Rejected

Table: interviews

Column	Type	Description
interview_id	INT (PK)	Unique interview identifier
application_id	INT (FK)	Reference to applications
interview_date	DATE	Date of interview
interview_round	INT	Round number (1, 2, 3...)
interviewer_id	INT	Employee who conducted interview
score	DECIMAL(3,1)	Interview score (1.0-10.0)
outcome	VARCHAR(20)	Pass, Fail, or Pending

Table: offers

Column	Type	Description
offer_id	INT (PK)	Unique offer identifier
application_id	INT (FK)	Reference to applications
offer_date	DATE	Date offer extended
salary_offered	DECIMAL(10,2)	Annual salary offered
accepted	BOOLEAN	Whether candidate accepted
response_date	DATE	Date candidate responded

Table: hires

Column	Type	Description
hire_id	INT (PK)	Unique hire record
application_id	INT (FK)	Reference to applications
start_date	DATE	Employee start date

Table: job_postings

Column	Type	Description
job_posting_id	INT (PK)	Unique posting identifier
role_title	VARCHAR(100)	Job title
department	VARCHAR(50)	Hiring department
posted_date	DATE	Date posting went live
closed_date	DATE	Date posting was closed
positions_open	INT	Number of positions to fill

Table: sources

Column	Type	Description
source_id	INT (PK)	Unique source identifier
source_name	VARCHAR(50)	Channel name (LinkedIn, Indeed, Referral, etc.)
annual_cost	DECIMAL(10,2)	Annual spend on this channel

SQL Analysis

Query 1: Overall Funnel Conversion Rates

Calculate the number of candidates at each funnel stage and the conversion rate from one stage to the next.

WITH funnel AS (
    SELECT
        COUNT(DISTINCT a.application_id) AS applied,
        COUNT(DISTINCT CASE WHEN a.status IN ('Screened','Interviewed','Offered','Hired')
              THEN a.application_id END) AS screened,
        COUNT(DISTINCT i.application_id) AS interviewed,
        COUNT(DISTINCT o.application_id) AS offered,
        COUNT(DISTINCT h.application_id) AS hired
    FROM applications a
    LEFT JOIN interviews i ON a.application_id = i.application_id
    LEFT JOIN offers o ON a.application_id = o.application_id
    LEFT JOIN hires h ON a.application_id = h.application_id
)
SELECT
    applied,
    screened,
    ROUND(screened * 100.0 / NULLIF(applied, 0), 1) AS applied_to_screen_pct,
    interviewed,
    ROUND(interviewed * 100.0 / NULLIF(screened, 0), 1) AS screen_to_interview_pct,
    offered,
    ROUND(offered * 100.0 / NULLIF(interviewed, 0), 1) AS interview_to_offer_pct,
    hired,
    ROUND(hired * 100.0 / NULLIF(offered, 0), 1) AS offer_acceptance_pct,
    ROUND(hired * 100.0 / NULLIF(applied, 0), 1) AS overall_conversion_pct
FROM funnel;

This gives a single-row summary of the entire funnel. The overall conversion from application to hire is typically 2-4% for competitive roles.

Query 2: Source Effectiveness and Cost-Per-Hire

Compare recruiting sources by volume, conversion rate, and cost efficiency.

SELECT
    s.source_name,
    COUNT(DISTINCT a.application_id) AS applications,
    COUNT(DISTINCT h.application_id) AS hires,
    ROUND(COUNT(DISTINCT h.application_id) * 100.0
          / NULLIF(COUNT(DISTINCT a.application_id), 0), 1) AS conversion_pct,
    s.annual_cost,
    ROUND(s.annual_cost / NULLIF(COUNT(DISTINCT h.application_id), 0), 0) AS cost_per_hire,
    ROUND(AVG(CASE WHEN h.application_id IS NOT NULL
              THEN h.start_date - a.applied_date END), 0) AS avg_days_to_hire
FROM applications a
JOIN sources s ON a.source_id = s.source_id
LEFT JOIN hires h ON a.application_id = h.application_id
GROUP BY s.source_name, s.annual_cost
ORDER BY cost_per_hire ASC NULLS LAST;

Query 3: Time-to-Hire by Role and Department

Identify which roles and departments experience the longest hiring cycles.

SELECT
    jp.department,
    jp.role_title,
    COUNT(DISTINCT h.hire_id) AS total_hires,
    ROUND(AVG(h.start_date - a.applied_date), 0) AS avg_days_to_hire,
    ROUND(PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY h.start_date - a.applied_date), 0) AS median_days,
    MIN(h.start_date - a.applied_date) AS fastest_hire_days,
    MAX(h.start_date - a.applied_date) AS slowest_hire_days
FROM hires h
JOIN applications a ON h.application_id = a.application_id
JOIN job_postings jp ON a.job_posting_id = jp.job_posting_id
GROUP BY jp.department, jp.role_title
HAVING COUNT(DISTINCT h.hire_id) >= 3
ORDER BY avg_days_to_hire DESC;

Query 4: Interview Stage Bottleneck Analysis

Measure the average delay between interview rounds to find where the process stalls.

WITH round_gaps AS (
    SELECT
        i1.application_id,
        i1.interview_round AS from_round,
        i2.interview_round AS to_round,
        i2.interview_date - i1.interview_date AS days_between_rounds,
        jp.department
    FROM interviews i1
    JOIN interviews i2
        ON i1.application_id = i2.application_id
        AND i2.interview_round = i1.interview_round + 1
    JOIN applications a ON i1.application_id = a.application_id
    JOIN job_postings jp ON a.job_posting_id = jp.job_posting_id
)
SELECT
    department,
    from_round || ' to ' || to_round AS transition,
    COUNT(*) AS candidates,
    ROUND(AVG(days_between_rounds), 1) AS avg_days_gap,
    MAX(days_between_rounds) AS max_days_gap
FROM round_gaps
GROUP BY department, from_round, to_round
ORDER BY department, from_round;

Python Analysis

Recruitment Funnel Visualization

Build a funnel chart showing candidate volume and drop-off at each stage.

import pandas as pd
import plotly.graph_objects as go

# Funnel data from SQL query
stages = ['Applied', 'Screened', 'Interviewed', 'Offered', 'Hired']
counts = [12000, 4200, 1680, 520, 340]

fig = go.Figure(go.Funnel(
    y=stages,
    x=counts,
    textinfo='value+percent initial+percent previous',
    textposition='inside',
    marker=dict(color=['#3498db', '#2ecc71', '#f39c12', '#e74c3c', '#9b59b6']),
    connector=dict(line=dict(color='#bdc3c7', width=2))
))

fig.update_layout(
    title='Recruitment Funnel (Annual)',
    font=dict(size=14),
    width=700, height=500
)
fig.write_image('recruitment_funnel.png', scale=2)
fig.show()

Source ROI Analysis

Scatter plot comparing each source's cost-per-hire against hire quality (measured by average interview score of hired candidates).

import pandas as pd
import matplotlib.pyplot as plt

source_data = pd.read_csv('source_effectiveness.csv')
# Columns: source_name, applications, hires, conversion_pct,
#           annual_cost, cost_per_hire, avg_interview_score

fig, ax = plt.subplots(figsize=(10, 7))
scatter = ax.scatter(
    source_data['cost_per_hire'],
    source_data['avg_interview_score'],
    s=source_data['hires'] * 5,  # bubble size = number of hires
    alpha=0.7,
    edgecolors='black',
    linewidths=0.5
)

for _, row in source_data.iterrows():
    ax.annotate(row['source_name'],
                (row['cost_per_hire'], row['avg_interview_score']),
                textcoords='offset points', xytext=(8, 5), fontsize=10)

ax.set_xlabel('Cost per Hire ($)', fontsize=12)
ax.set_ylabel('Avg Interview Score (Hired Candidates)', fontsize=12)
ax.set_title('Source ROI: Cost vs. Quality', fontsize=14)
ax.axhline(source_data['avg_interview_score'].mean(), color='gray',
           linestyle='--', alpha=0.5, label='Avg Score')
ax.axvline(source_data['cost_per_hire'].mean(), color='gray',
           linestyle=':', alpha=0.5, label='Avg Cost')
ax.legend(fontsize=10)

# Quadrant labels
ax.text(0.02, 0.98, 'High Quality / Low Cost\n(Best)', transform=ax.transAxes,
        fontsize=9, va='top', color='green', style='italic')
ax.text(0.75, 0.02, 'Low Quality / High Cost\n(Worst)', transform=ax.transAxes,
        fontsize=9, va='bottom', color='red', style='italic')

plt.tight_layout()
plt.savefig('source_roi.png', dpi=150)
plt.show()

Bottleneck Identification: Time Between Stages

Box plot showing the distribution of days spent at each funnel stage to identify where delays cluster.

import pandas as pd
import matplotlib.pyplot as plt

# Load stage transition times
transitions = pd.read_csv('stage_transitions.csv')
# Columns: application_id, stage, days_in_stage

stage_order = ['Application to Screen', 'Screen to Interview',
               'Interview to Offer', 'Offer to Hire']

fig, ax = plt.subplots(figsize=(10, 6))
data_to_plot = [transitions[transitions['stage'] == s]['days_in_stage']
                for s in stage_order]

bp = ax.boxplot(data_to_plot, labels=stage_order, patch_artist=True,
                medianprops=dict(color='black', linewidth=2))

colors = ['#3498db', '#2ecc71', '#f39c12', '#9b59b6']
for patch, color in zip(bp['boxes'], colors):
    patch.set_facecolor(color)
    patch.set_alpha(0.7)

ax.set_ylabel('Days', fontsize=12)
ax.set_title('Time Distribution at Each Funnel Stage', fontsize=14)
ax.grid(axis='y', alpha=0.3)

# Add median annotations
medians = [d.median() for d in data_to_plot]
for i, med in enumerate(medians):
    ax.annotate(f'{med:.0f}d', (i + 1, med),
                textcoords='offset points', xytext=(25, 5),
                fontsize=11, fontweight='bold', color=colors[i])

plt.tight_layout()
plt.savefig('stage_bottlenecks.png', dpi=150)
plt.show()

Excel Dashboard

Build a recruitment operations dashboard that the talent acquisition team can update weekly.

Import Data: Load the SQL output into sheets: "Funnel Summary", "Source Data", "Time to Hire", and "Bottlenecks". Format each as an Excel Table with descriptive names (tblFunnel, tblSources, etc.).
Funnel Bar Chart: Create a horizontal bar chart with stages on the Y-axis and candidate count on the X-axis. Sort from Applied (top) to Hired (bottom). Add data labels showing both the count and the stage-to-stage conversion rate as a percentage.
Source Comparison Table: Build a formatted table with columns: Source, Applications, Hires, Conversion %, Cost/Hire, Avg Days. Apply Conditional Formatting: green-to-red Color Scale on Cost/Hire (green = low cost). Add Icon Sets (arrows) to Conversion % column.
Time-to-Hire Tracker: Create a combo chart with roles on the X-axis. Use clustered bars for average days-to-hire and a line overlay for the median. Add a horizontal target line at 30 days using a constant series. Color bars red if they exceed the target.
Bottleneck Heatmap: Create a matrix with Departments as rows and Stage Transitions as columns. Fill cells with average days. Apply Conditional Formatting (Color Scales) so longer delays appear darker red. This immediately shows which department-stage combinations need attention.
Weekly Tracker: Add a "Pipeline" sheet with columns for each week. Use =COUNTIFS(tblApplications[applied_date],">="&week_start, tblApplications[status],"Screened") to count candidates at each stage per week. Create a stacked area chart to visualize pipeline flow over time.
Dashboard Summary: Create a top-level tab with KPI cards: Open Requisitions, Avg Time-to-Hire, Offer Acceptance Rate, Top Source, and Pipeline Volume. Add slicers for Department and Date Range connected to all PivotTables.
ROI Calculator: Add input cells for source budget adjustments. Use a formula: =new_budget / current_cost_per_hire to project expected hires from budget reallocation. Include a scenario comparison table showing current vs. proposed budget allocation.

Key Insights

Screening Bottleneck

65% of candidates are rejected at the screening stage, with an average 8-day delay. Implementing automated resume parsing could reduce screening time by 60% and surface qualified candidates faster.

Offer Acceptance Gap

The offer acceptance rate is 65%, below the 80% industry benchmark. Engineering roles have the lowest acceptance (52%), suggesting salary offers are not competitive for technical positions.

Source ROI Winner

Employee referrals produce a 12% application-to-hire rate at $1,200 cost-per-hire, compared to LinkedIn's 2.1% rate at $4,800. Investing in a referral bonus program would yield 3x the ROI.

Round 2-3 Scheduling

The average gap between interview rounds 2 and 3 is 14 days -- double the round 1-2 gap. Coordinating panel interviews and offering flexible scheduling could cut 5-7 days from the process.