Warehouse Efficiency Analysis

Project Overview

Scenario: You are a warehouse operations analyst at FulFast Distribution, a 250,000 sq ft fulfillment center processing 8,000+ orders per day. The warehouse manager has noticed that order processing times have increased by 20% over the last quarter despite no increase in volume. Error rates on shipments have also climbed to 3.2%, above the 2% target. Your task is to identify the root causes and recommend operational improvements.

Objective: Analyze pick/pack/ship data across warehouse zones, workers, and shifts to pinpoint productivity bottlenecks, high-error zones, and staffing imbalances. Build visualizations and a KPI dashboard to support data-driven management decisions.

Tools: SQL (PostgreSQL), Python (pandas, matplotlib, seaborn, numpy), Excel

Dataset Description

Six tables modeling the warehouse fulfillment process from order receipt through picking, packing, and shipping.

orders

Column	Type	Description
order_id	INT (PK)	Unique order identifier
order_received	TIMESTAMP	When order entered the system
order_shipped	TIMESTAMP	When order left the dock
total_items	INT	Number of line items
priority	VARCHAR(10)	Standard, Express, Same-Day
has_error	BOOLEAN	Whether a packing/shipping error occurred

picks

Column	Type	Description
pick_id	INT (PK)	Unique pick event identifier
order_id	INT (FK)	Associated order
worker_id	INT (FK)	Picker assigned
zone_id	INT (FK)	Warehouse zone
pick_start	TIMESTAMP	When pick task began
pick_end	TIMESTAMP	When pick task completed
items_picked	INT	Number of items picked
pick_error	BOOLEAN	Wrong item or quantity picked

packing

Column	Type	Description
pack_id	INT (PK)	Unique packing event
order_id	INT (FK)	Associated order
worker_id	INT (FK)	Packer assigned
pack_start	TIMESTAMP	When packing began
pack_end	TIMESTAMP	When packing completed
pack_error	BOOLEAN	Packing error occurred

zones

Column	Type	Description
zone_id	INT (PK)	Zone identifier
zone_name	VARCHAR(20)	Zone label (A1, A2, B1, etc.)
zone_type	VARCHAR(30)	Small Items, Bulk, Refrigerated, Fragile
aisle_count	INT	Number of aisles in zone
sku_count	INT	Number of unique SKUs stored

workers

Column	Type	Description
worker_id	INT (PK)	Unique worker identifier
worker_name	VARCHAR(50)	Employee name
role	VARCHAR(20)	Picker, Packer, Loader
hire_date	DATE	Date of hire
shift_id	INT (FK)	Assigned shift

shifts

Column	Type	Description
shift_id	INT (PK)	Shift identifier
shift_name	VARCHAR(20)	Day, Swing, Night
start_time	TIME	Shift start (e.g., 06:00)
end_time	TIME	Shift end (e.g., 14:00)

SQL Analysis

Query 1: Pick Rates by Zone

Calculate the average number of items picked per hour in each warehouse zone. Low pick rates may indicate poor layout, congested aisles, or difficult-to-locate SKUs.

SELECT
    z.zone_name,
    z.zone_type,
    z.sku_count,
    COUNT(p.pick_id) AS total_picks,
    SUM(p.items_picked) AS total_items_picked,
    ROUND(
        SUM(p.items_picked)::NUMERIC
        / NULLIF(SUM(EXTRACT(EPOCH FROM (p.pick_end - p.pick_start)) / 3600), 0), 1
    ) AS items_per_hour,
    ROUND(
        100.0 * SUM(CASE WHEN p.pick_error THEN 1 ELSE 0 END) / COUNT(p.pick_id), 2
    ) AS error_rate_pct
FROM picks p
JOIN zones z ON p.zone_id = z.zone_id
GROUP BY z.zone_name, z.zone_type, z.sku_count
ORDER BY items_per_hour DESC;

Query 2: Order Processing Time Breakdown

Break down the total order processing time into pick, pack, and wait stages. This reveals which stage is the primary bottleneck contributing to the 20% increase in processing time.

WITH order_stages AS (
    SELECT
        o.order_id,
        o.priority,
        MIN(p.pick_start) AS first_pick_start,
        MAX(p.pick_end) AS last_pick_end,
        MIN(pk.pack_start) AS pack_start,
        MAX(pk.pack_end) AS pack_end,
        o.order_shipped
    FROM orders o
    LEFT JOIN picks p ON o.order_id = p.order_id
    LEFT JOIN packing pk ON o.order_id = pk.order_id
    WHERE o.order_shipped IS NOT NULL
    GROUP BY o.order_id, o.priority, o.order_shipped
)
SELECT
    priority,
    COUNT(order_id) AS orders_processed,
    ROUND(AVG(EXTRACT(EPOCH FROM (last_pick_end - first_pick_start)) / 60), 1) AS avg_pick_min,
    ROUND(AVG(EXTRACT(EPOCH FROM (pack_start - last_pick_end)) / 60), 1) AS avg_wait_min,
    ROUND(AVG(EXTRACT(EPOCH FROM (pack_end - pack_start)) / 60), 1) AS avg_pack_min,
    ROUND(AVG(EXTRACT(EPOCH FROM (order_shipped - pack_end)) / 60), 1) AS avg_ship_min,
    ROUND(AVG(EXTRACT(EPOCH FROM (order_shipped - first_pick_start)) / 60), 1) AS avg_total_min
FROM order_stages
GROUP BY priority
ORDER BY avg_total_min DESC;

Query 3: Worker Productivity with Tenure

Compare worker productivity against their tenure. Newer employees may need additional training, while experienced workers who underperform may need different interventions.

SELECT
    w.worker_id,
    w.worker_name,
    w.role,
    s.shift_name,
    (CURRENT_DATE - w.hire_date) AS tenure_days,
    COUNT(p.pick_id) AS total_tasks,
    SUM(p.items_picked) AS total_items,
    ROUND(
        SUM(p.items_picked)::NUMERIC
        / NULLIF(SUM(EXTRACT(EPOCH FROM (p.pick_end - p.pick_start)) / 3600), 0), 1
    ) AS items_per_hour,
    ROUND(100.0 * SUM(CASE WHEN p.pick_error THEN 1 ELSE 0 END) / COUNT(p.pick_id), 2) AS error_rate,
    NTILE(4) OVER (ORDER BY
        SUM(p.items_picked)::NUMERIC
        / NULLIF(SUM(EXTRACT(EPOCH FROM (p.pick_end - p.pick_start)) / 3600), 0)
        DESC
    ) AS productivity_quartile
FROM workers w
JOIN picks p ON w.worker_id = p.worker_id
JOIN shifts s ON w.shift_id = s.shift_id
WHERE w.role = 'Picker'
GROUP BY w.worker_id, w.worker_name, w.role, s.shift_name, w.hire_date
HAVING COUNT(p.pick_id) >= 50
ORDER BY items_per_hour DESC;

Query 4: Error Rates by Zone, Shift, and Worker Experience

Cross-reference error rates across three dimensions to identify specific hotspots. A zone with high errors only on the night shift suggests lighting or staffing issues rather than a zone layout problem.

SELECT
    z.zone_name,
    sh.shift_name,
    CASE
        WHEN (CURRENT_DATE - w.hire_date) < 90 THEN 'New (<90 days)'
        WHEN (CURRENT_DATE - w.hire_date) BETWEEN 90 AND 365 THEN 'Mid (90-365 days)'
        ELSE 'Veteran (365+ days)'
    END AS experience_level,
    COUNT(p.pick_id) AS total_picks,
    SUM(CASE WHEN p.pick_error THEN 1 ELSE 0 END) AS errors,
    ROUND(100.0 * SUM(CASE WHEN p.pick_error THEN 1 ELSE 0 END) / COUNT(p.pick_id), 2) AS error_pct
FROM picks p
JOIN zones z ON p.zone_id = z.zone_id
JOIN workers w ON p.worker_id = w.worker_id
JOIN shifts sh ON w.shift_id = sh.shift_id
GROUP BY z.zone_name, sh.shift_name,
    CASE
        WHEN (CURRENT_DATE - w.hire_date) < 90 THEN 'New (<90 days)'
        WHEN (CURRENT_DATE - w.hire_date) BETWEEN 90 AND 365 THEN 'Mid (90-365 days)'
        ELSE 'Veteran (365+ days)'
    END
HAVING COUNT(p.pick_id) >= 20
ORDER BY error_pct DESC;

Python Analysis

Zone Performance Heatmap

Create a heatmap showing pick rates across zones and shifts. Hot zones (high activity) can be compared against cold zones to balance workload and identify layout optimization candidates.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load data
picks = pd.read_csv('picks.csv', parse_dates=['pick_start', 'pick_end'])
zones = pd.read_csv('zones.csv')
workers = pd.read_csv('workers.csv')
shifts = pd.read_csv('shifts.csv')

# Calculate pick duration in hours
picks['duration_hours'] = (picks['pick_end'] - picks['pick_start']).dt.total_seconds() / 3600

# Merge with zones and shifts
picks = picks.merge(zones[['zone_id', 'zone_name']], on='zone_id')
picks = picks.merge(workers[['worker_id', 'shift_id']], on='worker_id')
picks = picks.merge(shifts[['shift_id', 'shift_name']], on='shift_id')

# Calculate items per hour by zone and shift
zone_shift = picks.groupby(['zone_name', 'shift_name']).agg(
    total_items=('items_picked', 'sum'),
    total_hours=('duration_hours', 'sum')
).reset_index()
zone_shift['items_per_hour'] = (zone_shift['total_items'] / zone_shift['total_hours']).round(1)

# Pivot for heatmap
pivot = zone_shift.pivot(index='zone_name', columns='shift_name', values='items_per_hour')

fig, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(pivot, annot=True, fmt='.0f', cmap='YlOrRd', linewidths=0.5,
            cbar_kws={'label': 'Items/Hour'}, ax=ax)
ax.set_title('Pick Rate by Zone and Shift (Items/Hour)', fontsize=14, fontweight='bold')
ax.set_ylabel('Warehouse Zone', fontsize=12)
ax.set_xlabel('Shift', fontsize=12)
plt.tight_layout()
plt.savefig('zone_heatmap.png', dpi=150)
plt.show()

Worker Productivity Distribution

Visualize the distribution of worker productivity (items per hour) segmented by experience level. This reveals whether the productivity gap is driven by tenure or other factors.

# Calculate per-worker metrics
worker_stats = picks.groupby('worker_id').agg(
    total_items=('items_picked', 'sum'),
    total_hours=('duration_hours', 'sum'),
    total_errors=('pick_error', 'sum'),
    total_picks=('pick_id', 'count')
).reset_index()
worker_stats['items_per_hour'] = worker_stats['total_items'] / worker_stats['total_hours']
worker_stats['error_rate'] = worker_stats['total_errors'] / worker_stats['total_picks'] * 100

# Merge with worker info for experience
worker_info = pd.read_csv('workers.csv', parse_dates=['hire_date'])
worker_stats = worker_stats.merge(worker_info[['worker_id', 'hire_date', 'role']], on='worker_id')
worker_stats['tenure_days'] = (pd.Timestamp.now() - worker_stats['hire_date']).dt.days
worker_stats['experience'] = pd.cut(worker_stats['tenure_days'],
    bins=[0, 90, 365, 9999],
    labels=['New (<90d)', 'Mid (90-365d)', 'Veteran (365+d)'])

# Filter to pickers with meaningful volume
pickers = worker_stats[(worker_stats['role'] == 'Picker') & (worker_stats['total_picks'] >= 50)]

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Productivity distribution by experience
colors = {'New (<90d)': '#ef4444', 'Mid (90-365d)': '#f59e0b', 'Veteran (365+d)': '#10b981'}
for exp, group in pickers.groupby('experience'):
    ax1.hist(group['items_per_hour'], bins=15, alpha=0.6, label=exp, color=colors.get(exp, '#999'))
ax1.set_xlabel('Items per Hour', fontsize=12)
ax1.set_ylabel('Number of Workers', fontsize=12)
ax1.set_title('Productivity Distribution by Experience', fontsize=13, fontweight='bold')
ax1.legend()

# Error rate vs productivity scatter
scatter = ax2.scatter(pickers['items_per_hour'], pickers['error_rate'],
                      c=pickers['tenure_days'], cmap='RdYlGn', s=50, alpha=0.7)
ax2.set_xlabel('Items per Hour', fontsize=12)
ax2.set_ylabel('Error Rate (%)', fontsize=12)
ax2.set_title('Speed vs. Accuracy by Worker', fontsize=13, fontweight='bold')
plt.colorbar(scatter, ax=ax2, label='Tenure (days)')
ax2.axhline(y=2, color='red', linestyle='--', alpha=0.5, label='2% Target')
ax2.legend()

plt.tight_layout()
plt.savefig('worker_productivity.png', dpi=150)
plt.show()

Shift Analysis and Bottleneck Visualization

Stacked bar chart showing the time breakdown (pick, wait, pack, ship) for each shift. This clearly illustrates where time is being lost in the process.

# Load order stage data
orders = pd.read_csv('orders.csv', parse_dates=['order_received', 'order_shipped'])
packing = pd.read_csv('packing.csv', parse_dates=['pack_start', 'pack_end'])

# Calculate stage durations per order
order_picks = picks.groupby('order_id').agg(
    pick_start=('pick_start', 'min'),
    pick_end=('pick_end', 'max')
).reset_index()

order_packs = packing.groupby('order_id').agg(
    pack_start=('pack_start', 'min'),
    pack_end=('pack_end', 'max')
).reset_index()

stages = orders.merge(order_picks, on='order_id').merge(order_packs, on='order_id')
stages['pick_min'] = (stages['pick_end'] - stages['pick_start']).dt.total_seconds() / 60
stages['wait_min'] = (stages['pack_start'] - stages['pick_end']).dt.total_seconds() / 60
stages['pack_min'] = (stages['pack_end'] - stages['pack_start']).dt.total_seconds() / 60
stages['ship_min'] = (stages['order_shipped'] - stages['pack_end']).dt.total_seconds() / 60

# Determine shift based on pick_start hour
stages['hour'] = stages['pick_start'].dt.hour
stages['shift'] = stages['hour'].apply(
    lambda h: 'Day' if 6 <= h < 14 else ('Swing' if 14 <= h < 22 else 'Night'))

# Average stage times by shift
shift_stages = stages.groupby('shift')[['pick_min', 'wait_min', 'pack_min', 'ship_min']].mean()

fig, ax = plt.subplots(figsize=(10, 6))
shift_stages.plot(kind='bar', stacked=True, ax=ax,
    color=['#3b82f6', '#f59e0b', '#10b981', '#8b5cf6'])
ax.set_ylabel('Average Minutes', fontsize=12)
ax.set_xlabel('Shift', fontsize=12)
ax.set_title('Order Processing Time Breakdown by Shift', fontsize=14, fontweight='bold')
ax.legend(['Pick', 'Wait', 'Pack', 'Ship'], loc='upper right')
ax.set_xticklabels(ax.get_xticklabels(), rotation=0)
plt.tight_layout()
plt.savefig('shift_bottleneck.png', dpi=150)
plt.show()

Excel Instructions

Build a warehouse KPI dashboard, worker scorecard, and zone performance tracker in Excel.

Import Data: Import the SQL query results for zone performance, order processing breakdown, and worker productivity into separate Excel sheets. Name them "Zone KPIs", "Processing Times", and "Worker Scores" respectively.
Warehouse KPI Dashboard: Create a "Dashboard" sheet. In large formatted cells, display: Total Orders Processed, Average Processing Time (min), Error Rate (%), and Items Picked per Hour. Use =AVERAGE('Processing Times'!G:G) type formulas linked to the data sheets. Apply conditional formatting: green if meeting target, red if not.
Zone Performance Chart: From the Zone KPIs sheet, create a clustered bar chart with zones on the x-axis and two series: Items/Hour (left axis) and Error Rate % (right axis, secondary). Add a horizontal reference line at the 2% error target. Use data bars in the table to visually rank zones.
Worker Scorecard: In the Worker Scores sheet, add a calculated column: =IF(AND(items_per_hour >= 120, error_rate <= 2), "Star", IF(OR(items_per_hour < 80, error_rate > 4), "Needs Coaching", "Meets Standard")). Use conditional formatting with icon sets (green star, yellow circle, red flag) to visually flag each worker.
Shift Comparison: Create a stacked bar chart from the Processing Times data showing pick, wait, pack, and ship minutes for each shift. Add data labels showing the total. This makes it immediately clear which shift has the longest total time and which stage dominates.
Trend Tracker: Create a weekly trend chart showing error rate and processing time over the last 12 weeks. Use sparklines in the KPI cells to show directional trends. Add a target line for each metric. This lets the warehouse manager track whether improvements are taking effect.
Interactive Filters: Add slicers for Zone, Shift, and Worker Experience Level connected to PivotTables on each sheet. This allows managers to drill into specific segments and identify localized issues versus warehouse-wide problems.

Key Insights & Recommendations

Wait Time Bottleneck

The pick-to-pack wait time accounts for 35% of total processing time, averaging 22 minutes. This is the primary driver of the 20% increase. Implementing a conveyor handoff system or dedicated runners between zones could cut wait time by 50%.

Zone B3 Underperformance

Zone B3 (Bulk items) has the lowest pick rate at 68 items/hour, 40% below the warehouse average. The zone has the highest SKU density per aisle. Recommend reorganizing the zone layout using ABC analysis to place high-velocity SKUs in accessible locations.

Night Shift Quality Issue

Night shift error rates are 4.8%, more than double the day shift (2.1%). New employees (<90 days) on night shifts have a 7.3% error rate. Pair new hires with veteran mentors and improve zone lighting for the night shift.

Staffing Imbalance

Day shift has 45% of the workforce but handles only 38% of orders. Night shift is understaffed relative to volume by approximately 8 workers. Rebalancing shifts would reduce overtime costs and improve night shift throughput.

Training ROI

Workers who completed advanced training pick 28% faster with 60% fewer errors than untrained peers at the same tenure level. Expanding the training program to all workers within their first 60 days would yield an estimated ROI of 340% annually.

Express Order Priority

Same-Day orders average 48 minutes processing vs. 35 for Standard, despite the urgency. Implementing a dedicated express lane with pre-staged high-velocity items could reduce express processing to under 25 minutes.