Project 29

Warehouse Efficiency Analysis

Optimize pick, pack, and ship operations in a fulfillment warehouse by analyzing zone productivity, worker performance, shift patterns, and order processing bottlenecks.

Project Overview

Scenario: You are a warehouse operations analyst at FulFast Distribution, a 250,000 sq ft fulfillment center processing 8,000+ orders per day. The warehouse manager has noticed that order processing times have increased by 20% over the last quarter despite no increase in volume. Error rates on shipments have also climbed to 3.2%, above the 2% target. Your task is to identify the root causes and recommend operational improvements.

Objective: Analyze pick/pack/ship data across warehouse zones, workers, and shifts to pinpoint productivity bottlenecks, high-error zones, and staffing imbalances. Build visualizations and a KPI dashboard to support data-driven management decisions.

Tools: SQL (PostgreSQL), Python (pandas, matplotlib, seaborn, numpy), Excel

Dataset Description

Six tables modeling the warehouse fulfillment process from order receipt through picking, packing, and shipping.

orders

ColumnTypeDescription
order_idINT (PK)Unique order identifier
order_receivedTIMESTAMPWhen order entered the system
order_shippedTIMESTAMPWhen order left the dock
total_itemsINTNumber of line items
priorityVARCHAR(10)Standard, Express, Same-Day
has_errorBOOLEANWhether a packing/shipping error occurred

picks

ColumnTypeDescription
pick_idINT (PK)Unique pick event identifier
order_idINT (FK)Associated order
worker_idINT (FK)Picker assigned
zone_idINT (FK)Warehouse zone
pick_startTIMESTAMPWhen pick task began
pick_endTIMESTAMPWhen pick task completed
items_pickedINTNumber of items picked
pick_errorBOOLEANWrong item or quantity picked

packing

ColumnTypeDescription
pack_idINT (PK)Unique packing event
order_idINT (FK)Associated order
worker_idINT (FK)Packer assigned
pack_startTIMESTAMPWhen packing began
pack_endTIMESTAMPWhen packing completed
pack_errorBOOLEANPacking error occurred

zones

ColumnTypeDescription
zone_idINT (PK)Zone identifier
zone_nameVARCHAR(20)Zone label (A1, A2, B1, etc.)
zone_typeVARCHAR(30)Small Items, Bulk, Refrigerated, Fragile
aisle_countINTNumber of aisles in zone
sku_countINTNumber of unique SKUs stored

workers

ColumnTypeDescription
worker_idINT (PK)Unique worker identifier
worker_nameVARCHAR(50)Employee name
roleVARCHAR(20)Picker, Packer, Loader
hire_dateDATEDate of hire
shift_idINT (FK)Assigned shift

shifts

ColumnTypeDescription
shift_idINT (PK)Shift identifier
shift_nameVARCHAR(20)Day, Swing, Night
start_timeTIMEShift start (e.g., 06:00)
end_timeTIMEShift end (e.g., 14:00)

SQL Analysis

Query 1: Pick Rates by Zone

Calculate the average number of items picked per hour in each warehouse zone. Low pick rates may indicate poor layout, congested aisles, or difficult-to-locate SKUs.

SELECT
    z.zone_name,
    z.zone_type,
    z.sku_count,
    COUNT(p.pick_id) AS total_picks,
    SUM(p.items_picked) AS total_items_picked,
    ROUND(
        SUM(p.items_picked)::NUMERIC
        / NULLIF(SUM(EXTRACT(EPOCH FROM (p.pick_end - p.pick_start)) / 3600), 0), 1
    ) AS items_per_hour,
    ROUND(
        100.0 * SUM(CASE WHEN p.pick_error THEN 1 ELSE 0 END) / COUNT(p.pick_id), 2
    ) AS error_rate_pct
FROM picks p
JOIN zones z ON p.zone_id = z.zone_id
GROUP BY z.zone_name, z.zone_type, z.sku_count
ORDER BY items_per_hour DESC;

Query 2: Order Processing Time Breakdown

Break down the total order processing time into pick, pack, and wait stages. This reveals which stage is the primary bottleneck contributing to the 20% increase in processing time.

WITH order_stages AS (
    SELECT
        o.order_id,
        o.priority,
        MIN(p.pick_start) AS first_pick_start,
        MAX(p.pick_end) AS last_pick_end,
        MIN(pk.pack_start) AS pack_start,
        MAX(pk.pack_end) AS pack_end,
        o.order_shipped
    FROM orders o
    LEFT JOIN picks p ON o.order_id = p.order_id
    LEFT JOIN packing pk ON o.order_id = pk.order_id
    WHERE o.order_shipped IS NOT NULL
    GROUP BY o.order_id, o.priority, o.order_shipped
)
SELECT
    priority,
    COUNT(order_id) AS orders_processed,
    ROUND(AVG(EXTRACT(EPOCH FROM (last_pick_end - first_pick_start)) / 60), 1) AS avg_pick_min,
    ROUND(AVG(EXTRACT(EPOCH FROM (pack_start - last_pick_end)) / 60), 1) AS avg_wait_min,
    ROUND(AVG(EXTRACT(EPOCH FROM (pack_end - pack_start)) / 60), 1) AS avg_pack_min,
    ROUND(AVG(EXTRACT(EPOCH FROM (order_shipped - pack_end)) / 60), 1) AS avg_ship_min,
    ROUND(AVG(EXTRACT(EPOCH FROM (order_shipped - first_pick_start)) / 60), 1) AS avg_total_min
FROM order_stages
GROUP BY priority
ORDER BY avg_total_min DESC;

Query 3: Worker Productivity with Tenure

Compare worker productivity against their tenure. Newer employees may need additional training, while experienced workers who underperform may need different interventions.

SELECT
    w.worker_id,
    w.worker_name,
    w.role,
    s.shift_name,
    (CURRENT_DATE - w.hire_date) AS tenure_days,
    COUNT(p.pick_id) AS total_tasks,
    SUM(p.items_picked) AS total_items,
    ROUND(
        SUM(p.items_picked)::NUMERIC
        / NULLIF(SUM(EXTRACT(EPOCH FROM (p.pick_end - p.pick_start)) / 3600), 0), 1
    ) AS items_per_hour,
    ROUND(100.0 * SUM(CASE WHEN p.pick_error THEN 1 ELSE 0 END) / COUNT(p.pick_id), 2) AS error_rate,
    NTILE(4) OVER (ORDER BY
        SUM(p.items_picked)::NUMERIC
        / NULLIF(SUM(EXTRACT(EPOCH FROM (p.pick_end - p.pick_start)) / 3600), 0)
        DESC
    ) AS productivity_quartile
FROM workers w
JOIN picks p ON w.worker_id = p.worker_id
JOIN shifts s ON w.shift_id = s.shift_id
WHERE w.role = 'Picker'
GROUP BY w.worker_id, w.worker_name, w.role, s.shift_name, w.hire_date
HAVING COUNT(p.pick_id) >= 50
ORDER BY items_per_hour DESC;

Query 4: Error Rates by Zone, Shift, and Worker Experience

Cross-reference error rates across three dimensions to identify specific hotspots. A zone with high errors only on the night shift suggests lighting or staffing issues rather than a zone layout problem.

SELECT
    z.zone_name,
    sh.shift_name,
    CASE
        WHEN (CURRENT_DATE - w.hire_date) < 90 THEN 'New (<90 days)'
        WHEN (CURRENT_DATE - w.hire_date) BETWEEN 90 AND 365 THEN 'Mid (90-365 days)'
        ELSE 'Veteran (365+ days)'
    END AS experience_level,
    COUNT(p.pick_id) AS total_picks,
    SUM(CASE WHEN p.pick_error THEN 1 ELSE 0 END) AS errors,
    ROUND(100.0 * SUM(CASE WHEN p.pick_error THEN 1 ELSE 0 END) / COUNT(p.pick_id), 2) AS error_pct
FROM picks p
JOIN zones z ON p.zone_id = z.zone_id
JOIN workers w ON p.worker_id = w.worker_id
JOIN shifts sh ON w.shift_id = sh.shift_id
GROUP BY z.zone_name, sh.shift_name,
    CASE
        WHEN (CURRENT_DATE - w.hire_date) < 90 THEN 'New (<90 days)'
        WHEN (CURRENT_DATE - w.hire_date) BETWEEN 90 AND 365 THEN 'Mid (90-365 days)'
        ELSE 'Veteran (365+ days)'
    END
HAVING COUNT(p.pick_id) >= 20
ORDER BY error_pct DESC;

Python Analysis

Zone Performance Heatmap

Create a heatmap showing pick rates across zones and shifts. Hot zones (high activity) can be compared against cold zones to balance workload and identify layout optimization candidates.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load data
picks = pd.read_csv('picks.csv', parse_dates=['pick_start', 'pick_end'])
zones = pd.read_csv('zones.csv')
workers = pd.read_csv('workers.csv')
shifts = pd.read_csv('shifts.csv')

# Calculate pick duration in hours
picks['duration_hours'] = (picks['pick_end'] - picks['pick_start']).dt.total_seconds() / 3600

# Merge with zones and shifts
picks = picks.merge(zones[['zone_id', 'zone_name']], on='zone_id')
picks = picks.merge(workers[['worker_id', 'shift_id']], on='worker_id')
picks = picks.merge(shifts[['shift_id', 'shift_name']], on='shift_id')

# Calculate items per hour by zone and shift
zone_shift = picks.groupby(['zone_name', 'shift_name']).agg(
    total_items=('items_picked', 'sum'),
    total_hours=('duration_hours', 'sum')
).reset_index()
zone_shift['items_per_hour'] = (zone_shift['total_items'] / zone_shift['total_hours']).round(1)

# Pivot for heatmap
pivot = zone_shift.pivot(index='zone_name', columns='shift_name', values='items_per_hour')

fig, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(pivot, annot=True, fmt='.0f', cmap='YlOrRd', linewidths=0.5,
            cbar_kws={'label': 'Items/Hour'}, ax=ax)
ax.set_title('Pick Rate by Zone and Shift (Items/Hour)', fontsize=14, fontweight='bold')
ax.set_ylabel('Warehouse Zone', fontsize=12)
ax.set_xlabel('Shift', fontsize=12)
plt.tight_layout()
plt.savefig('zone_heatmap.png', dpi=150)
plt.show()

Worker Productivity Distribution

Visualize the distribution of worker productivity (items per hour) segmented by experience level. This reveals whether the productivity gap is driven by tenure or other factors.

# Calculate per-worker metrics
worker_stats = picks.groupby('worker_id').agg(
    total_items=('items_picked', 'sum'),
    total_hours=('duration_hours', 'sum'),
    total_errors=('pick_error', 'sum'),
    total_picks=('pick_id', 'count')
).reset_index()
worker_stats['items_per_hour'] = worker_stats['total_items'] / worker_stats['total_hours']
worker_stats['error_rate'] = worker_stats['total_errors'] / worker_stats['total_picks'] * 100

# Merge with worker info for experience
worker_info = pd.read_csv('workers.csv', parse_dates=['hire_date'])
worker_stats = worker_stats.merge(worker_info[['worker_id', 'hire_date', 'role']], on='worker_id')
worker_stats['tenure_days'] = (pd.Timestamp.now() - worker_stats['hire_date']).dt.days
worker_stats['experience'] = pd.cut(worker_stats['tenure_days'],
    bins=[0, 90, 365, 9999],
    labels=['New (<90d)', 'Mid (90-365d)', 'Veteran (365+d)'])

# Filter to pickers with meaningful volume
pickers = worker_stats[(worker_stats['role'] == 'Picker') & (worker_stats['total_picks'] >= 50)]

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Productivity distribution by experience
colors = {'New (<90d)': '#ef4444', 'Mid (90-365d)': '#f59e0b', 'Veteran (365+d)': '#10b981'}
for exp, group in pickers.groupby('experience'):
    ax1.hist(group['items_per_hour'], bins=15, alpha=0.6, label=exp, color=colors.get(exp, '#999'))
ax1.set_xlabel('Items per Hour', fontsize=12)
ax1.set_ylabel('Number of Workers', fontsize=12)
ax1.set_title('Productivity Distribution by Experience', fontsize=13, fontweight='bold')
ax1.legend()

# Error rate vs productivity scatter
scatter = ax2.scatter(pickers['items_per_hour'], pickers['error_rate'],
                      c=pickers['tenure_days'], cmap='RdYlGn', s=50, alpha=0.7)
ax2.set_xlabel('Items per Hour', fontsize=12)
ax2.set_ylabel('Error Rate (%)', fontsize=12)
ax2.set_title('Speed vs. Accuracy by Worker', fontsize=13, fontweight='bold')
plt.colorbar(scatter, ax=ax2, label='Tenure (days)')
ax2.axhline(y=2, color='red', linestyle='--', alpha=0.5, label='2% Target')
ax2.legend()

plt.tight_layout()
plt.savefig('worker_productivity.png', dpi=150)
plt.show()

Shift Analysis and Bottleneck Visualization

Stacked bar chart showing the time breakdown (pick, wait, pack, ship) for each shift. This clearly illustrates where time is being lost in the process.

# Load order stage data
orders = pd.read_csv('orders.csv', parse_dates=['order_received', 'order_shipped'])
packing = pd.read_csv('packing.csv', parse_dates=['pack_start', 'pack_end'])

# Calculate stage durations per order
order_picks = picks.groupby('order_id').agg(
    pick_start=('pick_start', 'min'),
    pick_end=('pick_end', 'max')
).reset_index()

order_packs = packing.groupby('order_id').agg(
    pack_start=('pack_start', 'min'),
    pack_end=('pack_end', 'max')
).reset_index()

stages = orders.merge(order_picks, on='order_id').merge(order_packs, on='order_id')
stages['pick_min'] = (stages['pick_end'] - stages['pick_start']).dt.total_seconds() / 60
stages['wait_min'] = (stages['pack_start'] - stages['pick_end']).dt.total_seconds() / 60
stages['pack_min'] = (stages['pack_end'] - stages['pack_start']).dt.total_seconds() / 60
stages['ship_min'] = (stages['order_shipped'] - stages['pack_end']).dt.total_seconds() / 60

# Determine shift based on pick_start hour
stages['hour'] = stages['pick_start'].dt.hour
stages['shift'] = stages['hour'].apply(
    lambda h: 'Day' if 6 <= h < 14 else ('Swing' if 14 <= h < 22 else 'Night'))

# Average stage times by shift
shift_stages = stages.groupby('shift')[['pick_min', 'wait_min', 'pack_min', 'ship_min']].mean()

fig, ax = plt.subplots(figsize=(10, 6))
shift_stages.plot(kind='bar', stacked=True, ax=ax,
    color=['#3b82f6', '#f59e0b', '#10b981', '#8b5cf6'])
ax.set_ylabel('Average Minutes', fontsize=12)
ax.set_xlabel('Shift', fontsize=12)
ax.set_title('Order Processing Time Breakdown by Shift', fontsize=14, fontweight='bold')
ax.legend(['Pick', 'Wait', 'Pack', 'Ship'], loc='upper right')
ax.set_xticklabels(ax.get_xticklabels(), rotation=0)
plt.tight_layout()
plt.savefig('shift_bottleneck.png', dpi=150)
plt.show()

Excel Instructions

Build a warehouse KPI dashboard, worker scorecard, and zone performance tracker in Excel.

  1. Import Data: Import the SQL query results for zone performance, order processing breakdown, and worker productivity into separate Excel sheets. Name them "Zone KPIs", "Processing Times", and "Worker Scores" respectively.
  2. Warehouse KPI Dashboard: Create a "Dashboard" sheet. In large formatted cells, display: Total Orders Processed, Average Processing Time (min), Error Rate (%), and Items Picked per Hour. Use =AVERAGE('Processing Times'!G:G) type formulas linked to the data sheets. Apply conditional formatting: green if meeting target, red if not.
  3. Zone Performance Chart: From the Zone KPIs sheet, create a clustered bar chart with zones on the x-axis and two series: Items/Hour (left axis) and Error Rate % (right axis, secondary). Add a horizontal reference line at the 2% error target. Use data bars in the table to visually rank zones.
  4. Worker Scorecard: In the Worker Scores sheet, add a calculated column: =IF(AND(items_per_hour >= 120, error_rate <= 2), "Star", IF(OR(items_per_hour < 80, error_rate > 4), "Needs Coaching", "Meets Standard")). Use conditional formatting with icon sets (green star, yellow circle, red flag) to visually flag each worker.
  5. Shift Comparison: Create a stacked bar chart from the Processing Times data showing pick, wait, pack, and ship minutes for each shift. Add data labels showing the total. This makes it immediately clear which shift has the longest total time and which stage dominates.
  6. Trend Tracker: Create a weekly trend chart showing error rate and processing time over the last 12 weeks. Use sparklines in the KPI cells to show directional trends. Add a target line for each metric. This lets the warehouse manager track whether improvements are taking effect.
  7. Interactive Filters: Add slicers for Zone, Shift, and Worker Experience Level connected to PivotTables on each sheet. This allows managers to drill into specific segments and identify localized issues versus warehouse-wide problems.

Key Insights & Recommendations

Wait Time Bottleneck

The pick-to-pack wait time accounts for 35% of total processing time, averaging 22 minutes. This is the primary driver of the 20% increase. Implementing a conveyor handoff system or dedicated runners between zones could cut wait time by 50%.

Zone B3 Underperformance

Zone B3 (Bulk items) has the lowest pick rate at 68 items/hour, 40% below the warehouse average. The zone has the highest SKU density per aisle. Recommend reorganizing the zone layout using ABC analysis to place high-velocity SKUs in accessible locations.

Night Shift Quality Issue

Night shift error rates are 4.8%, more than double the day shift (2.1%). New employees (<90 days) on night shifts have a 7.3% error rate. Pair new hires with veteran mentors and improve zone lighting for the night shift.

Staffing Imbalance

Day shift has 45% of the workforce but handles only 38% of orders. Night shift is understaffed relative to volume by approximately 8 workers. Rebalancing shifts would reduce overtime costs and improve night shift throughput.

Training ROI

Workers who completed advanced training pick 28% faster with 60% fewer errors than untrained peers at the same tenure level. Expanding the training program to all workers within their first 60 days would yield an estimated ROI of 340% annually.

Express Order Priority

Same-Day orders average 48 minutes processing vs. 35 for Standard, despite the urgency. Implementing a dedicated express lane with pre-staged high-velocity items could reduce express processing to under 25 minutes.

Knowledge Check

Question 1

In the order processing breakdown query, what does the expression MIN(pk.pack_start) - MAX(p.pick_end) represent?

Question 2

What does NTILE(4) do in the worker productivity query?

Question 3

In the Python heatmap, what does sns.heatmap with annot=True accomplish?

Question 4

Why does the error analysis query use a CASE expression to bucket workers into experience levels instead of using the raw tenure_days value?

Question 5

In the Python scatter plot of speed vs. accuracy, what does a worker positioned in the upper-left quadrant (low items/hour, high error rate) indicate?

← Previous Project Next Project →