Warehouse Efficiency Analysis
Optimize pick, pack, and ship operations in a fulfillment warehouse by analyzing zone productivity, worker performance, shift patterns, and order processing bottlenecks.
Project Overview
Scenario: You are a warehouse operations analyst at FulFast Distribution, a 250,000 sq ft fulfillment center processing 8,000+ orders per day. The warehouse manager has noticed that order processing times have increased by 20% over the last quarter despite no increase in volume. Error rates on shipments have also climbed to 3.2%, above the 2% target. Your task is to identify the root causes and recommend operational improvements.
Objective: Analyze pick/pack/ship data across warehouse zones, workers, and shifts to pinpoint productivity bottlenecks, high-error zones, and staffing imbalances. Build visualizations and a KPI dashboard to support data-driven management decisions.
Tools: SQL (PostgreSQL), Python (pandas, matplotlib, seaborn, numpy), Excel
Dataset Description
Six tables modeling the warehouse fulfillment process from order receipt through picking, packing, and shipping.
orders
| Column | Type | Description |
|---|---|---|
| order_id | INT (PK) | Unique order identifier |
| order_received | TIMESTAMP | When order entered the system |
| order_shipped | TIMESTAMP | When order left the dock |
| total_items | INT | Number of line items |
| priority | VARCHAR(10) | Standard, Express, Same-Day |
| has_error | BOOLEAN | Whether a packing/shipping error occurred |
picks
| Column | Type | Description |
|---|---|---|
| pick_id | INT (PK) | Unique pick event identifier |
| order_id | INT (FK) | Associated order |
| worker_id | INT (FK) | Picker assigned |
| zone_id | INT (FK) | Warehouse zone |
| pick_start | TIMESTAMP | When pick task began |
| pick_end | TIMESTAMP | When pick task completed |
| items_picked | INT | Number of items picked |
| pick_error | BOOLEAN | Wrong item or quantity picked |
packing
| Column | Type | Description |
|---|---|---|
| pack_id | INT (PK) | Unique packing event |
| order_id | INT (FK) | Associated order |
| worker_id | INT (FK) | Packer assigned |
| pack_start | TIMESTAMP | When packing began |
| pack_end | TIMESTAMP | When packing completed |
| pack_error | BOOLEAN | Packing error occurred |
zones
| Column | Type | Description |
|---|---|---|
| zone_id | INT (PK) | Zone identifier |
| zone_name | VARCHAR(20) | Zone label (A1, A2, B1, etc.) |
| zone_type | VARCHAR(30) | Small Items, Bulk, Refrigerated, Fragile |
| aisle_count | INT | Number of aisles in zone |
| sku_count | INT | Number of unique SKUs stored |
workers
| Column | Type | Description |
|---|---|---|
| worker_id | INT (PK) | Unique worker identifier |
| worker_name | VARCHAR(50) | Employee name |
| role | VARCHAR(20) | Picker, Packer, Loader |
| hire_date | DATE | Date of hire |
| shift_id | INT (FK) | Assigned shift |
shifts
| Column | Type | Description |
|---|---|---|
| shift_id | INT (PK) | Shift identifier |
| shift_name | VARCHAR(20) | Day, Swing, Night |
| start_time | TIME | Shift start (e.g., 06:00) |
| end_time | TIME | Shift end (e.g., 14:00) |
SQL Analysis
Query 1: Pick Rates by Zone
Calculate the average number of items picked per hour in each warehouse zone. Low pick rates may indicate poor layout, congested aisles, or difficult-to-locate SKUs.
SELECT
z.zone_name,
z.zone_type,
z.sku_count,
COUNT(p.pick_id) AS total_picks,
SUM(p.items_picked) AS total_items_picked,
ROUND(
SUM(p.items_picked)::NUMERIC
/ NULLIF(SUM(EXTRACT(EPOCH FROM (p.pick_end - p.pick_start)) / 3600), 0), 1
) AS items_per_hour,
ROUND(
100.0 * SUM(CASE WHEN p.pick_error THEN 1 ELSE 0 END) / COUNT(p.pick_id), 2
) AS error_rate_pct
FROM picks p
JOIN zones z ON p.zone_id = z.zone_id
GROUP BY z.zone_name, z.zone_type, z.sku_count
ORDER BY items_per_hour DESC;
Query 2: Order Processing Time Breakdown
Break down the total order processing time into pick, pack, and wait stages. This reveals which stage is the primary bottleneck contributing to the 20% increase in processing time.
WITH order_stages AS (
SELECT
o.order_id,
o.priority,
MIN(p.pick_start) AS first_pick_start,
MAX(p.pick_end) AS last_pick_end,
MIN(pk.pack_start) AS pack_start,
MAX(pk.pack_end) AS pack_end,
o.order_shipped
FROM orders o
LEFT JOIN picks p ON o.order_id = p.order_id
LEFT JOIN packing pk ON o.order_id = pk.order_id
WHERE o.order_shipped IS NOT NULL
GROUP BY o.order_id, o.priority, o.order_shipped
)
SELECT
priority,
COUNT(order_id) AS orders_processed,
ROUND(AVG(EXTRACT(EPOCH FROM (last_pick_end - first_pick_start)) / 60), 1) AS avg_pick_min,
ROUND(AVG(EXTRACT(EPOCH FROM (pack_start - last_pick_end)) / 60), 1) AS avg_wait_min,
ROUND(AVG(EXTRACT(EPOCH FROM (pack_end - pack_start)) / 60), 1) AS avg_pack_min,
ROUND(AVG(EXTRACT(EPOCH FROM (order_shipped - pack_end)) / 60), 1) AS avg_ship_min,
ROUND(AVG(EXTRACT(EPOCH FROM (order_shipped - first_pick_start)) / 60), 1) AS avg_total_min
FROM order_stages
GROUP BY priority
ORDER BY avg_total_min DESC;
Query 3: Worker Productivity with Tenure
Compare worker productivity against their tenure. Newer employees may need additional training, while experienced workers who underperform may need different interventions.
SELECT
w.worker_id,
w.worker_name,
w.role,
s.shift_name,
(CURRENT_DATE - w.hire_date) AS tenure_days,
COUNT(p.pick_id) AS total_tasks,
SUM(p.items_picked) AS total_items,
ROUND(
SUM(p.items_picked)::NUMERIC
/ NULLIF(SUM(EXTRACT(EPOCH FROM (p.pick_end - p.pick_start)) / 3600), 0), 1
) AS items_per_hour,
ROUND(100.0 * SUM(CASE WHEN p.pick_error THEN 1 ELSE 0 END) / COUNT(p.pick_id), 2) AS error_rate,
NTILE(4) OVER (ORDER BY
SUM(p.items_picked)::NUMERIC
/ NULLIF(SUM(EXTRACT(EPOCH FROM (p.pick_end - p.pick_start)) / 3600), 0)
DESC
) AS productivity_quartile
FROM workers w
JOIN picks p ON w.worker_id = p.worker_id
JOIN shifts s ON w.shift_id = s.shift_id
WHERE w.role = 'Picker'
GROUP BY w.worker_id, w.worker_name, w.role, s.shift_name, w.hire_date
HAVING COUNT(p.pick_id) >= 50
ORDER BY items_per_hour DESC;
Query 4: Error Rates by Zone, Shift, and Worker Experience
Cross-reference error rates across three dimensions to identify specific hotspots. A zone with high errors only on the night shift suggests lighting or staffing issues rather than a zone layout problem.
SELECT
z.zone_name,
sh.shift_name,
CASE
WHEN (CURRENT_DATE - w.hire_date) < 90 THEN 'New (<90 days)'
WHEN (CURRENT_DATE - w.hire_date) BETWEEN 90 AND 365 THEN 'Mid (90-365 days)'
ELSE 'Veteran (365+ days)'
END AS experience_level,
COUNT(p.pick_id) AS total_picks,
SUM(CASE WHEN p.pick_error THEN 1 ELSE 0 END) AS errors,
ROUND(100.0 * SUM(CASE WHEN p.pick_error THEN 1 ELSE 0 END) / COUNT(p.pick_id), 2) AS error_pct
FROM picks p
JOIN zones z ON p.zone_id = z.zone_id
JOIN workers w ON p.worker_id = w.worker_id
JOIN shifts sh ON w.shift_id = sh.shift_id
GROUP BY z.zone_name, sh.shift_name,
CASE
WHEN (CURRENT_DATE - w.hire_date) < 90 THEN 'New (<90 days)'
WHEN (CURRENT_DATE - w.hire_date) BETWEEN 90 AND 365 THEN 'Mid (90-365 days)'
ELSE 'Veteran (365+ days)'
END
HAVING COUNT(p.pick_id) >= 20
ORDER BY error_pct DESC;
Python Analysis
Zone Performance Heatmap
Create a heatmap showing pick rates across zones and shifts. Hot zones (high activity) can be compared against cold zones to balance workload and identify layout optimization candidates.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Load data
picks = pd.read_csv('picks.csv', parse_dates=['pick_start', 'pick_end'])
zones = pd.read_csv('zones.csv')
workers = pd.read_csv('workers.csv')
shifts = pd.read_csv('shifts.csv')
# Calculate pick duration in hours
picks['duration_hours'] = (picks['pick_end'] - picks['pick_start']).dt.total_seconds() / 3600
# Merge with zones and shifts
picks = picks.merge(zones[['zone_id', 'zone_name']], on='zone_id')
picks = picks.merge(workers[['worker_id', 'shift_id']], on='worker_id')
picks = picks.merge(shifts[['shift_id', 'shift_name']], on='shift_id')
# Calculate items per hour by zone and shift
zone_shift = picks.groupby(['zone_name', 'shift_name']).agg(
total_items=('items_picked', 'sum'),
total_hours=('duration_hours', 'sum')
).reset_index()
zone_shift['items_per_hour'] = (zone_shift['total_items'] / zone_shift['total_hours']).round(1)
# Pivot for heatmap
pivot = zone_shift.pivot(index='zone_name', columns='shift_name', values='items_per_hour')
fig, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(pivot, annot=True, fmt='.0f', cmap='YlOrRd', linewidths=0.5,
cbar_kws={'label': 'Items/Hour'}, ax=ax)
ax.set_title('Pick Rate by Zone and Shift (Items/Hour)', fontsize=14, fontweight='bold')
ax.set_ylabel('Warehouse Zone', fontsize=12)
ax.set_xlabel('Shift', fontsize=12)
plt.tight_layout()
plt.savefig('zone_heatmap.png', dpi=150)
plt.show()
Worker Productivity Distribution
Visualize the distribution of worker productivity (items per hour) segmented by experience level. This reveals whether the productivity gap is driven by tenure or other factors.
# Calculate per-worker metrics
worker_stats = picks.groupby('worker_id').agg(
total_items=('items_picked', 'sum'),
total_hours=('duration_hours', 'sum'),
total_errors=('pick_error', 'sum'),
total_picks=('pick_id', 'count')
).reset_index()
worker_stats['items_per_hour'] = worker_stats['total_items'] / worker_stats['total_hours']
worker_stats['error_rate'] = worker_stats['total_errors'] / worker_stats['total_picks'] * 100
# Merge with worker info for experience
worker_info = pd.read_csv('workers.csv', parse_dates=['hire_date'])
worker_stats = worker_stats.merge(worker_info[['worker_id', 'hire_date', 'role']], on='worker_id')
worker_stats['tenure_days'] = (pd.Timestamp.now() - worker_stats['hire_date']).dt.days
worker_stats['experience'] = pd.cut(worker_stats['tenure_days'],
bins=[0, 90, 365, 9999],
labels=['New (<90d)', 'Mid (90-365d)', 'Veteran (365+d)'])
# Filter to pickers with meaningful volume
pickers = worker_stats[(worker_stats['role'] == 'Picker') & (worker_stats['total_picks'] >= 50)]
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))
# Productivity distribution by experience
colors = {'New (<90d)': '#ef4444', 'Mid (90-365d)': '#f59e0b', 'Veteran (365+d)': '#10b981'}
for exp, group in pickers.groupby('experience'):
ax1.hist(group['items_per_hour'], bins=15, alpha=0.6, label=exp, color=colors.get(exp, '#999'))
ax1.set_xlabel('Items per Hour', fontsize=12)
ax1.set_ylabel('Number of Workers', fontsize=12)
ax1.set_title('Productivity Distribution by Experience', fontsize=13, fontweight='bold')
ax1.legend()
# Error rate vs productivity scatter
scatter = ax2.scatter(pickers['items_per_hour'], pickers['error_rate'],
c=pickers['tenure_days'], cmap='RdYlGn', s=50, alpha=0.7)
ax2.set_xlabel('Items per Hour', fontsize=12)
ax2.set_ylabel('Error Rate (%)', fontsize=12)
ax2.set_title('Speed vs. Accuracy by Worker', fontsize=13, fontweight='bold')
plt.colorbar(scatter, ax=ax2, label='Tenure (days)')
ax2.axhline(y=2, color='red', linestyle='--', alpha=0.5, label='2% Target')
ax2.legend()
plt.tight_layout()
plt.savefig('worker_productivity.png', dpi=150)
plt.show()
Shift Analysis and Bottleneck Visualization
Stacked bar chart showing the time breakdown (pick, wait, pack, ship) for each shift. This clearly illustrates where time is being lost in the process.
# Load order stage data
orders = pd.read_csv('orders.csv', parse_dates=['order_received', 'order_shipped'])
packing = pd.read_csv('packing.csv', parse_dates=['pack_start', 'pack_end'])
# Calculate stage durations per order
order_picks = picks.groupby('order_id').agg(
pick_start=('pick_start', 'min'),
pick_end=('pick_end', 'max')
).reset_index()
order_packs = packing.groupby('order_id').agg(
pack_start=('pack_start', 'min'),
pack_end=('pack_end', 'max')
).reset_index()
stages = orders.merge(order_picks, on='order_id').merge(order_packs, on='order_id')
stages['pick_min'] = (stages['pick_end'] - stages['pick_start']).dt.total_seconds() / 60
stages['wait_min'] = (stages['pack_start'] - stages['pick_end']).dt.total_seconds() / 60
stages['pack_min'] = (stages['pack_end'] - stages['pack_start']).dt.total_seconds() / 60
stages['ship_min'] = (stages['order_shipped'] - stages['pack_end']).dt.total_seconds() / 60
# Determine shift based on pick_start hour
stages['hour'] = stages['pick_start'].dt.hour
stages['shift'] = stages['hour'].apply(
lambda h: 'Day' if 6 <= h < 14 else ('Swing' if 14 <= h < 22 else 'Night'))
# Average stage times by shift
shift_stages = stages.groupby('shift')[['pick_min', 'wait_min', 'pack_min', 'ship_min']].mean()
fig, ax = plt.subplots(figsize=(10, 6))
shift_stages.plot(kind='bar', stacked=True, ax=ax,
color=['#3b82f6', '#f59e0b', '#10b981', '#8b5cf6'])
ax.set_ylabel('Average Minutes', fontsize=12)
ax.set_xlabel('Shift', fontsize=12)
ax.set_title('Order Processing Time Breakdown by Shift', fontsize=14, fontweight='bold')
ax.legend(['Pick', 'Wait', 'Pack', 'Ship'], loc='upper right')
ax.set_xticklabels(ax.get_xticklabels(), rotation=0)
plt.tight_layout()
plt.savefig('shift_bottleneck.png', dpi=150)
plt.show()
Excel Instructions
Build a warehouse KPI dashboard, worker scorecard, and zone performance tracker in Excel.
- Import Data: Import the SQL query results for zone performance, order processing breakdown, and worker productivity into separate Excel sheets. Name them "Zone KPIs", "Processing Times", and "Worker Scores" respectively.
- Warehouse KPI Dashboard: Create a "Dashboard" sheet. In large formatted cells, display: Total Orders Processed, Average Processing Time (min), Error Rate (%), and Items Picked per Hour. Use
=AVERAGE('Processing Times'!G:G)type formulas linked to the data sheets. Apply conditional formatting: green if meeting target, red if not. - Zone Performance Chart: From the Zone KPIs sheet, create a clustered bar chart with zones on the x-axis and two series: Items/Hour (left axis) and Error Rate % (right axis, secondary). Add a horizontal reference line at the 2% error target. Use data bars in the table to visually rank zones.
- Worker Scorecard: In the Worker Scores sheet, add a calculated column:
=IF(AND(items_per_hour >= 120, error_rate <= 2), "Star", IF(OR(items_per_hour < 80, error_rate > 4), "Needs Coaching", "Meets Standard")). Use conditional formatting with icon sets (green star, yellow circle, red flag) to visually flag each worker. - Shift Comparison: Create a stacked bar chart from the Processing Times data showing pick, wait, pack, and ship minutes for each shift. Add data labels showing the total. This makes it immediately clear which shift has the longest total time and which stage dominates.
- Trend Tracker: Create a weekly trend chart showing error rate and processing time over the last 12 weeks. Use sparklines in the KPI cells to show directional trends. Add a target line for each metric. This lets the warehouse manager track whether improvements are taking effect.
- Interactive Filters: Add slicers for Zone, Shift, and Worker Experience Level connected to PivotTables on each sheet. This allows managers to drill into specific segments and identify localized issues versus warehouse-wide problems.
Key Insights & Recommendations
Wait Time Bottleneck
The pick-to-pack wait time accounts for 35% of total processing time, averaging 22 minutes. This is the primary driver of the 20% increase. Implementing a conveyor handoff system or dedicated runners between zones could cut wait time by 50%.
Zone B3 Underperformance
Zone B3 (Bulk items) has the lowest pick rate at 68 items/hour, 40% below the warehouse average. The zone has the highest SKU density per aisle. Recommend reorganizing the zone layout using ABC analysis to place high-velocity SKUs in accessible locations.
Night Shift Quality Issue
Night shift error rates are 4.8%, more than double the day shift (2.1%). New employees (<90 days) on night shifts have a 7.3% error rate. Pair new hires with veteran mentors and improve zone lighting for the night shift.
Staffing Imbalance
Day shift has 45% of the workforce but handles only 38% of orders. Night shift is understaffed relative to volume by approximately 8 workers. Rebalancing shifts would reduce overtime costs and improve night shift throughput.
Training ROI
Workers who completed advanced training pick 28% faster with 60% fewer errors than untrained peers at the same tenure level. Expanding the training program to all workers within their first 60 days would yield an estimated ROI of 340% annually.
Express Order Priority
Same-Day orders average 48 minutes processing vs. 35 for Standard, despite the urgency. Implementing a dedicated express lane with pre-staged high-velocity items could reduce express processing to under 25 minutes.
Knowledge Check
In the order processing breakdown query, what does the expression MIN(pk.pack_start) - MAX(p.pick_end) represent?
What does NTILE(4) do in the worker productivity query?
In the Python heatmap, what does sns.heatmap with annot=True accomplish?
Why does the error analysis query use a CASE expression to bucket workers into experience levels instead of using the raw tenure_days value?
In the Python scatter plot of speed vs. accuracy, what does a worker positioned in the upper-left quadrant (low items/hour, high error rate) indicate?