Project 26

Supply Chain Dashboard

Build an end-to-end supply chain visibility dashboard that tracks order fulfillment, supplier performance, and logistics bottlenecks for a mid-size consumer goods company.

Project Overview

Scenario: You are a supply chain analyst at NovaPack Industries, a consumer goods manufacturer that ships to 500+ retail locations nationwide. The VP of Operations has requested a comprehensive dashboard to gain end-to-end visibility into order fulfillment, identify bottlenecks in the supply chain, and improve on-time delivery rates which have fallen to 78% over the past quarter.

Objective: Analyze order fulfillment data across suppliers, warehouses, and shipment routes to build a KPI-driven dashboard that highlights fulfillment rates, lead times, on-time delivery performance, and supplier reliability.

Tools: SQL (PostgreSQL), Python (pandas, matplotlib, seaborn), Excel

Dataset Description

The project uses five interrelated tables reflecting a typical supply chain data model.

orders

ColumnTypeDescription
order_idINT (PK)Unique order identifier
customer_idINT (FK)Reference to customer
order_dateDATEDate order was placed
required_dateDATECustomer-requested delivery date
warehouse_idINT (FK)Fulfilling warehouse
statusVARCHAR(20)Order status (Pending, Shipped, Delivered, Cancelled)
total_amountDECIMAL(10,2)Total order value in USD

shipments

ColumnTypeDescription
shipment_idINT (PK)Unique shipment identifier
order_idINT (FK)Associated order
ship_dateDATEDate shipment left warehouse
delivery_dateDATEActual delivery date
carrierVARCHAR(50)Shipping carrier name
shipping_costDECIMAL(8,2)Cost of shipment

suppliers

ColumnTypeDescription
supplier_idINT (PK)Unique supplier identifier
supplier_nameVARCHAR(100)Company name
regionVARCHAR(50)Geographic region
lead_time_daysINTAverage supplier lead time
reliability_scoreDECIMAL(3,2)Score from 0.00 to 1.00

warehouses

ColumnTypeDescription
warehouse_idINT (PK)Unique warehouse identifier
warehouse_nameVARCHAR(100)Warehouse name
cityVARCHAR(50)Location city
capacityINTMax storage units

inventory

ColumnTypeDescription
inventory_idINT (PK)Unique record identifier
warehouse_idINT (FK)Warehouse location
supplier_idINT (FK)Source supplier
product_nameVARCHAR(100)Product name
quantity_on_handINTCurrent stock level
reorder_pointINTMinimum before reorder trigger

SQL Analysis

Query 1: Order Fulfillment Rates by Warehouse

Calculate the percentage of orders that were successfully delivered versus cancelled for each warehouse, along with total revenue. This helps identify which fulfillment centers perform best.

SELECT
    w.warehouse_name,
    w.city,
    COUNT(o.order_id) AS total_orders,
    SUM(CASE WHEN o.status = 'Delivered' THEN 1 ELSE 0 END) AS delivered,
    SUM(CASE WHEN o.status = 'Cancelled' THEN 1 ELSE 0 END) AS cancelled,
    ROUND(
        100.0 * SUM(CASE WHEN o.status = 'Delivered' THEN 1 ELSE 0 END)
        / NULLIF(COUNT(o.order_id), 0), 1
    ) AS fulfillment_rate_pct,
    SUM(o.total_amount) AS total_revenue
FROM orders o
JOIN warehouses w ON o.warehouse_id = w.warehouse_id
GROUP BY w.warehouse_name, w.city
ORDER BY fulfillment_rate_pct DESC;

Query 2: Average Lead Time by Supplier

Measure the average number of days from order placement to shipment departure for each supplier. Suppliers with consistently high lead times may need contract renegotiation or replacement.

SELECT
    s.supplier_name,
    s.region,
    COUNT(DISTINCT o.order_id) AS orders_supplied,
    ROUND(AVG(sh.ship_date - o.order_date), 1) AS avg_processing_days,
    ROUND(AVG(sh.delivery_date - sh.ship_date), 1) AS avg_transit_days,
    ROUND(AVG(sh.delivery_date - o.order_date), 1) AS avg_total_lead_time,
    s.reliability_score
FROM suppliers s
JOIN inventory i ON s.supplier_id = i.supplier_id
JOIN orders o ON o.warehouse_id = i.warehouse_id
JOIN shipments sh ON o.order_id = sh.order_id
WHERE o.status = 'Delivered'
GROUP BY s.supplier_name, s.region, s.reliability_score
ORDER BY avg_total_lead_time ASC;

Query 3: On-Time Delivery Percentage (Monthly Trend)

Track on-time delivery rates over time to spot seasonal patterns or deteriorating performance. An order is on-time if the actual delivery date is on or before the customer's required date.

SELECT
    DATE_TRUNC('month', o.order_date) AS order_month,
    COUNT(o.order_id) AS total_delivered,
    SUM(CASE
        WHEN sh.delivery_date <= o.required_date THEN 1
        ELSE 0
    END) AS on_time_count,
    ROUND(
        100.0 * SUM(CASE WHEN sh.delivery_date <= o.required_date THEN 1 ELSE 0 END)
        / NULLIF(COUNT(o.order_id), 0), 1
    ) AS on_time_pct,
    ROUND(AVG(sh.delivery_date - o.required_date), 1) AS avg_days_variance
FROM orders o
JOIN shipments sh ON o.order_id = sh.order_id
WHERE o.status = 'Delivered'
GROUP BY DATE_TRUNC('month', o.order_date)
ORDER BY order_month;

Query 4: Bottleneck Identification - Warehouses with Low Stock

Identify products that have fallen below their reorder point, flagging potential stockout risks. Window functions rank the urgency by how far below the reorder threshold each item has fallen.

SELECT
    w.warehouse_name,
    i.product_name,
    i.quantity_on_hand,
    i.reorder_point,
    (i.reorder_point - i.quantity_on_hand) AS shortage_units,
    s.supplier_name,
    s.lead_time_days,
    RANK() OVER (
        PARTITION BY w.warehouse_id
        ORDER BY (i.reorder_point - i.quantity_on_hand) DESC
    ) AS urgency_rank
FROM inventory i
JOIN warehouses w ON i.warehouse_id = w.warehouse_id
JOIN suppliers s ON i.supplier_id = s.supplier_id
WHERE i.quantity_on_hand < i.reorder_point
ORDER BY shortage_units DESC;

Python Analysis

Lead Time Distribution by Supplier

Visualize the spread of lead times across suppliers to identify consistency issues. A supplier with a tight distribution is more predictable than one with a wide spread, even if their average is similar.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Load data
orders = pd.read_csv('orders.csv', parse_dates=['order_date', 'required_date'])
shipments = pd.read_csv('shipments.csv', parse_dates=['ship_date', 'delivery_date'])
suppliers = pd.read_csv('suppliers.csv')
inventory = pd.read_csv('inventory.csv')

# Merge orders with shipments
df = orders.merge(shipments, on='order_id')
df['lead_time'] = (df['delivery_date'] - df['order_date']).dt.days

# Merge with inventory and suppliers
inv_supp = inventory.merge(suppliers, on='supplier_id')
df = df.merge(inv_supp[['warehouse_id', 'supplier_name']].drop_duplicates(),
              on='warehouse_id')

# Box plot of lead times by supplier
fig, ax = plt.subplots(figsize=(12, 6))
supplier_groups = df.groupby('supplier_name')['lead_time']
labels = []
data = []
for name, group in supplier_groups:
    labels.append(name)
    data.append(group.values)

bp = ax.boxplot(data, labels=labels, patch_artist=True,
                boxprops=dict(facecolor='#3b82f6', alpha=0.7),
                medianprops=dict(color='#ef4444', linewidth=2))
ax.set_xlabel('Supplier', fontsize=12)
ax.set_ylabel('Lead Time (Days)', fontsize=12)
ax.set_title('Lead Time Distribution by Supplier', fontsize=14, fontweight='bold')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.savefig('lead_time_distribution.png', dpi=150)
plt.show()

Fulfillment Timeline (Gantt-Style Visualization)

Create a horizontal bar chart showing order-to-delivery timelines for a sample of orders. Each bar spans from the order date to the delivery date, colored by whether the order was on-time or late.

# Fulfillment timeline for recent orders
sample = df.sort_values('order_date', ascending=False).head(25).copy()
sample['is_on_time'] = sample['delivery_date'] <= sample['required_date']
sample = sample.sort_values('order_date')

fig, ax = plt.subplots(figsize=(14, 8))
for idx, (_, row) in enumerate(sample.iterrows()):
    start = row['order_date']
    duration = (row['delivery_date'] - row['order_date']).days
    color = '#10b981' if row['is_on_time'] else '#ef4444'
    ax.barh(idx, duration, left=start.toordinal(), height=0.6,
            color=color, alpha=0.8, edgecolor='white')

ax.set_yticks(range(len(sample)))
ax.set_yticklabels([f"Order {oid}" for oid in sample['order_id']], fontsize=9)
ax.set_xlabel('Timeline', fontsize=12)
ax.set_title('Order Fulfillment Timeline (Green = On-Time, Red = Late)',
             fontsize=14, fontweight='bold')

# Custom x-axis with readable dates
import matplotlib.dates as mdates
ax.xaxis_date()
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))
plt.tight_layout()
plt.savefig('fulfillment_gantt.png', dpi=150)
plt.show()

Supplier Performance Comparison

Compare suppliers across multiple dimensions: average lead time, fulfillment rate, reliability score, and cost efficiency. A grouped bar chart makes it easy to spot which suppliers excel in different areas.

# Supplier performance metrics
perf = df.groupby('supplier_name').agg(
    avg_lead_time=('lead_time', 'mean'),
    on_time_rate=('is_on_time', 'mean'),
    total_orders=('order_id', 'nunique'),
    avg_shipping_cost=('shipping_cost', 'mean')
).reset_index()
perf = perf.merge(suppliers[['supplier_name', 'reliability_score']], on='supplier_name')

# Normalize metrics to 0-100 scale for comparison
perf['lead_time_score'] = 100 - (perf['avg_lead_time'] / perf['avg_lead_time'].max() * 100)
perf['on_time_score'] = perf['on_time_rate'] * 100
perf['reliability'] = perf['reliability_score'] * 100
perf['cost_score'] = 100 - (perf['avg_shipping_cost'] / perf['avg_shipping_cost'].max() * 100)

# Grouped bar chart
metrics = ['lead_time_score', 'on_time_score', 'reliability', 'cost_score']
metric_labels = ['Lead Time', 'On-Time %', 'Reliability', 'Cost Efficiency']
x = np.arange(len(perf))
width = 0.2

fig, ax = plt.subplots(figsize=(14, 7))
colors = ['#3b82f6', '#10b981', '#f59e0b', '#8b5cf6']
for i, (metric, label) in enumerate(zip(metrics, metric_labels)):
    ax.bar(x + i * width, perf[metric], width, label=label, color=colors[i], alpha=0.85)

ax.set_xticks(x + width * 1.5)
ax.set_xticklabels(perf['supplier_name'], rotation=45, ha='right')
ax.set_ylabel('Score (0-100)', fontsize=12)
ax.set_title('Supplier Performance Comparison', fontsize=14, fontweight='bold')
ax.legend(loc='upper right')
ax.set_ylim(0, 110)
plt.tight_layout()
plt.savefig('supplier_performance.png', dpi=150)
plt.show()

Excel Instructions

Build a supply chain KPI dashboard, supplier scorecard, and order tracking workbook in Excel.

  1. Import Data: Open Excel and create a new workbook. Import the orders, shipments, suppliers, warehouses, and inventory CSV files into separate sheets. Use Data > Get Data > From Text/CSV for each file. Name the sheets accordingly.
  2. Create Relationships: Go to Data > Relationships. Link orders to shipments on order_id, orders to warehouses on warehouse_id, and inventory to suppliers on supplier_id. This enables cross-table PivotTables.
  3. KPI Summary Sheet: Create a new sheet called "KPI Dashboard." In cell B2, calculate total orders with =COUNTA(orders[order_id]). In B3, use =COUNTIF(orders[status],"Delivered")/COUNTA(orders[order_id]) for fulfillment rate. In B4, compute average lead time with =AVERAGE(shipments[delivery_date]-orders[order_date]) using a helper column if needed.
  4. On-Time Delivery Tracker: Add a helper column in the orders sheet: =IF(shipments[delivery_date]<=orders[required_date],"On-Time","Late"). Create a PivotTable grouped by month showing the count of On-Time vs Late deliveries. Insert a line chart to visualize the monthly trend.
  5. Supplier Scorecard: Create a PivotTable with supplier_name as rows. Add calculated fields for average lead time, on-time percentage, and order count. Use conditional formatting (Data Bars or Color Scales) to highlight best and worst performers. Add a slicer for region to enable interactive filtering.
  6. Inventory Alerts: On the inventory sheet, add a column: =IF(quantity_on_hand < reorder_point, "REORDER", "OK"). Apply conditional formatting to highlight REORDER rows in red. Create a PivotTable counting alerts by warehouse.
  7. Dashboard Layout: Arrange the KPI cards, on-time trend chart, supplier scorecard, and inventory alerts on a single dashboard sheet. Use cell references and charts linked to the PivotTables. Add slicers for warehouse, carrier, and date range to make the dashboard interactive.

Key Insights & Recommendations

On-Time Delivery Decline

On-time delivery dropped from 91% to 78% over the past quarter, primarily driven by two warehouses in the Southeast region. Recommend auditing carrier contracts and warehouse staffing levels in those locations.

Carrier Performance Gap

Express carriers maintain 94% on-time rates versus 71% for economy carriers. For high-priority orders, route through express carriers despite the 15% cost premium to protect customer satisfaction.

Inventory Stockout Risk

17 products across 3 warehouses are below reorder points, with an average supplier lead time of 12 days. Trigger immediate reorders and consider safety stock adjustments for high-velocity items.

Supplier Consolidation

Top 3 suppliers (by reliability score) handle 60% of volume with a 92% on-time rate. Bottom 4 suppliers average 68% on-time. Consider consolidating volume with top performers to improve overall supply chain reliability.

Shipping Cost Optimization

Average shipping cost per order is $47, but varies from $28 to $89 by route. Consolidating shipments from the Dallas warehouse (lowest cost) for Southwest orders could save an estimated $120K annually.

Seasonal Bottleneck

Lead times increase 35% during Q4 holiday season. Pre-positioning inventory in September and establishing backup carrier agreements would mitigate seasonal delays and prevent the fulfillment rate dip seen last year.

Knowledge Check

Question 1

In the SQL query for order fulfillment rates, what does the NULLIF(COUNT(o.order_id), 0) function prevent?

Question 2

What is the purpose of the RANK() window function in the bottleneck identification query?

Question 3

In the Python lead time distribution analysis, why is a box plot preferred over a simple bar chart of averages?

Question 4

What does the DATE_TRUNC('month', o.order_date) function do in the on-time delivery trend query?

Question 5

In the supplier performance comparison chart, why are metrics normalized to a 0-100 scale?

← Previous Project Next Project →