Project 28

Logistics Route Optimization

Analyze delivery routes for a logistics company to optimize costs, reduce transit times, and improve driver utilization using SQL analytics and Python visualization.

Project Overview

Scenario: You are a logistics analyst at SwiftHaul Logistics, a regional delivery company operating a fleet of 85 vehicles across the Midwest. The operations director wants to reduce delivery costs by 15% while maintaining current service levels. Current average cost per delivery is $34.50, and the on-time rate is 82%. Fuel costs have risen 18% year-over-year, making route efficiency critical.

Objective: Analyze delivery performance across routes, drivers, and vehicle types to identify inefficiencies. Build visualizations that highlight optimal route assignments and cost-saving opportunities. This is a SQL + Python focused project.

Tools: SQL (PostgreSQL), Python (pandas, matplotlib, numpy, folium)

Dataset Description

Five tables modeling the logistics operation including delivery records, route definitions, fleet details, customer locations, and a distance matrix.

deliveries

ColumnTypeDescription
delivery_idINT (PK)Unique delivery identifier
route_idINT (FK)Assigned route
vehicle_idINT (FK)Vehicle used
driver_idINTDriver identifier
customer_idINT (FK)Destination customer
dispatch_timeTIMESTAMPWhen vehicle left depot
delivery_timeTIMESTAMPWhen delivery was completed
promised_timeTIMESTAMPPromised delivery window end
fuel_costDECIMAL(8,2)Fuel consumed in USD
total_costDECIMAL(8,2)Total delivery cost (fuel + labor + tolls)
packagesINTNumber of packages delivered

routes

ColumnTypeDescription
route_idINT (PK)Unique route identifier
route_nameVARCHAR(50)Route label (e.g., "CHI-DET-01")
origin_cityVARCHAR(50)Departure hub city
destination_cityVARCHAR(50)Destination city
distance_milesDECIMAL(7,1)Total route distance
route_typeVARCHAR(20)Urban, Suburban, Highway, Mixed

vehicles

ColumnTypeDescription
vehicle_idINT (PK)Unique vehicle identifier
vehicle_typeVARCHAR(30)Van, Box Truck, Semi, Sprinter
fuel_efficiency_mpgDECIMAL(4,1)Miles per gallon rating
max_capacity_lbsINTMaximum payload in pounds
year_manufacturedINTVehicle model year

customers

ColumnTypeDescription
customer_idINT (PK)Unique customer identifier
customer_nameVARCHAR(100)Business name
cityVARCHAR(50)Customer city
latitudeDECIMAL(9,6)GPS latitude
longitudeDECIMAL(9,6)GPS longitude

distance_matrix

ColumnTypeDescription
origin_idINT (FK)Origin customer/depot ID
destination_idINT (FK)Destination customer/depot ID
distance_milesDECIMAL(7,1)Driving distance between points
est_drive_minutesINTEstimated driving time

SQL Analysis

Query 1: Average Delivery Time and Cost by Route

Rank routes by average delivery duration and cost to find the most and least efficient corridors. Routes with high costs relative to their distance suggest inefficiencies in vehicle assignment or traffic delays.

SELECT
    r.route_name,
    r.origin_city,
    r.destination_city,
    r.distance_miles,
    r.route_type,
    COUNT(d.delivery_id) AS total_deliveries,
    ROUND(AVG(EXTRACT(EPOCH FROM (d.delivery_time - d.dispatch_time)) / 60), 1) AS avg_delivery_min,
    ROUND(AVG(d.total_cost), 2) AS avg_cost,
    ROUND(AVG(d.total_cost / r.distance_miles), 2) AS cost_per_mile,
    ROUND(AVG(d.fuel_cost), 2) AS avg_fuel_cost
FROM deliveries d
JOIN routes r ON d.route_id = r.route_id
GROUP BY r.route_name, r.origin_city, r.destination_city, r.distance_miles, r.route_type
ORDER BY cost_per_mile DESC;

Query 2: Cost Per Delivery and On-Time Rate by Vehicle Type

Determine which vehicle types deliver the best cost efficiency and reliability. This guides fleet procurement decisions and optimal vehicle-to-route matching.

SELECT
    v.vehicle_type,
    v.fuel_efficiency_mpg,
    COUNT(d.delivery_id) AS deliveries,
    ROUND(AVG(d.total_cost), 2) AS avg_cost_per_delivery,
    ROUND(AVG(d.fuel_cost), 2) AS avg_fuel_per_delivery,
    ROUND(
        100.0 * SUM(CASE WHEN d.delivery_time <= d.promised_time THEN 1 ELSE 0 END)
        / COUNT(d.delivery_id), 1
    ) AS on_time_pct,
    ROUND(AVG(d.packages), 1) AS avg_packages,
    ROUND(AVG(d.total_cost / NULLIF(d.packages, 0)), 2) AS cost_per_package
FROM deliveries d
JOIN vehicles v ON d.vehicle_id = v.vehicle_id
GROUP BY v.vehicle_type, v.fuel_efficiency_mpg
ORDER BY cost_per_package ASC;

Query 3: Driver Performance Ranking

Rank drivers by a composite score combining on-time rate, cost efficiency, and delivery volume. Window functions compute percentile rankings within each metric.

WITH driver_metrics AS (
    SELECT
        d.driver_id,
        COUNT(d.delivery_id) AS total_deliveries,
        ROUND(AVG(d.total_cost), 2) AS avg_cost,
        ROUND(
            100.0 * SUM(CASE WHEN d.delivery_time <= d.promised_time THEN 1 ELSE 0 END)
            / COUNT(d.delivery_id), 1
        ) AS on_time_pct,
        ROUND(AVG(EXTRACT(EPOCH FROM (d.delivery_time - d.dispatch_time)) / 60), 1) AS avg_time_min
    FROM deliveries d
    GROUP BY d.driver_id
    HAVING COUNT(d.delivery_id) >= 20
)
SELECT
    driver_id,
    total_deliveries,
    avg_cost,
    on_time_pct,
    avg_time_min,
    PERCENT_RANK() OVER (ORDER BY on_time_pct DESC) AS on_time_percentile,
    PERCENT_RANK() OVER (ORDER BY avg_cost ASC) AS cost_percentile,
    ROUND(
        0.4 * PERCENT_RANK() OVER (ORDER BY on_time_pct DESC)
        + 0.4 * PERCENT_RANK() OVER (ORDER BY avg_cost ASC)
        + 0.2 * PERCENT_RANK() OVER (ORDER BY total_deliveries DESC)
    , 3) AS composite_score
FROM driver_metrics
ORDER BY composite_score ASC
LIMIT 20;

Query 4: Route Efficiency by Day of Week and Time Slot

Identify temporal patterns in delivery performance. Some routes may be significantly faster on certain days or time slots due to traffic patterns, enabling smarter scheduling.

SELECT
    r.route_name,
    TO_CHAR(d.dispatch_time, 'Day') AS day_of_week,
    CASE
        WHEN EXTRACT(HOUR FROM d.dispatch_time) BETWEEN 6 AND 9 THEN 'Early Morning'
        WHEN EXTRACT(HOUR FROM d.dispatch_time) BETWEEN 10 AND 13 THEN 'Midday'
        WHEN EXTRACT(HOUR FROM d.dispatch_time) BETWEEN 14 AND 17 THEN 'Afternoon'
        ELSE 'Evening'
    END AS time_slot,
    COUNT(d.delivery_id) AS deliveries,
    ROUND(AVG(EXTRACT(EPOCH FROM (d.delivery_time - d.dispatch_time)) / 60), 1) AS avg_min,
    ROUND(AVG(d.total_cost), 2) AS avg_cost,
    ROUND(
        100.0 * SUM(CASE WHEN d.delivery_time <= d.promised_time THEN 1 ELSE 0 END)
        / COUNT(d.delivery_id), 1
    ) AS on_time_pct
FROM deliveries d
JOIN routes r ON d.route_id = r.route_id
GROUP BY r.route_name, TO_CHAR(d.dispatch_time, 'Day'),
    CASE
        WHEN EXTRACT(HOUR FROM d.dispatch_time) BETWEEN 6 AND 9 THEN 'Early Morning'
        WHEN EXTRACT(HOUR FROM d.dispatch_time) BETWEEN 10 AND 13 THEN 'Midday'
        WHEN EXTRACT(HOUR FROM d.dispatch_time) BETWEEN 14 AND 17 THEN 'Afternoon'
        ELSE 'Evening'
    END
HAVING COUNT(d.delivery_id) >= 10
ORDER BY r.route_name, avg_min ASC;

Python Analysis

Delivery Route Visualization on Map

Plot delivery locations on an interactive map using folium. Color-code markers by on-time status and draw route lines from the depot to each customer. This reveals geographic clusters and potential route consolidation opportunities.

import pandas as pd
import numpy as np
import folium
from folium.plugins import MarkerCluster

# Load data
deliveries = pd.read_csv('deliveries.csv', parse_dates=['dispatch_time', 'delivery_time', 'promised_time'])
customers = pd.read_csv('customers.csv')
routes = pd.read_csv('routes.csv')

# Merge delivery data with customer locations
df = deliveries.merge(customers, on='customer_id')
df['is_on_time'] = df['delivery_time'] <= df['promised_time']

# Create base map centered on Midwest
m = folium.Map(location=[41.8781, -87.6298], zoom_start=7, tiles='cartodbpositron')
marker_cluster = MarkerCluster().add_to(m)

# Add delivery markers
for _, row in df.iterrows():
    color = 'green' if row['is_on_time'] else 'red'
    folium.CircleMarker(
        location=[row['latitude'], row['longitude']],
        radius=4,
        color=color,
        fill=True,
        fill_opacity=0.7,
        popup=f"Customer: {row['customer_name']}<br>"
              f"Cost: ${row['total_cost']:.2f}<br>"
              f"On-Time: {'Yes' if row['is_on_time'] else 'No'}"
    ).add_to(marker_cluster)

# Add depot marker
folium.Marker(
    [41.8781, -87.6298],
    popup='Chicago Depot (HQ)',
    icon=folium.Icon(color='blue', icon='warehouse', prefix='fa')
).add_to(m)

m.save('delivery_map.html')
print("Map saved to delivery_map.html")

Distance vs. Cost Analysis with Optimal Frontier

Scatter plot of distance versus total cost for each delivery, with a linear regression line showing the expected cost. Deliveries far above the line are candidates for investigation -- they are more expensive than their distance warrants.

import matplotlib.pyplot as plt
from numpy.polynomial import polynomial as P

# Merge with route distances
df = df.merge(routes[['route_id', 'distance_miles']], on='route_id')

fig, ax = plt.subplots(figsize=(12, 7))

# Scatter by on-time status
on_time = df[df['is_on_time']]
late = df[~df['is_on_time']]
ax.scatter(on_time['distance_miles'], on_time['total_cost'],
           alpha=0.4, s=20, color='#10b981', label='On-Time')
ax.scatter(late['distance_miles'], late['total_cost'],
           alpha=0.4, s=20, color='#ef4444', label='Late')

# Regression line (expected cost frontier)
coeffs = np.polyfit(df['distance_miles'], df['total_cost'], 1)
x_line = np.linspace(df['distance_miles'].min(), df['distance_miles'].max(), 100)
y_line = np.polyval(coeffs, x_line)
ax.plot(x_line, y_line, '--', color='#3b82f6', linewidth=2,
        label=f'Expected (${coeffs[0]:.2f}/mile + ${coeffs[1]:.2f})')

# Highlight outliers (cost > 1.5x expected)
df['expected_cost'] = np.polyval(coeffs, df['distance_miles'])
outliers = df[df['total_cost'] > 1.5 * df['expected_cost']]
ax.scatter(outliers['distance_miles'], outliers['total_cost'],
           s=60, facecolors='none', edgecolors='#f59e0b', linewidths=2,
           label=f'Cost Outliers ({len(outliers)})')

ax.set_xlabel('Route Distance (miles)', fontsize=12)
ax.set_ylabel('Total Delivery Cost ($)', fontsize=12)
ax.set_title('Delivery Cost vs. Distance Analysis', fontsize=14, fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('cost_vs_distance.png', dpi=150)
plt.show()

Cost-Benefit Analysis: Vehicle Type by Route Type

A heatmap showing the average cost per delivery for each vehicle type on each route type. This reveals which vehicle-route combinations are optimal and which are wasteful (e.g., using a semi truck on an urban route).

# Merge with vehicles and routes
df = df.merge(
    pd.read_csv('vehicles.csv')[['vehicle_id', 'vehicle_type']],
    on='vehicle_id'
)

# Pivot: average cost by vehicle type x route type
pivot = df.pivot_table(
    values='total_cost',
    index='vehicle_type',
    columns='route_type',
    aggfunc='mean'
).round(2)

fig, ax = plt.subplots(figsize=(10, 6))
im = ax.imshow(pivot.values, cmap='RdYlGn_r', aspect='auto')

# Labels
ax.set_xticks(range(len(pivot.columns)))
ax.set_xticklabels(pivot.columns, fontsize=11)
ax.set_yticks(range(len(pivot.index)))
ax.set_yticklabels(pivot.index, fontsize=11)

# Annotate cells with values
for i in range(len(pivot.index)):
    for j in range(len(pivot.columns)):
        val = pivot.values[i, j]
        if not np.isnan(val):
            ax.text(j, i, f'${val:.0f}', ha='center', va='center',
                    fontsize=12, fontweight='bold',
                    color='white' if val > pivot.values[~np.isnan(pivot.values)].mean() else 'black')

ax.set_title('Avg. Delivery Cost by Vehicle Type and Route Type',
             fontsize=14, fontweight='bold', pad=15)
plt.colorbar(im, ax=ax, label='Cost ($)')
plt.tight_layout()
plt.savefig('vehicle_route_heatmap.png', dpi=150)
plt.show()

Excel Instructions

This project emphasizes SQL and Python analysis. However, for executive stakeholders who prefer Excel summaries, use the following approach.

  1. Export SQL Results: Run each SQL query above and export the results to CSV. Import these into a single Excel workbook with separate sheets: "Route Performance", "Vehicle Efficiency", "Driver Rankings", and "Schedule Analysis".
  2. Summary Dashboard: Create a "Dashboard" sheet with key KPIs linked to the data sheets: Total Deliveries, Average Cost/Delivery, On-Time %, and Cost/Mile. Use large-font cells with conditional formatting (green for targets met, red for missed).
  3. Route Comparison Chart: Create a combo chart from the Route Performance sheet with bars for average cost and a line for on-time percentage. Add a secondary axis for the percentage values. Sort routes by cost-per-mile descending.
  4. Vehicle Assignment Matrix: Manually create a cross-reference table mapping each route type to the recommended vehicle type based on the Python heatmap results. Highlight the optimal assignment in green and current mismatches in red.
  5. Savings Estimator: Build a simple calculator: enter the number of deliveries to reassign and the cost differential per delivery. Formula: =Reassigned_Count * (Current_Avg_Cost - Optimal_Avg_Cost). This gives stakeholders a quick estimate of savings from route optimization.

Key Insights & Recommendations

Route Consolidation

12 routes with overlapping destinations account for 23% of total deliveries but 31% of fuel costs. Consolidating into 8 optimized routes could save an estimated $185K annually in fuel and labor.

Vehicle Mismatch

Semi trucks are being deployed on 18% of urban routes where sprinter vans would reduce costs by 40%. Reassigning vehicles to match route types is the single largest cost-saving opportunity identified.

Optimal Dispatch Windows

Early morning dispatches (6-9 AM) have 91% on-time rates and 15% lower costs than afternoon dispatches. Shifting 30% of midday routes to early morning would improve both KPIs significantly.

Driver Training Impact

Top-quartile drivers deliver at $2.80/mile versus $4.10/mile for bottom-quartile drivers on the same routes. A targeted training program for the bottom 20 drivers could close 50% of this gap.

Fuel Cost Reduction

Vehicles older than 2018 consume 22% more fuel per mile than newer models. Prioritizing replacement of the 15 oldest vehicles would pay for itself within 18 months through fuel savings alone.

Geographic Clustering

The map analysis revealed 4 customer clusters that are currently split across different routes. Assigning each cluster to a single dedicated route would reduce deadhead miles by an estimated 340 miles per week.

Knowledge Check

Question 1

In the driver performance query, what does PERCENT_RANK() compute?

Question 2

Why is cost-per-mile a better metric than total cost for comparing route efficiency?

Question 3

In the Python cost-vs-distance scatter plot, what do deliveries plotted far above the regression line represent?

Question 4

What SQL function is used to calculate delivery duration in minutes from two TIMESTAMP columns?

Question 5

In the vehicle-route heatmap, a "RdYlGn_r" colormap is used. What does the "_r" suffix indicate?

← Previous Project Next Project →