Logistics Route Optimization

Project Overview

Scenario: You are a logistics analyst at SwiftHaul Logistics, a regional delivery company operating a fleet of 85 vehicles across the Midwest. The operations director wants to reduce delivery costs by 15% while maintaining current service levels. Current average cost per delivery is $34.50, and the on-time rate is 82%. Fuel costs have risen 18% year-over-year, making route efficiency critical.

Objective: Analyze delivery performance across routes, drivers, and vehicle types to identify inefficiencies. Build visualizations that highlight optimal route assignments and cost-saving opportunities. This is a SQL + Python focused project.

Tools: SQL (PostgreSQL), Python (pandas, matplotlib, numpy, folium)

Dataset Description

Five tables modeling the logistics operation including delivery records, route definitions, fleet details, customer locations, and a distance matrix.

deliveries

Column	Type	Description
delivery_id	INT (PK)	Unique delivery identifier
route_id	INT (FK)	Assigned route
vehicle_id	INT (FK)	Vehicle used
driver_id	INT	Driver identifier
customer_id	INT (FK)	Destination customer
dispatch_time	TIMESTAMP	When vehicle left depot
delivery_time	TIMESTAMP	When delivery was completed
promised_time	TIMESTAMP	Promised delivery window end
fuel_cost	DECIMAL(8,2)	Fuel consumed in USD
total_cost	DECIMAL(8,2)	Total delivery cost (fuel + labor + tolls)
packages	INT	Number of packages delivered

routes

Column	Type	Description
route_id	INT (PK)	Unique route identifier
route_name	VARCHAR(50)	Route label (e.g., "CHI-DET-01")
origin_city	VARCHAR(50)	Departure hub city
destination_city	VARCHAR(50)	Destination city
distance_miles	DECIMAL(7,1)	Total route distance
route_type	VARCHAR(20)	Urban, Suburban, Highway, Mixed

vehicles

Column	Type	Description
vehicle_id	INT (PK)	Unique vehicle identifier
vehicle_type	VARCHAR(30)	Van, Box Truck, Semi, Sprinter
fuel_efficiency_mpg	DECIMAL(4,1)	Miles per gallon rating
max_capacity_lbs	INT	Maximum payload in pounds
year_manufactured	INT	Vehicle model year

customers

Column	Type	Description
customer_id	INT (PK)	Unique customer identifier
customer_name	VARCHAR(100)	Business name
city	VARCHAR(50)	Customer city
latitude	DECIMAL(9,6)	GPS latitude
longitude	DECIMAL(9,6)	GPS longitude

distance_matrix

Column	Type	Description
origin_id	INT (FK)	Origin customer/depot ID
destination_id	INT (FK)	Destination customer/depot ID
distance_miles	DECIMAL(7,1)	Driving distance between points
est_drive_minutes	INT	Estimated driving time

SQL Analysis

Query 1: Average Delivery Time and Cost by Route

Rank routes by average delivery duration and cost to find the most and least efficient corridors. Routes with high costs relative to their distance suggest inefficiencies in vehicle assignment or traffic delays.

SELECT
    r.route_name,
    r.origin_city,
    r.destination_city,
    r.distance_miles,
    r.route_type,
    COUNT(d.delivery_id) AS total_deliveries,
    ROUND(AVG(EXTRACT(EPOCH FROM (d.delivery_time - d.dispatch_time)) / 60), 1) AS avg_delivery_min,
    ROUND(AVG(d.total_cost), 2) AS avg_cost,
    ROUND(AVG(d.total_cost / r.distance_miles), 2) AS cost_per_mile,
    ROUND(AVG(d.fuel_cost), 2) AS avg_fuel_cost
FROM deliveries d
JOIN routes r ON d.route_id = r.route_id
GROUP BY r.route_name, r.origin_city, r.destination_city, r.distance_miles, r.route_type
ORDER BY cost_per_mile DESC;

Query 2: Cost Per Delivery and On-Time Rate by Vehicle Type

Determine which vehicle types deliver the best cost efficiency and reliability. This guides fleet procurement decisions and optimal vehicle-to-route matching.

SELECT
    v.vehicle_type,
    v.fuel_efficiency_mpg,
    COUNT(d.delivery_id) AS deliveries,
    ROUND(AVG(d.total_cost), 2) AS avg_cost_per_delivery,
    ROUND(AVG(d.fuel_cost), 2) AS avg_fuel_per_delivery,
    ROUND(
        100.0 * SUM(CASE WHEN d.delivery_time <= d.promised_time THEN 1 ELSE 0 END)
        / COUNT(d.delivery_id), 1
    ) AS on_time_pct,
    ROUND(AVG(d.packages), 1) AS avg_packages,
    ROUND(AVG(d.total_cost / NULLIF(d.packages, 0)), 2) AS cost_per_package
FROM deliveries d
JOIN vehicles v ON d.vehicle_id = v.vehicle_id
GROUP BY v.vehicle_type, v.fuel_efficiency_mpg
ORDER BY cost_per_package ASC;

Query 3: Driver Performance Ranking

Rank drivers by a composite score combining on-time rate, cost efficiency, and delivery volume. Window functions compute percentile rankings within each metric.

WITH driver_metrics AS (
    SELECT
        d.driver_id,
        COUNT(d.delivery_id) AS total_deliveries,
        ROUND(AVG(d.total_cost), 2) AS avg_cost,
        ROUND(
            100.0 * SUM(CASE WHEN d.delivery_time <= d.promised_time THEN 1 ELSE 0 END)
            / COUNT(d.delivery_id), 1
        ) AS on_time_pct,
        ROUND(AVG(EXTRACT(EPOCH FROM (d.delivery_time - d.dispatch_time)) / 60), 1) AS avg_time_min
    FROM deliveries d
    GROUP BY d.driver_id
    HAVING COUNT(d.delivery_id) >= 20
)
SELECT
    driver_id,
    total_deliveries,
    avg_cost,
    on_time_pct,
    avg_time_min,
    PERCENT_RANK() OVER (ORDER BY on_time_pct DESC) AS on_time_percentile,
    PERCENT_RANK() OVER (ORDER BY avg_cost ASC) AS cost_percentile,
    ROUND(
        0.4 * PERCENT_RANK() OVER (ORDER BY on_time_pct DESC)
        + 0.4 * PERCENT_RANK() OVER (ORDER BY avg_cost ASC)
        + 0.2 * PERCENT_RANK() OVER (ORDER BY total_deliveries DESC)
    , 3) AS composite_score
FROM driver_metrics
ORDER BY composite_score ASC
LIMIT 20;

Query 4: Route Efficiency by Day of Week and Time Slot

Identify temporal patterns in delivery performance. Some routes may be significantly faster on certain days or time slots due to traffic patterns, enabling smarter scheduling.

SELECT
    r.route_name,
    TO_CHAR(d.dispatch_time, 'Day') AS day_of_week,
    CASE
        WHEN EXTRACT(HOUR FROM d.dispatch_time) BETWEEN 6 AND 9 THEN 'Early Morning'
        WHEN EXTRACT(HOUR FROM d.dispatch_time) BETWEEN 10 AND 13 THEN 'Midday'
        WHEN EXTRACT(HOUR FROM d.dispatch_time) BETWEEN 14 AND 17 THEN 'Afternoon'
        ELSE 'Evening'
    END AS time_slot,
    COUNT(d.delivery_id) AS deliveries,
    ROUND(AVG(EXTRACT(EPOCH FROM (d.delivery_time - d.dispatch_time)) / 60), 1) AS avg_min,
    ROUND(AVG(d.total_cost), 2) AS avg_cost,
    ROUND(
        100.0 * SUM(CASE WHEN d.delivery_time <= d.promised_time THEN 1 ELSE 0 END)
        / COUNT(d.delivery_id), 1
    ) AS on_time_pct
FROM deliveries d
JOIN routes r ON d.route_id = r.route_id
GROUP BY r.route_name, TO_CHAR(d.dispatch_time, 'Day'),
    CASE
        WHEN EXTRACT(HOUR FROM d.dispatch_time) BETWEEN 6 AND 9 THEN 'Early Morning'
        WHEN EXTRACT(HOUR FROM d.dispatch_time) BETWEEN 10 AND 13 THEN 'Midday'
        WHEN EXTRACT(HOUR FROM d.dispatch_time) BETWEEN 14 AND 17 THEN 'Afternoon'
        ELSE 'Evening'
    END
HAVING COUNT(d.delivery_id) >= 10
ORDER BY r.route_name, avg_min ASC;

Python Analysis

Delivery Route Visualization on Map

Plot delivery locations on an interactive map using folium. Color-code markers by on-time status and draw route lines from the depot to each customer. This reveals geographic clusters and potential route consolidation opportunities.

import pandas as pd
import numpy as np
import folium
from folium.plugins import MarkerCluster

# Load data
deliveries = pd.read_csv('deliveries.csv', parse_dates=['dispatch_time', 'delivery_time', 'promised_time'])
customers = pd.read_csv('customers.csv')
routes = pd.read_csv('routes.csv')

# Merge delivery data with customer locations
df = deliveries.merge(customers, on='customer_id')
df['is_on_time'] = df['delivery_time'] <= df['promised_time']

# Create base map centered on Midwest
m = folium.Map(location=[41.8781, -87.6298], zoom_start=7, tiles='cartodbpositron')
marker_cluster = MarkerCluster().add_to(m)

# Add delivery markers
for _, row in df.iterrows():
    color = 'green' if row['is_on_time'] else 'red'
    folium.CircleMarker(
        location=[row['latitude'], row['longitude']],
        radius=4,
        color=color,
        fill=True,
        fill_opacity=0.7,
        popup=f"Customer: {row['customer_name']}<br>"
              f"Cost: ${row['total_cost']:.2f}<br>"
              f"On-Time: {'Yes' if row['is_on_time'] else 'No'}"
    ).add_to(marker_cluster)

# Add depot marker
folium.Marker(
    [41.8781, -87.6298],
    popup='Chicago Depot (HQ)',
    icon=folium.Icon(color='blue', icon='warehouse', prefix='fa')
).add_to(m)

m.save('delivery_map.html')
print("Map saved to delivery_map.html")

Distance vs. Cost Analysis with Optimal Frontier

Scatter plot of distance versus total cost for each delivery, with a linear regression line showing the expected cost. Deliveries far above the line are candidates for investigation -- they are more expensive than their distance warrants.

import matplotlib.pyplot as plt
from numpy.polynomial import polynomial as P

# Merge with route distances
df = df.merge(routes[['route_id', 'distance_miles']], on='route_id')

fig, ax = plt.subplots(figsize=(12, 7))

# Scatter by on-time status
on_time = df[df['is_on_time']]
late = df[~df['is_on_time']]
ax.scatter(on_time['distance_miles'], on_time['total_cost'],
           alpha=0.4, s=20, color='#10b981', label='On-Time')
ax.scatter(late['distance_miles'], late['total_cost'],
           alpha=0.4, s=20, color='#ef4444', label='Late')

# Regression line (expected cost frontier)
coeffs = np.polyfit(df['distance_miles'], df['total_cost'], 1)
x_line = np.linspace(df['distance_miles'].min(), df['distance_miles'].max(), 100)
y_line = np.polyval(coeffs, x_line)
ax.plot(x_line, y_line, '--', color='#3b82f6', linewidth=2,
        label=f'Expected (${coeffs[0]:.2f}/mile + ${coeffs[1]:.2f})')

# Highlight outliers (cost > 1.5x expected)
df['expected_cost'] = np.polyval(coeffs, df['distance_miles'])
outliers = df[df['total_cost'] > 1.5 * df['expected_cost']]
ax.scatter(outliers['distance_miles'], outliers['total_cost'],
           s=60, facecolors='none', edgecolors='#f59e0b', linewidths=2,
           label=f'Cost Outliers ({len(outliers)})')

ax.set_xlabel('Route Distance (miles)', fontsize=12)
ax.set_ylabel('Total Delivery Cost ($)', fontsize=12)
ax.set_title('Delivery Cost vs. Distance Analysis', fontsize=14, fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('cost_vs_distance.png', dpi=150)
plt.show()

Cost-Benefit Analysis: Vehicle Type by Route Type

A heatmap showing the average cost per delivery for each vehicle type on each route type. This reveals which vehicle-route combinations are optimal and which are wasteful (e.g., using a semi truck on an urban route).

# Merge with vehicles and routes
df = df.merge(
    pd.read_csv('vehicles.csv')[['vehicle_id', 'vehicle_type']],
    on='vehicle_id'
)

# Pivot: average cost by vehicle type x route type
pivot = df.pivot_table(
    values='total_cost',
    index='vehicle_type',
    columns='route_type',
    aggfunc='mean'
).round(2)

fig, ax = plt.subplots(figsize=(10, 6))
im = ax.imshow(pivot.values, cmap='RdYlGn_r', aspect='auto')

# Labels
ax.set_xticks(range(len(pivot.columns)))
ax.set_xticklabels(pivot.columns, fontsize=11)
ax.set_yticks(range(len(pivot.index)))
ax.set_yticklabels(pivot.index, fontsize=11)

# Annotate cells with values
for i in range(len(pivot.index)):
    for j in range(len(pivot.columns)):
        val = pivot.values[i, j]
        if not np.isnan(val):
            ax.text(j, i, f'${val:.0f}', ha='center', va='center',
                    fontsize=12, fontweight='bold',
                    color='white' if val > pivot.values[~np.isnan(pivot.values)].mean() else 'black')

ax.set_title('Avg. Delivery Cost by Vehicle Type and Route Type',
             fontsize=14, fontweight='bold', pad=15)
plt.colorbar(im, ax=ax, label='Cost ($)')
plt.tight_layout()
plt.savefig('vehicle_route_heatmap.png', dpi=150)
plt.show()

Excel Instructions

This project emphasizes SQL and Python analysis. However, for executive stakeholders who prefer Excel summaries, use the following approach.

Export SQL Results: Run each SQL query above and export the results to CSV. Import these into a single Excel workbook with separate sheets: "Route Performance", "Vehicle Efficiency", "Driver Rankings", and "Schedule Analysis".
Summary Dashboard: Create a "Dashboard" sheet with key KPIs linked to the data sheets: Total Deliveries, Average Cost/Delivery, On-Time %, and Cost/Mile. Use large-font cells with conditional formatting (green for targets met, red for missed).
Route Comparison Chart: Create a combo chart from the Route Performance sheet with bars for average cost and a line for on-time percentage. Add a secondary axis for the percentage values. Sort routes by cost-per-mile descending.
Vehicle Assignment Matrix: Manually create a cross-reference table mapping each route type to the recommended vehicle type based on the Python heatmap results. Highlight the optimal assignment in green and current mismatches in red.
Savings Estimator: Build a simple calculator: enter the number of deliveries to reassign and the cost differential per delivery. Formula: =Reassigned_Count * (Current_Avg_Cost - Optimal_Avg_Cost). This gives stakeholders a quick estimate of savings from route optimization.

Key Insights & Recommendations

Route Consolidation

12 routes with overlapping destinations account for 23% of total deliveries but 31% of fuel costs. Consolidating into 8 optimized routes could save an estimated $185K annually in fuel and labor.

Vehicle Mismatch

Semi trucks are being deployed on 18% of urban routes where sprinter vans would reduce costs by 40%. Reassigning vehicles to match route types is the single largest cost-saving opportunity identified.

Optimal Dispatch Windows

Early morning dispatches (6-9 AM) have 91% on-time rates and 15% lower costs than afternoon dispatches. Shifting 30% of midday routes to early morning would improve both KPIs significantly.

Driver Training Impact

Top-quartile drivers deliver at $2.80/mile versus $4.10/mile for bottom-quartile drivers on the same routes. A targeted training program for the bottom 20 drivers could close 50% of this gap.

Fuel Cost Reduction

Vehicles older than 2018 consume 22% more fuel per mile than newer models. Prioritizing replacement of the 15 oldest vehicles would pay for itself within 18 months through fuel savings alone.

Geographic Clustering

The map analysis revealed 4 customer clusters that are currently split across different routes. Assigning each cluster to a single dedicated route would reduce deadhead miles by an estimated 340 miles per week.