Logistics Route Optimization
Analyze delivery routes for a logistics company to optimize costs, reduce transit times, and improve driver utilization using SQL analytics and Python visualization.
Project Overview
Scenario: You are a logistics analyst at SwiftHaul Logistics, a regional delivery company operating a fleet of 85 vehicles across the Midwest. The operations director wants to reduce delivery costs by 15% while maintaining current service levels. Current average cost per delivery is $34.50, and the on-time rate is 82%. Fuel costs have risen 18% year-over-year, making route efficiency critical.
Objective: Analyze delivery performance across routes, drivers, and vehicle types to identify inefficiencies. Build visualizations that highlight optimal route assignments and cost-saving opportunities. This is a SQL + Python focused project.
Tools: SQL (PostgreSQL), Python (pandas, matplotlib, numpy, folium)
Dataset Description
Five tables modeling the logistics operation including delivery records, route definitions, fleet details, customer locations, and a distance matrix.
deliveries
| Column | Type | Description |
|---|---|---|
| delivery_id | INT (PK) | Unique delivery identifier |
| route_id | INT (FK) | Assigned route |
| vehicle_id | INT (FK) | Vehicle used |
| driver_id | INT | Driver identifier |
| customer_id | INT (FK) | Destination customer |
| dispatch_time | TIMESTAMP | When vehicle left depot |
| delivery_time | TIMESTAMP | When delivery was completed |
| promised_time | TIMESTAMP | Promised delivery window end |
| fuel_cost | DECIMAL(8,2) | Fuel consumed in USD |
| total_cost | DECIMAL(8,2) | Total delivery cost (fuel + labor + tolls) |
| packages | INT | Number of packages delivered |
routes
| Column | Type | Description |
|---|---|---|
| route_id | INT (PK) | Unique route identifier |
| route_name | VARCHAR(50) | Route label (e.g., "CHI-DET-01") |
| origin_city | VARCHAR(50) | Departure hub city |
| destination_city | VARCHAR(50) | Destination city |
| distance_miles | DECIMAL(7,1) | Total route distance |
| route_type | VARCHAR(20) | Urban, Suburban, Highway, Mixed |
vehicles
| Column | Type | Description |
|---|---|---|
| vehicle_id | INT (PK) | Unique vehicle identifier |
| vehicle_type | VARCHAR(30) | Van, Box Truck, Semi, Sprinter |
| fuel_efficiency_mpg | DECIMAL(4,1) | Miles per gallon rating |
| max_capacity_lbs | INT | Maximum payload in pounds |
| year_manufactured | INT | Vehicle model year |
customers
| Column | Type | Description |
|---|---|---|
| customer_id | INT (PK) | Unique customer identifier |
| customer_name | VARCHAR(100) | Business name |
| city | VARCHAR(50) | Customer city |
| latitude | DECIMAL(9,6) | GPS latitude |
| longitude | DECIMAL(9,6) | GPS longitude |
distance_matrix
| Column | Type | Description |
|---|---|---|
| origin_id | INT (FK) | Origin customer/depot ID |
| destination_id | INT (FK) | Destination customer/depot ID |
| distance_miles | DECIMAL(7,1) | Driving distance between points |
| est_drive_minutes | INT | Estimated driving time |
SQL Analysis
Query 1: Average Delivery Time and Cost by Route
Rank routes by average delivery duration and cost to find the most and least efficient corridors. Routes with high costs relative to their distance suggest inefficiencies in vehicle assignment or traffic delays.
SELECT
r.route_name,
r.origin_city,
r.destination_city,
r.distance_miles,
r.route_type,
COUNT(d.delivery_id) AS total_deliveries,
ROUND(AVG(EXTRACT(EPOCH FROM (d.delivery_time - d.dispatch_time)) / 60), 1) AS avg_delivery_min,
ROUND(AVG(d.total_cost), 2) AS avg_cost,
ROUND(AVG(d.total_cost / r.distance_miles), 2) AS cost_per_mile,
ROUND(AVG(d.fuel_cost), 2) AS avg_fuel_cost
FROM deliveries d
JOIN routes r ON d.route_id = r.route_id
GROUP BY r.route_name, r.origin_city, r.destination_city, r.distance_miles, r.route_type
ORDER BY cost_per_mile DESC;
Query 2: Cost Per Delivery and On-Time Rate by Vehicle Type
Determine which vehicle types deliver the best cost efficiency and reliability. This guides fleet procurement decisions and optimal vehicle-to-route matching.
SELECT
v.vehicle_type,
v.fuel_efficiency_mpg,
COUNT(d.delivery_id) AS deliveries,
ROUND(AVG(d.total_cost), 2) AS avg_cost_per_delivery,
ROUND(AVG(d.fuel_cost), 2) AS avg_fuel_per_delivery,
ROUND(
100.0 * SUM(CASE WHEN d.delivery_time <= d.promised_time THEN 1 ELSE 0 END)
/ COUNT(d.delivery_id), 1
) AS on_time_pct,
ROUND(AVG(d.packages), 1) AS avg_packages,
ROUND(AVG(d.total_cost / NULLIF(d.packages, 0)), 2) AS cost_per_package
FROM deliveries d
JOIN vehicles v ON d.vehicle_id = v.vehicle_id
GROUP BY v.vehicle_type, v.fuel_efficiency_mpg
ORDER BY cost_per_package ASC;
Query 3: Driver Performance Ranking
Rank drivers by a composite score combining on-time rate, cost efficiency, and delivery volume. Window functions compute percentile rankings within each metric.
WITH driver_metrics AS (
SELECT
d.driver_id,
COUNT(d.delivery_id) AS total_deliveries,
ROUND(AVG(d.total_cost), 2) AS avg_cost,
ROUND(
100.0 * SUM(CASE WHEN d.delivery_time <= d.promised_time THEN 1 ELSE 0 END)
/ COUNT(d.delivery_id), 1
) AS on_time_pct,
ROUND(AVG(EXTRACT(EPOCH FROM (d.delivery_time - d.dispatch_time)) / 60), 1) AS avg_time_min
FROM deliveries d
GROUP BY d.driver_id
HAVING COUNT(d.delivery_id) >= 20
)
SELECT
driver_id,
total_deliveries,
avg_cost,
on_time_pct,
avg_time_min,
PERCENT_RANK() OVER (ORDER BY on_time_pct DESC) AS on_time_percentile,
PERCENT_RANK() OVER (ORDER BY avg_cost ASC) AS cost_percentile,
ROUND(
0.4 * PERCENT_RANK() OVER (ORDER BY on_time_pct DESC)
+ 0.4 * PERCENT_RANK() OVER (ORDER BY avg_cost ASC)
+ 0.2 * PERCENT_RANK() OVER (ORDER BY total_deliveries DESC)
, 3) AS composite_score
FROM driver_metrics
ORDER BY composite_score ASC
LIMIT 20;
Query 4: Route Efficiency by Day of Week and Time Slot
Identify temporal patterns in delivery performance. Some routes may be significantly faster on certain days or time slots due to traffic patterns, enabling smarter scheduling.
SELECT
r.route_name,
TO_CHAR(d.dispatch_time, 'Day') AS day_of_week,
CASE
WHEN EXTRACT(HOUR FROM d.dispatch_time) BETWEEN 6 AND 9 THEN 'Early Morning'
WHEN EXTRACT(HOUR FROM d.dispatch_time) BETWEEN 10 AND 13 THEN 'Midday'
WHEN EXTRACT(HOUR FROM d.dispatch_time) BETWEEN 14 AND 17 THEN 'Afternoon'
ELSE 'Evening'
END AS time_slot,
COUNT(d.delivery_id) AS deliveries,
ROUND(AVG(EXTRACT(EPOCH FROM (d.delivery_time - d.dispatch_time)) / 60), 1) AS avg_min,
ROUND(AVG(d.total_cost), 2) AS avg_cost,
ROUND(
100.0 * SUM(CASE WHEN d.delivery_time <= d.promised_time THEN 1 ELSE 0 END)
/ COUNT(d.delivery_id), 1
) AS on_time_pct
FROM deliveries d
JOIN routes r ON d.route_id = r.route_id
GROUP BY r.route_name, TO_CHAR(d.dispatch_time, 'Day'),
CASE
WHEN EXTRACT(HOUR FROM d.dispatch_time) BETWEEN 6 AND 9 THEN 'Early Morning'
WHEN EXTRACT(HOUR FROM d.dispatch_time) BETWEEN 10 AND 13 THEN 'Midday'
WHEN EXTRACT(HOUR FROM d.dispatch_time) BETWEEN 14 AND 17 THEN 'Afternoon'
ELSE 'Evening'
END
HAVING COUNT(d.delivery_id) >= 10
ORDER BY r.route_name, avg_min ASC;
Python Analysis
Delivery Route Visualization on Map
Plot delivery locations on an interactive map using folium. Color-code markers by on-time status and draw route lines from the depot to each customer. This reveals geographic clusters and potential route consolidation opportunities.
import pandas as pd
import numpy as np
import folium
from folium.plugins import MarkerCluster
# Load data
deliveries = pd.read_csv('deliveries.csv', parse_dates=['dispatch_time', 'delivery_time', 'promised_time'])
customers = pd.read_csv('customers.csv')
routes = pd.read_csv('routes.csv')
# Merge delivery data with customer locations
df = deliveries.merge(customers, on='customer_id')
df['is_on_time'] = df['delivery_time'] <= df['promised_time']
# Create base map centered on Midwest
m = folium.Map(location=[41.8781, -87.6298], zoom_start=7, tiles='cartodbpositron')
marker_cluster = MarkerCluster().add_to(m)
# Add delivery markers
for _, row in df.iterrows():
color = 'green' if row['is_on_time'] else 'red'
folium.CircleMarker(
location=[row['latitude'], row['longitude']],
radius=4,
color=color,
fill=True,
fill_opacity=0.7,
popup=f"Customer: {row['customer_name']}<br>"
f"Cost: ${row['total_cost']:.2f}<br>"
f"On-Time: {'Yes' if row['is_on_time'] else 'No'}"
).add_to(marker_cluster)
# Add depot marker
folium.Marker(
[41.8781, -87.6298],
popup='Chicago Depot (HQ)',
icon=folium.Icon(color='blue', icon='warehouse', prefix='fa')
).add_to(m)
m.save('delivery_map.html')
print("Map saved to delivery_map.html")
Distance vs. Cost Analysis with Optimal Frontier
Scatter plot of distance versus total cost for each delivery, with a linear regression line showing the expected cost. Deliveries far above the line are candidates for investigation -- they are more expensive than their distance warrants.
import matplotlib.pyplot as plt
from numpy.polynomial import polynomial as P
# Merge with route distances
df = df.merge(routes[['route_id', 'distance_miles']], on='route_id')
fig, ax = plt.subplots(figsize=(12, 7))
# Scatter by on-time status
on_time = df[df['is_on_time']]
late = df[~df['is_on_time']]
ax.scatter(on_time['distance_miles'], on_time['total_cost'],
alpha=0.4, s=20, color='#10b981', label='On-Time')
ax.scatter(late['distance_miles'], late['total_cost'],
alpha=0.4, s=20, color='#ef4444', label='Late')
# Regression line (expected cost frontier)
coeffs = np.polyfit(df['distance_miles'], df['total_cost'], 1)
x_line = np.linspace(df['distance_miles'].min(), df['distance_miles'].max(), 100)
y_line = np.polyval(coeffs, x_line)
ax.plot(x_line, y_line, '--', color='#3b82f6', linewidth=2,
label=f'Expected (${coeffs[0]:.2f}/mile + ${coeffs[1]:.2f})')
# Highlight outliers (cost > 1.5x expected)
df['expected_cost'] = np.polyval(coeffs, df['distance_miles'])
outliers = df[df['total_cost'] > 1.5 * df['expected_cost']]
ax.scatter(outliers['distance_miles'], outliers['total_cost'],
s=60, facecolors='none', edgecolors='#f59e0b', linewidths=2,
label=f'Cost Outliers ({len(outliers)})')
ax.set_xlabel('Route Distance (miles)', fontsize=12)
ax.set_ylabel('Total Delivery Cost ($)', fontsize=12)
ax.set_title('Delivery Cost vs. Distance Analysis', fontsize=14, fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('cost_vs_distance.png', dpi=150)
plt.show()
Cost-Benefit Analysis: Vehicle Type by Route Type
A heatmap showing the average cost per delivery for each vehicle type on each route type. This reveals which vehicle-route combinations are optimal and which are wasteful (e.g., using a semi truck on an urban route).
# Merge with vehicles and routes
df = df.merge(
pd.read_csv('vehicles.csv')[['vehicle_id', 'vehicle_type']],
on='vehicle_id'
)
# Pivot: average cost by vehicle type x route type
pivot = df.pivot_table(
values='total_cost',
index='vehicle_type',
columns='route_type',
aggfunc='mean'
).round(2)
fig, ax = plt.subplots(figsize=(10, 6))
im = ax.imshow(pivot.values, cmap='RdYlGn_r', aspect='auto')
# Labels
ax.set_xticks(range(len(pivot.columns)))
ax.set_xticklabels(pivot.columns, fontsize=11)
ax.set_yticks(range(len(pivot.index)))
ax.set_yticklabels(pivot.index, fontsize=11)
# Annotate cells with values
for i in range(len(pivot.index)):
for j in range(len(pivot.columns)):
val = pivot.values[i, j]
if not np.isnan(val):
ax.text(j, i, f'${val:.0f}', ha='center', va='center',
fontsize=12, fontweight='bold',
color='white' if val > pivot.values[~np.isnan(pivot.values)].mean() else 'black')
ax.set_title('Avg. Delivery Cost by Vehicle Type and Route Type',
fontsize=14, fontweight='bold', pad=15)
plt.colorbar(im, ax=ax, label='Cost ($)')
plt.tight_layout()
plt.savefig('vehicle_route_heatmap.png', dpi=150)
plt.show()
Excel Instructions
This project emphasizes SQL and Python analysis. However, for executive stakeholders who prefer Excel summaries, use the following approach.
- Export SQL Results: Run each SQL query above and export the results to CSV. Import these into a single Excel workbook with separate sheets: "Route Performance", "Vehicle Efficiency", "Driver Rankings", and "Schedule Analysis".
- Summary Dashboard: Create a "Dashboard" sheet with key KPIs linked to the data sheets: Total Deliveries, Average Cost/Delivery, On-Time %, and Cost/Mile. Use large-font cells with conditional formatting (green for targets met, red for missed).
- Route Comparison Chart: Create a combo chart from the Route Performance sheet with bars for average cost and a line for on-time percentage. Add a secondary axis for the percentage values. Sort routes by cost-per-mile descending.
- Vehicle Assignment Matrix: Manually create a cross-reference table mapping each route type to the recommended vehicle type based on the Python heatmap results. Highlight the optimal assignment in green and current mismatches in red.
- Savings Estimator: Build a simple calculator: enter the number of deliveries to reassign and the cost differential per delivery. Formula:
=Reassigned_Count * (Current_Avg_Cost - Optimal_Avg_Cost). This gives stakeholders a quick estimate of savings from route optimization.
Key Insights & Recommendations
Route Consolidation
12 routes with overlapping destinations account for 23% of total deliveries but 31% of fuel costs. Consolidating into 8 optimized routes could save an estimated $185K annually in fuel and labor.
Vehicle Mismatch
Semi trucks are being deployed on 18% of urban routes where sprinter vans would reduce costs by 40%. Reassigning vehicles to match route types is the single largest cost-saving opportunity identified.
Optimal Dispatch Windows
Early morning dispatches (6-9 AM) have 91% on-time rates and 15% lower costs than afternoon dispatches. Shifting 30% of midday routes to early morning would improve both KPIs significantly.
Driver Training Impact
Top-quartile drivers deliver at $2.80/mile versus $4.10/mile for bottom-quartile drivers on the same routes. A targeted training program for the bottom 20 drivers could close 50% of this gap.
Fuel Cost Reduction
Vehicles older than 2018 consume 22% more fuel per mile than newer models. Prioritizing replacement of the 15 oldest vehicles would pay for itself within 18 months through fuel savings alone.
Geographic Clustering
The map analysis revealed 4 customer clusters that are currently split across different routes. Assigning each cluster to a single dedicated route would reduce deadhead miles by an estimated 340 miles per week.
Knowledge Check
In the driver performance query, what does PERCENT_RANK() compute?
Why is cost-per-mile a better metric than total cost for comparing route efficiency?
In the Python cost-vs-distance scatter plot, what do deliveries plotted far above the regression line represent?
What SQL function is used to calculate delivery duration in minutes from two TIMESTAMP columns?
In the vehicle-route heatmap, a "RdYlGn_r" colormap is used. What does the "_r" suffix indicate?