Email Marketing Analytics
Analyze email campaign performance across open rates, click-through rates, and conversions to optimize send times, subject lines, and subscriber segmentation for maximum engagement.
Project Overview
You are the analytics lead for an email marketing team at a subscription-based SaaS company. The team sends weekly newsletters, promotional offers, onboarding sequences, and re-engagement campaigns to a subscriber base of 150,000. Leadership wants to understand which campaigns drive the most engagement, what the optimal send times are, and how to segment subscribers for personalized content. You will use SQL to extract and aggregate engagement metrics, Python for deeper analysis and clustering, and Excel to build a campaign dashboard and send calendar.
Dataset Description
The dataset represents a full email marketing data pipeline with event-level tracking from send through conversion.
| Table | Key Columns | Rows |
|---|---|---|
| campaigns | campaign_id, campaign_name, campaign_type, subject_line, send_date, send_time, list_segment | 200 |
| sends | send_id, campaign_id, subscriber_id, sent_at, delivered, bounced | 2,400,000 |
| opens | open_id, send_id, subscriber_id, opened_at, device_type, email_client | 580,000 |
| clicks | click_id, send_id, subscriber_id, clicked_at, link_url, link_position | 145,000 |
| unsubscribes | unsub_id, send_id, subscriber_id, unsub_date, reason | 18,000 |
| conversions | conversion_id, send_id, subscriber_id, conversion_date, revenue, product_id | 42,000 |
SQL Analysis
1. Open and Click Rates by Campaign
Calculate core engagement metrics for each campaign to identify top performers.
SELECT
c.campaign_id,
c.campaign_name,
c.campaign_type,
COUNT(DISTINCT s.send_id) AS total_sent,
COUNT(DISTINCT s.send_id) FILTER (WHERE s.delivered) AS delivered,
COUNT(DISTINCT o.subscriber_id) AS unique_opens,
COUNT(DISTINCT cl.subscriber_id) AS unique_clicks,
ROUND(
COUNT(DISTINCT o.subscriber_id) * 100.0
/ NULLIF(COUNT(DISTINCT s.send_id) FILTER (WHERE s.delivered), 0), 2
) AS open_rate_pct,
ROUND(
COUNT(DISTINCT cl.subscriber_id) * 100.0
/ NULLIF(COUNT(DISTINCT o.subscriber_id), 0), 2
) AS click_to_open_pct,
ROUND(
COUNT(DISTINCT cl.subscriber_id) * 100.0
/ NULLIF(COUNT(DISTINCT s.send_id) FILTER (WHERE s.delivered), 0), 2
) AS ctr_pct
FROM campaigns c
JOIN sends s ON s.campaign_id = c.campaign_id
LEFT JOIN opens o ON o.send_id = s.send_id
LEFT JOIN clicks cl ON cl.send_id = s.send_id
GROUP BY c.campaign_id, c.campaign_name, c.campaign_type
ORDER BY open_rate_pct DESC;
2. Best Send Times
Analyze which day-of-week and hour-of-day combinations yield the highest open rates.
SELECT
EXTRACT(DOW FROM c.send_date) AS day_of_week,
TO_CHAR(c.send_date, 'Day') AS day_name,
EXTRACT(HOUR FROM c.send_time) AS send_hour,
COUNT(DISTINCT s.send_id) AS emails_sent,
COUNT(DISTINCT o.subscriber_id) AS unique_opens,
ROUND(
COUNT(DISTINCT o.subscriber_id) * 100.0
/ NULLIF(COUNT(DISTINCT s.send_id) FILTER (WHERE s.delivered), 0), 2
) AS open_rate_pct
FROM campaigns c
JOIN sends s ON s.campaign_id = c.campaign_id
LEFT JOIN opens o ON o.send_id = s.send_id
GROUP BY EXTRACT(DOW FROM c.send_date),
TO_CHAR(c.send_date, 'Day'),
EXTRACT(HOUR FROM c.send_time)
ORDER BY open_rate_pct DESC
LIMIT 10;
3. Subscriber Segment Performance
Compare engagement and conversion metrics across subscriber segments.
SELECT
c.list_segment,
COUNT(DISTINCT s.subscriber_id) AS subscribers,
ROUND(AVG(
CASE WHEN o.open_id IS NOT NULL THEN 1.0 ELSE 0.0 END
) * 100, 2) AS avg_open_rate_pct,
ROUND(AVG(
CASE WHEN cl.click_id IS NOT NULL THEN 1.0 ELSE 0.0 END
) * 100, 2) AS avg_ctr_pct,
COUNT(DISTINCT cv.conversion_id) AS total_conversions,
COALESCE(SUM(cv.revenue), 0) AS total_revenue,
COUNT(DISTINCT u.subscriber_id) AS unsubscribes,
ROUND(
COUNT(DISTINCT u.subscriber_id) * 100.0
/ NULLIF(COUNT(DISTINCT s.subscriber_id), 0), 2
) AS unsub_rate_pct
FROM campaigns c
JOIN sends s ON s.campaign_id = c.campaign_id
LEFT JOIN opens o ON o.send_id = s.send_id
LEFT JOIN clicks cl ON cl.send_id = s.send_id
LEFT JOIN conversions cv ON cv.send_id = s.send_id
LEFT JOIN unsubscribes u ON u.send_id = s.send_id
GROUP BY c.list_segment
ORDER BY total_revenue DESC;
4. Unsubscribe Analysis
Identify which campaigns and reasons drive the most list churn.
SELECT
c.campaign_type,
u.reason,
COUNT(*) AS unsub_count,
ROUND(
COUNT(*) * 100.0
/ SUM(COUNT(*)) OVER (PARTITION BY c.campaign_type), 1
) AS pct_of_type
FROM unsubscribes u
JOIN sends s ON s.send_id = u.send_id
JOIN campaigns c ON c.campaign_id = s.campaign_id
GROUP BY c.campaign_type, u.reason
ORDER BY c.campaign_type, unsub_count DESC;
Python Analysis
1. Engagement Funnel Visualization
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
campaigns_df = pd.read_csv('campaigns.csv')
sends_df = pd.read_csv('sends.csv')
opens_df = pd.read_csv('opens.csv')
clicks_df = pd.read_csv('clicks.csv')
conv_df = pd.read_csv('conversions.csv')
# Overall funnel metrics
total_sent = sends_df[sends_df['delivered'] == True].shape[0]
total_opens = opens_df['subscriber_id'].nunique()
total_clicks = clicks_df['subscriber_id'].nunique()
total_conv = conv_df['subscriber_id'].nunique()
stages = ['Delivered', 'Opened', 'Clicked', 'Converted']
values = [total_sent, total_opens, total_clicks, total_conv]
colors = ['#3498db', '#2ecc71', '#f39c12', '#e74c3c']
fig, ax = plt.subplots(figsize=(10, 6))
bars = ax.barh(stages[::-1], values[::-1], color=colors[::-1], height=0.6)
for bar, val in zip(bars, values[::-1]):
ax.text(bar.get_width() + total_sent * 0.02, bar.get_y() + bar.get_height() / 2,
f'{val:,} ({val/total_sent*100:.1f}%)', va='center', fontsize=11)
ax.set_xlabel('Unique Subscribers')
ax.set_title('Email Engagement Funnel')
ax.set_xlim(0, total_sent * 1.3)
plt.tight_layout()
plt.savefig('email_funnel.png', dpi=150)
plt.show()
2. Send Time Optimization Heatmap
import numpy as np
import seaborn as sns
# Merge sends with opens and campaigns
sends_with_camp = sends_df.merge(campaigns_df[['campaign_id', 'send_date', 'send_time']], on='campaign_id')
sends_with_camp['send_date'] = pd.to_datetime(sends_with_camp['send_date'])
sends_with_camp['day_of_week'] = sends_with_camp['send_date'].dt.day_name()
sends_with_camp['hour'] = pd.to_datetime(sends_with_camp['send_time']).dt.hour
# Calculate open rate per day/hour
sends_with_camp['opened'] = sends_with_camp['send_id'].isin(opens_df['send_id']).astype(int)
heatmap_data = sends_with_camp.groupby(['day_of_week', 'hour'])['opened'].mean().unstack()
# Reorder days
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
heatmap_data = heatmap_data.reindex(day_order)
fig, ax = plt.subplots(figsize=(14, 6))
sns.heatmap(heatmap_data * 100, annot=True, fmt='.1f', cmap='YlOrRd',
cbar_kws={'label': 'Open Rate %'}, ax=ax)
ax.set_title('Open Rate by Day of Week and Send Hour')
ax.set_xlabel('Hour of Day')
ax.set_ylabel('')
plt.tight_layout()
plt.savefig('send_time_heatmap.png', dpi=150)
plt.show()
3. Subscriber Clustering (RFM-Style)
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
# Build subscriber-level engagement features
sub_opens = opens_df.groupby('subscriber_id').size().reset_index(name='open_count')
sub_clicks = clicks_df.groupby('subscriber_id').size().reset_index(name='click_count')
sub_conv = conv_df.groupby('subscriber_id').agg(
conv_count=('conversion_id', 'count'),
total_revenue=('revenue', 'sum')
).reset_index()
# Merge into single subscriber profile
subs = sends_df[['subscriber_id']].drop_duplicates()
subs = subs.merge(sub_opens, on='subscriber_id', how='left')
subs = subs.merge(sub_clicks, on='subscriber_id', how='left')
subs = subs.merge(sub_conv, on='subscriber_id', how='left')
subs = subs.fillna(0)
# Scale features and cluster
features = ['open_count', 'click_count', 'conv_count', 'total_revenue']
scaler = StandardScaler()
X = scaler.fit_transform(subs[features])
kmeans = KMeans(n_clusters=4, random_state=42, n_init=10)
subs['cluster'] = kmeans.fit_predict(X)
# Profile each cluster
cluster_profile = subs.groupby('cluster')[features].mean().round(2)
cluster_profile['size'] = subs.groupby('cluster').size().values
print(cluster_profile)
# Visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Cluster sizes
axes[0].pie(cluster_profile['size'], labels=[f'Cluster {i}' for i in cluster_profile.index],
autopct='%1.1f%%', colors=['#3498db', '#2ecc71', '#f39c12', '#e74c3c'])
axes[0].set_title('Subscriber Segment Distribution')
# Average revenue by cluster
axes[1].bar(cluster_profile.index, cluster_profile['total_revenue'],
color=['#3498db', '#2ecc71', '#f39c12', '#e74c3c'])
axes[1].set_xlabel('Cluster')
axes[1].set_ylabel('Avg Revenue ($)')
axes[1].set_title('Average Revenue per Subscriber Segment')
plt.tight_layout()
plt.savefig('subscriber_clusters.png', dpi=150)
plt.show()
Excel Dashboard
1. Campaign Performance Tab — Create a table with columns: Campaign Name, Type, Send Date, Sent, Delivered, Opens, Clicks, Conversions, Revenue, Open Rate, CTR, Conversion Rate, Unsubscribes. Use formulas like =Opens/Delivered for rate columns. Apply conditional formatting (green for above-average rates, red for below).
2. KPI Summary Cards — At the top of the Dashboard tab, display four large KPI cells: Average Open Rate, Average CTR, Total Revenue, and List Growth Rate. Pull values using AVERAGEIF and SUMIFS referencing the campaign table. Add month-over-month arrows using IF with up/down arrow characters.
3. Segment Comparison Chart — Build a stacked bar chart showing Open Rate, Click Rate, and Conversion Rate side by side for each subscriber segment. Add a data table below the chart with exact numbers. Include a slicer for campaign type to allow filtering.
4. Send Calendar — Create a monthly calendar view using a grid layout (7 columns for days, 5 rows for weeks). Use conditional formatting to shade cells based on scheduled sends. Link each cell to the campaign detail via VLOOKUP so clicking a date shows the planned campaign name, segment, and expected send volume.
5. Unsubscribe Tracker — Build a line chart showing weekly unsubscribe counts overlaid with send volume. Add a secondary axis for unsubscribe rate. Include a data table below listing the top unsubscribe reasons with their counts and percentages.
6. Revenue Attribution — Add a waterfall chart showing how each campaign type contributes to total email revenue. Use custom bar formatting (green for positive, gray for subtotals) to make the revenue flow visually clear. Include a PivotTable for drill-down by individual campaign.
Key Insights
Send Time Impact
Tuesday and Thursday mornings (9-11 AM) typically yield the highest open rates for B2B audiences, while B2C emails perform well on weekends. Testing your specific audience is essential since benchmarks vary by industry.
Segmentation Power
Sending targeted campaigns to behavioral segments (active, lapsed, new) can improve open rates by 30-50% compared to batch-and-blast approaches. RFM-style clustering provides a data-driven way to define these segments.
List Hygiene
High unsubscribe rates often indicate over-sending or poor targeting rather than content quality. Monitor unsub rate by campaign type and frequency to find the right cadence for each segment.
Click-to-Open Ratio
Click-to-open rate (CTOR) isolates content effectiveness from subject line performance. A high open rate but low CTOR suggests compelling subject lines but mismatched or weak email body content.
Knowledge Check
What is the difference between open rate and click-to-open rate (CTOR)?
In the subscriber clustering analysis, why is StandardScaler applied before KMeans?
What does the SQL FILTER (WHERE s.delivered) clause do?
A campaign has a 45% open rate but only 2% CTOR. What does this suggest?
Why does the clustering use n_init=10 in KMeans?