SEO & Web Traffic Analytics
Analyze website traffic patterns, search query performance, and conversion funnels to optimize organic search rankings, improve landing page effectiveness, and forecast future traffic trends.
Project Overview
You are the SEO and web analytics lead for a content-driven e-commerce site. The site receives traffic from organic search, paid ads, social media, direct visits, and email referrals. Organic search accounts for roughly 55% of all sessions, making it the most valuable channel. The marketing director needs to understand which landing pages perform best, where users drop off in the conversion funnel, which search queries drive the most valuable traffic, and what traffic growth looks like over the next quarter. You will use SQL for traffic extraction and funnel analysis, Python for keyword clustering and forecasting, and Excel for an interactive SEO dashboard.
Dataset Description
The dataset models a web analytics warehouse combining Google Analytics-style session data with Search Console query performance and on-site conversion tracking.
| Table | Key Columns | Rows |
|---|---|---|
| page_views | pageview_id, session_id, page_url, page_title, view_timestamp, time_on_page, scroll_depth | 850,000 |
| sessions | session_id, user_id, source, medium, landing_page, session_date, device_type, country, pages_per_session, session_duration | 320,000 |
| search_queries | query_id, query_text, impressions, clicks, avg_position, ctr, query_date, landing_page | 95,000 |
| landing_pages | page_id, page_url, page_title, content_type, word_count, publish_date, last_updated | 1,200 |
| conversions | conversion_id, session_id, user_id, conversion_type, revenue, conversion_date, funnel_step | 28,000 |
SQL Analysis
1. Traffic by Source and Medium
Break down sessions, bounce rate, and conversion rate by traffic source to understand channel contribution.
SELECT
s.source,
s.medium,
COUNT(DISTINCT s.session_id) AS sessions,
ROUND(AVG(s.pages_per_session), 2) AS avg_pages,
ROUND(AVG(s.session_duration), 1) AS avg_duration_sec,
ROUND(
COUNT(DISTINCT s.session_id) FILTER (WHERE s.pages_per_session = 1) * 100.0
/ NULLIF(COUNT(DISTINCT s.session_id), 0), 1
) AS bounce_rate_pct,
COUNT(DISTINCT c.conversion_id) AS conversions,
ROUND(
COUNT(DISTINCT c.conversion_id) * 100.0
/ NULLIF(COUNT(DISTINCT s.session_id), 0), 2
) AS conv_rate_pct
FROM sessions s
LEFT JOIN conversions c ON c.session_id = s.session_id
GROUP BY s.source, s.medium
ORDER BY sessions DESC;
2. Top Landing Pages by Organic Traffic
Identify the highest-performing landing pages from organic search based on sessions, engagement, and conversions.
SELECT
s.landing_page,
lp.page_title,
lp.content_type,
COUNT(DISTINCT s.session_id) AS organic_sessions,
ROUND(AVG(s.pages_per_session), 2) AS avg_pages,
ROUND(AVG(s.session_duration), 1) AS avg_duration,
COUNT(DISTINCT c.conversion_id) AS conversions,
COALESCE(SUM(c.revenue), 0) AS revenue,
ROUND(
COUNT(DISTINCT c.conversion_id) * 100.0
/ NULLIF(COUNT(DISTINCT s.session_id), 0), 2
) AS conv_rate_pct
FROM sessions s
JOIN landing_pages lp ON lp.page_url = s.landing_page
LEFT JOIN conversions c ON c.session_id = s.session_id
WHERE s.medium = 'organic'
GROUP BY s.landing_page, lp.page_title, lp.content_type
ORDER BY organic_sessions DESC
LIMIT 20;
3. Conversion Funnel Drop-Off
Track how many users progress through each funnel step from landing to purchase.
WITH funnel_steps AS (
SELECT
s.source,
c.funnel_step,
COUNT(DISTINCT c.user_id) AS users_at_step
FROM sessions s
JOIN conversions c ON c.session_id = s.session_id
WHERE s.medium = 'organic'
GROUP BY s.source, c.funnel_step
)
SELECT
funnel_step,
users_at_step,
LAG(users_at_step) OVER (ORDER BY
CASE funnel_step
WHEN 'page_view' THEN 1
WHEN 'product_view' THEN 2
WHEN 'add_to_cart' THEN 3
WHEN 'checkout' THEN 4
WHEN 'purchase' THEN 5
END
) AS prev_step,
ROUND(
users_at_step * 100.0
/ NULLIF(FIRST_VALUE(users_at_step) OVER (ORDER BY
CASE funnel_step
WHEN 'page_view' THEN 1
WHEN 'product_view' THEN 2
WHEN 'add_to_cart' THEN 3
WHEN 'checkout' THEN 4
WHEN 'purchase' THEN 5
END
), 0), 1
) AS pct_of_top
FROM funnel_steps
WHERE source = 'google'
ORDER BY
CASE funnel_step
WHEN 'page_view' THEN 1
WHEN 'product_view' THEN 2
WHEN 'add_to_cart' THEN 3
WHEN 'checkout' THEN 4
WHEN 'purchase' THEN 5
END;
4. Search Query Performance
Find high-opportunity queries: high impressions but low CTR or position outside top 3.
SELECT
sq.query_text,
SUM(sq.impressions) AS total_impressions,
SUM(sq.clicks) AS total_clicks,
ROUND(AVG(sq.avg_position), 1) AS avg_position,
ROUND(SUM(sq.clicks) * 100.0 / NULLIF(SUM(sq.impressions), 0), 2)
AS ctr_pct,
sq.landing_page
FROM search_queries sq
GROUP BY sq.query_text, sq.landing_page
HAVING SUM(sq.impressions) > 100
AND ROUND(AVG(sq.avg_position), 1) BETWEEN 4 AND 20
ORDER BY total_impressions DESC
LIMIT 30;
Python Analysis
1. Conversion Funnel Visualization
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as FancyBboxPatch
import numpy as np
conv_df = pd.read_csv('conversions.csv')
sessions_df = pd.read_csv('sessions.csv')
# Organic sessions funnel
organic = sessions_df[sessions_df['medium'] == 'organic']
organic_conv = conv_df[conv_df['session_id'].isin(organic['session_id'])]
steps = ['page_view', 'product_view', 'add_to_cart', 'checkout', 'purchase']
step_labels = ['Page View', 'Product View', 'Add to Cart', 'Checkout', 'Purchase']
step_counts = [organic_conv[organic_conv['funnel_step'] == s]['user_id'].nunique() for s in steps]
# Funnel chart
fig, ax = plt.subplots(figsize=(10, 7))
colors = ['#3498db', '#2ecc71', '#f39c12', '#e67e22', '#e74c3c']
max_width = 0.8
for i, (label, count, color) in enumerate(zip(step_labels, step_counts, colors)):
width = max_width * (count / step_counts[0])
left = (max_width - width) / 2
ax.barh(len(steps) - i - 1, width, left=left, height=0.7, color=color, alpha=0.85)
pct = count / step_counts[0] * 100
drop = '' if i == 0 else f' ({(1 - count/step_counts[i-1])*100:.0f}% drop)'
ax.text(0.5, len(steps) - i - 1, f'{label}\n{count:,} ({pct:.1f}%){drop}',
ha='center', va='center', fontsize=10, fontweight='bold')
ax.set_xlim(0, 1)
ax.set_yticks([])
ax.set_xticks([])
ax.set_title('Organic Search Conversion Funnel', fontsize=14, pad=20)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
plt.tight_layout()
plt.savefig('seo_funnel.png', dpi=150)
plt.show()
2. Keyword Clustering with TF-IDF
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
from collections import Counter
queries_df = pd.read_csv('search_queries.csv')
# Aggregate query-level metrics
query_agg = queries_df.groupby('query_text').agg(
impressions=('impressions', 'sum'),
clicks=('clicks', 'sum'),
avg_position=('avg_position', 'mean')
).reset_index()
query_agg['ctr'] = query_agg['clicks'] / query_agg['impressions']
# Vectorize query text
vectorizer = TfidfVectorizer(max_features=500, stop_words='english', ngram_range=(1, 2))
tfidf_matrix = vectorizer.fit_transform(query_agg['query_text'])
# Cluster queries into topic groups
n_clusters = 8
kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
query_agg['cluster'] = kmeans.fit_predict(tfidf_matrix)
# Profile each cluster
for cluster_id in range(n_clusters):
cluster_queries = query_agg[query_agg['cluster'] == cluster_id]
top_queries = cluster_queries.nlargest(5, 'impressions')['query_text'].tolist()
avg_pos = cluster_queries['avg_position'].mean()
total_clicks = cluster_queries['clicks'].sum()
print(f"\nCluster {cluster_id} ({len(cluster_queries)} queries, {total_clicks:,} clicks, avg pos {avg_pos:.1f}):")
for q in top_queries:
print(f" - {q}")
3. Traffic Forecasting with Prophet
from prophet import Prophet
# Prepare daily organic session counts
organic_daily = sessions_df[sessions_df['medium'] == 'organic'].copy()
organic_daily['session_date'] = pd.to_datetime(organic_daily['session_date'])
daily_traffic = organic_daily.groupby('session_date').size().reset_index()
daily_traffic.columns = ['ds', 'y']
# Fit Prophet model
model = Prophet(
yearly_seasonality=True,
weekly_seasonality=True,
daily_seasonality=False,
changepoint_prior_scale=0.05
)
model.fit(daily_traffic)
# Forecast next 90 days
future = model.make_future_dataframe(periods=90)
forecast = model.predict(future)
# Plot forecast
fig, ax = plt.subplots(figsize=(14, 6))
ax.plot(daily_traffic['ds'], daily_traffic['y'], color='#2c3e50', alpha=0.6, label='Actual')
ax.plot(forecast['ds'], forecast['yhat'], color='#3498db', linewidth=2, label='Forecast')
ax.fill_between(forecast['ds'], forecast['yhat_lower'], forecast['yhat_upper'],
alpha=0.15, color='#3498db', label='95% CI')
ax.axvline(x=daily_traffic['ds'].max(), color='red', linestyle='--', alpha=0.7, label='Forecast Start')
ax.set_xlabel('Date')
ax.set_ylabel('Daily Organic Sessions')
ax.set_title('Organic Traffic Forecast (90 Days)')
ax.legend()
plt.tight_layout()
plt.savefig('traffic_forecast.png', dpi=150)
plt.show()
# Print projected growth
last_30_actual = daily_traffic.tail(30)['y'].mean()
next_30_forecast = forecast.tail(90).head(30)['yhat'].mean()
growth_pct = (next_30_forecast - last_30_actual) / last_30_actual * 100
print(f"\nLast 30 days avg: {last_30_actual:,.0f} sessions/day")
print(f"Next 30 days forecast avg: {next_30_forecast:,.0f} sessions/day")
print(f"Projected growth: {growth_pct:+.1f}%")
4. Opportunity Score Matrix
# Score queries by opportunity: high impressions + position 4-20 = easy wins
query_agg['opportunity_score'] = (
query_agg['impressions'] / query_agg['impressions'].max() * 0.4 +
(1 - (query_agg['avg_position'] - 1) / 100).clip(0, 1) * 0.3 +
(1 - query_agg['ctr']).clip(0, 1) * 0.3
)
# Filter to actionable queries (position 4-20)
opportunities = query_agg[
(query_agg['avg_position'] >= 4) & (query_agg['avg_position'] <= 20)
].nlargest(20, 'opportunity_score')
fig, ax = plt.subplots(figsize=(12, 6))
scatter = ax.scatter(
opportunities['avg_position'],
opportunities['impressions'],
s=opportunities['opportunity_score'] * 500,
c=opportunities['ctr'],
cmap='RdYlGn',
alpha=0.7,
edgecolors='#333'
)
ax.set_xlabel('Average Position')
ax.set_ylabel('Total Impressions')
ax.set_title('SEO Opportunity Matrix (size = opportunity score, color = CTR)')
ax.invert_xaxis()
plt.colorbar(scatter, label='CTR')
# Annotate top 5
for _, row in opportunities.head(5).iterrows():
ax.annotate(row['query_text'][:25], (row['avg_position'], row['impressions']),
textcoords='offset points', xytext=(10, 5), fontsize=8)
plt.tight_layout()
plt.savefig('seo_opportunity_matrix.png', dpi=150)
plt.show()
Excel Dashboard
1. Traffic Overview Tab — Create a summary table with rows for each traffic source (Organic, Paid, Social, Direct, Email, Referral) and columns for Sessions, Bounce Rate, Avg Pages/Session, Avg Duration, Conversions, Revenue, and Conv Rate. Use SUMIFS and AVERAGEIFS from the raw session data. Add a pie chart showing session share by source.
2. Keyword Tracker — Build a keyword tracking sheet with columns: Query, Current Position, Previous Position, Change, Impressions, Clicks, CTR, and Landing Page. Use conditional formatting with arrows to highlight position improvements (green) and declines (red). Add a sparkline column showing weekly position trend.
3. Landing Page Scorecard — Create a scorecard for the top 20 landing pages showing Organic Sessions, Bounce Rate, Avg Time on Page, Conversions, Revenue, and a calculated "Page Score" = weighted sum of normalized metrics. Rank pages by score and apply heatmap formatting across all metric columns.
4. Funnel Visualization — Build a funnel chart using stacked bar segments. Each row represents a funnel step (Page View through Purchase). Calculate step-to-step drop-off rates and display them alongside. Use data bars conditional formatting as an alternative if funnel charts are not available in your Excel version.
5. Traffic Trend Charts — Create a line chart showing weekly organic sessions over the past 12 months with a trendline and R-squared value displayed. Add a secondary line for non-organic traffic as a comparison. Include a forecast trendline extending 13 weeks. Below the chart, add a data table with month-over-month growth percentages.
6. Opportunity Prioritization — Build a prioritized action list from the keyword opportunity analysis. Columns: Query, Current Position, Target Position, Estimated Click Gain (formula: =Impressions * (Target_CTR - Current_CTR)), Effort Level (dropdown: Low/Medium/High), and Priority Score. Sort by Priority Score descending. Use slicers for Effort Level to help the team focus on quick wins.
Key Insights
Position 4-10 Opportunity
Queries ranking on positions 4-10 represent the highest ROI optimization targets. Moving from position 8 to position 3 can increase CTR by 5-10x, and these pages already have topical authority, requiring less effort than building new content.
Funnel Bottlenecks
The largest drop-off typically occurs between product view and add-to-cart. Optimizing product pages with better descriptions, images, reviews, and clearer CTAs can dramatically improve this step's conversion rate.
Content Type Performance
Blog posts may drive high traffic but low conversion, while product and comparison pages convert at higher rates. A balanced content strategy targets both informational queries (for top-of-funnel) and transactional queries (for bottom-of-funnel).
Seasonality in Forecasts
Prophet captures weekly and yearly seasonality patterns. Ignoring seasonality can lead to misinterpreting normal cyclical dips as traffic problems. Always account for holidays, weekends, and industry-specific seasonal patterns.
Knowledge Check
A query has 50,000 impressions, 500 clicks, and an average position of 8.5. What is the CTR and what does the position suggest?
What is bounce rate and why does it matter for SEO?
In the keyword clustering code, what does TF-IDF do?
Why does the Prophet model set changepoint_prior_scale=0.05?
What does the FIRST_VALUE window function accomplish in the funnel drop-off SQL query?