Project 20

SEO & Web Traffic Analytics

Analyze website traffic patterns, search query performance, and conversion funnels to optimize organic search rankings, improve landing page effectiveness, and forecast future traffic trends.

Project Overview

You are the SEO and web analytics lead for a content-driven e-commerce site. The site receives traffic from organic search, paid ads, social media, direct visits, and email referrals. Organic search accounts for roughly 55% of all sessions, making it the most valuable channel. The marketing director needs to understand which landing pages perform best, where users drop off in the conversion funnel, which search queries drive the most valuable traffic, and what traffic growth looks like over the next quarter. You will use SQL for traffic extraction and funnel analysis, Python for keyword clustering and forecasting, and Excel for an interactive SEO dashboard.

Dataset Description

The dataset models a web analytics warehouse combining Google Analytics-style session data with Search Console query performance and on-site conversion tracking.

TableKey ColumnsRows
page_viewspageview_id, session_id, page_url, page_title, view_timestamp, time_on_page, scroll_depth850,000
sessionssession_id, user_id, source, medium, landing_page, session_date, device_type, country, pages_per_session, session_duration320,000
search_queriesquery_id, query_text, impressions, clicks, avg_position, ctr, query_date, landing_page95,000
landing_pagespage_id, page_url, page_title, content_type, word_count, publish_date, last_updated1,200
conversionsconversion_id, session_id, user_id, conversion_type, revenue, conversion_date, funnel_step28,000

SQL Analysis

1. Traffic by Source and Medium

Break down sessions, bounce rate, and conversion rate by traffic source to understand channel contribution.

SELECT
    s.source,
    s.medium,
    COUNT(DISTINCT s.session_id)                   AS sessions,
    ROUND(AVG(s.pages_per_session), 2)             AS avg_pages,
    ROUND(AVG(s.session_duration), 1)              AS avg_duration_sec,
    ROUND(
        COUNT(DISTINCT s.session_id) FILTER (WHERE s.pages_per_session = 1) * 100.0
        / NULLIF(COUNT(DISTINCT s.session_id), 0), 1
    )                                              AS bounce_rate_pct,
    COUNT(DISTINCT c.conversion_id)                AS conversions,
    ROUND(
        COUNT(DISTINCT c.conversion_id) * 100.0
        / NULLIF(COUNT(DISTINCT s.session_id), 0), 2
    )                                              AS conv_rate_pct
FROM sessions s
LEFT JOIN conversions c ON c.session_id = s.session_id
GROUP BY s.source, s.medium
ORDER BY sessions DESC;

2. Top Landing Pages by Organic Traffic

Identify the highest-performing landing pages from organic search based on sessions, engagement, and conversions.

SELECT
    s.landing_page,
    lp.page_title,
    lp.content_type,
    COUNT(DISTINCT s.session_id)                   AS organic_sessions,
    ROUND(AVG(s.pages_per_session), 2)             AS avg_pages,
    ROUND(AVG(s.session_duration), 1)              AS avg_duration,
    COUNT(DISTINCT c.conversion_id)                AS conversions,
    COALESCE(SUM(c.revenue), 0)                    AS revenue,
    ROUND(
        COUNT(DISTINCT c.conversion_id) * 100.0
        / NULLIF(COUNT(DISTINCT s.session_id), 0), 2
    )                                              AS conv_rate_pct
FROM sessions s
JOIN landing_pages lp ON lp.page_url = s.landing_page
LEFT JOIN conversions c ON c.session_id = s.session_id
WHERE s.medium = 'organic'
GROUP BY s.landing_page, lp.page_title, lp.content_type
ORDER BY organic_sessions DESC
LIMIT 20;

3. Conversion Funnel Drop-Off

Track how many users progress through each funnel step from landing to purchase.

WITH funnel_steps AS (
    SELECT
        s.source,
        c.funnel_step,
        COUNT(DISTINCT c.user_id) AS users_at_step
    FROM sessions s
    JOIN conversions c ON c.session_id = s.session_id
    WHERE s.medium = 'organic'
    GROUP BY s.source, c.funnel_step
)
SELECT
    funnel_step,
    users_at_step,
    LAG(users_at_step) OVER (ORDER BY
        CASE funnel_step
            WHEN 'page_view'    THEN 1
            WHEN 'product_view' THEN 2
            WHEN 'add_to_cart'  THEN 3
            WHEN 'checkout'     THEN 4
            WHEN 'purchase'     THEN 5
        END
    )                                              AS prev_step,
    ROUND(
        users_at_step * 100.0
        / NULLIF(FIRST_VALUE(users_at_step) OVER (ORDER BY
            CASE funnel_step
                WHEN 'page_view'    THEN 1
                WHEN 'product_view' THEN 2
                WHEN 'add_to_cart'  THEN 3
                WHEN 'checkout'     THEN 4
                WHEN 'purchase'     THEN 5
            END
        ), 0), 1
    )                                              AS pct_of_top
FROM funnel_steps
WHERE source = 'google'
ORDER BY
    CASE funnel_step
        WHEN 'page_view'    THEN 1
        WHEN 'product_view' THEN 2
        WHEN 'add_to_cart'  THEN 3
        WHEN 'checkout'     THEN 4
        WHEN 'purchase'     THEN 5
    END;

4. Search Query Performance

Find high-opportunity queries: high impressions but low CTR or position outside top 3.

SELECT
    sq.query_text,
    SUM(sq.impressions)                            AS total_impressions,
    SUM(sq.clicks)                                 AS total_clicks,
    ROUND(AVG(sq.avg_position), 1)                 AS avg_position,
    ROUND(SUM(sq.clicks) * 100.0 / NULLIF(SUM(sq.impressions), 0), 2)
                                                   AS ctr_pct,
    sq.landing_page
FROM search_queries sq
GROUP BY sq.query_text, sq.landing_page
HAVING SUM(sq.impressions) > 100
   AND ROUND(AVG(sq.avg_position), 1) BETWEEN 4 AND 20
ORDER BY total_impressions DESC
LIMIT 30;

Python Analysis

1. Conversion Funnel Visualization

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as FancyBboxPatch
import numpy as np

conv_df = pd.read_csv('conversions.csv')
sessions_df = pd.read_csv('sessions.csv')

# Organic sessions funnel
organic = sessions_df[sessions_df['medium'] == 'organic']
organic_conv = conv_df[conv_df['session_id'].isin(organic['session_id'])]

steps = ['page_view', 'product_view', 'add_to_cart', 'checkout', 'purchase']
step_labels = ['Page View', 'Product View', 'Add to Cart', 'Checkout', 'Purchase']
step_counts = [organic_conv[organic_conv['funnel_step'] == s]['user_id'].nunique() for s in steps]

# Funnel chart
fig, ax = plt.subplots(figsize=(10, 7))
colors = ['#3498db', '#2ecc71', '#f39c12', '#e67e22', '#e74c3c']
max_width = 0.8

for i, (label, count, color) in enumerate(zip(step_labels, step_counts, colors)):
    width = max_width * (count / step_counts[0])
    left = (max_width - width) / 2
    ax.barh(len(steps) - i - 1, width, left=left, height=0.7, color=color, alpha=0.85)
    pct = count / step_counts[0] * 100
    drop = '' if i == 0 else f' ({(1 - count/step_counts[i-1])*100:.0f}% drop)'
    ax.text(0.5, len(steps) - i - 1, f'{label}\n{count:,} ({pct:.1f}%){drop}',
            ha='center', va='center', fontsize=10, fontweight='bold')

ax.set_xlim(0, 1)
ax.set_yticks([])
ax.set_xticks([])
ax.set_title('Organic Search Conversion Funnel', fontsize=14, pad=20)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)
plt.tight_layout()
plt.savefig('seo_funnel.png', dpi=150)
plt.show()

2. Keyword Clustering with TF-IDF

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
from collections import Counter

queries_df = pd.read_csv('search_queries.csv')

# Aggregate query-level metrics
query_agg = queries_df.groupby('query_text').agg(
    impressions=('impressions', 'sum'),
    clicks=('clicks', 'sum'),
    avg_position=('avg_position', 'mean')
).reset_index()
query_agg['ctr'] = query_agg['clicks'] / query_agg['impressions']

# Vectorize query text
vectorizer = TfidfVectorizer(max_features=500, stop_words='english', ngram_range=(1, 2))
tfidf_matrix = vectorizer.fit_transform(query_agg['query_text'])

# Cluster queries into topic groups
n_clusters = 8
kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
query_agg['cluster'] = kmeans.fit_predict(tfidf_matrix)

# Profile each cluster
for cluster_id in range(n_clusters):
    cluster_queries = query_agg[query_agg['cluster'] == cluster_id]
    top_queries = cluster_queries.nlargest(5, 'impressions')['query_text'].tolist()
    avg_pos = cluster_queries['avg_position'].mean()
    total_clicks = cluster_queries['clicks'].sum()
    print(f"\nCluster {cluster_id} ({len(cluster_queries)} queries, {total_clicks:,} clicks, avg pos {avg_pos:.1f}):")
    for q in top_queries:
        print(f"  - {q}")

3. Traffic Forecasting with Prophet

from prophet import Prophet

# Prepare daily organic session counts
organic_daily = sessions_df[sessions_df['medium'] == 'organic'].copy()
organic_daily['session_date'] = pd.to_datetime(organic_daily['session_date'])
daily_traffic = organic_daily.groupby('session_date').size().reset_index()
daily_traffic.columns = ['ds', 'y']

# Fit Prophet model
model = Prophet(
    yearly_seasonality=True,
    weekly_seasonality=True,
    daily_seasonality=False,
    changepoint_prior_scale=0.05
)
model.fit(daily_traffic)

# Forecast next 90 days
future = model.make_future_dataframe(periods=90)
forecast = model.predict(future)

# Plot forecast
fig, ax = plt.subplots(figsize=(14, 6))
ax.plot(daily_traffic['ds'], daily_traffic['y'], color='#2c3e50', alpha=0.6, label='Actual')
ax.plot(forecast['ds'], forecast['yhat'], color='#3498db', linewidth=2, label='Forecast')
ax.fill_between(forecast['ds'], forecast['yhat_lower'], forecast['yhat_upper'],
                alpha=0.15, color='#3498db', label='95% CI')
ax.axvline(x=daily_traffic['ds'].max(), color='red', linestyle='--', alpha=0.7, label='Forecast Start')
ax.set_xlabel('Date')
ax.set_ylabel('Daily Organic Sessions')
ax.set_title('Organic Traffic Forecast (90 Days)')
ax.legend()
plt.tight_layout()
plt.savefig('traffic_forecast.png', dpi=150)
plt.show()

# Print projected growth
last_30_actual = daily_traffic.tail(30)['y'].mean()
next_30_forecast = forecast.tail(90).head(30)['yhat'].mean()
growth_pct = (next_30_forecast - last_30_actual) / last_30_actual * 100
print(f"\nLast 30 days avg: {last_30_actual:,.0f} sessions/day")
print(f"Next 30 days forecast avg: {next_30_forecast:,.0f} sessions/day")
print(f"Projected growth: {growth_pct:+.1f}%")

4. Opportunity Score Matrix

# Score queries by opportunity: high impressions + position 4-20 = easy wins
query_agg['opportunity_score'] = (
    query_agg['impressions'] / query_agg['impressions'].max() * 0.4 +
    (1 - (query_agg['avg_position'] - 1) / 100).clip(0, 1) * 0.3 +
    (1 - query_agg['ctr']).clip(0, 1) * 0.3
)

# Filter to actionable queries (position 4-20)
opportunities = query_agg[
    (query_agg['avg_position'] >= 4) & (query_agg['avg_position'] <= 20)
].nlargest(20, 'opportunity_score')

fig, ax = plt.subplots(figsize=(12, 6))
scatter = ax.scatter(
    opportunities['avg_position'],
    opportunities['impressions'],
    s=opportunities['opportunity_score'] * 500,
    c=opportunities['ctr'],
    cmap='RdYlGn',
    alpha=0.7,
    edgecolors='#333'
)
ax.set_xlabel('Average Position')
ax.set_ylabel('Total Impressions')
ax.set_title('SEO Opportunity Matrix (size = opportunity score, color = CTR)')
ax.invert_xaxis()
plt.colorbar(scatter, label='CTR')

# Annotate top 5
for _, row in opportunities.head(5).iterrows():
    ax.annotate(row['query_text'][:25], (row['avg_position'], row['impressions']),
                textcoords='offset points', xytext=(10, 5), fontsize=8)

plt.tight_layout()
plt.savefig('seo_opportunity_matrix.png', dpi=150)
plt.show()

Excel Dashboard

1. Traffic Overview Tab — Create a summary table with rows for each traffic source (Organic, Paid, Social, Direct, Email, Referral) and columns for Sessions, Bounce Rate, Avg Pages/Session, Avg Duration, Conversions, Revenue, and Conv Rate. Use SUMIFS and AVERAGEIFS from the raw session data. Add a pie chart showing session share by source.

2. Keyword Tracker — Build a keyword tracking sheet with columns: Query, Current Position, Previous Position, Change, Impressions, Clicks, CTR, and Landing Page. Use conditional formatting with arrows to highlight position improvements (green) and declines (red). Add a sparkline column showing weekly position trend.

3. Landing Page Scorecard — Create a scorecard for the top 20 landing pages showing Organic Sessions, Bounce Rate, Avg Time on Page, Conversions, Revenue, and a calculated "Page Score" = weighted sum of normalized metrics. Rank pages by score and apply heatmap formatting across all metric columns.

4. Funnel Visualization — Build a funnel chart using stacked bar segments. Each row represents a funnel step (Page View through Purchase). Calculate step-to-step drop-off rates and display them alongside. Use data bars conditional formatting as an alternative if funnel charts are not available in your Excel version.

5. Traffic Trend Charts — Create a line chart showing weekly organic sessions over the past 12 months with a trendline and R-squared value displayed. Add a secondary line for non-organic traffic as a comparison. Include a forecast trendline extending 13 weeks. Below the chart, add a data table with month-over-month growth percentages.

6. Opportunity Prioritization — Build a prioritized action list from the keyword opportunity analysis. Columns: Query, Current Position, Target Position, Estimated Click Gain (formula: =Impressions * (Target_CTR - Current_CTR)), Effort Level (dropdown: Low/Medium/High), and Priority Score. Sort by Priority Score descending. Use slicers for Effort Level to help the team focus on quick wins.

Key Insights

Position 4-10 Opportunity

Queries ranking on positions 4-10 represent the highest ROI optimization targets. Moving from position 8 to position 3 can increase CTR by 5-10x, and these pages already have topical authority, requiring less effort than building new content.

Funnel Bottlenecks

The largest drop-off typically occurs between product view and add-to-cart. Optimizing product pages with better descriptions, images, reviews, and clearer CTAs can dramatically improve this step's conversion rate.

Content Type Performance

Blog posts may drive high traffic but low conversion, while product and comparison pages convert at higher rates. A balanced content strategy targets both informational queries (for top-of-funnel) and transactional queries (for bottom-of-funnel).

Seasonality in Forecasts

Prophet captures weekly and yearly seasonality patterns. Ignoring seasonality can lead to misinterpreting normal cyclical dips as traffic problems. Always account for holidays, weekends, and industry-specific seasonal patterns.

Knowledge Check

Question 1

A query has 50,000 impressions, 500 clicks, and an average position of 8.5. What is the CTR and what does the position suggest?

Question 2

What is bounce rate and why does it matter for SEO?

Question 3

In the keyword clustering code, what does TF-IDF do?

Question 4

Why does the Prophet model set changepoint_prior_scale=0.05?

Question 5

What does the FIRST_VALUE window function accomplish in the funnel drop-off SQL query?

← Previous Project Next Project →