Project 1

Online Store Sales Dashboard

Build a comprehensive sales dashboard for an online retailer to analyze revenue trends, top-selling products, and regional performance using SQL, Python, and Excel.

Project Overview

Difficulty: Beginner

Tools: SQL, Python (pandas, matplotlib, seaborn), Excel

Estimated Time: 4-6 hours

You are a data analyst at ShopStream, an online retailer that sells electronics, clothing, and home goods across the United States. The VP of Sales has asked you to build a dashboard that answers the following business questions:

Your analysis will inform inventory decisions, marketing spend allocation, and regional expansion planning for the next fiscal year.

Dataset Description

The project uses three related tables in a relational database. Below are the schemas you will work with.

Table: orders

Column Data Type Description
order_idINT (PK)Unique order identifier
customer_idINT (FK)References customers table
order_dateDATEDate the order was placed
total_amountDECIMAL(10,2)Total order value in USD
statusVARCHAR(20)completed, pending, returned
regionVARCHAR(50)US region (Northeast, Southeast, etc.)

Table: order_items

Column Data Type Description
item_idINT (PK)Unique line item identifier
order_idINT (FK)References orders table
product_idINT (FK)References products table
quantityINTNumber of units purchased
unit_priceDECIMAL(10,2)Price per unit at time of sale

Table: products

Column Data Type Description
product_idINT (PK)Unique product identifier
product_nameVARCHAR(100)Name of the product
categoryVARCHAR(50)Electronics, Clothing, Home Goods
cost_priceDECIMAL(10,2)Wholesale cost per unit

SQL Analysis

Use the following queries to extract insights from the database. Run each query and examine the results before moving to the Python visualization step.

Query 1: Monthly Revenue Trend

This query calculates total revenue by month for completed orders, giving you the data for a trend line chart.

SELECT
    DATE_FORMAT(o.order_date, '%Y-%m') AS order_month,
    COUNT(DISTINCT o.order_id)         AS total_orders,
    SUM(oi.quantity * oi.unit_price)   AS monthly_revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.status = 'completed'
GROUP BY DATE_FORMAT(o.order_date, '%Y-%m')
ORDER BY order_month;

Query 2: Top 10 Products by Revenue

Identifies the highest-grossing products so the merchandising team knows which items to promote and keep in stock.

SELECT
    p.product_name,
    p.category,
    SUM(oi.quantity)                   AS units_sold,
    SUM(oi.quantity * oi.unit_price)   AS total_revenue,
    SUM(oi.quantity * (oi.unit_price - p.cost_price)) AS gross_profit
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
JOIN orders o  ON oi.order_id   = o.order_id
WHERE o.status = 'completed'
GROUP BY p.product_id, p.product_name, p.category
ORDER BY total_revenue DESC
LIMIT 10;

Query 3: Sales by Region

Breaks down revenue and order count by US region to highlight geographic performance differences.

SELECT
    o.region,
    COUNT(DISTINCT o.order_id)         AS order_count,
    SUM(oi.quantity * oi.unit_price)   AS region_revenue,
    ROUND(AVG(o.total_amount), 2)      AS avg_order_value
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.status = 'completed'
GROUP BY o.region
ORDER BY region_revenue DESC;

Query 4: Customer Lifetime Value (CLV)

Calculates total spend per customer over their entire purchase history, then summarizes the distribution.

WITH customer_spend AS (
    SELECT
        o.customer_id,
        COUNT(DISTINCT o.order_id) AS total_orders,
        SUM(o.total_amount)        AS lifetime_value,
        MIN(o.order_date)          AS first_order,
        MAX(o.order_date)          AS last_order
    FROM orders o
    WHERE o.status = 'completed'
    GROUP BY o.customer_id
)
SELECT
    ROUND(AVG(lifetime_value), 2)  AS avg_clv,
    ROUND(MIN(lifetime_value), 2)  AS min_clv,
    ROUND(MAX(lifetime_value), 2)  AS max_clv,
    ROUND(AVG(total_orders), 1)    AS avg_orders_per_customer,
    COUNT(*)                       AS total_customers
FROM customer_spend;

Python Analysis

After extracting data with SQL, use Python to clean, transform, and visualize the results. The following code assumes you have exported your SQL results to CSV files or are connecting directly via a library like sqlalchemy.

Setup and Data Loading

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load data (adjust paths or use sqlalchemy for direct DB connection)
orders = pd.read_csv('orders.csv', parse_dates=['order_date'])
order_items = pd.read_csv('order_items.csv')
products = pd.read_csv('products.csv')

# Merge into a single analysis dataframe
df = order_items.merge(orders, on='order_id').merge(products, on='product_id')
df = df[df['status'] == 'completed']
df['line_total'] = df['quantity'] * df['unit_price']
df['profit'] = df['quantity'] * (df['unit_price'] - df['cost_price'])

print(f"Dataset: {len(df):,} line items across {df['order_id'].nunique():,} orders")
print(f"Date range: {df['order_date'].min().date()} to {df['order_date'].max().date()}")

Visualization 1: Monthly Revenue Trend (Line Chart)

monthly = df.groupby(df['order_date'].dt.to_period('M')).agg(
    revenue=('line_total', 'sum'),
    orders=('order_id', 'nunique')
).reset_index()
monthly['order_date'] = monthly['order_date'].dt.to_timestamp()

fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(monthly['order_date'], monthly['revenue'], marker='o', color='#e07a2f', linewidth=2)
ax.fill_between(monthly['order_date'], monthly['revenue'], alpha=0.15, color='#e07a2f')
ax.set_title('Monthly Revenue Trend', fontsize=16, fontweight='bold')
ax.set_xlabel('Month')
ax.set_ylabel('Revenue ($)')
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'${x:,.0f}'))
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('monthly_revenue_trend.png', dpi=150)
plt.show()

Visualization 2: Top 10 Products (Horizontal Bar Chart)

top_products = (
    df.groupby('product_name')['line_total']
    .sum()
    .nlargest(10)
    .sort_values()
)

fig, ax = plt.subplots(figsize=(10, 6))
top_products.plot(kind='barh', color='#e07a2f', ax=ax)
ax.set_title('Top 10 Products by Revenue', fontsize=16, fontweight='bold')
ax.set_xlabel('Revenue ($)')
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'${x:,.0f}'))
plt.tight_layout()
plt.savefig('top_products.png', dpi=150)
plt.show()

Visualization 3: Revenue by Region (Heatmap)

region_category = df.pivot_table(
    values='line_total',
    index='region',
    columns='category',
    aggfunc='sum'
).fillna(0)

fig, ax = plt.subplots(figsize=(8, 5))
sns.heatmap(
    region_category,
    annot=True,
    fmt=',.0f',
    cmap='YlOrRd',
    linewidths=0.5,
    ax=ax
)
ax.set_title('Revenue by Region and Category', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.savefig('region_heatmap.png', dpi=150)
plt.show()

Excel Dashboard Instructions

Follow these steps to build an interactive sales dashboard in Excel using the exported data.

Step-by-Step Guide

  • Import Data: Open Excel and go to Data > Get Data > From Text/CSV. Import the orders, order_items, and products CSV files into separate sheets. Use Power Query to merge the three tables on their key columns (order_id and product_id).
  • Create a Date Table: Add a helper column with =TEXT(order_date,"YYYY-MM") to group orders by month. Create another column with =TEXT(order_date,"MMMM") for month names.
  • Build Pivot Table - Monthly Revenue: Select the merged data, Insert > PivotTable. Drag order_month to Rows and line_total (quantity * unit_price) to Values (Sum). This gives you the monthly revenue breakdown.
  • Build Pivot Table - Top Products: Create a second PivotTable. Drag product_name to Rows, line_total to Values (Sum). Right-click the Values field > Sort Largest to Smallest. Use Value Filters to show Top 10.
  • Build Pivot Table - Regional Analysis: Create a third PivotTable with region in Rows, category in Columns, and line_total in Values. This creates a cross-tabulation of region vs. product category.
  • Add Conditional Formatting: Select the regional pivot table, go to Home > Conditional Formatting > Color Scales. Choose a Red-Yellow-Green scale so high-revenue cells stand out immediately.
  • Create Charts: From the monthly pivot, insert a Line Chart for revenue trend. From the top products pivot, insert a Bar Chart. From the regional pivot, insert a Stacked Bar Chart showing category breakdown per region.
  • Build the Dashboard Sheet: Create a new sheet called "Dashboard". Copy your charts here and arrange them in a 2x2 grid. Add a title row with the company name and date range. Add slicers for region and category to make the dashboard interactive.
  • Add KPI Cards: At the top of the dashboard, create cells showing Total Revenue (=SUM), Total Orders (=COUNTA), Average Order Value (=AVERAGE), and Total Customers (=COUNTUNIQUE or approximate with a helper column). Format these as large bold numbers with labels above.
  • Final Formatting: Remove gridlines (View > uncheck Gridlines). Set a consistent color scheme. Group the dashboard elements and protect the sheet to prevent accidental edits.
  • Key Insights & Recommendations

    After completing your analysis, you should be able to present findings like these to stakeholders:

    Seasonal Revenue Peaks

    Revenue spikes during November-December (holiday season) and dips in January-February. Recommend increasing ad spend 6 weeks before peak periods and pre-stocking high-demand items.

    Top Product Concentration

    The top 10 products account for roughly 35% of total revenue. This concentration risk means stockouts on these items have outsized impact. Set up automated reorder alerts for these SKUs.

    Regional Performance Gap

    The West and Northeast regions outperform the Southeast and Midwest by 40%. Consider targeted promotions, free shipping thresholds, or regional pricing adjustments for underperforming areas.

    Customer Lifetime Value

    The average CLV is $247, but the top 10% of customers average $1,200+. A loyalty program targeting mid-tier customers ($100-$300 CLV) could shift them into the high-value segment.

    Category Mix Opportunity

    Electronics drive the most revenue but Home Goods have the highest profit margin (42% vs. 28%). Cross-selling home goods to electronics buyers could improve overall profitability.

    Average Order Value Trends

    AOV has been declining 3% month-over-month. Implement bundle discounts and "frequently bought together" suggestions to encourage larger cart sizes.

    Knowledge Check Quiz

    Test your understanding of the concepts covered in this project.

    Question 1

    Which SQL clause would you use to calculate monthly revenue from daily order data?

    Question 2

    In the Python analysis, what does df.pivot_table() do when creating the region heatmap?

    Question 3

    What is Customer Lifetime Value (CLV)?

    Question 4

    In Excel, which feature lets users filter a PivotTable interactively by clicking buttons on the dashboard?

    Question 5

    Why does the SQL query use a CTE (WITH clause) for the Customer Lifetime Value calculation?

    ← Previous Project Next Project →