Online Store Sales Dashboard
Build a comprehensive sales dashboard for an online retailer to analyze revenue trends, top-selling products, and regional performance using SQL, Python, and Excel.
Project Overview
Difficulty: Beginner
Tools: SQL, Python (pandas, matplotlib, seaborn), Excel
Estimated Time: 4-6 hours
You are a data analyst at ShopStream, an online retailer that sells electronics, clothing, and home goods across the United States. The VP of Sales has asked you to build a dashboard that answers the following business questions:
- How has monthly revenue trended over the past year?
- Which products generate the most revenue?
- Which regions are performing best and worst?
- What is the average customer lifetime value?
Your analysis will inform inventory decisions, marketing spend allocation, and regional expansion planning for the next fiscal year.
Dataset Description
The project uses three related tables in a relational database. Below are the schemas you will work with.
Table: orders
| Column | Data Type | Description |
|---|---|---|
| order_id | INT (PK) | Unique order identifier |
| customer_id | INT (FK) | References customers table |
| order_date | DATE | Date the order was placed |
| total_amount | DECIMAL(10,2) | Total order value in USD |
| status | VARCHAR(20) | completed, pending, returned |
| region | VARCHAR(50) | US region (Northeast, Southeast, etc.) |
Table: order_items
| Column | Data Type | Description |
|---|---|---|
| item_id | INT (PK) | Unique line item identifier |
| order_id | INT (FK) | References orders table |
| product_id | INT (FK) | References products table |
| quantity | INT | Number of units purchased |
| unit_price | DECIMAL(10,2) | Price per unit at time of sale |
Table: products
| Column | Data Type | Description |
|---|---|---|
| product_id | INT (PK) | Unique product identifier |
| product_name | VARCHAR(100) | Name of the product |
| category | VARCHAR(50) | Electronics, Clothing, Home Goods |
| cost_price | DECIMAL(10,2) | Wholesale cost per unit |
SQL Analysis
Use the following queries to extract insights from the database. Run each query and examine the results before moving to the Python visualization step.
Query 1: Monthly Revenue Trend
This query calculates total revenue by month for completed orders, giving you the data for a trend line chart.
SELECT
DATE_FORMAT(o.order_date, '%Y-%m') AS order_month,
COUNT(DISTINCT o.order_id) AS total_orders,
SUM(oi.quantity * oi.unit_price) AS monthly_revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.status = 'completed'
GROUP BY DATE_FORMAT(o.order_date, '%Y-%m')
ORDER BY order_month;
Query 2: Top 10 Products by Revenue
Identifies the highest-grossing products so the merchandising team knows which items to promote and keep in stock.
SELECT
p.product_name,
p.category,
SUM(oi.quantity) AS units_sold,
SUM(oi.quantity * oi.unit_price) AS total_revenue,
SUM(oi.quantity * (oi.unit_price - p.cost_price)) AS gross_profit
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
JOIN orders o ON oi.order_id = o.order_id
WHERE o.status = 'completed'
GROUP BY p.product_id, p.product_name, p.category
ORDER BY total_revenue DESC
LIMIT 10;
Query 3: Sales by Region
Breaks down revenue and order count by US region to highlight geographic performance differences.
SELECT
o.region,
COUNT(DISTINCT o.order_id) AS order_count,
SUM(oi.quantity * oi.unit_price) AS region_revenue,
ROUND(AVG(o.total_amount), 2) AS avg_order_value
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.status = 'completed'
GROUP BY o.region
ORDER BY region_revenue DESC;
Query 4: Customer Lifetime Value (CLV)
Calculates total spend per customer over their entire purchase history, then summarizes the distribution.
WITH customer_spend AS (
SELECT
o.customer_id,
COUNT(DISTINCT o.order_id) AS total_orders,
SUM(o.total_amount) AS lifetime_value,
MIN(o.order_date) AS first_order,
MAX(o.order_date) AS last_order
FROM orders o
WHERE o.status = 'completed'
GROUP BY o.customer_id
)
SELECT
ROUND(AVG(lifetime_value), 2) AS avg_clv,
ROUND(MIN(lifetime_value), 2) AS min_clv,
ROUND(MAX(lifetime_value), 2) AS max_clv,
ROUND(AVG(total_orders), 1) AS avg_orders_per_customer,
COUNT(*) AS total_customers
FROM customer_spend;
Python Analysis
After extracting data with SQL, use Python to clean, transform, and visualize the results. The following code assumes you have exported your SQL results to CSV files or are connecting directly via a library like sqlalchemy.
Setup and Data Loading
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Load data (adjust paths or use sqlalchemy for direct DB connection)
orders = pd.read_csv('orders.csv', parse_dates=['order_date'])
order_items = pd.read_csv('order_items.csv')
products = pd.read_csv('products.csv')
# Merge into a single analysis dataframe
df = order_items.merge(orders, on='order_id').merge(products, on='product_id')
df = df[df['status'] == 'completed']
df['line_total'] = df['quantity'] * df['unit_price']
df['profit'] = df['quantity'] * (df['unit_price'] - df['cost_price'])
print(f"Dataset: {len(df):,} line items across {df['order_id'].nunique():,} orders")
print(f"Date range: {df['order_date'].min().date()} to {df['order_date'].max().date()}")
Visualization 1: Monthly Revenue Trend (Line Chart)
monthly = df.groupby(df['order_date'].dt.to_period('M')).agg(
revenue=('line_total', 'sum'),
orders=('order_id', 'nunique')
).reset_index()
monthly['order_date'] = monthly['order_date'].dt.to_timestamp()
fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(monthly['order_date'], monthly['revenue'], marker='o', color='#e07a2f', linewidth=2)
ax.fill_between(monthly['order_date'], monthly['revenue'], alpha=0.15, color='#e07a2f')
ax.set_title('Monthly Revenue Trend', fontsize=16, fontweight='bold')
ax.set_xlabel('Month')
ax.set_ylabel('Revenue ($)')
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'${x:,.0f}'))
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('monthly_revenue_trend.png', dpi=150)
plt.show()
Visualization 2: Top 10 Products (Horizontal Bar Chart)
top_products = (
df.groupby('product_name')['line_total']
.sum()
.nlargest(10)
.sort_values()
)
fig, ax = plt.subplots(figsize=(10, 6))
top_products.plot(kind='barh', color='#e07a2f', ax=ax)
ax.set_title('Top 10 Products by Revenue', fontsize=16, fontweight='bold')
ax.set_xlabel('Revenue ($)')
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'${x:,.0f}'))
plt.tight_layout()
plt.savefig('top_products.png', dpi=150)
plt.show()
Visualization 3: Revenue by Region (Heatmap)
region_category = df.pivot_table(
values='line_total',
index='region',
columns='category',
aggfunc='sum'
).fillna(0)
fig, ax = plt.subplots(figsize=(8, 5))
sns.heatmap(
region_category,
annot=True,
fmt=',.0f',
cmap='YlOrRd',
linewidths=0.5,
ax=ax
)
ax.set_title('Revenue by Region and Category', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.savefig('region_heatmap.png', dpi=150)
plt.show()
Excel Dashboard Instructions
Follow these steps to build an interactive sales dashboard in Excel using the exported data.
Step-by-Step Guide
Key Insights & Recommendations
After completing your analysis, you should be able to present findings like these to stakeholders:
Seasonal Revenue Peaks
Revenue spikes during November-December (holiday season) and dips in January-February. Recommend increasing ad spend 6 weeks before peak periods and pre-stocking high-demand items.
Top Product Concentration
The top 10 products account for roughly 35% of total revenue. This concentration risk means stockouts on these items have outsized impact. Set up automated reorder alerts for these SKUs.
Regional Performance Gap
The West and Northeast regions outperform the Southeast and Midwest by 40%. Consider targeted promotions, free shipping thresholds, or regional pricing adjustments for underperforming areas.
Customer Lifetime Value
The average CLV is $247, but the top 10% of customers average $1,200+. A loyalty program targeting mid-tier customers ($100-$300 CLV) could shift them into the high-value segment.
Category Mix Opportunity
Electronics drive the most revenue but Home Goods have the highest profit margin (42% vs. 28%). Cross-selling home goods to electronics buyers could improve overall profitability.
Average Order Value Trends
AOV has been declining 3% month-over-month. Implement bundle discounts and "frequently bought together" suggestions to encourage larger cart sizes.
Knowledge Check Quiz
Test your understanding of the concepts covered in this project.
Which SQL clause would you use to calculate monthly revenue from daily order data?
In the Python analysis, what does df.pivot_table() do when creating the region heatmap?
What is Customer Lifetime Value (CLV)?
In Excel, which feature lets users filter a PivotTable interactively by clicking buttons on the dashboard?
Why does the SQL query use a CTE (WITH clause) for the Customer Lifetime Value calculation?