Price Optimization Analysis
Analyze price elasticity of demand, build regression models, and determine optimal pricing strategies that maximize revenue or profit for an e-commerce retailer.
Project Overview
Difficulty: Advanced
Tools: SQL, Python (pandas, numpy, sklearn, scipy, matplotlib), Excel
Estimated Time: 8-10 hours
You are a pricing analyst at TechDirect, an online electronics retailer competing with Amazon, Best Buy, and Newegg. The CEO wants a data-driven pricing strategy to replace the current cost-plus approach. Your analysis will answer:
- How sensitive is demand to price changes for each product category? (Price Elasticity)
- What is the relationship between price and units sold? (Demand Curve Modeling)
- At what price point is revenue maximized? Is profit maximized at the same price?
- How do competitor prices affect our sales volume?
The key concept is Price Elasticity of Demand (PED): the percentage change in quantity demanded resulting from a 1% change in price. If PED = -1.5, a 10% price increase causes a 15% drop in demand.
Dataset Description
The project uses pricing history, sales data, and competitor price tracking.
Table: pricing_history
| Column | Data Type | Description |
|---|---|---|
| price_record_id | INT (PK) | Unique record identifier |
| product_id | INT (FK) | References products table |
| effective_date | DATE | Date the price took effect |
| our_price | DECIMAL(10,2) | Our selling price |
| cost_price | DECIMAL(10,2) | Our cost to purchase/produce |
| units_sold | INT | Units sold during this price period |
| days_at_price | INT | Number of days at this price point |
Table: products
| Column | Data Type | Description |
|---|---|---|
| product_id | INT (PK) | Unique product identifier |
| product_name | VARCHAR(100) | Product display name |
| category | VARCHAR(50) | Laptops, Headphones, Monitors, etc. |
| brand | VARCHAR(50) | Manufacturer brand |
| min_price | DECIMAL(10,2) | Floor price (MAP or margin floor) |
Table: competitor_prices
| Column | Data Type | Description |
|---|---|---|
| competitor_id | INT (PK) | Unique competitor price record |
| product_id | INT (FK) | References products table |
| competitor_name | VARCHAR(50) | Amazon, BestBuy, Newegg |
| competitor_price | DECIMAL(10,2) | Competitor's price on that date |
| scraped_date | DATE | Date the price was recorded |
SQL Analysis
Extract pricing and sales data at different price points to feed into the elasticity and regression models.
Query 1: Sales Performance at Different Price Points
For each product, shows how units sold per day vary at different price levels. This is the raw data for calculating price elasticity.
SELECT
p.product_name,
p.category,
ph.our_price,
ph.cost_price,
ph.units_sold,
ph.days_at_price,
ROUND(ph.units_sold * 1.0 / ph.days_at_price, 2) AS daily_units,
ROUND(ph.our_price - ph.cost_price, 2) AS margin_per_unit,
ROUND((ph.our_price - ph.cost_price) * ph.units_sold, 2) AS total_profit,
ROUND(ph.our_price * ph.units_sold, 2) AS total_revenue
FROM pricing_history ph
JOIN products p ON ph.product_id = p.product_id
ORDER BY p.product_name, ph.our_price;
Query 2: Price Elasticity Calculation (Point Elasticity)
Uses LAG window function to compare consecutive price periods and calculate the point elasticity between each price change.
WITH price_changes AS (
SELECT
p.product_name,
p.category,
ph.our_price,
ph.units_sold * 1.0 / ph.days_at_price AS daily_demand,
LAG(ph.our_price) OVER (
PARTITION BY ph.product_id ORDER BY ph.effective_date
) AS prev_price,
LAG(ph.units_sold * 1.0 / ph.days_at_price) OVER (
PARTITION BY ph.product_id ORDER BY ph.effective_date
) AS prev_demand
FROM pricing_history ph
JOIN products p ON ph.product_id = p.product_id
)
SELECT
product_name,
category,
prev_price,
our_price AS new_price,
ROUND((our_price - prev_price) / prev_price * 100, 2) AS price_change_pct,
ROUND(prev_demand, 2) AS prev_daily_demand,
ROUND(daily_demand, 2) AS new_daily_demand,
ROUND((daily_demand - prev_demand) / prev_demand * 100, 2) AS demand_change_pct,
ROUND(
((daily_demand - prev_demand) / prev_demand) /
((our_price - prev_price) / prev_price),
2
) AS point_elasticity
FROM price_changes
WHERE prev_price IS NOT NULL
AND prev_price != our_price
ORDER BY product_name, new_price;
Query 3: Competitive Price Positioning
Compares our prices with competitor prices to understand the price gap and its impact on our sales.
SELECT
p.product_name,
p.category,
ph.our_price,
cp.competitor_name,
cp.competitor_price,
ROUND(ph.our_price - cp.competitor_price, 2) AS price_diff,
ROUND((ph.our_price - cp.competitor_price) / cp.competitor_price * 100, 1) AS pct_diff,
ROUND(ph.units_sold * 1.0 / ph.days_at_price, 2) AS our_daily_units
FROM pricing_history ph
JOIN products p ON ph.product_id = p.product_id
JOIN competitor_prices cp ON ph.product_id = cp.product_id
AND cp.scraped_date BETWEEN ph.effective_date
AND DATE_ADD(ph.effective_date, INTERVAL ph.days_at_price DAY)
ORDER BY p.product_name, cp.competitor_name, ph.effective_date;
Query 4: Revenue-Maximizing Price Point by Category
For each category, identifies the price band that generates the highest daily revenue.
WITH daily_metrics AS (
SELECT
p.category,
CASE
WHEN ph.our_price < 50 THEN 'Under $50'
WHEN ph.our_price < 100 THEN '$50-$99'
WHEN ph.our_price < 200 THEN '$100-$199'
WHEN ph.our_price < 500 THEN '$200-$499'
ELSE '$500+'
END AS price_band,
ph.our_price * (ph.units_sold * 1.0 / ph.days_at_price) AS daily_revenue,
(ph.our_price - ph.cost_price) * (ph.units_sold * 1.0 / ph.days_at_price) AS daily_profit
FROM pricing_history ph
JOIN products p ON ph.product_id = p.product_id
)
SELECT
category,
price_band,
ROUND(AVG(daily_revenue), 2) AS avg_daily_revenue,
ROUND(AVG(daily_profit), 2) AS avg_daily_profit,
COUNT(*) AS data_points
FROM daily_metrics
GROUP BY category, price_band
ORDER BY category, avg_daily_revenue DESC;
Python Analysis
Use Python to build demand curves, calculate elasticity, and find optimal prices using regression analysis.
Data Loading and Exploration
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from scipy.optimize import minimize_scalar
# Load data
pricing = pd.read_csv('pricing_history.csv', parse_dates=['effective_date'])
products = pd.read_csv('products.csv')
competitors = pd.read_csv('competitor_prices.csv', parse_dates=['scraped_date'])
# Merge and calculate daily demand
df = pricing.merge(products, on='product_id')
df['daily_demand'] = df['units_sold'] / df['days_at_price']
df['daily_revenue'] = df['our_price'] * df['daily_demand']
df['margin'] = df['our_price'] - df['cost_price']
df['daily_profit'] = df['margin'] * df['daily_demand']
print(f"Products analyzed: {df['product_id'].nunique()}")
print(f"Price points per product: {df.groupby('product_id').size().mean():.1f} avg")
print(f"\nPrice range: ${df['our_price'].min():.2f} - ${df['our_price'].max():.2f}")
print(f"Daily demand range: {df['daily_demand'].min():.1f} - {df['daily_demand'].max():.1f}")
Price Elasticity Calculation
def calculate_elasticity(group):
"""Calculate arc elasticity between consecutive price points."""
group = group.sort_values('effective_date')
results = []
for i in range(1, len(group)):
p1, p2 = group.iloc[i-1]['our_price'], group.iloc[i]['our_price']
q1, q2 = group.iloc[i-1]['daily_demand'], group.iloc[i]['daily_demand']
if p1 == p2 or q1 == 0:
continue
# Arc (midpoint) elasticity: more stable than point elasticity
pct_change_q = (q2 - q1) / ((q1 + q2) / 2)
pct_change_p = (p2 - p1) / ((p1 + p2) / 2)
elasticity = pct_change_q / pct_change_p
results.append({
'product_id': group.iloc[i]['product_id'],
'product_name': group.iloc[i]['product_name'],
'category': group.iloc[i]['category'],
'price_from': p1,
'price_to': p2,
'demand_from': round(q1, 2),
'demand_to': round(q2, 2),
'elasticity': round(elasticity, 3)
})
return pd.DataFrame(results)
elasticity_df = df.groupby('product_id').apply(calculate_elasticity).reset_index(drop=True)
# Summary by category
cat_elasticity = elasticity_df.groupby('category')['elasticity'].agg(['mean', 'median', 'std']).round(3)
cat_elasticity.columns = ['mean_elasticity', 'median_elasticity', 'std_elasticity']
print("Price Elasticity by Category:")
print(cat_elasticity.to_string())
print("\nInterpretation: Elasticity < -1 = elastic (demand sensitive to price)")
print(" Elasticity > -1 = inelastic (demand less sensitive)")
Demand Curve Regression and Optimal Price
def find_optimal_price(product_data, product_name):
"""
Fit a demand curve and find the price that maximizes revenue or profit.
Uses polynomial regression: demand = a*price^2 + b*price + c
"""
X = product_data['our_price'].values.reshape(-1, 1)
y = product_data['daily_demand'].values
# Fit quadratic demand curve
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)
model = LinearRegression()
model.fit(X_poly, y)
r_squared = model.score(X_poly, y)
coefs = model.coef_
intercept = model.intercept_
# Revenue = price * demand(price)
# Profit = (price - cost) * demand(price)
avg_cost = product_data['cost_price'].mean()
def neg_revenue(price):
pred_demand = model.predict(poly.transform([[price]]))[0]
return -(price * max(pred_demand, 0))
def neg_profit(price):
pred_demand = model.predict(poly.transform([[price]]))[0]
return -((price - avg_cost) * max(pred_demand, 0))
price_min = product_data['our_price'].min() * 0.8
price_max = product_data['our_price'].max() * 1.2
opt_revenue = minimize_scalar(neg_revenue, bounds=(price_min, price_max), method='bounded')
opt_profit = minimize_scalar(neg_profit, bounds=(price_min, price_max), method='bounded')
return {
'product_name': product_name,
'r_squared': round(r_squared, 4),
'cost_price': round(avg_cost, 2),
'revenue_optimal_price': round(opt_revenue.x, 2),
'max_daily_revenue': round(-opt_revenue.fun, 2),
'profit_optimal_price': round(opt_profit.x, 2),
'max_daily_profit': round(-opt_profit.fun, 2),
'model': model,
'poly': poly
}
# Run for each product with enough data points
results = []
for pid, group in df.groupby('product_id'):
if len(group) >= 4: # Need at least 4 data points for quadratic fit
name = group.iloc[0]['product_name']
result = find_optimal_price(group, name)
results.append(result)
results_df = pd.DataFrame(results)
print("\nOptimal Pricing Results:")
print(results_df[['product_name', 'cost_price', 'revenue_optimal_price',
'profit_optimal_price', 'r_squared']].to_string(index=False))
Visualizations
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Select a sample product for detailed visualization
sample = df[df['product_id'] == df['product_id'].value_counts().index[0]].copy()
sample_name = sample.iloc[0]['product_name']
sample_result = [r for r in results if r['product_name'] == sample_name][0]
model = sample_result['model']
poly = sample_result['poly']
# Plot 1: Demand Curve with Regression
ax1 = axes[0, 0]
price_range = np.linspace(sample['our_price'].min() * 0.8,
sample['our_price'].max() * 1.2, 100)
pred_demand = model.predict(poly.transform(price_range.reshape(-1, 1)))
ax1.scatter(sample['our_price'], sample['daily_demand'],
color='#e07a2f', s=80, zorder=5, label='Observed')
ax1.plot(price_range, pred_demand, color='#4a90d9', linewidth=2, label='Demand Curve')
ax1.axvline(x=sample_result['revenue_optimal_price'], color='green',
linestyle='--', label=f"Rev-Optimal: ${sample_result['revenue_optimal_price']}")
ax1.axvline(x=sample_result['profit_optimal_price'], color='red',
linestyle='--', label=f"Profit-Optimal: ${sample_result['profit_optimal_price']}")
ax1.set_xlabel('Price ($)')
ax1.set_ylabel('Daily Units Sold')
ax1.set_title(f'Demand Curve: {sample_name[:30]}', fontweight='bold')
ax1.legend(fontsize=8)
# Plot 2: Revenue and Profit Curves
ax2 = axes[0, 1]
pred_demand_clipped = np.maximum(pred_demand, 0)
revenue_curve = price_range * pred_demand_clipped
avg_cost = sample['cost_price'].mean()
profit_curve = (price_range - avg_cost) * pred_demand_clipped
ax2.plot(price_range, revenue_curve, color='#4a90d9', linewidth=2, label='Revenue')
ax2.plot(price_range, profit_curve, color='#e07a2f', linewidth=2, label='Profit')
ax2.axvline(x=sample_result['revenue_optimal_price'], color='#4a90d9',
linestyle='--', alpha=0.5)
ax2.axvline(x=sample_result['profit_optimal_price'], color='#e07a2f',
linestyle='--', alpha=0.5)
ax2.set_xlabel('Price ($)')
ax2.set_ylabel('Daily $ Amount')
ax2.set_title('Revenue vs Profit by Price Point', fontweight='bold')
ax2.legend()
# Plot 3: Elasticity Distribution by Category
ax3 = axes[1, 0]
categories = elasticity_df['category'].unique()
elast_data = [elasticity_df[elasticity_df['category'] == c]['elasticity'].dropna()
for c in categories]
bp = ax3.boxplot(elast_data, labels=categories, patch_artist=True)
colors = ['#e07a2f', '#4a90d9', '#50c878', '#e05050', '#9b59b6']
for patch, color in zip(bp['boxes'], colors[:len(categories)]):
patch.set_facecolor(color)
patch.set_alpha(0.6)
ax3.axhline(y=-1, color='red', linestyle='--', alpha=0.5, label='Unit Elastic (-1)')
ax3.set_ylabel('Price Elasticity')
ax3.set_title('Elasticity Distribution by Category', fontweight='bold')
ax3.legend(fontsize=9)
plt.setp(ax3.xaxis.get_majorticklabels(), rotation=30, ha='right')
# Plot 4: Our Price vs Competitor Prices
ax4 = axes[1, 1]
comp_avg = competitors.groupby(['product_id', 'competitor_name'])['competitor_price'].mean().reset_index()
comp_wide = comp_avg.pivot(index='product_id', columns='competitor_name', values='competitor_price')
our_avg = df.groupby('product_id')['our_price'].mean()
comp_wide['TechDirect'] = our_avg
comp_wide = comp_wide.dropna().head(10)
comp_wide.plot(kind='bar', ax=ax4, width=0.8)
ax4.set_ylabel('Average Price ($)')
ax4.set_title('Price Comparison vs Competitors', fontweight='bold')
ax4.set_xlabel('Product ID')
ax4.legend(fontsize=8)
plt.setp(ax4.xaxis.get_majorticklabels(), rotation=0)
plt.suptitle('Price Optimization Analysis', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.savefig('price_optimization.png', dpi=150)
plt.show()
Pricing Recommendations Report
# Generate final pricing recommendations
recommendations = results_df.copy()
recommendations['current_avg_price'] = [
df[df['product_name'] == name]['our_price'].mean()
for name in recommendations['product_name']
]
recommendations['revenue_change_pct'] = (
(recommendations['revenue_optimal_price'] - recommendations['current_avg_price'])
/ recommendations['current_avg_price'] * 100
).round(1)
recommendations['action'] = np.where(
abs(recommendations['revenue_change_pct']) < 3, 'Keep Current',
np.where(recommendations['revenue_change_pct'] > 0, 'Increase Price', 'Decrease Price')
)
print("Pricing Recommendations:")
print(recommendations[['product_name', 'current_avg_price', 'profit_optimal_price',
'revenue_change_pct', 'action']].to_string(index=False))
recommendations.to_csv('pricing_recommendations.csv', index=False)
Excel Dashboard Instructions
Build a pricing analysis workbook that enables the pricing team to explore scenarios and make decisions.
Step-by-Step Guide
Key Insights & Recommendations
Present these findings to the executive team with clear revenue impact estimates.
Elastic vs. Inelastic Categories
Commodity products (cables, adapters) tend to be highly elastic (PED < -2), meaning price cuts boost volume significantly. Premium/branded products (laptops, headphones) are less elastic, allowing for higher margins without losing much volume.
Revenue vs. Profit Tradeoff
The revenue-maximizing price is almost always lower than the profit-maximizing price. For market-share goals, price closer to revenue-optimal. For margin goals, price closer to profit-optimal. The gap between these prices is your strategic decision space.
Competitive Positioning
Being 5-10% above the lowest competitor reduces volume by only 8-12% for most products, but maintains healthy margins. Being the cheapest is rarely the most profitable strategy unless you have a cost advantage.
Dynamic Pricing Opportunity
Products with high elasticity are ideal for dynamic pricing: lower prices during slow periods to stimulate demand, raise prices during peak demand (holiday season) when customers are less price-sensitive.
Bundle Pricing
For products with inelastic demand, bundling with elastic products creates value perception. Example: pair a premium laptop (inelastic) with a discounted accessory kit (elastic). The bundle increases total revenue per customer.
A/B Testing Next Steps
The regression model provides directional guidance, but real-world validation requires A/B price testing. Start with the 3 products showing the largest gap between current and optimal prices. Run each test for at least 2 weeks with a 50/50 traffic split.
Knowledge Check Quiz
Test your understanding of price optimization and demand elasticity.
A product has a price elasticity of demand (PED) of -2.0. If you increase the price by 10%, what happens to quantity demanded?
Why does the analysis use a quadratic (polynomial degree 2) regression for the demand curve instead of a simple linear model?
What does the SQL LAG() window function do in the price elasticity query?
A product is "inelastic" (PED between -1 and 0). What pricing strategy maximizes revenue?
Why might the revenue-maximizing price and profit-maximizing price be different for the same product?