Stock Portfolio Tracker
An individual investor wants to track and analyze their stock portfolio performance across multiple holdings and time periods. You will build a complete analytics pipeline using SQL for data extraction, Python for advanced financial calculations, and Excel for dashboard visualization.
Project Overview
Business Scenario
Sarah is a retail investor who has built a diversified stock portfolio over the past three years. She holds positions across technology, healthcare, finance, and energy sectors. She needs a systematic way to track her portfolio's total value over time, understand her sector allocation, and measure performance against the S&P 500 benchmark. Currently she uses scattered spreadsheets and brokerage statements, making it difficult to get a unified view.
- Goal: Build a portfolio analytics system that tracks value, measures risk-adjusted returns, and optimizes allocation
- Tools: SQL (data extraction), Python (financial analysis), Excel (dashboard)
- Stakeholder: Individual investor wanting to make data-driven investment decisions
- Deliverables: Portfolio value timeline, sector breakdown, Sharpe ratio report, optimization recommendations
Dataset
Database Schema
The portfolio database consists of four related tables capturing stock prices, holdings, transactions, and benchmark data.
stock_prices
| Column | Type | Description |
|---|---|---|
| price_id | INT (PK) | Unique price record identifier |
| ticker | VARCHAR(10) | Stock ticker symbol (e.g., AAPL, MSFT) |
| trade_date | DATE | Trading date |
| open_price | DECIMAL(10,2) | Opening price |
| close_price | DECIMAL(10,2) | Closing price |
| volume | BIGINT | Daily trading volume |
portfolio_holdings
| Column | Type | Description |
|---|---|---|
| holding_id | INT (PK) | Unique holding identifier |
| ticker | VARCHAR(10) | Stock ticker symbol |
| shares | DECIMAL(10,4) | Number of shares held |
| avg_cost_basis | DECIMAL(10,2) | Average purchase price per share |
| sector | VARCHAR(50) | Industry sector (Technology, Healthcare, etc.) |
transactions
| Column | Type | Description |
|---|---|---|
| txn_id | INT (PK) | Transaction identifier |
| ticker | VARCHAR(10) | Stock ticker symbol |
| txn_type | VARCHAR(4) | BUY or SELL |
| txn_date | DATE | Transaction date |
| shares | DECIMAL(10,4) | Number of shares traded |
| price_per_share | DECIMAL(10,2) | Execution price per share |
benchmarks
| Column | Type | Description |
|---|---|---|
| bench_id | INT (PK) | Benchmark record identifier |
| bench_name | VARCHAR(20) | Benchmark name (e.g., SP500) |
| trade_date | DATE | Date |
| close_value | DECIMAL(12,2) | Benchmark closing value |
SQL Analysis
Query 1: Portfolio Value Over Time
Calculate the total portfolio market value for each trading day by joining holdings with daily closing prices.
SELECT
sp.trade_date,
SUM(ph.shares * sp.close_price) AS portfolio_value
FROM portfolio_holdings ph
JOIN stock_prices sp
ON ph.ticker = sp.ticker
WHERE sp.trade_date >= '2023-01-01'
GROUP BY sp.trade_date
ORDER BY sp.trade_date;
Query 2: Sector Allocation
Break down the portfolio by sector to see how capital is distributed. This uses the most recent closing price for each stock.
WITH latest_prices AS (
SELECT ticker,
close_price,
ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY trade_date DESC) AS rn
FROM stock_prices
)
SELECT
ph.sector,
COUNT(DISTINCT ph.ticker) AS num_stocks,
SUM(ph.shares * lp.close_price) AS sector_value,
ROUND(SUM(ph.shares * lp.close_price) * 100.0
/ SUM(SUM(ph.shares * lp.close_price)) OVER (), 2) AS pct_of_portfolio
FROM portfolio_holdings ph
JOIN latest_prices lp
ON ph.ticker = lp.ticker AND lp.rn = 1
GROUP BY ph.sector
ORDER BY sector_value DESC;
Query 3: Realized vs Unrealized Gains
Separate closed-position profits (realized) from open-position paper gains (unrealized).
-- Realized gains from completed SELL transactions
SELECT
t.ticker,
SUM(t.shares * (t.price_per_share - ph.avg_cost_basis)) AS realized_gain
FROM transactions t
JOIN portfolio_holdings ph ON t.ticker = ph.ticker
WHERE t.txn_type = 'SELL'
GROUP BY t.ticker
UNION ALL
-- Unrealized gains on current holdings
SELECT
ph.ticker,
ph.shares * (lp.close_price - ph.avg_cost_basis) AS unrealized_gain
FROM portfolio_holdings ph
JOIN (
SELECT ticker, close_price,
ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY trade_date DESC) AS rn
FROM stock_prices
) lp ON ph.ticker = lp.ticker AND lp.rn = 1
ORDER BY ticker;
Python Analysis
Sharpe Ratio Calculation
The Sharpe ratio measures risk-adjusted return. A ratio above 1.0 is generally considered good; above 2.0 is very good.
import pandas as pd
import numpy as np
# Load portfolio daily values from SQL output
portfolio = pd.read_csv('portfolio_daily_values.csv', parse_dates=['trade_date'])
portfolio = portfolio.sort_values('trade_date')
# Calculate daily returns
portfolio['daily_return'] = portfolio['portfolio_value'].pct_change()
# Annualized Sharpe Ratio (assuming 252 trading days, risk-free rate = 4.5%)
risk_free_rate = 0.045
daily_rf = risk_free_rate / 252
excess_returns = portfolio['daily_return'] - daily_rf
sharpe_ratio = (excess_returns.mean() / excess_returns.std()) * np.sqrt(252)
print(f"Annualized Sharpe Ratio: {sharpe_ratio:.2f}")
Cumulative Returns Chart
Visualize portfolio performance against the S&P 500 benchmark over the analysis period.
import matplotlib.pyplot as plt
benchmark = pd.read_csv('benchmark_daily.csv', parse_dates=['trade_date'])
benchmark = benchmark.sort_values('trade_date')
# Normalize both to start at 100
portfolio['cum_return'] = (1 + portfolio['daily_return']).cumprod()
benchmark['daily_return'] = benchmark['close_value'].pct_change()
benchmark['cum_return'] = (1 + benchmark['daily_return']).cumprod()
plt.figure(figsize=(12, 6))
plt.plot(portfolio['trade_date'], portfolio['cum_return'], label='My Portfolio', linewidth=2)
plt.plot(benchmark['trade_date'], benchmark['cum_return'], label='S&P 500', linewidth=2, linestyle='--')
plt.title('Portfolio vs S&P 500 - Cumulative Returns')
plt.xlabel('Date')
plt.ylabel('Growth of $1')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('cumulative_returns.png', dpi=150)
plt.show()
Correlation Matrix
Understand how your holdings move relative to each other. Low correlation between stocks means better diversification.
import seaborn as sns
# Pivot stock prices into a wide table of daily returns
prices = pd.read_csv('stock_prices.csv', parse_dates=['trade_date'])
pivot = prices.pivot(index='trade_date', columns='ticker', values='close_price')
returns = pivot.pct_change().dropna()
corr_matrix = returns.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, fmt='.2f', cmap='RdYlGn_r',
center=0, vmin=-1, vmax=1, square=True)
plt.title('Stock Correlation Matrix')
plt.tight_layout()
plt.savefig('correlation_matrix.png', dpi=150)
plt.show()
Mean-Variance Portfolio Optimization
Use Modern Portfolio Theory to find the allocation that maximizes the Sharpe ratio for a given set of stocks.
from scipy.optimize import minimize
mean_returns = returns.mean() * 252
cov_matrix = returns.cov() * 252
num_assets = len(returns.columns)
def neg_sharpe(weights):
port_return = np.dot(weights, mean_returns)
port_vol = np.sqrt(np.dot(weights.T, np.dot(cov_matrix, weights)))
return -(port_return - risk_free_rate) / port_vol
constraints = {'type': 'eq', 'fun': lambda w: np.sum(w) - 1}
bounds = tuple((0.0, 0.4) for _ in range(num_assets))
initial = np.array([1 / num_assets] * num_assets)
result = minimize(neg_sharpe, initial, method='SLSQP',
bounds=bounds, constraints=constraints)
optimal_weights = result.x
for ticker, weight in zip(returns.columns, optimal_weights):
print(f"{ticker}: {weight:.1%}")
Excel Analysis
=SUMPRODUCT(shares, latest_price) for current value and =(current - cost) / cost for return.=(B3-B2)/B2. Then add columns for cumulative return using =PRODUCT(1+C$2:C3)-1. Calculate annualized return with =(1+total_return)^(252/trading_days)-1.Key Insights
Diversification Matters
The correlation matrix reveals which holdings move together. A well-diversified portfolio should have low average pairwise correlation (below 0.5), reducing overall volatility without sacrificing returns.
Risk-Adjusted Returns
Raw returns can be misleading. The Sharpe ratio normalizes returns by volatility so you can compare a high-flying tech portfolio to a steady dividend portfolio on equal footing.
Sector Concentration Risk
If one sector exceeds 40% of portfolio value, a sector-wide downturn could have outsized impact. The sector allocation query helps identify this imbalance before it becomes a problem.
Tax-Aware Decisions
Separating realized from unrealized gains helps with tax planning. You can harvest losses to offset gains, and timing sales to hit long-term capital gains thresholds saves real money.
Quiz
What does the Sharpe ratio measure?
In the sector allocation SQL query, what does the window function SUM() OVER() calculate?
Why is low correlation between portfolio holdings desirable?
What is the difference between realized and unrealized gains?
In the portfolio optimization code, what does the constraint np.sum(w) - 1 == 0 enforce?