Hospital Patient Flow Analysis
Optimize patient flow, reduce wait times, and improve bed utilization for a busy metropolitan hospital.
Project Overview
Scenario
You have been hired as a data analyst at Metro General Hospital, a 500-bed facility serving over 80,000 patients annually. The hospital administrator has raised concerns about increasing patient wait times, suboptimal bed utilization, and rising readmission rates. Emergency department overcrowding during peak hours has led to patient diversions to nearby hospitals.
Your task is to analyze patient flow data across departments, identify bottlenecks, and provide actionable recommendations to improve operational efficiency.
Objectives
- Calculate average length of stay by department and identify outliers
- Determine bed occupancy rates and predict capacity shortfalls
- Identify peak admission hours and seasonal trends
- Analyze readmission rates and their root causes
- Build dashboards to monitor key performance indicators in real time
Dataset Description
The hospital database contains four primary tables that capture the complete patient journey from admission to discharge.
patients
| Column | Type | Description |
|---|---|---|
| patient_id | INT (PK) | Unique patient identifier |
| first_name | VARCHAR(50) | Patient first name |
| last_name | VARCHAR(50) | Patient last name |
| date_of_birth | DATE | Patient date of birth |
| gender | VARCHAR(10) | M, F, or Other |
| insurance_type | VARCHAR(30) | Private, Medicare, Medicaid, Self-pay |
admissions
| Column | Type | Description |
|---|---|---|
| admission_id | INT (PK) | Unique admission identifier |
| patient_id | INT (FK) | References patients table |
| department_id | INT (FK) | References departments table |
| admission_date | DATETIME | Date and time of admission |
| discharge_date | DATETIME | Date and time of discharge (NULL if still admitted) |
| admission_type | VARCHAR(20) | Emergency, Elective, Transfer, Urgent |
| primary_diagnosis | VARCHAR(100) | Primary diagnosis at admission |
| bed_id | INT (FK) | Assigned bed identifier |
departments
| Column | Type | Description |
|---|---|---|
| department_id | INT (PK) | Unique department identifier |
| department_name | VARCHAR(50) | Name of department |
| floor_number | INT | Hospital floor location |
| total_beds | INT | Total bed capacity for department |
beds
| Column | Type | Description |
|---|---|---|
| bed_id | INT (PK) | Unique bed identifier |
| department_id | INT (FK) | Department the bed belongs to |
| bed_type | VARCHAR(20) | Standard, ICU, Pediatric, Maternity |
| status | VARCHAR(15) | Available, Occupied, Maintenance |
SQL Analysis
1. Average Length of Stay by Department
Calculate the average number of days patients spend in each department to identify departments with unusually long stays.
SELECT
d.department_name,
COUNT(a.admission_id) AS total_admissions,
ROUND(AVG(
DATEDIFF(a.discharge_date, a.admission_date)
), 1) AS avg_length_of_stay_days,
MIN(DATEDIFF(a.discharge_date, a.admission_date)) AS min_stay,
MAX(DATEDIFF(a.discharge_date, a.admission_date)) AS max_stay
FROM admissions a
JOIN departments d ON a.department_id = d.department_id
WHERE a.discharge_date IS NOT NULL
GROUP BY d.department_name
ORDER BY avg_length_of_stay_days DESC;
2. Bed Occupancy Rate by Department
Measure how efficiently each department uses its available beds. An occupancy rate above 85% often signals capacity strain.
SELECT
d.department_name,
d.total_beds,
COUNT(CASE WHEN b.status = 'Occupied' THEN 1 END) AS occupied_beds,
ROUND(
COUNT(CASE WHEN b.status = 'Occupied' THEN 1 END) * 100.0
/ d.total_beds, 1
) AS occupancy_rate_pct
FROM departments d
JOIN beds b ON d.department_id = b.department_id
GROUP BY d.department_name, d.total_beds
ORDER BY occupancy_rate_pct DESC;
3. Peak Admission Hours
Identify the busiest hours for admissions so staffing can be aligned with demand.
SELECT
HOUR(admission_date) AS admission_hour,
COUNT(*) AS total_admissions,
ROUND(COUNT(*) * 100.0 / (
SELECT COUNT(*) FROM admissions
), 1) AS pct_of_total
FROM admissions
GROUP BY HOUR(admission_date)
ORDER BY total_admissions DESC;
4. Readmission Rates (within 30 days)
Find patients who were readmitted within 30 days of discharge, grouped by department, to identify quality-of-care concerns.
WITH discharge_events AS (
SELECT
a1.patient_id,
a1.admission_id AS first_admission,
a1.discharge_date,
a1.department_id,
a2.admission_id AS readmission_id,
a2.admission_date AS readmission_date,
DATEDIFF(a2.admission_date, a1.discharge_date) AS days_to_readmit
FROM admissions a1
JOIN admissions a2
ON a1.patient_id = a2.patient_id
AND a2.admission_date > a1.discharge_date
AND DATEDIFF(a2.admission_date, a1.discharge_date) <= 30
WHERE a1.discharge_date IS NOT NULL
)
SELECT
d.department_name,
COUNT(DISTINCT de.first_admission) AS discharges_with_readmit,
COUNT(DISTINCT a_total.admission_id) AS total_discharges,
ROUND(
COUNT(DISTINCT de.first_admission) * 100.0
/ COUNT(DISTINCT a_total.admission_id), 1
) AS readmission_rate_pct
FROM departments d
JOIN admissions a_total
ON d.department_id = a_total.department_id
AND a_total.discharge_date IS NOT NULL
LEFT JOIN discharge_events de
ON d.department_id = de.department_id
GROUP BY d.department_name
ORDER BY readmission_rate_pct DESC;
Python Analysis
1. Time Series of Daily Admissions
Visualize admission trends over time to spot seasonal patterns and anomalies.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# Load admissions data
admissions = pd.read_csv('admissions.csv', parse_dates=['admission_date', 'discharge_date'])
# Daily admission counts
daily_admissions = (
admissions
.set_index('admission_date')
.resample('D')
.size()
.rename('admissions')
)
# Plot with 7-day rolling average
fig, ax = plt.subplots(figsize=(14, 5))
ax.bar(daily_admissions.index, daily_admissions.values,
alpha=0.3, color='steelblue', label='Daily admissions')
ax.plot(daily_admissions.rolling(7).mean(),
color='darkblue', linewidth=2, label='7-day rolling avg')
ax.set_title('Daily Hospital Admissions', fontsize=16, fontweight='bold')
ax.set_xlabel('Date')
ax.set_ylabel('Number of Admissions')
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))
ax.legend()
plt.tight_layout()
plt.savefig('daily_admissions_trend.png', dpi=150)
plt.show()
2. Heatmap of Admissions by Hour and Day of Week
Create a heatmap showing the busiest times to help with shift scheduling.
import seaborn as sns
import numpy as np
# Extract hour and day of week
admissions['hour'] = admissions['admission_date'].dt.hour
admissions['day_of_week'] = admissions['admission_date'].dt.day_name()
# Pivot table for heatmap
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday',
'Friday', 'Saturday', 'Sunday']
heatmap_data = (
admissions
.groupby(['day_of_week', 'hour'])
.size()
.unstack(fill_value=0)
.reindex(day_order)
)
fig, ax = plt.subplots(figsize=(16, 6))
sns.heatmap(heatmap_data, cmap='YlOrRd', annot=True, fmt='d',
linewidths=0.5, ax=ax, cbar_kws={'label': 'Admissions'})
ax.set_title('Admissions Heatmap: Day of Week vs Hour',
fontsize=16, fontweight='bold')
ax.set_xlabel('Hour of Day')
ax.set_ylabel('')
plt.tight_layout()
plt.savefig('admissions_heatmap.png', dpi=150)
plt.show()
3. Sankey Diagram of Patient Flow
Visualize how patients move through the hospital from admission type to department to discharge outcome.
import plotly.graph_objects as go
# Prepare flow data: Admission Type -> Department -> Outcome
departments = pd.read_csv('departments.csv')
admissions_dept = admissions.merge(departments, on='department_id')
# Define discharge outcome based on readmission flag
admissions_dept['outcome'] = np.where(
admissions_dept['discharge_date'].isna(), 'Still Admitted',
'Discharged'
)
# Build Sankey node and link lists
admission_types = admissions_dept['admission_type'].unique().tolist()
dept_names = admissions_dept['department_name'].unique().tolist()
outcomes = ['Discharged', 'Still Admitted']
labels = admission_types + dept_names + outcomes
label_idx = {name: i for i, name in enumerate(labels)}
sources, targets, values = [], [], []
# Admission Type -> Department
for _, row in (admissions_dept
.groupby(['admission_type', 'department_name'])
.size().reset_index(name='count')).iterrows():
sources.append(label_idx[row['admission_type']])
targets.append(label_idx[row['department_name']])
values.append(row['count'])
# Department -> Outcome
for _, row in (admissions_dept
.groupby(['department_name', 'outcome'])
.size().reset_index(name='count')).iterrows():
sources.append(label_idx[row['department_name']])
targets.append(label_idx[row['outcome']])
values.append(row['count'])
fig = go.Figure(go.Sankey(
node=dict(label=labels, pad=20, thickness=20),
link=dict(source=sources, target=targets, value=values)
))
fig.update_layout(title_text='Patient Flow: Admission Type → Department → Outcome',
font_size=12, width=1000, height=600)
fig.write_html('patient_flow_sankey.html')
fig.show()
Excel Dashboard
Build an operational dashboard that hospital administrators can monitor daily.
Step-by-Step Instructions
- Import Data: Open Excel and go to Data > Get Data > From Text/CSV. Import admissions.csv, departments.csv, and beds.csv into separate worksheets.
- Create Relationships: In the Data Model, create relationships: admissions[department_id] to departments[department_id], and admissions[bed_id] to beds[bed_id].
- Build KPI Cards: In a new "Dashboard" sheet, create summary cells:
- Total Admissions:
=COUNTA(admissions[admission_id]) - Avg Length of Stay:
=AVERAGE(admissions[discharge_date]-admissions[admission_date]) - Current Occupancy:
=COUNTIF(beds[status],"Occupied")/COUNTA(beds[bed_id])(format as %) - Beds Available:
=COUNTIF(beds[status],"Available")
- Total Admissions:
- Department Comparison Chart: Create a PivotTable grouping admissions by department_name. Add a Clustered Bar Chart showing total admissions and average length of stay side by side.
- Hourly Admissions Chart: Extract the hour from admission_date using
=HOUR([@admission_date])in a helper column. Create a PivotChart (Line Chart) showing admission count by hour. - Bed Utilization Gauge: Insert a Doughnut Chart where one slice represents occupied beds and the other represents available beds. Format it to look like a gauge by hiding the bottom half.
- Add Slicers: Insert slicers for Department, Admission Type, and Date Range. Connect all slicers to every PivotTable on the dashboard for interactive filtering.
- Conditional Formatting: Apply color scales to the occupancy rate column: green for below 75%, yellow for 75-85%, red for above 85%.
Key Insights
Peak Hours
Admissions spike between 10 AM and 2 PM on weekdays. The ED sees a second peak from 6 PM to 10 PM. Staff scheduling should reflect these dual peaks.
Bed Bottleneck
The ICU consistently runs above 90% occupancy, causing ED patients to wait 4+ hours for a bed. Adding 8-10 step-down beds could relieve pressure.
Length of Stay
Cardiology has the longest average stay at 6.8 days, driven by post-surgical recovery. Implementing a fast-track discharge protocol could reduce this by 1-2 days.
Readmission Drivers
Heart failure and COPD patients account for 42% of 30-day readmissions. A post-discharge follow-up program targeting these diagnoses could significantly reduce rates.
Weekend Discharges
Discharge volume drops 60% on weekends, creating a Monday morning backlog. Staffing discharge coordinators on Saturdays could smooth the flow.
Seasonal Patterns
Winter months (Dec-Feb) show a 25% increase in admissions due to respiratory illnesses. Temporary surge capacity should be planned accordingly.
Quiz
What bed occupancy rate is generally considered the threshold for capacity strain in hospitals?
In the SQL readmission query, what does the DATEDIFF function calculate?
What type of visualization is most effective for showing patient movement through hospital stages?
A readmission is typically defined as a patient returning to the hospital within how many days of discharge?
In the Python heatmap code, what does the .resample('D') method do?