Project 6

Hospital Patient Flow Analysis

Optimize patient flow, reduce wait times, and improve bed utilization for a busy metropolitan hospital.

Project Overview

Scenario

You have been hired as a data analyst at Metro General Hospital, a 500-bed facility serving over 80,000 patients annually. The hospital administrator has raised concerns about increasing patient wait times, suboptimal bed utilization, and rising readmission rates. Emergency department overcrowding during peak hours has led to patient diversions to nearby hospitals.

Your task is to analyze patient flow data across departments, identify bottlenecks, and provide actionable recommendations to improve operational efficiency.

Objectives

  • Calculate average length of stay by department and identify outliers
  • Determine bed occupancy rates and predict capacity shortfalls
  • Identify peak admission hours and seasonal trends
  • Analyze readmission rates and their root causes
  • Build dashboards to monitor key performance indicators in real time

Dataset Description

The hospital database contains four primary tables that capture the complete patient journey from admission to discharge.

patients

Column Type Description
patient_idINT (PK)Unique patient identifier
first_nameVARCHAR(50)Patient first name
last_nameVARCHAR(50)Patient last name
date_of_birthDATEPatient date of birth
genderVARCHAR(10)M, F, or Other
insurance_typeVARCHAR(30)Private, Medicare, Medicaid, Self-pay

admissions

Column Type Description
admission_idINT (PK)Unique admission identifier
patient_idINT (FK)References patients table
department_idINT (FK)References departments table
admission_dateDATETIMEDate and time of admission
discharge_dateDATETIMEDate and time of discharge (NULL if still admitted)
admission_typeVARCHAR(20)Emergency, Elective, Transfer, Urgent
primary_diagnosisVARCHAR(100)Primary diagnosis at admission
bed_idINT (FK)Assigned bed identifier

departments

Column Type Description
department_idINT (PK)Unique department identifier
department_nameVARCHAR(50)Name of department
floor_numberINTHospital floor location
total_bedsINTTotal bed capacity for department

beds

Column Type Description
bed_idINT (PK)Unique bed identifier
department_idINT (FK)Department the bed belongs to
bed_typeVARCHAR(20)Standard, ICU, Pediatric, Maternity
statusVARCHAR(15)Available, Occupied, Maintenance

SQL Analysis

1. Average Length of Stay by Department

Calculate the average number of days patients spend in each department to identify departments with unusually long stays.

SELECT
    d.department_name,
    COUNT(a.admission_id) AS total_admissions,
    ROUND(AVG(
        DATEDIFF(a.discharge_date, a.admission_date)
    ), 1) AS avg_length_of_stay_days,
    MIN(DATEDIFF(a.discharge_date, a.admission_date)) AS min_stay,
    MAX(DATEDIFF(a.discharge_date, a.admission_date)) AS max_stay
FROM admissions a
JOIN departments d ON a.department_id = d.department_id
WHERE a.discharge_date IS NOT NULL
GROUP BY d.department_name
ORDER BY avg_length_of_stay_days DESC;

2. Bed Occupancy Rate by Department

Measure how efficiently each department uses its available beds. An occupancy rate above 85% often signals capacity strain.

SELECT
    d.department_name,
    d.total_beds,
    COUNT(CASE WHEN b.status = 'Occupied' THEN 1 END) AS occupied_beds,
    ROUND(
        COUNT(CASE WHEN b.status = 'Occupied' THEN 1 END) * 100.0
        / d.total_beds, 1
    ) AS occupancy_rate_pct
FROM departments d
JOIN beds b ON d.department_id = b.department_id
GROUP BY d.department_name, d.total_beds
ORDER BY occupancy_rate_pct DESC;

3. Peak Admission Hours

Identify the busiest hours for admissions so staffing can be aligned with demand.

SELECT
    HOUR(admission_date) AS admission_hour,
    COUNT(*) AS total_admissions,
    ROUND(COUNT(*) * 100.0 / (
        SELECT COUNT(*) FROM admissions
    ), 1) AS pct_of_total
FROM admissions
GROUP BY HOUR(admission_date)
ORDER BY total_admissions DESC;

4. Readmission Rates (within 30 days)

Find patients who were readmitted within 30 days of discharge, grouped by department, to identify quality-of-care concerns.

WITH discharge_events AS (
    SELECT
        a1.patient_id,
        a1.admission_id AS first_admission,
        a1.discharge_date,
        a1.department_id,
        a2.admission_id AS readmission_id,
        a2.admission_date AS readmission_date,
        DATEDIFF(a2.admission_date, a1.discharge_date) AS days_to_readmit
    FROM admissions a1
    JOIN admissions a2
        ON a1.patient_id = a2.patient_id
        AND a2.admission_date > a1.discharge_date
        AND DATEDIFF(a2.admission_date, a1.discharge_date) <= 30
    WHERE a1.discharge_date IS NOT NULL
)
SELECT
    d.department_name,
    COUNT(DISTINCT de.first_admission) AS discharges_with_readmit,
    COUNT(DISTINCT a_total.admission_id) AS total_discharges,
    ROUND(
        COUNT(DISTINCT de.first_admission) * 100.0
        / COUNT(DISTINCT a_total.admission_id), 1
    ) AS readmission_rate_pct
FROM departments d
JOIN admissions a_total
    ON d.department_id = a_total.department_id
    AND a_total.discharge_date IS NOT NULL
LEFT JOIN discharge_events de
    ON d.department_id = de.department_id
GROUP BY d.department_name
ORDER BY readmission_rate_pct DESC;

Python Analysis

1. Time Series of Daily Admissions

Visualize admission trends over time to spot seasonal patterns and anomalies.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

# Load admissions data
admissions = pd.read_csv('admissions.csv', parse_dates=['admission_date', 'discharge_date'])

# Daily admission counts
daily_admissions = (
    admissions
    .set_index('admission_date')
    .resample('D')
    .size()
    .rename('admissions')
)

# Plot with 7-day rolling average
fig, ax = plt.subplots(figsize=(14, 5))
ax.bar(daily_admissions.index, daily_admissions.values,
       alpha=0.3, color='steelblue', label='Daily admissions')
ax.plot(daily_admissions.rolling(7).mean(),
        color='darkblue', linewidth=2, label='7-day rolling avg')

ax.set_title('Daily Hospital Admissions', fontsize=16, fontweight='bold')
ax.set_xlabel('Date')
ax.set_ylabel('Number of Admissions')
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))
ax.legend()
plt.tight_layout()
plt.savefig('daily_admissions_trend.png', dpi=150)
plt.show()

2. Heatmap of Admissions by Hour and Day of Week

Create a heatmap showing the busiest times to help with shift scheduling.

import seaborn as sns
import numpy as np

# Extract hour and day of week
admissions['hour'] = admissions['admission_date'].dt.hour
admissions['day_of_week'] = admissions['admission_date'].dt.day_name()

# Pivot table for heatmap
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday',
             'Friday', 'Saturday', 'Sunday']
heatmap_data = (
    admissions
    .groupby(['day_of_week', 'hour'])
    .size()
    .unstack(fill_value=0)
    .reindex(day_order)
)

fig, ax = plt.subplots(figsize=(16, 6))
sns.heatmap(heatmap_data, cmap='YlOrRd', annot=True, fmt='d',
            linewidths=0.5, ax=ax, cbar_kws={'label': 'Admissions'})
ax.set_title('Admissions Heatmap: Day of Week vs Hour',
             fontsize=16, fontweight='bold')
ax.set_xlabel('Hour of Day')
ax.set_ylabel('')
plt.tight_layout()
plt.savefig('admissions_heatmap.png', dpi=150)
plt.show()

3. Sankey Diagram of Patient Flow

Visualize how patients move through the hospital from admission type to department to discharge outcome.

import plotly.graph_objects as go

# Prepare flow data: Admission Type -> Department -> Outcome
departments = pd.read_csv('departments.csv')
admissions_dept = admissions.merge(departments, on='department_id')

# Define discharge outcome based on readmission flag
admissions_dept['outcome'] = np.where(
    admissions_dept['discharge_date'].isna(), 'Still Admitted',
    'Discharged'
)

# Build Sankey node and link lists
admission_types = admissions_dept['admission_type'].unique().tolist()
dept_names = admissions_dept['department_name'].unique().tolist()
outcomes = ['Discharged', 'Still Admitted']

labels = admission_types + dept_names + outcomes
label_idx = {name: i for i, name in enumerate(labels)}

sources, targets, values = [], [], []

# Admission Type -> Department
for _, row in (admissions_dept
    .groupby(['admission_type', 'department_name'])
    .size().reset_index(name='count')).iterrows():
    sources.append(label_idx[row['admission_type']])
    targets.append(label_idx[row['department_name']])
    values.append(row['count'])

# Department -> Outcome
for _, row in (admissions_dept
    .groupby(['department_name', 'outcome'])
    .size().reset_index(name='count')).iterrows():
    sources.append(label_idx[row['department_name']])
    targets.append(label_idx[row['outcome']])
    values.append(row['count'])

fig = go.Figure(go.Sankey(
    node=dict(label=labels, pad=20, thickness=20),
    link=dict(source=sources, target=targets, value=values)
))
fig.update_layout(title_text='Patient Flow: Admission Type → Department → Outcome',
                  font_size=12, width=1000, height=600)
fig.write_html('patient_flow_sankey.html')
fig.show()

Excel Dashboard

Build an operational dashboard that hospital administrators can monitor daily.

Step-by-Step Instructions

  1. Import Data: Open Excel and go to Data > Get Data > From Text/CSV. Import admissions.csv, departments.csv, and beds.csv into separate worksheets.
  2. Create Relationships: In the Data Model, create relationships: admissions[department_id] to departments[department_id], and admissions[bed_id] to beds[bed_id].
  3. Build KPI Cards: In a new "Dashboard" sheet, create summary cells:
    • Total Admissions: =COUNTA(admissions[admission_id])
    • Avg Length of Stay: =AVERAGE(admissions[discharge_date]-admissions[admission_date])
    • Current Occupancy: =COUNTIF(beds[status],"Occupied")/COUNTA(beds[bed_id]) (format as %)
    • Beds Available: =COUNTIF(beds[status],"Available")
  4. Department Comparison Chart: Create a PivotTable grouping admissions by department_name. Add a Clustered Bar Chart showing total admissions and average length of stay side by side.
  5. Hourly Admissions Chart: Extract the hour from admission_date using =HOUR([@admission_date]) in a helper column. Create a PivotChart (Line Chart) showing admission count by hour.
  6. Bed Utilization Gauge: Insert a Doughnut Chart where one slice represents occupied beds and the other represents available beds. Format it to look like a gauge by hiding the bottom half.
  7. Add Slicers: Insert slicers for Department, Admission Type, and Date Range. Connect all slicers to every PivotTable on the dashboard for interactive filtering.
  8. Conditional Formatting: Apply color scales to the occupancy rate column: green for below 75%, yellow for 75-85%, red for above 85%.

Key Insights

Peak Hours

Admissions spike between 10 AM and 2 PM on weekdays. The ED sees a second peak from 6 PM to 10 PM. Staff scheduling should reflect these dual peaks.

Bed Bottleneck

The ICU consistently runs above 90% occupancy, causing ED patients to wait 4+ hours for a bed. Adding 8-10 step-down beds could relieve pressure.

Length of Stay

Cardiology has the longest average stay at 6.8 days, driven by post-surgical recovery. Implementing a fast-track discharge protocol could reduce this by 1-2 days.

Readmission Drivers

Heart failure and COPD patients account for 42% of 30-day readmissions. A post-discharge follow-up program targeting these diagnoses could significantly reduce rates.

Weekend Discharges

Discharge volume drops 60% on weekends, creating a Monday morning backlog. Staffing discharge coordinators on Saturdays could smooth the flow.

Seasonal Patterns

Winter months (Dec-Feb) show a 25% increase in admissions due to respiratory illnesses. Temporary surge capacity should be planned accordingly.

Quiz

Question 1

What bed occupancy rate is generally considered the threshold for capacity strain in hospitals?

Question 2

In the SQL readmission query, what does the DATEDIFF function calculate?

Question 3

What type of visualization is most effective for showing patient movement through hospital stages?

Question 4

A readmission is typically defined as a patient returning to the hospital within how many days of discharge?

Question 5

In the Python heatmap code, what does the .resample('D') method do?

← Previous Project Next Project →