Project 10

Clinical Trial Dashboard

Monitor a multi-site Phase III clinical trial for a pharmaceutical company, tracking enrollment, visit completion, adverse events, and participant outcomes.

Project Overview

Scenario

You are a data analyst at NovaPharma Inc., a mid-size pharmaceutical company conducting a Phase III randomized controlled trial (RCT) for a new cardiovascular drug called CardioVex. The trial spans 8 clinical sites across 4 states, with a target enrollment of 1,200 participants over 18 months. Participants are randomized into treatment and placebo groups and followed for 12 months with scheduled visits at weeks 0, 4, 12, 24, 36, and 52.

The clinical operations team needs real-time visibility into enrollment progress, protocol adherence, safety signals, and dropout rates to ensure the trial stays on track and meets FDA submission timelines.

Objectives

  • Track cumulative enrollment by site against monthly targets
  • Calculate visit completion rates to measure protocol adherence
  • Monitor adverse events by type, severity, and treatment group
  • Analyze dropout rates and reasons for discontinuation
  • Visualize Kaplan-Meier survival curves for primary endpoint analysis

Dataset Description

The clinical trial management system (CTMS) stores data across five tables covering participant demographics, site information, visit records, safety data, and efficacy outcomes.

participants

Column Type Description
participant_idINT (PK)Unique participant identifier
site_idINT (FK)Enrolling clinical site
enrollment_dateDATEDate participant was enrolled
ageINTParticipant age at enrollment
genderVARCHAR(10)M or F
treatment_groupVARCHAR(15)Treatment or Placebo
statusVARCHAR(20)Active, Completed, Withdrawn, Lost to Follow-up
withdrawal_dateDATEDate of withdrawal (NULL if still active/completed)
withdrawal_reasonVARCHAR(50)Reason for withdrawal, if applicable

sites

Column Type Description
site_idINT (PK)Unique site identifier
site_nameVARCHAR(100)Clinical site name
cityVARCHAR(50)Site city
stateVARCHAR(2)Site state abbreviation
principal_investigatorVARCHAR(100)Lead investigator name
enrollment_targetINTTarget number of participants for this site
activation_dateDATEDate site was activated for enrollment

visits

Column Type Description
visit_idINT (PK)Unique visit record
participant_idINT (FK)References participants table
visit_nameVARCHAR(30)Baseline, Week 4, Week 12, Week 24, Week 36, Week 52
scheduled_dateDATEPlanned visit date
actual_dateDATEActual visit date (NULL if missed)
visit_statusVARCHAR(15)Completed, Missed, Scheduled, Cancelled

adverse_events

Column Type Description
ae_idINT (PK)Unique adverse event identifier
participant_idINT (FK)References participants table
ae_termVARCHAR(100)Adverse event description (MedDRA preferred term)
severityVARCHAR(10)Mild, Moderate, Severe
seriousBOOLEANWhether this is a Serious Adverse Event (SAE)
onset_dateDATEDate the event started
resolution_dateDATEDate the event resolved (NULL if ongoing)
relationshipVARCHAR(20)Related, Possibly Related, Unrelated

outcomes

Column Type Description
outcome_idINT (PK)Unique outcome record
participant_idINT (FK)References participants table
event_typeVARCHAR(30)Primary endpoint event (MI, Stroke, CV Death, None)
event_dateDATEDate the event occurred (NULL if no event)
days_to_eventINTDays from enrollment to event or last follow-up
censoredBOOLEANTRUE if no event occurred (censored observation)

SQL Analysis

1. Enrollment by Site vs. Target

Track how each site is performing against its enrollment target to identify underperforming sites that need support.

SELECT
    s.site_name,
    s.city,
    s.state,
    s.principal_investigator,
    s.enrollment_target,
    COUNT(p.participant_id) AS enrolled,
    ROUND(
        COUNT(p.participant_id) * 100.0 / s.enrollment_target, 1
    ) AS pct_of_target,
    s.enrollment_target - COUNT(p.participant_id) AS remaining_to_enroll,
    ROUND(
        COUNT(p.participant_id) * 1.0
        / GREATEST(DATEDIFF(CURDATE(), s.activation_date) / 30.0, 1), 1
    ) AS enrollments_per_month
FROM sites s
LEFT JOIN participants p ON s.site_id = p.site_id
GROUP BY s.site_id, s.site_name, s.city, s.state,
         s.principal_investigator, s.enrollment_target, s.activation_date
ORDER BY pct_of_target DESC;

2. Visit Completion Rates

Measure protocol adherence by calculating the completion rate for each scheduled visit across all sites.

SELECT
    v.visit_name,
    COUNT(*) AS total_scheduled,
    SUM(CASE WHEN v.visit_status = 'Completed' THEN 1 ELSE 0 END) AS completed,
    SUM(CASE WHEN v.visit_status = 'Missed' THEN 1 ELSE 0 END) AS missed,
    SUM(CASE WHEN v.visit_status = 'Cancelled' THEN 1 ELSE 0 END) AS cancelled,
    ROUND(
        SUM(CASE WHEN v.visit_status = 'Completed' THEN 1 ELSE 0 END) * 100.0
        / COUNT(*), 1
    ) AS completion_rate_pct
FROM visits v
WHERE v.scheduled_date <= CURDATE()
GROUP BY v.visit_name
ORDER BY FIELD(v.visit_name, 'Baseline', 'Week 4', 'Week 12',
              'Week 24', 'Week 36', 'Week 52');

3. Adverse Events by Treatment Group

Compare adverse event rates between treatment and placebo groups to detect potential safety signals.

SELECT
    p.treatment_group,
    COUNT(DISTINCT p.participant_id) AS group_size,
    COUNT(ae.ae_id) AS total_aes,
    ROUND(COUNT(ae.ae_id) * 1.0 / COUNT(DISTINCT p.participant_id), 2) AS aes_per_participant,
    SUM(CASE WHEN ae.serious = TRUE THEN 1 ELSE 0 END) AS serious_aes,
    SUM(CASE WHEN ae.relationship IN ('Related', 'Possibly Related') THEN 1 ELSE 0 END) AS related_aes,
    COUNT(DISTINCT ae.ae_term) AS unique_ae_types
FROM participants p
LEFT JOIN adverse_events ae ON p.participant_id = ae.participant_id
GROUP BY p.treatment_group;

4. Dropout Analysis

Analyze withdrawal rates and reasons to understand trial retention challenges.

SELECT
    p.treatment_group,
    p.withdrawal_reason,
    COUNT(*) AS withdrawals,
    ROUND(
        COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY p.treatment_group), 1
    ) AS pct_of_group_withdrawals,
    ROUND(AVG(DATEDIFF(p.withdrawal_date, p.enrollment_date)), 0) AS avg_days_on_study
FROM participants p
WHERE p.status = 'Withdrawn'
GROUP BY p.treatment_group, p.withdrawal_reason
ORDER BY p.treatment_group, withdrawals DESC;

Python Analysis

1. Enrollment Timeline with Target Overlay

Plot cumulative enrollment over time against the target trajectory to visualize enrollment pace.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np

# Load data
participants = pd.read_csv('participants.csv', parse_dates=['enrollment_date', 'withdrawal_date'])
sites = pd.read_csv('sites.csv', parse_dates=['activation_date'])

# Cumulative enrollment over time
enrollment_timeline = (
    participants
    .groupby('enrollment_date')
    .size()
    .cumsum()
    .reset_index(name='cumulative_enrolled')
)

# Target line: linear ramp to 1200 over 18 months
target_start = participants['enrollment_date'].min()
target_end = target_start + pd.DateOffset(months=18)
target_dates = pd.date_range(target_start, target_end, freq='D')
target_values = np.linspace(0, 1200, len(target_dates))

fig, ax = plt.subplots(figsize=(14, 6))

# Actual enrollment
ax.plot(enrollment_timeline['enrollment_date'],
        enrollment_timeline['cumulative_enrolled'],
        color='#2196F3', linewidth=2.5, label='Actual Enrollment')

# Target line
ax.plot(target_dates, target_values, color='gray',
        linestyle='--', linewidth=1.5, label='Target (1,200 in 18 months)')

# Milestone markers
milestones = [300, 600, 900, 1200]
for m in milestones:
    ax.axhline(y=m, color='lightgray', linestyle=':', alpha=0.5)
    ax.text(enrollment_timeline['enrollment_date'].min(), m + 15,
            f'{m} participants', fontsize=9, color='gray')

ax.set_title('Cumulative Trial Enrollment vs. Target',
             fontsize=16, fontweight='bold')
ax.set_xlabel('Date')
ax.set_ylabel('Participants Enrolled')
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))
ax.legend(fontsize=12)
ax.grid(True, alpha=0.2)
plt.tight_layout()
plt.savefig('enrollment_timeline.png', dpi=150)
plt.show()

2. Kaplan-Meier Survival Curves

Estimate event-free survival probability for treatment vs. placebo groups using the Kaplan-Meier method.

from lifelines import KaplanMeierFitter
from lifelines.statistics import logrank_test

# Load outcomes
outcomes = pd.read_csv('outcomes.csv')
data = participants.merge(outcomes, on='participant_id')

# Prepare survival data
data['event_observed'] = ~data['censored']  # 1 = event, 0 = censored

# Fit KM curves by treatment group
kmf_treatment = KaplanMeierFitter()
kmf_placebo = KaplanMeierFitter()

treatment_mask = data['treatment_group'] == 'Treatment'
placebo_mask = data['treatment_group'] == 'Placebo'

kmf_treatment.fit(
    data.loc[treatment_mask, 'days_to_event'],
    data.loc[treatment_mask, 'event_observed'],
    label='CardioVex (Treatment)'
)

kmf_placebo.fit(
    data.loc[placebo_mask, 'days_to_event'],
    data.loc[placebo_mask, 'event_observed'],
    label='Placebo'
)

# Log-rank test
lr_result = logrank_test(
    data.loc[treatment_mask, 'days_to_event'],
    data.loc[placebo_mask, 'days_to_event'],
    data.loc[treatment_mask, 'event_observed'],
    data.loc[placebo_mask, 'event_observed']
)

fig, ax = plt.subplots(figsize=(10, 7))
kmf_treatment.plot_survival_function(ax=ax, color='#2196F3', linewidth=2)
kmf_placebo.plot_survival_function(ax=ax, color='#F44336', linewidth=2)

ax.set_title('Kaplan-Meier: Event-Free Survival by Treatment Group',
             fontsize=15, fontweight='bold')
ax.set_xlabel('Days Since Enrollment', fontsize=12)
ax.set_ylabel('Event-Free Survival Probability', fontsize=12)
ax.set_ylim(0, 1.05)

# Add log-rank p-value
ax.text(0.05, 0.05, f'Log-rank p-value: {lr_result.p_value:.4f}',
        transform=ax.transAxes, fontsize=12,
        bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))

ax.legend(fontsize=12, loc='lower left')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('kaplan_meier.png', dpi=150)
plt.show()

3. Adverse Event Visualization

Create a grouped bar chart comparing the most common adverse events between treatment groups.

import seaborn as sns

# Load adverse events
adverse_events = pd.read_csv('adverse_events.csv', parse_dates=['onset_date'])
ae_data = adverse_events.merge(
    participants[['participant_id', 'treatment_group']],
    on='participant_id'
)

# Top 10 most common AEs
top_aes = ae_data['ae_term'].value_counts().head(10).index

ae_comparison = (
    ae_data[ae_data['ae_term'].isin(top_aes)]
    .groupby(['ae_term', 'treatment_group'])
    .size()
    .reset_index(name='count')
)

# Normalize by group size
group_sizes = participants['treatment_group'].value_counts()
ae_comparison['rate_per_100'] = ae_comparison.apply(
    lambda row: row['count'] * 100 / group_sizes[row['treatment_group']],
    axis=1
)

fig, ax = plt.subplots(figsize=(12, 7))
pivot_data = ae_comparison.pivot(
    index='ae_term', columns='treatment_group', values='rate_per_100'
).reindex(top_aes)

pivot_data.plot(kind='barh', ax=ax, color=['#2196F3', '#F44336'],
                edgecolor='none', width=0.7)

ax.set_title('Top 10 Adverse Events: Treatment vs. Placebo',
             fontsize=15, fontweight='bold')
ax.set_xlabel('Rate per 100 Participants', fontsize=12)
ax.set_ylabel('')
ax.legend(title='Group', fontsize=11)
ax.invert_yaxis()
ax.grid(True, axis='x', alpha=0.3)
plt.tight_layout()
plt.savefig('adverse_events_comparison.png', dpi=150)
plt.show()

Excel Dashboard

Build a trial progress dashboard for weekly sponsor review meetings.

Step-by-Step Instructions

  1. Import Data: Go to Data > Get Data > From Text/CSV. Import participants.csv, sites.csv, visits.csv, and adverse_events.csv into separate worksheets.
  2. Site Enrollment Scorecard: Create a PivotTable from participants data with site_id in Rows and COUNT of participant_id in Values. Add a helper column with enrollment_target from the sites table using =VLOOKUP([@site_id], sites, 6, FALSE). Create a Clustered Bar Chart comparing enrolled vs. target for each site.
  3. Enrollment Timeline Chart: Add a helper column for cumulative enrollment using =COUNTIFS(participants[enrollment_date], "<="&A2) sorted by date. Insert a Line Chart showing the cumulative curve. Add a straight diagonal line from 0 to 1200 as the target reference.
  4. Visit Completion Matrix: Create a PivotTable with visit_name in Rows and visit_status in Columns. Show COUNT of visit_id in Values. Add a calculated field for completion rate: =Completed/(Completed+Missed+Cancelled). Apply conditional formatting (green > 90%, yellow 80-90%, red < 80%).
  5. Adverse Event Summary: Create a PivotTable filtering by treatment_group. Show top AE terms with counts. Add a second PivotTable for SAEs only (filter serious = TRUE). Highlight any SAEs in the treatment group using red conditional formatting.
  6. Milestone Tracker: Create a table with key milestones (25% enrolled, 50% enrolled, first patient Week 52, etc.). Add columns for Target Date, Actual Date, and Status. Use conditional formatting with icons (green check, yellow clock, red X).
  7. KPI Summary Row: At the top of the dashboard, display:
    • Total Enrolled: =COUNTA(participants[participant_id])
    • Active Participants: =COUNTIF(participants[status],"Active")
    • Dropout Rate: =COUNTIF(participants[status],"Withdrawn")/COUNTA(participants[participant_id])
    • Total SAEs: =COUNTIF(adverse_events[serious],TRUE)
  8. Add Slicers: Insert slicers for Site, Treatment Group, and Status. Connect all slicers to every PivotTable. Add a Timeline slicer connected to enrollment_date for time-based filtering.

Key Insights

Enrollment Pace

Overall enrollment is at 78% of target after 14 months. Two sites are below 50% of their target and need recruitment strategy adjustments or additional site activation.

Visit Adherence

Baseline and Week 4 visits have 97% completion, but Week 36 and Week 52 drop to 82%. Implementing reminder calls 72 hours before visits could improve retention.

Safety Profile

The treatment group shows a 12% higher rate of mild headaches but no significant difference in serious adverse events (3.2% vs. 2.8%). The safety profile appears acceptable.

Dropout Patterns

The overall dropout rate is 14.2%. "Lost to follow-up" is the most common reason (38%), followed by "adverse event" (22%) and "withdrew consent" (18%).

Efficacy Signal

The Kaplan-Meier analysis shows a separation between treatment and placebo curves after Day 120, with the treatment group showing 23% fewer primary endpoint events (p=0.03).

Site Performance

The top-performing site (Boston Medical) enrolls 12 participants/month vs. the average of 7. Best practices from this site should be shared across the network.

Quiz

Question 1

In a Kaplan-Meier analysis, what does "censoring" mean?

Question 2

What does the log-rank test assess in the context of this clinical trial?

Question 3

What is a Serious Adverse Event (SAE) in a clinical trial?

Question 4

In the enrollment SQL query, what does the GREATEST() function do?

Question 5

Why is it important to normalize adverse event counts by group size when comparing treatment vs. placebo?

← Previous Project Next Project →