Clinical Trial Dashboard
Monitor a multi-site Phase III clinical trial for a pharmaceutical company, tracking enrollment, visit completion, adverse events, and participant outcomes.
Project Overview
Scenario
You are a data analyst at NovaPharma Inc., a mid-size pharmaceutical company conducting a Phase III randomized controlled trial (RCT) for a new cardiovascular drug called CardioVex. The trial spans 8 clinical sites across 4 states, with a target enrollment of 1,200 participants over 18 months. Participants are randomized into treatment and placebo groups and followed for 12 months with scheduled visits at weeks 0, 4, 12, 24, 36, and 52.
The clinical operations team needs real-time visibility into enrollment progress, protocol adherence, safety signals, and dropout rates to ensure the trial stays on track and meets FDA submission timelines.
Objectives
- Track cumulative enrollment by site against monthly targets
- Calculate visit completion rates to measure protocol adherence
- Monitor adverse events by type, severity, and treatment group
- Analyze dropout rates and reasons for discontinuation
- Visualize Kaplan-Meier survival curves for primary endpoint analysis
Dataset Description
The clinical trial management system (CTMS) stores data across five tables covering participant demographics, site information, visit records, safety data, and efficacy outcomes.
participants
| Column | Type | Description |
|---|---|---|
| participant_id | INT (PK) | Unique participant identifier |
| site_id | INT (FK) | Enrolling clinical site |
| enrollment_date | DATE | Date participant was enrolled |
| age | INT | Participant age at enrollment |
| gender | VARCHAR(10) | M or F |
| treatment_group | VARCHAR(15) | Treatment or Placebo |
| status | VARCHAR(20) | Active, Completed, Withdrawn, Lost to Follow-up |
| withdrawal_date | DATE | Date of withdrawal (NULL if still active/completed) |
| withdrawal_reason | VARCHAR(50) | Reason for withdrawal, if applicable |
sites
| Column | Type | Description |
|---|---|---|
| site_id | INT (PK) | Unique site identifier |
| site_name | VARCHAR(100) | Clinical site name |
| city | VARCHAR(50) | Site city |
| state | VARCHAR(2) | Site state abbreviation |
| principal_investigator | VARCHAR(100) | Lead investigator name |
| enrollment_target | INT | Target number of participants for this site |
| activation_date | DATE | Date site was activated for enrollment |
visits
| Column | Type | Description |
|---|---|---|
| visit_id | INT (PK) | Unique visit record |
| participant_id | INT (FK) | References participants table |
| visit_name | VARCHAR(30) | Baseline, Week 4, Week 12, Week 24, Week 36, Week 52 |
| scheduled_date | DATE | Planned visit date |
| actual_date | DATE | Actual visit date (NULL if missed) |
| visit_status | VARCHAR(15) | Completed, Missed, Scheduled, Cancelled |
adverse_events
| Column | Type | Description |
|---|---|---|
| ae_id | INT (PK) | Unique adverse event identifier |
| participant_id | INT (FK) | References participants table |
| ae_term | VARCHAR(100) | Adverse event description (MedDRA preferred term) |
| severity | VARCHAR(10) | Mild, Moderate, Severe |
| serious | BOOLEAN | Whether this is a Serious Adverse Event (SAE) |
| onset_date | DATE | Date the event started |
| resolution_date | DATE | Date the event resolved (NULL if ongoing) |
| relationship | VARCHAR(20) | Related, Possibly Related, Unrelated |
outcomes
| Column | Type | Description |
|---|---|---|
| outcome_id | INT (PK) | Unique outcome record |
| participant_id | INT (FK) | References participants table |
| event_type | VARCHAR(30) | Primary endpoint event (MI, Stroke, CV Death, None) |
| event_date | DATE | Date the event occurred (NULL if no event) |
| days_to_event | INT | Days from enrollment to event or last follow-up |
| censored | BOOLEAN | TRUE if no event occurred (censored observation) |
SQL Analysis
1. Enrollment by Site vs. Target
Track how each site is performing against its enrollment target to identify underperforming sites that need support.
SELECT
s.site_name,
s.city,
s.state,
s.principal_investigator,
s.enrollment_target,
COUNT(p.participant_id) AS enrolled,
ROUND(
COUNT(p.participant_id) * 100.0 / s.enrollment_target, 1
) AS pct_of_target,
s.enrollment_target - COUNT(p.participant_id) AS remaining_to_enroll,
ROUND(
COUNT(p.participant_id) * 1.0
/ GREATEST(DATEDIFF(CURDATE(), s.activation_date) / 30.0, 1), 1
) AS enrollments_per_month
FROM sites s
LEFT JOIN participants p ON s.site_id = p.site_id
GROUP BY s.site_id, s.site_name, s.city, s.state,
s.principal_investigator, s.enrollment_target, s.activation_date
ORDER BY pct_of_target DESC;
2. Visit Completion Rates
Measure protocol adherence by calculating the completion rate for each scheduled visit across all sites.
SELECT
v.visit_name,
COUNT(*) AS total_scheduled,
SUM(CASE WHEN v.visit_status = 'Completed' THEN 1 ELSE 0 END) AS completed,
SUM(CASE WHEN v.visit_status = 'Missed' THEN 1 ELSE 0 END) AS missed,
SUM(CASE WHEN v.visit_status = 'Cancelled' THEN 1 ELSE 0 END) AS cancelled,
ROUND(
SUM(CASE WHEN v.visit_status = 'Completed' THEN 1 ELSE 0 END) * 100.0
/ COUNT(*), 1
) AS completion_rate_pct
FROM visits v
WHERE v.scheduled_date <= CURDATE()
GROUP BY v.visit_name
ORDER BY FIELD(v.visit_name, 'Baseline', 'Week 4', 'Week 12',
'Week 24', 'Week 36', 'Week 52');
3. Adverse Events by Treatment Group
Compare adverse event rates between treatment and placebo groups to detect potential safety signals.
SELECT
p.treatment_group,
COUNT(DISTINCT p.participant_id) AS group_size,
COUNT(ae.ae_id) AS total_aes,
ROUND(COUNT(ae.ae_id) * 1.0 / COUNT(DISTINCT p.participant_id), 2) AS aes_per_participant,
SUM(CASE WHEN ae.serious = TRUE THEN 1 ELSE 0 END) AS serious_aes,
SUM(CASE WHEN ae.relationship IN ('Related', 'Possibly Related') THEN 1 ELSE 0 END) AS related_aes,
COUNT(DISTINCT ae.ae_term) AS unique_ae_types
FROM participants p
LEFT JOIN adverse_events ae ON p.participant_id = ae.participant_id
GROUP BY p.treatment_group;
4. Dropout Analysis
Analyze withdrawal rates and reasons to understand trial retention challenges.
SELECT
p.treatment_group,
p.withdrawal_reason,
COUNT(*) AS withdrawals,
ROUND(
COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (PARTITION BY p.treatment_group), 1
) AS pct_of_group_withdrawals,
ROUND(AVG(DATEDIFF(p.withdrawal_date, p.enrollment_date)), 0) AS avg_days_on_study
FROM participants p
WHERE p.status = 'Withdrawn'
GROUP BY p.treatment_group, p.withdrawal_reason
ORDER BY p.treatment_group, withdrawals DESC;
Python Analysis
1. Enrollment Timeline with Target Overlay
Plot cumulative enrollment over time against the target trajectory to visualize enrollment pace.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
# Load data
participants = pd.read_csv('participants.csv', parse_dates=['enrollment_date', 'withdrawal_date'])
sites = pd.read_csv('sites.csv', parse_dates=['activation_date'])
# Cumulative enrollment over time
enrollment_timeline = (
participants
.groupby('enrollment_date')
.size()
.cumsum()
.reset_index(name='cumulative_enrolled')
)
# Target line: linear ramp to 1200 over 18 months
target_start = participants['enrollment_date'].min()
target_end = target_start + pd.DateOffset(months=18)
target_dates = pd.date_range(target_start, target_end, freq='D')
target_values = np.linspace(0, 1200, len(target_dates))
fig, ax = plt.subplots(figsize=(14, 6))
# Actual enrollment
ax.plot(enrollment_timeline['enrollment_date'],
enrollment_timeline['cumulative_enrolled'],
color='#2196F3', linewidth=2.5, label='Actual Enrollment')
# Target line
ax.plot(target_dates, target_values, color='gray',
linestyle='--', linewidth=1.5, label='Target (1,200 in 18 months)')
# Milestone markers
milestones = [300, 600, 900, 1200]
for m in milestones:
ax.axhline(y=m, color='lightgray', linestyle=':', alpha=0.5)
ax.text(enrollment_timeline['enrollment_date'].min(), m + 15,
f'{m} participants', fontsize=9, color='gray')
ax.set_title('Cumulative Trial Enrollment vs. Target',
fontsize=16, fontweight='bold')
ax.set_xlabel('Date')
ax.set_ylabel('Participants Enrolled')
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))
ax.legend(fontsize=12)
ax.grid(True, alpha=0.2)
plt.tight_layout()
plt.savefig('enrollment_timeline.png', dpi=150)
plt.show()
2. Kaplan-Meier Survival Curves
Estimate event-free survival probability for treatment vs. placebo groups using the Kaplan-Meier method.
from lifelines import KaplanMeierFitter
from lifelines.statistics import logrank_test
# Load outcomes
outcomes = pd.read_csv('outcomes.csv')
data = participants.merge(outcomes, on='participant_id')
# Prepare survival data
data['event_observed'] = ~data['censored'] # 1 = event, 0 = censored
# Fit KM curves by treatment group
kmf_treatment = KaplanMeierFitter()
kmf_placebo = KaplanMeierFitter()
treatment_mask = data['treatment_group'] == 'Treatment'
placebo_mask = data['treatment_group'] == 'Placebo'
kmf_treatment.fit(
data.loc[treatment_mask, 'days_to_event'],
data.loc[treatment_mask, 'event_observed'],
label='CardioVex (Treatment)'
)
kmf_placebo.fit(
data.loc[placebo_mask, 'days_to_event'],
data.loc[placebo_mask, 'event_observed'],
label='Placebo'
)
# Log-rank test
lr_result = logrank_test(
data.loc[treatment_mask, 'days_to_event'],
data.loc[placebo_mask, 'days_to_event'],
data.loc[treatment_mask, 'event_observed'],
data.loc[placebo_mask, 'event_observed']
)
fig, ax = plt.subplots(figsize=(10, 7))
kmf_treatment.plot_survival_function(ax=ax, color='#2196F3', linewidth=2)
kmf_placebo.plot_survival_function(ax=ax, color='#F44336', linewidth=2)
ax.set_title('Kaplan-Meier: Event-Free Survival by Treatment Group',
fontsize=15, fontweight='bold')
ax.set_xlabel('Days Since Enrollment', fontsize=12)
ax.set_ylabel('Event-Free Survival Probability', fontsize=12)
ax.set_ylim(0, 1.05)
# Add log-rank p-value
ax.text(0.05, 0.05, f'Log-rank p-value: {lr_result.p_value:.4f}',
transform=ax.transAxes, fontsize=12,
bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.8))
ax.legend(fontsize=12, loc='lower left')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('kaplan_meier.png', dpi=150)
plt.show()
3. Adverse Event Visualization
Create a grouped bar chart comparing the most common adverse events between treatment groups.
import seaborn as sns
# Load adverse events
adverse_events = pd.read_csv('adverse_events.csv', parse_dates=['onset_date'])
ae_data = adverse_events.merge(
participants[['participant_id', 'treatment_group']],
on='participant_id'
)
# Top 10 most common AEs
top_aes = ae_data['ae_term'].value_counts().head(10).index
ae_comparison = (
ae_data[ae_data['ae_term'].isin(top_aes)]
.groupby(['ae_term', 'treatment_group'])
.size()
.reset_index(name='count')
)
# Normalize by group size
group_sizes = participants['treatment_group'].value_counts()
ae_comparison['rate_per_100'] = ae_comparison.apply(
lambda row: row['count'] * 100 / group_sizes[row['treatment_group']],
axis=1
)
fig, ax = plt.subplots(figsize=(12, 7))
pivot_data = ae_comparison.pivot(
index='ae_term', columns='treatment_group', values='rate_per_100'
).reindex(top_aes)
pivot_data.plot(kind='barh', ax=ax, color=['#2196F3', '#F44336'],
edgecolor='none', width=0.7)
ax.set_title('Top 10 Adverse Events: Treatment vs. Placebo',
fontsize=15, fontweight='bold')
ax.set_xlabel('Rate per 100 Participants', fontsize=12)
ax.set_ylabel('')
ax.legend(title='Group', fontsize=11)
ax.invert_yaxis()
ax.grid(True, axis='x', alpha=0.3)
plt.tight_layout()
plt.savefig('adverse_events_comparison.png', dpi=150)
plt.show()
Excel Dashboard
Build a trial progress dashboard for weekly sponsor review meetings.
Step-by-Step Instructions
- Import Data: Go to Data > Get Data > From Text/CSV. Import participants.csv, sites.csv, visits.csv, and adverse_events.csv into separate worksheets.
- Site Enrollment Scorecard: Create a PivotTable from participants data with site_id in Rows and COUNT of participant_id in Values. Add a helper column with enrollment_target from the sites table using
=VLOOKUP([@site_id], sites, 6, FALSE). Create a Clustered Bar Chart comparing enrolled vs. target for each site. - Enrollment Timeline Chart: Add a helper column for cumulative enrollment using
=COUNTIFS(participants[enrollment_date], "<="&A2)sorted by date. Insert a Line Chart showing the cumulative curve. Add a straight diagonal line from 0 to 1200 as the target reference. - Visit Completion Matrix: Create a PivotTable with visit_name in Rows and visit_status in Columns. Show COUNT of visit_id in Values. Add a calculated field for completion rate:
=Completed/(Completed+Missed+Cancelled). Apply conditional formatting (green > 90%, yellow 80-90%, red < 80%). - Adverse Event Summary: Create a PivotTable filtering by treatment_group. Show top AE terms with counts. Add a second PivotTable for SAEs only (filter serious = TRUE). Highlight any SAEs in the treatment group using red conditional formatting.
- Milestone Tracker: Create a table with key milestones (25% enrolled, 50% enrolled, first patient Week 52, etc.). Add columns for Target Date, Actual Date, and Status. Use conditional formatting with icons (green check, yellow clock, red X).
- KPI Summary Row: At the top of the dashboard, display:
- Total Enrolled:
=COUNTA(participants[participant_id]) - Active Participants:
=COUNTIF(participants[status],"Active") - Dropout Rate:
=COUNTIF(participants[status],"Withdrawn")/COUNTA(participants[participant_id]) - Total SAEs:
=COUNTIF(adverse_events[serious],TRUE)
- Total Enrolled:
- Add Slicers: Insert slicers for Site, Treatment Group, and Status. Connect all slicers to every PivotTable. Add a Timeline slicer connected to enrollment_date for time-based filtering.
Key Insights
Enrollment Pace
Overall enrollment is at 78% of target after 14 months. Two sites are below 50% of their target and need recruitment strategy adjustments or additional site activation.
Visit Adherence
Baseline and Week 4 visits have 97% completion, but Week 36 and Week 52 drop to 82%. Implementing reminder calls 72 hours before visits could improve retention.
Safety Profile
The treatment group shows a 12% higher rate of mild headaches but no significant difference in serious adverse events (3.2% vs. 2.8%). The safety profile appears acceptable.
Dropout Patterns
The overall dropout rate is 14.2%. "Lost to follow-up" is the most common reason (38%), followed by "adverse event" (22%) and "withdrew consent" (18%).
Efficacy Signal
The Kaplan-Meier analysis shows a separation between treatment and placebo curves after Day 120, with the treatment group showing 23% fewer primary endpoint events (p=0.03).
Site Performance
The top-performing site (Boston Medical) enrolls 12 participants/month vs. the average of 7. Best practices from this site should be shared across the network.
Quiz
In a Kaplan-Meier analysis, what does "censoring" mean?
What does the log-rank test assess in the context of this clinical trial?
What is a Serious Adverse Event (SAE) in a clinical trial?
In the enrollment SQL query, what does the GREATEST() function do?
Why is it important to normalize adverse event counts by group size when comparing treatment vs. placebo?