Code
library(tidyverse)
library(survival)
library(reactable)
Alex Zajichek
October 2, 2023
I was recently introduced to overlap weighting, which is part of a general family of methods for balancing covariates when estimating treatment effects with observational data. Specifically, it focuses on the clinical equipoise; that is, the patients in which the treatment decision is most uncertain. I find it more elegant than matching (which I’ve used in the past), and figured a useful way to better understand the approach is to deconstruct, interpret, and translate a SAS simulation in the context of survival analysis implemented by the original authors. All code is written in (and translated to) R
. We’ll start by loading some packages.
The theoretical underpinnings of overlap weighting live in the context of the potential outcomes paradigm of causal inference. Basically, we wonder what would have happened had we been able to observe each patient under both treatments (known as the counterfactual). If we knew the outcomes from the two worlds, the causal treatment effect would simply be the average difference across all patients. The problem of course is that in reality the outcome can only be observed for the treatment in which the patient was assigned (or chose), and so our goal is to work with the “incomplete” information that we do have to try to estimate what the outcome difference would have been had the counterfactual been observed as well.
The first thing we need to do is generate some (fake) patients to facilitate the simulation. Here we’ll focus on age, sex, and income as patient-identifying characteristics.
# A tibble: 10,000 × 5
age zage male income zincome
<dbl> <dbl> <int> <dbl> <dbl>
1 44.4 -0.560 0 36.5 -1.35
2 47.7 -0.230 1 44.2 -0.579
3 65.6 1.56 1 41.4 -0.861
4 50.7 0.0705 1 59.7 0.973
5 51.3 0.129 0 56.2 0.619
6 67.2 1.72 1 63.9 1.39
7 54.6 0.461 1 35.1 -1.49
8 37.3 -1.27 1 56.4 0.639
9 43.1 -0.687 1 53.7 0.375
10 45.5 -0.446 1 53.7 0.370
# ℹ 9,990 more rows
We’ve generated a sample of 10000 patients. We’ll assume age
is measured in years, income
in thousands of dollars ($), and male
is 1 for Male and 0 for Female. The columns zage
and zincome
are just standardized versions of the originals to avoid scaling nuances (we’ll refer to these as \(age_z\) and \(income_z\)).
Here’s where the important theory starts to creep in (already). We assume that there is a true propensity score, say \(p_i^A\), that is the true probability that patient i receives treatment A given their specific characteristics (and let’s assume there are possible treatments A & B). Further, we assume (some transformation of) this probability is a linear combination of all the characteristics that confound the crude treatment effect on the outcome. In this simulation, we’ll assume the logit model:
\[log(\frac{p_i^A}{1 - p_i^A}) = 0.41 \times age_z - 0.22 \times male - 0.69 \times income_z - 0.40\]
So let’s add this true propensity score to the population
data set:
# A tibble: 10,000 × 7
age zage male income zincome log_odds_pA pA
<dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
1 44.4 -0.560 0 36.5 -1.35 0.311 0.577
2 47.7 -0.230 1 44.2 -0.579 -0.315 0.422
3 65.6 1.56 1 41.4 -0.861 0.606 0.647
4 50.7 0.0705 1 59.7 0.973 -1.27 0.219
5 51.3 0.129 0 56.2 0.619 -0.777 0.315
6 67.2 1.72 1 63.9 1.39 -0.888 0.292
7 54.6 0.461 1 35.1 -1.49 0.595 0.644
8 37.3 -1.27 1 56.4 0.639 -1.58 0.171
9 43.1 -0.687 1 53.7 0.375 -1.16 0.238
10 45.5 -0.446 1 53.7 0.370 -1.06 0.257
# ℹ 9,990 more rows
Obviously in practice we don’t know what \(p_i^A\) is, so it must be estimated from the treatment assignments we observe in the data. Additionally, we aren’t strictly required to assume the logit model, but when we use it to estimate the propensity score for overlap weighting, it has the advantageous property of perfect balance. That is, the weighted-mean differences for all covariates included in the logistic regression model will be zero across the treatment groups.
The treatment assignment is one of the known quantities we would observe in a real life sample, and that is what would be used as the dependent variable for estimating propensity scores. However, since we are running a simulation, we need to generate the treatment assignments from the governing distribution. We assume that the observed treatment for patient i is the result of an (unfair) coin flip that has a probability equal to the true propensity score. In statistical notation,
\[A_i \sim Bernoulli(p_i^A)\]
where \(A_i\) is the indicator of treatment A. Let’s add a realized treatment assignment to the population
:
# A tibble: 10,000 × 8
age zage male income zincome log_odds_pA pA A
<dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int>
1 44.4 -0.560 0 36.5 -1.35 0.311 0.577 1
2 47.7 -0.230 1 44.2 -0.579 -0.315 0.422 0
3 65.6 1.56 1 41.4 -0.861 0.606 0.647 1
4 50.7 0.0705 1 59.7 0.973 -1.27 0.219 0
5 51.3 0.129 0 56.2 0.619 -0.777 0.315 1
6 67.2 1.72 1 63.9 1.39 -0.888 0.292 1
7 54.6 0.461 1 35.1 -1.49 0.595 0.644 0
8 37.3 -1.27 1 56.4 0.639 -1.58 0.171 0
9 43.1 -0.687 1 53.7 0.375 -1.16 0.238 0
10 45.5 -0.446 1 53.7 0.370 -1.06 0.257 0
# ℹ 9,990 more rows
In this case, A=1
represents treatment A and A=0
represents treatment B. Again, this is the actual treatment we would observe the patient receiving in a sample.
We need to define what the overlap weight is, and it’s actually quite simple: just assign the patient the probability they did not receive their observed treatment.
\[ \begin{equation} OW_i= \begin{cases} 1-p_i^A & \text{if } A_i=1\\ p_i^A & \text{if } A_i=0\\ \end{cases} \end{equation} \]
Notice the notation. Since it depends on the true propensity score, the overlap weight is another unknown quantity that is estimated from the data. It also depends on the realized treatment assignment, so the collection of weights across patients differ depending on the observed treatment distribution. We can add these weights to the population
data set:
# A tibble: 10,000 × 9
age zage male income zincome log_odds_pA pA A OW
<dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <dbl>
1 44.4 -0.560 0 36.5 -1.35 0.311 0.577 1 0.423
2 47.7 -0.230 1 44.2 -0.579 -0.315 0.422 0 0.422
3 65.6 1.56 1 41.4 -0.861 0.606 0.647 1 0.353
4 50.7 0.0705 1 59.7 0.973 -1.27 0.219 0 0.219
5 51.3 0.129 0 56.2 0.619 -0.777 0.315 1 0.685
6 67.2 1.72 1 63.9 1.39 -0.888 0.292 1 0.708
7 54.6 0.461 1 35.1 -1.49 0.595 0.644 0 0.644
8 37.3 -1.27 1 56.4 0.639 -1.58 0.171 0 0.171
9 43.1 -0.687 1 53.7 0.375 -1.16 0.238 0 0.238
10 45.5 -0.446 1 53.7 0.370 -1.06 0.257 0 0.257
# ℹ 9,990 more rows
In a real analysis, these weights are what we are ultimately after in order to estimate the average causal treatment effect on the outcome of interest while balancing differences in patient characteristics across the groups (i.e., removing the confounding factors).
Now we do end up normalizing the weights so they sum to one within each treatment group. But the intuition about what’s happening is that the focus of the treatment effect estimation is shifted to patients that are most likely in either treatment group. This is known as the clinical equipoise, and the resulting treatment effect is interpreted as the average treatment effect in the overlap population.
This makes a lot of sense. When we’re comparing treatments, we should focus most heavily on patients that could be candidates for either, and less so on patients that were bound for one. Thus, we up-weight patients who have the most overlap in characteristics with the opposing treatment group. The beautiful thing here is that we do this without throwing away any information; smoothly and proportionately weighting each subject just the amount that we should.
We’ve generated the treatments and established how they are related to patient characteristics, but haven’t talked about the outcome in which we’re ultimately interested in estimating the treatment effect for. Since we’re focusing on time-to-event outcomes, we’ll stay in that context, but the general idea is that we assume there exists a realization of what a patient’s outcome would have been under each treatment scenario. Then if we compare the difference of those outcomes across all patients, the average difference must be caused by the treatment.
In this simulation, we’ll generate event times from a Weibull distribution. Starting at treatment initiation, this might be the time until cancer recurrence, hospitalization, or something else; we’re just looking to see how long it takes for some event to occur. The PDF for this distribution looks like this:
\[f(t) = \frac{\alpha}{\sigma}\left(\frac{t}{\sigma}\right)^{\alpha-1}e^{-\left(\frac{t}{\sigma}\right)^\alpha}\]
where \(t\) is the event time, \(\alpha\) is the shape parameter, and \(\sigma\) is the scale parameter. In our example, we will say that:
\[T_i^A \sim Weibull(\alpha, \sigma_i^A)\] \[T_i^B \sim Weibull(\alpha, \sigma_i^B)\] where \(T_i^A\) and \(T_i^B\) are the event times under treatments A and B, respectively. That is, the event times for each patient under each treatment follow a Weibull distribution with a common shape parameter (in this case, across both treatments), but a scale parameter that depends on the patient’s specific characteristics (in log-linear form) and differs by treatment, which captures the treatment effect. Specifically,
\[\alpha = 1\] \[\sigma_i^A = \lambda \times e^{-(\phi_i - 0.36)}\] \[\sigma_i^B = \lambda \times e^{-\phi_i}\] \[\phi_i = 1.10 \times age_z - 0.22 \times male - 0.36 \times income_z\] \[\lambda = 4055.56\]
Basically, the \(\phi_i\) term does the baseline adjustment on the outcome risk for the patient’s specific characteristics, and the treatment effect is simply a fixed, additive deviation from that for all patients. The \(\lambda\) term is the baseline hazard function, providing the scale parameter when all covariate values are zero (for treatment B), which in this case is constant over time.
When setting \(\alpha=1\), the mean of a Weibull distribution is equal to its scale parameter. That is,
\[E[T_i^A] = \sigma_i^A\] \[E[T_i^B] = \sigma_i^B\] This gives us a very nice interpretation of the treatment effect. If we compare them relatively, we get:
\[ \begin{equation} \begin{split} \frac{E[T_i^A]}{E[T_i^B]} & = \frac{\sigma_i^A}{\sigma_i^B} \\ & = \frac{\lambda \times e^{-(\phi_i - 0.36)}}{\lambda \times e^{-\phi_i}} \\ & = \frac{e^{-(\phi_i - 0.36)}}{e^{-\phi_i}} \\ & = \frac{e^{0.36}e^{-\phi_i}}{e^{-\phi_i}} \\ & = e^{0.36} \\ & \approx 1.43 \end{split} \end{equation} \]
So we can say that the time to event for patients on treatment A is 1.43 times, or 43%, longer on average than those on treatment B, given a fixed age, sex, and income. That is, treatment A is “better” than treatment B.
Next, we need to actually sample the times for our population
(note that it was confirmed that SAS and R have the same parameterizations):
population <-
population |>
# Sample outcome event times under each treatment (counterfactual)
mutate(
# Baseline hazard (constant over time)
lambda = 1*365/0.09,
# Define TRUE Weibull linear predictor
phi = log(3) * zage + log(0.8) * male + log(.70) * zincome,
sigma_A = lambda * exp(-(phi + log(0.70))),
sigma_B = lambda * exp(-phi),
# Generate REALIZED survival times under each treatment scenario (same parameterizations in SAS)
t_A = rweibull(n, shape = 1, scale = sigma_A),
t_B = rweibull(n, shape = 1, scale = sigma_B),
)
population |> select(lambda, phi, sigma_A, sigma_B, t_A, t_B)
# A tibble: 10,000 × 6
lambda phi sigma_A sigma_B t_A t_B
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 4056. -0.133 6617. 4632. 5689. 7760.
2 4056. -0.269 7585. 5309. 1831. 9482.
3 4056. 1.80 961. 673. 488. 300.
4 4056. -0.493 9482. 6637. 23722. 255.
5 4056. -0.0788 6269. 4388. 7649. 1800.
6 4056. 1.17 1804. 1263. 1953. 1493.
7 4056. 0.814 2568. 1797. 5462. 2652.
8 4056. -1.84 36514. 25560. 38262. 2385.
9 4056. -1.11 17604. 12323. 3577. 6894.
10 4056. -0.845 13485. 9439. 38541. 3743.
# ℹ 9,990 more rows
Now we can make some density plots comparing the event time distributions across treatments (we’ll use log scaling for visual appeal):
population |>
# Send down the rows
pivot_longer(
cols = starts_with("t_"),
names_to = "Treatment",
values_to = "EventTime",
names_prefix = "t_"
) |>
# Make a plot
ggplot() +
geom_density(
aes(
x = log(EventTime),
fill = Treatment
),
alpha = .75
) +
scale_x_continuous(labels = function(x) round(exp(x))) +
theme(
panel.background = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = element_blank(),
axis.title.y = element_blank(),
legend.position = "top"
) +
xlab("Event Time")
Notice the shift to the right in the distribution for treatment A. This is the treatment effect. In the hypothetical world where all patients received treatment A, the event times happened later than in the world where all patients received treatment B (and all event times were observed), indicating, on average, a “benefit” to treatment A, which was expected from our prior calculation.
The final step for simulation setup is to assign the event time as we might observe in a real data set. The event times generated in the previous section capture both treatment scenarios for all patients, but in reality we would only (potentially) observe the event time for the treatment the patient received. Additionally, we likely (or definitely) won’t be following all patients long enough to observe all events occur, meaning some patients will be censored. Let’s add the observed outcomes to the population
:
population <-
population |>
# Add the observed outcome
mutate(
# The ACTUAL event time outcome depends on the treatment ACTUALLY observed for the patient
actual_event_time = t_B * (1 - A) + t_A * A,
# Generate a REALIZED censoring time (completely random)
censor_time = 500 + 500 * runif(n),
# Calculate the OBSERVED time in the data set (known quantity)
time = pmin(actual_event_time, censor_time), # They either had the event, or were censored first
# Calculate the event status (TRUE if event observed, FALSE if censored)
status = as.numeric(actual_event_time < censor_time)
)
population |> select(actual_event_time, censor_time, time, status)
# A tibble: 10,000 × 4
actual_event_time censor_time time status
<dbl> <dbl> <dbl> <dbl>
1 5689. 843. 843. 0
2 9482. 814. 814. 0
3 488. 882. 488. 1
4 255. 936. 255. 1
5 7649. 653. 653. 0
6 1953. 890. 890. 0
7 2652. 737. 737. 0
8 2385. 703. 703. 0
9 6894. 952. 952. 0
10 3743. 526. 526. 0
# ℹ 9,990 more rows
Our final simulated data set has the following observed outcome summaries:
population |>
# Compute summary metrics
summarize(
Patients = n(),
Time = mean(time),
.by = c(A, status)
) |>
# Add shares within treatment
mutate(
Percent = Patients / sum(Patients),
.by = A
) |>
# Clean/rearrange
transmute(
Treatment =
case_when(
A == 1 ~ "A",
TRUE ~ "B"
),
Status =
case_when(
status == 1 ~ "Event",
TRUE ~ "Censored"
),
Patients,
Percent,
Time
) |>
arrange(Treatment, desc(Status)) |>
# Make a table
reactable(
groupBy = "Treatment",
columns =
list(
Patients = colDef(aggregate = "sum", align = "center"),
Percent = colDef(name = "Percent of patients in treatment group (%)", aggregate = "sum", align = "center", format = colFormat(digits = 1, percent = TRUE)),
Time = colDef(name = "Avg. time to outcome", aggregate = zildge::rectbl_agg_wtd("Patients"), align = "center", format = colFormat(digits = 1))
),
striped = TRUE,
highlight = TRUE,
bordered = TRUE,
resizable = TRUE,
theme = reactablefmtr::sandstone()
) |>
reactablefmtr::add_source("Use arrows to expand table", font_size = 12, font_style = "italic")
Use arrows to expand table
We can also look at the survival curves.
population |>
# Nest by treatment
group_by(A) |>
nest() |>
# Esimtate survival curves for each treatment
mutate(
Surv =
data |>
map(
function(.trt) {
# Fit the model
mod <- survfit(Surv(time, status) ~ 1, data = .trt)
# Extract the elements
tibble(
Time = mod$time,
Survival = mod$surv,
Lower = mod$lower,
Upper = mod$upper
)
}
)
) |>
# Unnest the curves
select(-data) |>
unnest(cols = Surv) |>
ungroup() |>
# Clean labels
mutate(
Treatment =
case_when(
A == 1 ~ "A",
TRUE ~ "B"
)
) |>
# Make a plot
ggplot(
aes(
x = Time,
y = Survival
)
) +
geom_line(aes(color = Treatment)) +
geom_ribbon(
aes(
ymin = Lower,
ymax = Upper,
fill = Treatment
),
alpha = .25
) +
scale_y_continuous(labels = scales::percent) +
theme(
panel.background = element_blank(),
panel.grid.major.y = element_line(color = "gray"),
legend.position = "top"
) +
xlab("Time since treatment") +
ylab("Survival Probability")
If we weren’t doing a simulation, we wouldn’t know how much of these differences are explained by the treatment. And actually, these crude survival curves suggest longer survival from treatment B, even though we know the opposite is true. The goal is to use this known information (along with patient characteristics) to estimate the causal treatment effect that is baked into all of the unknown quantities we have here that would be unavailable to us in a real-life analysis.
Now that our simulation is set, we can start using the observed data (i.e., known quantities) to estimate the case-weights needed for treatment effect estimation. This is the process we’d go through in a real-life analysis, but here we have the benefit of knowing the truth, so we can see how close our estimations are to reality.
The intention of these weights is to adjust the observed sample to create a pseudo-population such that the patient characteristics that we believe are confounding the treatment effect are corrected for, or balanced, across the treatment groups. The term “pseudo” here is a little misleading. It’s simply referring to the fact that each patient will not contribute the same weight in estimating the treatment effect. In fact, the patients with the most overlap in characteristics across the treatment groups will contribute the most, with patients being proportionately down-weighted the less overlap they have (I’d argue this is the same thing that is done in matching, it’s just that the weights for patients are either exactly 1 or 0). This is done to tease out the portion of the outcome differences that is explained by the treatment, namely, the causal treatment effect.
We’ve established that the overlap weights are quantities that need to be estimated from the data. In order to calculate those weights, we first need to estimate the propensity scores. We know what the true model is, but let’s assume we are only working with observable data:
# A tibble: 10,000 × 6
zage male zincome A time status
<dbl> <int> <dbl> <int> <dbl> <dbl>
1 -0.560 0 -1.35 1 843. 0
2 -0.230 1 -0.579 0 814. 0
3 1.56 1 -0.861 1 488. 1
4 0.0705 1 0.973 0 255. 1
5 0.129 0 0.619 1 653. 0
6 1.72 1 1.39 1 890. 0
7 0.461 1 -1.49 0 737. 0
8 -1.27 1 0.639 0 703. 0
9 -0.687 1 0.375 0 952. 0
10 -0.446 1 0.370 0 526. 0
# ℹ 9,990 more rows
Our goal is to obtain patient-specific probabilities of receiving treatment A. We’ll estimate this with a logistic regression model, but we’ll allow for some flexibility in the shape of the relationships between the continuous variables (age and income) and the outcome with the use of restricted cubic splines (see my other post for another explanation):
Call:
glm(formula = A ~ rms::rcs(zage, 3) + rms::rcs(zincome, 3) +
factor(male), family = "binomial", data = population)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.30027 0.06527 -4.600 4.22e-06 ***
rms::rcs(zage, 3)zage 0.40206 0.05099 7.885 3.16e-15 ***
rms::rcs(zage, 3)zage' -0.07337 0.05813 -1.262 0.207
rms::rcs(zincome, 3)zincome -0.63792 0.04950 -12.888 < 2e-16 ***
rms::rcs(zincome, 3)zincome' -0.04020 0.06434 -0.625 0.532
factor(male)1 -0.25432 0.04415 -5.760 8.41e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 13359 on 9999 degrees of freedom
Residual deviance: 12193 on 9994 degrees of freedom
AIC: 12205
Number of Fisher Scoring iterations: 4
The model seems to indicate that the non-linear terms for age and income don’t particularly matter (which is expected), but we’ll leave it as-is. Next, let’s attach the estimated propensity scores to the population
:
We can take a look at the estimated propensity score distribution across the treatment groups (we’ll also overlay the true distributions for comparison):
population |>
# Send PS scores down the rows
pivot_longer(
cols = c(pA, pA_hat),
names_to = "Type",
values_to = "Score"
) |>
# Clean labels
mutate(
Treatment =
case_when(
A == 1 ~ "A",
TRUE ~ "B"
),
Type =
case_when(
Type == "pA" ~ "True",
TRUE ~ "Estimated"
)
) |>
# Make a plot
ggplot() +
geom_density(
aes(
x = Score,
fill = Treatment,
linetype = Type
),
alpha = .40
) +
scale_x_continuous(labels = scales::percent) +
theme(
panel.background = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = element_blank(),
axis.title.y = element_blank(),
legend.position = "top"
) +
xlab("P(A=1|X)")
The model does a great job at estimating the true propensity scores. In a real analysis we probably wouldn’t be this close since we likely wouldn’t have only and all true confounders accounted for.
In the same vein, we can explore the modeled relationships between each patient characteristic and the propensity scores.
plots <-
population |>
# Send PS scores down the rows
pivot_longer(
cols = c(pA, pA_hat),
names_to = "Type",
values_to = "Score"
) |>
# Clean labels
mutate(
Treatment =
case_when(
A == 1 ~ "A",
TRUE ~ "B"
),
Type =
case_when(
Type == "pA" ~ "True",
TRUE ~ "Estimated"
)
) |>
# Send covariates down the rows
pivot_longer(
cols = c(zage, zincome, male),
names_prefix = "^z"
) |>
# Make groups
mutate(
Group =
case_when(
name == "male" ~ "categorical",
TRUE ~ "continuous"
)
) |>
# Make a nested frame
group_by(Group) |>
nest() |>
# Make plot depending on type
mutate(
plot =
data |>
map(
function(.group) {
# Check condition
if(n_distinct(.group$value) == 2) {
.group |>
summarize(
Rate = mean(Score),
.by = c(Treatment, Type, name, value)
) |>
mutate(
Sex =
case_when(
value == 1 ~ "Male",
TRUE ~ "Female"
),
name = "Sex"
) |>
ggplot() +
geom_point(
aes(
x = Sex,
y = Rate,
color = Treatment,
shape = Type,
group = Type
)
) +
geom_line(
aes(
x = Sex,
y = Rate,
color = Treatment,
linetype = Type,
group = interaction(Type, Treatment)
)
) +
facet_wrap(~name) +
coord_flip() +
scale_y_continuous(labels = scales::percent) +
theme(
panel.background = element_blank(),
panel.grid.major.x = element_line(color = "gray"),
legend.position = "top",
axis.title.y = element_blank()
) +
ylab("P(A=1|X)")
} else {
.group |>
ggplot() +
geom_smooth(
aes(
x = value,
y = Score,
color = Treatment,
fill = Treatment,
linetype = Type
),
alpha = .25
) +
facet_wrap(~name) +
scale_y_continuous(labels = scales::percent) +
theme(
panel.background = element_blank(),
panel.grid.major.y = element_line(color = "gray"),
legend.position = "top"
) +
xlab("Z-Score") +
ylab("P(A=1|X)")
}
}
)
)
gridExtra::grid.arrange(plots$plot[[1]], plots$plot[[2]])
As known from the true model, patients who received treatment A tend to be older, female patients with lower income. We can also see slight miscalibration in the estimates for patients who are older in that the model tends to underestimate the true propensity score in these patients. We see similar miscalibration for sex.
We’ve already defined how to calculate the overlap weights. The only difference here is that we’ll do it from the estimated propensity scores instead of the true ones. First, we’ll apply the formula to add the estimated weights to the population
:
As we’ve mentioned earlier, we will then normalize the weights within each treatment group so they have the same cumulative contribution for estimating the treatment effect in the outcome model.
# A tibble: 10,000 × 7
A pA pA_hat OW OW_norm OW_hat OW_hat_norm
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0.577 0.583 0.423 0.000202 0.417 0.000198
2 0 0.422 0.427 0.422 0.000201 0.427 0.000202
3 1 0.647 0.610 0.353 0.000168 0.390 0.000185
4 0 0.219 0.226 0.219 0.000105 0.226 0.000107
5 1 0.315 0.329 0.685 0.000327 0.671 0.000318
6 1 0.292 0.265 0.708 0.000338 0.735 0.000348
7 0 0.644 0.628 0.644 0.000307 0.628 0.000297
8 0 0.171 0.181 0.171 0.0000814 0.181 0.0000856
9 0 0.238 0.250 0.238 0.000114 0.250 0.000118
10 0 0.257 0.268 0.257 0.000123 0.268 0.000127
# ℹ 9,990 more rows
To get a sense of the impact of these weights, let’s first look at their distributions (again, adding the true weights for comparison):
population |>
# Send overlap weight down the rows
pivot_longer(
cols = c(OW_norm, OW_hat_norm),
names_to = "Type",
values_to = "Weight"
) |>
# Clean labels
mutate(
Treatment =
case_when(
A == 1 ~ "A",
TRUE ~ "B"
),
Type =
case_when(
Type == "OW_norm" ~ "True",
TRUE ~ "Estimated"
)
) |>
# Make a plot
ggplot() +
geom_density(
aes(
x = Weight,
fill = Treatment,
linetype = Type
),
alpha = .40
) +
theme(
panel.background = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = element_blank(),
axis.title.y = element_blank(),
legend.position = "top"
) +
xlab("Normalized Overlap Weight")
In aggregate, the groups will have equal weight. The distribution shift has to do with the sampled treatment distribution. Only 38.8% of patients have treatment A so at an individual level each patient will contribute more on average.
We can also look at the accumulation of patients (i.e., when each patient contributes equally) as a function of the cumulative overlap weight in each group. This allows us to understand how much (or little) the treatment effect estimation will be dominated by smaller concentrations of patients.
population |>
# Send overlap weight down the rows
pivot_longer(
cols = c(OW_norm, OW_hat_norm),
names_to = "Type",
values_to = "Weight"
) |>
# Clean labels
mutate(
Treatment =
case_when(
A == 1 ~ "A",
TRUE ~ "B"
),
Type =
case_when(
Type == "OW_norm" ~ "True",
TRUE ~ "Estimated"
),
NullWeight = 1
) |>
# Rearrange
arrange(Treatment, Type, Weight) |>
# Compute the cumulative weight
mutate(
Weight = cumsum(Weight) / sum(Weight),
NullWeight = cumsum(NullWeight) / sum(NullWeight),
.by = c(Treatment, Type)
) |>
# Make a plot
ggplot() +
geom_line(
aes(
x = Weight,
y = NullWeight,
color = Treatment,
linetype = Type
),
linewidth = 1
) +
scale_x_continuous(labels = scales::percent) +
scale_y_continuous(labels = scales::percent) +
theme(
panel.background = element_blank(),
panel.grid.major.x = element_line(color = "gray"),
panel.grid.major.y = element_line(color = "gray"),
legend.position = "top"
) +
xlab("Cumulative percent of overlap weight (%)") +
ylab("Cumulative percent of patients (%)")
We can see that for treatment B, about 40% of the outcome model weight will be accounted for by only 25% of patients. The weights for treatment A are slightly more evenly spread across the patients.
It was previously mentioned that the overlap weight methodology leads to perfect balance among the covariates used in the propensity score model. We can verify by looking at the group means for each model factor before and after weighting.
population |>
# Add uniform weights
add_column(NullWeight = 1) |>
# Send weights down the rows
pivot_longer(
cols = c(NullWeight, OW_hat_norm),
names_to = "Type",
values_to = "Weight"
) |>
# Send factors down the rows
pivot_longer(
cols = c(age, income, male),
names_to = "Factor",
values_to = "Value"
) |>
# For each
summarize(
Patients = n(),
Mean = sum(Weight * Value) / sum(Weight),
.by = c(Factor, Type, A)
) |>
# Clean up
transmute(
Factor =
case_when(
Factor == "age" ~ "Age (years)",
Factor == "income" ~ "Income ($)",
Factor == "male" ~ "Male (%)"
),
Treatment =
case_when(
A == 1 ~ "A",
TRUE ~ "B"
),
Type,
Patients,
Mean =
case_when(
Factor == "Male (%)" ~ 100 * Mean,
TRUE ~ Mean
)
) |>
# Send over the columns
pivot_wider(
names_from = Type,
values_from = c(Patients, Mean)
) |>
# Rearrange
arrange(Factor, Treatment) |>
select(Factor, Treatment, ends_with("NullWeight"), Mean_OW_hat_norm) |>
# Make a table
reactable(
groupBy = "Factor",
columns =
list(
Patients_NullWeight = colDef(name = "Patients", aggregate = "sum", align = "center"),
Mean_NullWeight = colDef(name = "Before Weighting", aggregate = zildge::rectbl_agg_wtd("Patients_NullWeight"), align = "center", format = colFormat(digits = 1)),
Mean_OW_hat_norm = colDef(name = "After Weighting", aggregate = zildge::rectbl_agg_wtd("Patients_NullWeight"), align = "center", format = colFormat(digits = 1))
),
columnGroups =
list(
colGroup(
name = "Mean Value",
columns = c("Mean_NullWeight", "Mean_OW_hat_norm")
)
),
striped = TRUE,
highlight = TRUE,
bordered = TRUE,
resizable = TRUE,
theme = reactablefmtr::sandstone()
) |>
reactablefmtr::add_source("Use arrows to expand table", font_size = 12, font_style = "italic")
Use arrows to expand table
We can see that, overall, the weighted sample is slightly older and female with lower income.
Similar to looking at weight attributions as a whole, we can explore weight shifts within subgroups of patient characteristics. This helps build intuition about which patients the subsequent treatment effect estimates will focus on most (and least).
population |>
# Add uniform weights
add_column(NullWeight = 1) |>
# Convert to quntiles
mutate(
across(
c(age, income),
\(x) Hmisc::cut2(x, g = 10)
),
Age = age,
Income = income,
Sex =
case_when(
male == 1 ~ "Male",
TRUE ~ "Female"
),
Treatment =
case_when(
A == 1 ~ "A",
TRUE ~ "B"
)
) |>
# Send weights down the rows
pivot_longer(
cols = c(NullWeight, OW_hat_norm),
names_to = "Type",
values_to = "Weight"
) |>
# Send factors down the rows
pivot_longer(
cols = c(Age, Income, Sex),
names_to = "Factor",
values_to = "Level"
) |>
# Compute total weights
summarize(
Weight = sum(Weight),
.by = c(Factor, Level, Type, Treatment)
) |>
# Find percent of weight
mutate(
Weight = Weight / sum(Weight),
.by = c(Factor, Type, Treatment)
) |>
# Send over columns
pivot_wider(
names_from = Type,
values_from = Weight
) |>
# Clean up
mutate(
Level = factor(Level) |> fct_rev(),
Factor =
case_when(
Factor == "Age" ~ "Age (years)",
Factor == "Income" ~ "Income ($)",
TRUE ~ Factor
)
) |>
# Make a plot
ggplot() +
geom_col(
aes(
x = Level,
y = NullWeight,
fill = Treatment
),
color = "black",
linewidth = .75,
alpha = .40,
position = "dodge"
) +
geom_point(
aes(
x = Level,
y = OW_hat_norm,
color = Treatment,
shape = Treatment
),
position = position_dodge(width = 1),
size = 3
) +
geom_line(
aes(
x = Level,
y = OW_hat_norm,
color = Treatment,
group = Treatment
),
position = position_dodge(width = 1),
linewidth = .5
) +
facet_wrap(~Factor, scales = "free") +
coord_flip() +
scale_y_continuous(labels = scales::percent) +
theme(
panel.background = element_blank(),
panel.grid.major.x = element_line(color = "gray"),
legend.position = "top"
) +
xlab("Level") +
ylab("Share of weight within group (%)") +
labs(
fill = "Before Weighting",
shape = "After Weighting",
color = "After Weighting"
)
We can see the balancing take place. After overlap weighting, the contribution to treatment effect estimation is reduced in older patients with lower income for treatment A, and vice-versa for treatment B, with some levels in the middle having increased weight for both treatments. This is where the most overlap is between the groups.
It is also useful to look at these plots for factors that were not included in the propensity score model. We may find drastic weight shifts which might indicate significant co-variation among factors already accounted for.
We’re now ready to estimate the causal treatment effect. Although we generated our data from a Weibull distribution, we will use a Cox proportional-hazards model for estimation, which is a semi-parametric method, and tends to be the default choice for modeling survival data, especially in healthcare.
The treatment effect will be quantified by a hazard ratio, which is an estimate of the ratio of instantaneous event rates between the treatment groups (this is what is done in the original simulation). This measure is different than the effect we interpreted from the true data-generating process, so, although we won’t expect them to provide the same numerical result, we should still get a similar conclusion, in that treatment A is superior to treatment B.
To start, we can compute the actual hazard ratio had we been able to observe each potential outcome (i.e., the counterfactual). To do this, we need to transform our data such that we have two rows of data per patient–one for each treatment. Then, we fit the Cox model, using the true overlap weights as case weights.
# Fit the model
true_hr_mod <-
coxph(
formula = Surv(time, status) ~ factor(A),
data =
population |>
# Keep columns needed
select(
OW_norm, # True (normalized) overlap weight (could use un-normalized version here)
t_A, # True event time under treatment A
t_B, # True event time under treatment B
censor_time # When the patient was censored
) |>
# Send true event times down the rows
pivot_longer(
cols = c(t_A, t_B),
names_to = "A",
values_to = "time"
) |>
# Check if censored first
mutate(
A = case_when(A == "t_A" ~ 1, TRUE ~ 0), # Replace with our notation
time = pmin(time, censor_time),
status = as.numeric(time < censor_time)
),
weights = OW_norm
)
# Extract the result
true_hr <- exp(true_hr_mod$coefficients)[[1]]
true_hr
[1] 0.7284168
If we could compare the worlds in which we observe all patients under each treatment, the hazard ratio is 0.73, suggesting that the instantaneous event rate is 27% lower with treatment A than with treatment B. This aligns with what we’ve established as the true effect. Our hope is that the estimate from the observable data (in the next section) will contain this value within some margin of sampling error.
Finally, we can obtain our estimate of the treatment effect using only the observable quantities we have in our data set, which is what we would have in a real-life analysis. Similar to the previous section, we are just fitting a Cox model with the treatment as a covariate. Except this time we only observe one outcome per patient as a result of the treatment they actually received, and use the estimated overlap weight to balance the confounding factors (age, sex, and income). We will also use robust standard error estimates.
[1] 0.7423806
2.5 % 97.5 %
factor(A)1 0.6756889 0.8156548
Our estimate for the hazard ratio is 0.74, with a 95% confidence interval ranging from 0.68 to 0.82, suggesting that the instantaneous event rate for treatment A is between 18% and 32% lower compared to treatment B (note that this captures the true hazard ratio). Under the assumption that we’ve correctly specified all confounding factors (which we did), we can interpret this as the causal treatment effect, and, assuming the magnitude of improvement is of practical importance (when considering things like cost, resources, clinical outcomes, etc.), we can reliably conclude that there is a benefit to treatment A over treatment B, on average.
Some additional resources I’ve come across while learning the methodology:
---
title: "The overlap weight in survival analysis"
description: "A simulation review"
author: "Alex Zajichek"
date: "10/2/2023"
image: "feature.png"
categories:
- Causal Inference
- Survival Analysis
- Propensity Scores
- Weighting
format:
html:
code-fold: true
code-tools: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
```
I was recently introduced to [overlap weighting](https://jamanetwork.com/journals/jama/article-abstract/2765748), which is part of a general family of methods for balancing covariates when estimating treatment effects with observational data. Specifically, it focuses on the [_clinical equipoise_](https://www.sciencedirect.com/topics/medicine-and-dentistry/clinical-equipoise#:~:text=Clinical%20equipoise%20is%20defined%20as,Seminars%20in%20Vascular%20Surgery%2C%202022); that is, the patients in which the treatment decision is most uncertain. I find it more elegant than [matching](https://en.wikipedia.org/wiki/Propensity_score_matching) (which I've [used in the past](https://jamanetwork.com/journals/jama/fullarticle/2749478)), and figured a useful way to better understand the approach is to deconstruct, interpret, and translate a [SAS simulation](http://www2.stat.duke.edu/~fl35/OW/OW_survival_Demo.sas) in the context of survival analysis implemented by the [original authors](https://pubmed.ncbi.nlm.nih.gov/30189042/). All code is written in (and translated to) [`R`](https://www.r-project.org/). We'll start by loading some packages.
```{r}
library(tidyverse)
library(survival)
library(reactable)
```
# Table of Contents
* [Setting up the potential outcomes](#potentialoutcomes)
+ [Simulate some patients](#simulatepatients)
+ [Define the treatment propensity](#treatmentpropensity)
+ [Generate the treatment assignment](#treatmentassignment)
+ [Compute the (true) overlap weight](#trueoverlapweight)
+ [Set the treatment effect](#treatmenteffect)
+ [Assign the observed outcome](#observedoutcome)
* [Estimating the weights](#estimatingweights)
+ [Model the propensity scores](#modelpropensityscores)
+ [Calculate the (estimated) overlap weight](#estimatedoverlapweight)
* [Estimating the treatment effect](#estimatingtreatmenteffect)
+ [A look at the true hazard ratio](#truehazardratio)
+ [The final estimate](#estimatedhazardratio)
# Setting up the potential outcomes {#potentialoutcomes}
The theoretical underpinnings of [overlap weighting](https://pubmed.ncbi.nlm.nih.gov/30189042/) live in the context of the [potential outcomes](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5841618/#:~:text=The%20potential%20outcomes%20framework%20provides%20a%20way%20to%20quantify%20causal,exposure%20or%20intervention%20under%20consideration.) paradigm of [causal inference](https://www.sciencedirect.com/topics/social-sciences/causal-inference). Basically, we wonder what _would have_ happened had we been able to observe each patient under both treatments (known as the [counterfactual](https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/1471-2288-5-28)). If we knew the outcomes from the two worlds, the causal treatment effect would simply be the average difference across all patients. The problem of course is that in reality the outcome can only be observed for the treatment in which the patient was assigned (or chose), and so our goal is to work with the "incomplete" information that we do have to try to estimate what the outcome difference would have been had the counterfactual been observed as well.
## Simulate some patients {#simulatepatients}
The first thing we need to do is generate some (fake) patients to facilitate the simulation. Here we'll focus on age, sex, and income as patient-identifying characteristics.
```{r}
# Set the sample size
n <- 10000
# Set the seed
set.seed(123)
# Build the population
population <-
tibble(
age = rnorm(n, mean = 50, sd = 10),
zage = (age - 50) / 10,
male = rbinom(n, size = 1, prob = 0.6),
income = rnorm(n, mean = 50, sd = 10),
zincome = (income - 50) / 10
)
population
```
We've generated a sample of `r paste(n)` patients. We'll assume `age` is measured in years, `income` in thousands of dollars (\$), and `male` is 1 for _Male_ and 0 for _Female_. The columns `zage` and `zincome` are just [standardized](https://statisticsbyjim.com/glossary/standardization/#:~:text=In%20statistics%2C%20standardization%20is%20the,standard%20deviation%20for%20a%20variable.) versions of the originals to avoid scaling nuances (we'll refer to these as $age_z$ and $income_z$).
## Define the treatment propensity {#treatmentpropensity}
Here's where the important theory starts to creep in (already). We assume that there is a _true_ propensity score, say $p_i^A$, that is the _true_ probability that patient _i_ receives treatment _A_ given their specific characteristics (and let's assume there are possible treatments _A_ & _B_). Further, we assume (some transformation of) this probability is a linear combination of all the characteristics that confound the crude treatment effect on the outcome. In this simulation, we'll assume the [logit](https://en.wikipedia.org/wiki/Logistic_regression) model:
$$log(\frac{p_i^A}{1 - p_i^A}) = 0.41 \times age_z - 0.22 \times male - 0.69 \times income_z - 0.40$$
So let's add this _true_ propensity score to the `population` data set:
```{r}
population <-
population |>
# Add the true propensity score (unknown quantity)
mutate(
log_odds_pA = -0.4 + log(1.5)*zage + log(0.8)*male + log(.50)*zincome,
pA = 1 / (1 + exp(-log_odds_pA))
)
population
```
Obviously in practice we don't know what $p_i^A$ is, so it must be _estimated_ from the treatment assignments we observe in the data. Additionally, we aren't strictly required to assume the [logit](https://en.wikipedia.org/wiki/Logistic_regression) model, but when we use it to estimate the propensity score for [overlap weighting](https://pubmed.ncbi.nlm.nih.gov/30189042/), it has the advantageous property of perfect balance. That is, the weighted-mean differences for all covariates included in the logistic regression model will be zero across the treatment groups.
## Generate the treatment assignment {#treatmentassignment}
The treatment assignment is one of the _known_ quantities we would observe in a real life sample, and that is what would be used as the dependent variable for _estimating_ propensity scores. However, since we are running a simulation, we need to generate the treatment assignments from the governing distribution. We assume that the observed treatment for patient _i_ is the result of an (unfair) coin flip that has a probability equal to the [true propensity score](#treatmentpropensity). In statistical notation,
$$A_i \sim Bernoulli(p_i^A)$$
where $A_i$ is the indicator of treatment A. Let's add a _realized_ treatment assignment to the `population`:
```{r}
population <-
population |>
# Add the realized treatment assignment (known quantity)
mutate(
A = rbinom(n, size = 1, prob = pA),
)
population
```
In this case, `A=1` represents treatment _A_ and `A=0` represents treatment _B_. Again, this is the actual treatment we would observe the patient receiving in a sample.
## Compute the (true) overlap weight {#trueoverlapweight}
We need to define what the overlap weight is, and it's actually quite simple: just assign the patient the probability they _did not_ receive their observed treatment.
$$
\begin{equation}
OW_i=
\begin{cases}
1-p_i^A & \text{if } A_i=1\\
p_i^A & \text{if } A_i=0\\
\end{cases}
\end{equation}
$$
Notice the notation. Since it depends on the [true propensity score](#treatmentpropensity), the overlap weight is another _unknown_ quantity that is estimated from the data. It also depends on the [realized treatment assignment](#treatmentassignment), so the collection of weights across patients differ depending on the observed treatment distribution. We can add these weights to the `population` data set:
```{r}
population <-
population |>
# Add the true overlap weight for the realized treatment (unknown quantity)
mutate(
OW = A * (1-pA) + (1-A) * pA
)
population
```
In a real analysis, these weights are what we are ultimately after in order to estimate the average _causal_ treatment effect on the outcome of interest while balancing differences in patient characteristics across the groups (i.e., removing the confounding factors).
### Target population {#targetpopulation}
Now we do end up normalizing the weights so they sum to one within each treatment group. But the intuition about what's happening is that the focus of the treatment effect estimation is shifted to patients that are most likely in _either_ treatment group. This is known as the [_clinical equipoise_](https://www.sciencedirect.com/topics/medicine-and-dentistry/clinical-equipoise#:~:text=Clinical%20equipoise%20is%20defined%20as,Seminars%20in%20Vascular%20Surgery%2C%202022), and the resulting treatment effect is interpreted as the _average treatment effect in the overlap population_.
This makes a lot of sense. When we're comparing treatments, we should focus most heavily on patients that could be candidates for either, and less so on patients that were bound for one. Thus, we up-weight patients who have the most overlap in characteristics with the opposing treatment group. The beautiful thing here is that we do this without throwing away any information; smoothly and proportionately weighting each subject just the amount that we should.
## Set the treatment effect {#treatmenteffect}
We've [generated the treatments](#treatmentassignment) and established how they are [related to patient characteristics](#treatmentpropensity), but haven't talked about the outcome in which we're ultimately interested in estimating the treatment effect for. Since we're focusing on [time-to-event](https://www.publichealth.columbia.edu/research/population-health-methods/time-event-data-analysis) outcomes, we'll stay in that context, but the general idea is that we assume there _exists_ a realization of what a patient's outcome would have been under each treatment scenario. Then if we compare the difference of those outcomes across all patients, the average difference must be caused by the treatment.
### Defining the event times
In this simulation, we'll generate event times from a [Weibull](https://en.wikipedia.org/wiki/Weibull_distribution) distribution. Starting at treatment initiation, this might be the time until cancer recurrence, hospitalization, or something else; we're just looking to see how long it takes for some event to occur. The [PDF](https://en.wikipedia.org/wiki/Probability_density_function) for this distribution looks [like this](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/Weibull.html):
$$f(t) = \frac{\alpha}{\sigma}\left(\frac{t}{\sigma}\right)^{\alpha-1}e^{-\left(\frac{t}{\sigma}\right)^\alpha}$$
where $t$ is the event time, $\alpha$ is the _shape_ parameter, and $\sigma$ is the _scale_ parameter. In our example, we will say that:
$$T_i^A \sim Weibull(\alpha, \sigma_i^A)$$
$$T_i^B \sim Weibull(\alpha, \sigma_i^B)$$
where $T_i^A$ and $T_i^B$ are the event times under treatments _A_ and _B_, respectively. That is, the event times for each patient under each treatment follow a [Weibull](https://en.wikipedia.org/wiki/Weibull_distribution) distribution with a common _shape_ parameter (in this case, across both treatments), but a _scale_ parameter that depends on the patient's specific characteristics (in _log-linear_ form) and differs by treatment, which captures the treatment effect. Specifically,
$$\alpha = 1$$
$$\sigma_i^A = \lambda \times e^{-(\phi_i - 0.36)}$$
$$\sigma_i^B = \lambda \times e^{-\phi_i}$$
$$\phi_i = 1.10 \times age_z - 0.22 \times male - 0.36 \times income_z$$
$$\lambda = 4055.56$$
Basically, the $\phi_i$ term does the baseline adjustment on the outcome risk for the patient's specific characteristics, and the treatment effect is simply a fixed, additive deviation from that for all patients. The $\lambda$ term is the [baseline hazard function](https://www.linkedin.com/advice/0/how-do-you-interpret-hazard-ratio-baseline-function), providing the _scale_ parameter when all covariate values are zero (for treatment _B_), which in this case is constant over time.
#### Interpreting the treatment effect {#interpretingtreatmenteffect}
When setting $\alpha=1$, the mean of a [Weibull](https://en.wikipedia.org/wiki/Weibull_distribution) distribution is equal to its scale parameter. That is,
$$E[T_i^A] = \sigma_i^A$$
$$E[T_i^B] = \sigma_i^B$$
This gives us a very nice interpretation of the treatment effect. If we compare them _relatively_, we get:
$$
\begin{equation}
\begin{split}
\frac{E[T_i^A]}{E[T_i^B]} & = \frac{\sigma_i^A}{\sigma_i^B} \\
& = \frac{\lambda \times e^{-(\phi_i - 0.36)}}{\lambda \times e^{-\phi_i}} \\
& = \frac{e^{-(\phi_i - 0.36)}}{e^{-\phi_i}} \\
& = \frac{e^{0.36}e^{-\phi_i}}{e^{-\phi_i}} \\
& = e^{0.36} \\
& \approx 1.43
\end{split}
\end{equation}
$$
So we can say that the _time to event for patients on treatment A is 1.43 times, or 43%, longer on average than those on treatment B, given a fixed age, sex, and income_. That is, treatment A is "better" than treatment B.
### Sampling the event times {#samplingeventtimes}
Next, we need to actually sample the times for our `population` (_note that it was confirmed that SAS and R have the same parameterizations_):
```{r}
population <-
population |>
# Sample outcome event times under each treatment (counterfactual)
mutate(
# Baseline hazard (constant over time)
lambda = 1*365/0.09,
# Define TRUE Weibull linear predictor
phi = log(3) * zage + log(0.8) * male + log(.70) * zincome,
sigma_A = lambda * exp(-(phi + log(0.70))),
sigma_B = lambda * exp(-phi),
# Generate REALIZED survival times under each treatment scenario (same parameterizations in SAS)
t_A = rweibull(n, shape = 1, scale = sigma_A),
t_B = rweibull(n, shape = 1, scale = sigma_B),
)
population |> select(lambda, phi, sigma_A, sigma_B, t_A, t_B)
```
Now we can make some density plots comparing the event time distributions across treatments (we'll use _log_ scaling for visual appeal):
```{r}
population |>
# Send down the rows
pivot_longer(
cols = starts_with("t_"),
names_to = "Treatment",
values_to = "EventTime",
names_prefix = "t_"
) |>
# Make a plot
ggplot() +
geom_density(
aes(
x = log(EventTime),
fill = Treatment
),
alpha = .75
) +
scale_x_continuous(labels = function(x) round(exp(x))) +
theme(
panel.background = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = element_blank(),
axis.title.y = element_blank(),
legend.position = "top"
) +
xlab("Event Time")
```
Notice the shift to the right in the distribution for treatment _A_. This is the treatment effect. In the hypothetical world where all patients received treatment _A_, the event times happened _later_ than in the world where all patients received treatment _B_ (and all event times were observed), indicating, on average, a "benefit" to treatment _A_, which was expected from our [prior calculation](#interpretingtreatmenteffect).
## Assign the observed outcome {#observedoutcome}
The final step for simulation setup is to assign the event time as we might observe in a real data set. The event times generated in the [previous section](#samplingeventtimes) capture both treatment scenarios for all patients, but in reality we would only (potentially) observe the event time for the [treatment the patient received](#treatmentassignment). Additionally, we likely (or definitely) won't be following all patients long enough to observe all events occur, meaning some patients will be [censored](https://en.wikipedia.org/wiki/Survival_analysis#:~:text=Censoring%20%2F%20Censored%20observation%3A%20Censoring%20occurs,after%20the%20time%20of%20censoring.). Let's add the observed outcomes to the `population`:
```{r}
population <-
population |>
# Add the observed outcome
mutate(
# The ACTUAL event time outcome depends on the treatment ACTUALLY observed for the patient
actual_event_time = t_B * (1 - A) + t_A * A,
# Generate a REALIZED censoring time (completely random)
censor_time = 500 + 500 * runif(n),
# Calculate the OBSERVED time in the data set (known quantity)
time = pmin(actual_event_time, censor_time), # They either had the event, or were censored first
# Calculate the event status (TRUE if event observed, FALSE if censored)
status = as.numeric(actual_event_time < censor_time)
)
population |> select(actual_event_time, censor_time, time, status)
```
Our final simulated data set has the following _observed_ outcome summaries:
```{r}
population |>
# Compute summary metrics
summarize(
Patients = n(),
Time = mean(time),
.by = c(A, status)
) |>
# Add shares within treatment
mutate(
Percent = Patients / sum(Patients),
.by = A
) |>
# Clean/rearrange
transmute(
Treatment =
case_when(
A == 1 ~ "A",
TRUE ~ "B"
),
Status =
case_when(
status == 1 ~ "Event",
TRUE ~ "Censored"
),
Patients,
Percent,
Time
) |>
arrange(Treatment, desc(Status)) |>
# Make a table
reactable(
groupBy = "Treatment",
columns =
list(
Patients = colDef(aggregate = "sum", align = "center"),
Percent = colDef(name = "Percent of patients in treatment group (%)", aggregate = "sum", align = "center", format = colFormat(digits = 1, percent = TRUE)),
Time = colDef(name = "Avg. time to outcome", aggregate = zildge::rectbl_agg_wtd("Patients"), align = "center", format = colFormat(digits = 1))
),
striped = TRUE,
highlight = TRUE,
bordered = TRUE,
resizable = TRUE,
theme = reactablefmtr::sandstone()
) |>
reactablefmtr::add_source("Use arrows to expand table", font_size = 12, font_style = "italic")
```
<br>
We can also look at the survival curves.
```{r}
population |>
# Nest by treatment
group_by(A) |>
nest() |>
# Esimtate survival curves for each treatment
mutate(
Surv =
data |>
map(
function(.trt) {
# Fit the model
mod <- survfit(Surv(time, status) ~ 1, data = .trt)
# Extract the elements
tibble(
Time = mod$time,
Survival = mod$surv,
Lower = mod$lower,
Upper = mod$upper
)
}
)
) |>
# Unnest the curves
select(-data) |>
unnest(cols = Surv) |>
ungroup() |>
# Clean labels
mutate(
Treatment =
case_when(
A == 1 ~ "A",
TRUE ~ "B"
)
) |>
# Make a plot
ggplot(
aes(
x = Time,
y = Survival
)
) +
geom_line(aes(color = Treatment)) +
geom_ribbon(
aes(
ymin = Lower,
ymax = Upper,
fill = Treatment
),
alpha = .25
) +
scale_y_continuous(labels = scales::percent) +
theme(
panel.background = element_blank(),
panel.grid.major.y = element_line(color = "gray"),
legend.position = "top"
) +
xlab("Time since treatment") +
ylab("Survival Probability")
```
If we weren't doing a simulation, we wouldn't know how much of these differences are explained by the treatment. And actually, these crude survival curves suggest longer survival from treatment _B_, even though we know the [opposite is true](#treatmenteffect). The goal is to use this _known_ information (along with [patient characteristics](#simulatepatients)) to estimate the causal treatment effect that is baked into all of the _unknown_ quantities we have here that would be unavailable to us in a real-life analysis.
# Estimating the weights {#estimatingweights}
Now that our [simulation is set](#potentialoutcomes), we can start using the observed data (i.e., known quantities) to estimate the case-weights needed for treatment effect estimation. This is the process we'd go through in a real-life analysis, but here we have the benefit of knowing the truth, so we can see how close our estimations are to reality.
The intention of these weights is to _adjust_ the observed sample to create a _pseudo-population_ such that the patient characteristics that we believe are confounding the treatment effect are corrected for, or _balanced_, across the treatment groups. The term "pseudo" here is a little misleading. It's simply referring to the fact that each patient will not contribute the same weight in estimating the treatment effect. In fact, the patients with the most _overlap_ in characteristics across the treatment groups will contribute the most, with patients being proportionately down-weighted the less overlap they have (I'd argue this is the same thing that is done in [matching](https://en.wikipedia.org/wiki/Propensity_score_matching), it's just that the weights for patients are either exactly _1_ or _0_). This is done to tease out the portion of the outcome differences that is explained by the treatment, namely, the causal treatment effect.
## Model the propensity scores {#modelpropensityscores}
We've [established](#trueoverlapweight) that the overlap weights are quantities that need to be estimated from the data. In order to calculate those weights, we first need to estimate the propensity scores. We know what the [true model](#treatmentpropensity) is, but let's assume we are only working with observable data:
```{r}
population |> select(zage, male, zincome, A, time, status)
```
Our goal is to obtain patient-specific probabilities of receiving treatment _A_. We'll estimate this with a [logistic regression](https://en.wikipedia.org/wiki/Logistic_regression) model, but we'll allow for some flexibility in the shape of the relationships between the continuous variables (age and income) and the outcome with the use of [restricted cubic splines](https://www.nature.com/articles/s41409-019-0679-x) (see my [other post](https://www.zajichekstats.com/post/the-evasive-spline/) for another explanation):
```{r}
# Fit the propensity score model
ps_mod <-
glm(
formula = A ~ rms::rcs(zage, 3) + rms::rcs(zincome, 3) + factor(male),
data = population,
family = "binomial"
)
summary(ps_mod)
```
The model seems to indicate that the non-linear terms for age and income don't particularly matter (which [is expected](#treatmentpropensity)), but we'll leave it as-is. Next, let's attach the _estimated_ propensity scores to the `population`:
```{r}
population <-
population |>
# Add the estimated propensity score
mutate(
pA_hat = predict(ps_mod, type = "response") # P(A = 1 | X)
)
```
We can take a look at the estimated propensity score distribution across the treatment groups (we'll also overlay the _true_ distributions for comparison):
```{r}
population |>
# Send PS scores down the rows
pivot_longer(
cols = c(pA, pA_hat),
names_to = "Type",
values_to = "Score"
) |>
# Clean labels
mutate(
Treatment =
case_when(
A == 1 ~ "A",
TRUE ~ "B"
),
Type =
case_when(
Type == "pA" ~ "True",
TRUE ~ "Estimated"
)
) |>
# Make a plot
ggplot() +
geom_density(
aes(
x = Score,
fill = Treatment,
linetype = Type
),
alpha = .40
) +
scale_x_continuous(labels = scales::percent) +
theme(
panel.background = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = element_blank(),
axis.title.y = element_blank(),
legend.position = "top"
) +
xlab("P(A=1|X)")
```
The model does a great job at estimating the true propensity scores. In a real analysis we probably wouldn't be this close since we likely wouldn't have only and all true confounders accounted for.
### Visualizing propensity score effects
In the same vein, we can explore the modeled relationships between each patient characteristic and the propensity scores.
```{r}
plots <-
population |>
# Send PS scores down the rows
pivot_longer(
cols = c(pA, pA_hat),
names_to = "Type",
values_to = "Score"
) |>
# Clean labels
mutate(
Treatment =
case_when(
A == 1 ~ "A",
TRUE ~ "B"
),
Type =
case_when(
Type == "pA" ~ "True",
TRUE ~ "Estimated"
)
) |>
# Send covariates down the rows
pivot_longer(
cols = c(zage, zincome, male),
names_prefix = "^z"
) |>
# Make groups
mutate(
Group =
case_when(
name == "male" ~ "categorical",
TRUE ~ "continuous"
)
) |>
# Make a nested frame
group_by(Group) |>
nest() |>
# Make plot depending on type
mutate(
plot =
data |>
map(
function(.group) {
# Check condition
if(n_distinct(.group$value) == 2) {
.group |>
summarize(
Rate = mean(Score),
.by = c(Treatment, Type, name, value)
) |>
mutate(
Sex =
case_when(
value == 1 ~ "Male",
TRUE ~ "Female"
),
name = "Sex"
) |>
ggplot() +
geom_point(
aes(
x = Sex,
y = Rate,
color = Treatment,
shape = Type,
group = Type
)
) +
geom_line(
aes(
x = Sex,
y = Rate,
color = Treatment,
linetype = Type,
group = interaction(Type, Treatment)
)
) +
facet_wrap(~name) +
coord_flip() +
scale_y_continuous(labels = scales::percent) +
theme(
panel.background = element_blank(),
panel.grid.major.x = element_line(color = "gray"),
legend.position = "top",
axis.title.y = element_blank()
) +
ylab("P(A=1|X)")
} else {
.group |>
ggplot() +
geom_smooth(
aes(
x = value,
y = Score,
color = Treatment,
fill = Treatment,
linetype = Type
),
alpha = .25
) +
facet_wrap(~name) +
scale_y_continuous(labels = scales::percent) +
theme(
panel.background = element_blank(),
panel.grid.major.y = element_line(color = "gray"),
legend.position = "top"
) +
xlab("Z-Score") +
ylab("P(A=1|X)")
}
}
)
)
gridExtra::grid.arrange(plots$plot[[1]], plots$plot[[2]])
```
As known from the [true model](#treatmentpropensity), patients who received treatment _A_ tend to be older, female patients with lower income. We can also see slight miscalibration in the estimates for patients who are older in that the model tends to _underestimate_ the true propensity score in these patients. We see similar miscalibration for sex.
## Calculate the (estimated) overlap weight {#estimatedoverlapweight}
We've already [defined](#trueoverlapweight) how to calculate the overlap weights. The only difference here is that we'll do it from the [_estimated_](#modelpropensityscores) propensity scores instead of the [true](#treatmentpropensity) ones. First, we'll apply the formula to add the estimated weights to the `population`:
```{r}
population <-
population |>
# Add the estimated overlap weight for the realized treatment (known quantity)
mutate(
OW_hat = A * (1-pA_hat) + (1-A) * pA_hat
)
```
As we've [mentioned earlier](#targetpopulation), we will then _normalize_ the weights _within_ each treatment group so they have the same cumulative contribution for estimating the treatment effect in the outcome model.
```{r}
population <-
population |>
# Add the normalized weights (adding true and estimated)
mutate(
OW_norm = OW / sum(OW),
OW_hat_norm = OW_hat / sum(OW_hat),
.by = A
)
population |> select(A, pA, pA_hat, OW, OW_norm, OW_hat, OW_hat_norm)
```
To get a sense of the impact of these weights, let's first look at their distributions (again, adding the [true](#trueoverlapweight) weights for comparison):
```{r}
population |>
# Send overlap weight down the rows
pivot_longer(
cols = c(OW_norm, OW_hat_norm),
names_to = "Type",
values_to = "Weight"
) |>
# Clean labels
mutate(
Treatment =
case_when(
A == 1 ~ "A",
TRUE ~ "B"
),
Type =
case_when(
Type == "OW_norm" ~ "True",
TRUE ~ "Estimated"
)
) |>
# Make a plot
ggplot() +
geom_density(
aes(
x = Weight,
fill = Treatment,
linetype = Type
),
alpha = .40
) +
theme(
panel.background = element_blank(),
axis.ticks.y = element_blank(),
axis.text.y = element_blank(),
axis.title.y = element_blank(),
legend.position = "top"
) +
xlab("Normalized Overlap Weight")
```
In aggregate, the groups will have equal weight. The distribution shift has to do with the sampled treatment distribution. Only `r paste0(round(mean(population$A)*100, 1), "%")` of patients have treatment _A_ so at an individual level each patient will contribute more on average.
### Cumulative patient contribution
We can also look at the accumulation of _patients_ (i.e., when each patient contributes equally) as a function of the cumulative _overlap weight_ in each group. This allows us to understand how much (or little) the treatment effect estimation will be dominated by smaller concentrations of patients.
```{r}
population |>
# Send overlap weight down the rows
pivot_longer(
cols = c(OW_norm, OW_hat_norm),
names_to = "Type",
values_to = "Weight"
) |>
# Clean labels
mutate(
Treatment =
case_when(
A == 1 ~ "A",
TRUE ~ "B"
),
Type =
case_when(
Type == "OW_norm" ~ "True",
TRUE ~ "Estimated"
),
NullWeight = 1
) |>
# Rearrange
arrange(Treatment, Type, Weight) |>
# Compute the cumulative weight
mutate(
Weight = cumsum(Weight) / sum(Weight),
NullWeight = cumsum(NullWeight) / sum(NullWeight),
.by = c(Treatment, Type)
) |>
# Make a plot
ggplot() +
geom_line(
aes(
x = Weight,
y = NullWeight,
color = Treatment,
linetype = Type
),
linewidth = 1
) +
scale_x_continuous(labels = scales::percent) +
scale_y_continuous(labels = scales::percent) +
theme(
panel.background = element_blank(),
panel.grid.major.x = element_line(color = "gray"),
panel.grid.major.y = element_line(color = "gray"),
legend.position = "top"
) +
xlab("Cumulative percent of overlap weight (%)") +
ylab("Cumulative percent of patients (%)")
```
We can see that for treatment _B_, about 40% of the outcome model weight will be accounted for by only 25% of patients. The weights for treatment _A_ are slightly more evenly spread across the patients.
### Weighted-mean differences
It was [previously mentioned](#treatmentpropensity) that the overlap weight methodology leads to perfect balance among the covariates used in the propensity score model. We can verify by looking at the group means for [each model factor](#modelpropensityscores) before and after weighting.
```{r}
population |>
# Add uniform weights
add_column(NullWeight = 1) |>
# Send weights down the rows
pivot_longer(
cols = c(NullWeight, OW_hat_norm),
names_to = "Type",
values_to = "Weight"
) |>
# Send factors down the rows
pivot_longer(
cols = c(age, income, male),
names_to = "Factor",
values_to = "Value"
) |>
# For each
summarize(
Patients = n(),
Mean = sum(Weight * Value) / sum(Weight),
.by = c(Factor, Type, A)
) |>
# Clean up
transmute(
Factor =
case_when(
Factor == "age" ~ "Age (years)",
Factor == "income" ~ "Income ($)",
Factor == "male" ~ "Male (%)"
),
Treatment =
case_when(
A == 1 ~ "A",
TRUE ~ "B"
),
Type,
Patients,
Mean =
case_when(
Factor == "Male (%)" ~ 100 * Mean,
TRUE ~ Mean
)
) |>
# Send over the columns
pivot_wider(
names_from = Type,
values_from = c(Patients, Mean)
) |>
# Rearrange
arrange(Factor, Treatment) |>
select(Factor, Treatment, ends_with("NullWeight"), Mean_OW_hat_norm) |>
# Make a table
reactable(
groupBy = "Factor",
columns =
list(
Patients_NullWeight = colDef(name = "Patients", aggregate = "sum", align = "center"),
Mean_NullWeight = colDef(name = "Before Weighting", aggregate = zildge::rectbl_agg_wtd("Patients_NullWeight"), align = "center", format = colFormat(digits = 1)),
Mean_OW_hat_norm = colDef(name = "After Weighting", aggregate = zildge::rectbl_agg_wtd("Patients_NullWeight"), align = "center", format = colFormat(digits = 1))
),
columnGroups =
list(
colGroup(
name = "Mean Value",
columns = c("Mean_NullWeight", "Mean_OW_hat_norm")
)
),
striped = TRUE,
highlight = TRUE,
bordered = TRUE,
resizable = TRUE,
theme = reactablefmtr::sandstone()
) |>
reactablefmtr::add_source("Use arrows to expand table", font_size = 12, font_style = "italic")
```
We can see that, overall, the weighted sample is _slightly_ older and female with lower income.
### Confounder weight shifts
Similar to looking at weight attributions [as a whole](#estimatedoverlapweight), we can explore weight shifts within subgroups of patient characteristics. This helps build intuition about which patients the subsequent treatment effect estimates will focus on most (and least).
```{r}
population |>
# Add uniform weights
add_column(NullWeight = 1) |>
# Convert to quntiles
mutate(
across(
c(age, income),
\(x) Hmisc::cut2(x, g = 10)
),
Age = age,
Income = income,
Sex =
case_when(
male == 1 ~ "Male",
TRUE ~ "Female"
),
Treatment =
case_when(
A == 1 ~ "A",
TRUE ~ "B"
)
) |>
# Send weights down the rows
pivot_longer(
cols = c(NullWeight, OW_hat_norm),
names_to = "Type",
values_to = "Weight"
) |>
# Send factors down the rows
pivot_longer(
cols = c(Age, Income, Sex),
names_to = "Factor",
values_to = "Level"
) |>
# Compute total weights
summarize(
Weight = sum(Weight),
.by = c(Factor, Level, Type, Treatment)
) |>
# Find percent of weight
mutate(
Weight = Weight / sum(Weight),
.by = c(Factor, Type, Treatment)
) |>
# Send over columns
pivot_wider(
names_from = Type,
values_from = Weight
) |>
# Clean up
mutate(
Level = factor(Level) |> fct_rev(),
Factor =
case_when(
Factor == "Age" ~ "Age (years)",
Factor == "Income" ~ "Income ($)",
TRUE ~ Factor
)
) |>
# Make a plot
ggplot() +
geom_col(
aes(
x = Level,
y = NullWeight,
fill = Treatment
),
color = "black",
linewidth = .75,
alpha = .40,
position = "dodge"
) +
geom_point(
aes(
x = Level,
y = OW_hat_norm,
color = Treatment,
shape = Treatment
),
position = position_dodge(width = 1),
size = 3
) +
geom_line(
aes(
x = Level,
y = OW_hat_norm,
color = Treatment,
group = Treatment
),
position = position_dodge(width = 1),
linewidth = .5
) +
facet_wrap(~Factor, scales = "free") +
coord_flip() +
scale_y_continuous(labels = scales::percent) +
theme(
panel.background = element_blank(),
panel.grid.major.x = element_line(color = "gray"),
legend.position = "top"
) +
xlab("Level") +
ylab("Share of weight within group (%)") +
labs(
fill = "Before Weighting",
shape = "After Weighting",
color = "After Weighting"
)
```
We can see the balancing take place. After overlap weighting, the contribution to treatment effect estimation is reduced in older patients with lower income for treatment _A_, and vice-versa for treatment _B_, with some levels in the middle having increased weight for _both_ treatments. This is where the most overlap is between the groups.
It is also useful to look at these plots for factors that were _not_ included in the propensity score model. We may find drastic weight shifts which might indicate significant co-variation among factors already accounted for.
# Estimating the treatment effect {#estimatingtreatmenteffect}
We're now ready to estimate the causal treatment effect. Although we [generated our data](#treatmenteffect) from a [Weibull](https://en.wikipedia.org/wiki/Weibull_distribution) distribution, we will use a [Cox proportional-hazards model](https://en.wikipedia.org/wiki/Proportional_hazards_model) for estimation, which is a _semi-parametric_ method, and tends to be the default choice for modeling survival data, especially in healthcare.
## A look at the true hazard ratio {#truehazardratio}
The treatment effect will be quantified by a [_hazard ratio_](https://en.wikipedia.org/wiki/Hazard_ratio), which is an estimate of the ratio of instantaneous event rates between the treatment groups (this is what is done in [the original simulation](https://www2.stat.duke.edu/~fl35/OW/OW_survival_Demo.sas)). This measure is different than the effect we [interpreted](#interpretingtreatmenteffect) from the true data-generating process, so, although we won't expect them to provide the same numerical result, we should still get a similar conclusion, in that treatment _A_ is superior to treatment _B_.
To start, we can compute the actual hazard ratio had we been able to observe each potential outcome (i.e., the counterfactual). To do this, we need to transform our data such that we have two rows of data per patient--one for each treatment. Then, we fit the Cox model, using the _true_ overlap weights as case weights.
```{r}
# Fit the model
true_hr_mod <-
coxph(
formula = Surv(time, status) ~ factor(A),
data =
population |>
# Keep columns needed
select(
OW_norm, # True (normalized) overlap weight (could use un-normalized version here)
t_A, # True event time under treatment A
t_B, # True event time under treatment B
censor_time # When the patient was censored
) |>
# Send true event times down the rows
pivot_longer(
cols = c(t_A, t_B),
names_to = "A",
values_to = "time"
) |>
# Check if censored first
mutate(
A = case_when(A == "t_A" ~ 1, TRUE ~ 0), # Replace with our notation
time = pmin(time, censor_time),
status = as.numeric(time < censor_time)
),
weights = OW_norm
)
# Extract the result
true_hr <- exp(true_hr_mod$coefficients)[[1]]
true_hr
```
If we could compare the worlds in which we observe all patients under each treatment, the hazard ratio is `r round(true_hr, 2)`, suggesting that the instantaneous event rate is `r round(100*(1-true_hr))`% lower with treatment _A_ than with treatment _B_. This aligns with what we've established as the [true effect](#interpretingtreatmenteffect). Our hope is that the estimate from the observable data (in the [next section](#estimatedhazardratio)) will contain this value within some margin of sampling error.
## Our estimate of the treatment effect {#estimatedhazardratio}
Finally, we can obtain our estimate of the treatment effect using only the observable quantities we have in our data set, which is what we would have in a real-life analysis. Similar to the [previous section](##truehazardratio), we are just fitting a Cox model with the treatment as a covariate. Except this time we only observe one outcome per patient as a result of the treatment they [actually received](#treatmentassignment), and use the [_estimated_ overlap weight](#estimatedoverlapweight) to balance the confounding factors (age, sex, and income). We will also use [robust](https://rdrr.io/cran/survival/man/coxph.html) standard error estimates.
```{r}
# Fit the model
estimated_hr_mod <-
coxph(
formula = Surv(time, status) ~ factor(A),
data = population,
weights = OW_hat_norm,
robust = TRUE
)
# Extract the result
estimated_hr <- exp(estimated_hr_mod$coefficients)[[1]]
estimated_hr
# Extract the confidence interval
estimated_hr_ci <- exp(confint(estimated_hr_mod))
estimated_hr_ci
```
Our estimate for the hazard ratio is `r round(estimated_hr, 2)`, with a 95% confidence interval ranging from `r round(estimated_hr_ci[[1]], 2)` to `r round(estimated_hr_ci[[2]], 2)`, suggesting that the instantaneous event rate for treatment _A_ is between `r round(100*(1-estimated_hr_ci[[2]]))`% and `r round(100*(1-estimated_hr_ci[[1]]))`% lower compared to treatment _B_ (note that this captures the [true hazard ratio](#truehazardratio)). Under the assumption that we've correctly specified all confounding factors (which [we did](#treatmentpropensity)), we can interpret this as the _causal_ treatment effect, and, assuming the magnitude of improvement is of practical importance (when considering things like cost, resources, clinical outcomes, etc.), we can reliably conclude that there is a benefit to treatment _A_ over treatment _B_, on average.
# Resources {#resources}
Some additional resources I've come across while learning the methodology:
* Original paper for overlap weighting ([link](https://pubmed.ncbi.nlm.nih.gov/30189042/))
* Paper extending overlap weighting to survival analysis ([link](https://arxiv.org/abs/2108.04394))
* Example simulation of overlap weighting in the survival setting in SAS ([link](http://www2.stat.duke.edu/~fl35/OW/OW_survival_Demo.sas))
* R package for implementing overlap weight methods ([link](https://github.com/thuizhou/PSweight))
* R functions for overlap weighting in the survival setting ([link](https://github.com/chaochengstat/OW_Survival))
* Author's web page dedicated to overlap weighting ([link](https://www2.stat.duke.edu/~fl35/OW.html))