The main purpose of the `adjustedCurves`

R-Package is to
calculate and plot confounder-adjusted survival curves and
cause-specific confounder-adjusted cumulative incidence functions (CIF)
using a variety of methods. Group- or treatment-specific survival curves
and CIFs are often used to graphically display the treatment (or group)
effect on the survival probability. When the data at hand comes from a
randomized controlled trial with balanced subgroups, simple stratified
estimators of the survival curves or CIFs ignoring other covariates are
unbiased. When randomization fails or was not done at all however,
confounding can lead to biased depictions (see Rubin 1974; Pearl
2009).

Luckily, a lot of different methods to adjust survival curves and
CIFs for confounding have been proposed. With the
`adjustedCurves`

package, these methods can be used with
little effort. A review and simulation study comparing these methods can
be found in Denz et al. (2023).

The R-Package is currently not available on CRAN, but can be
installed easily using the `devtools`

package:

or the `remotes`

package:

Further packages might have to be installed, depending on the specified method.

Let’s start with the standard survival setting. Using the
`sim_confounded_surv`

function, we simulate some survival
data (Chatton et al. 2020):

```
library(survival)
library(ggplot2)
library(riskRegression)
library(pammtools)
library(adjustedCurves)
# set random number generator seed to make this reproducible
set.seed(44)
# simulate standard survival data with 300 rows
data_1 <- sim_confounded_surv(n=300, max_t=1.1, group_beta=-0.6)
# code the grouping variable as a factor
data_1$group <- as.factor(data_1$group)
# take a look at the first few rows
head(data_1)
```

```
## x1 x2 x3 x4 x5 x6 group time event
## 1 1 0 0 -0.52520986 0.22776621 -0.05319665 1 0.37181297 1
## 2 0 1 1 0.02441988 -2.16818274 0.58097459 0 0.65649863 1
## 3 1 1 1 0.05159071 0.07329185 1.18737651 1 0.08156474 0
## 4 1 0 0 0.01956631 -1.27205270 1.16594437 0 0.34161603 1
## 5 0 1 0 0.33793112 1.32447922 -1.28991227 0 0.52472847 1
## 6 0 1 0 -2.96505672 2.19434272 0.00528396 1 0.87274894 1
```

Using the default arguments, this function outputs a data.frame with
6 independently drawn covariates (`x1`

- `x6`

), a
binary group variable (`group`

), the observed event time
(`time`

) and the event indicator (`event`

). In
this setting, there is only one type of event (`1`

). When the
observations are right-censored this indicator is set to `0`

.
This is the standard data format used in standard time-to-event
analysis.

To calculate confounder-adjusted survival curves fo each group of
this dataset using *Direct Standardization* (also known as
*G-Computation* or *Corrected Group Prognosis method*, see
Makuch (1982) or Chang et al. (1982)), we first have to fit a
`coxph`

model:

```
# it is important to use X=TRUE in the coxph function call
outcome_model <- survival::coxph(Surv(time, event) ~ x1 + x2 + x3 + x4 + x5 +
x6 + group, data=data_1, x=TRUE)
```

This model can then be used in a call to the
`adjustedsurv`

function, as shown below:

```
adjsurv <- adjustedsurv(data=data_1,
variable="group",
ev_time="time",
event="event",
method="direct",
outcome_model=outcome_model,
conf_int=TRUE)
```

The argument `data`

simply refers to our data.frame, the
argument `variable`

specifies our grouping variable of
interest and the `ev_time`

and `event`

variable
specify the time-to-event variables in our data.frame. Setting the
`method`

to `"direct"`

will result in
G-Computation estimates, based on the previously fit cox-regression
model supplied using the `outcome_model`

argument.

Doing this returns a list with some needed output objects. Most
important however is the `adjsurv`

data.frame in there,
containing the adjusted survival curves and corresponding confidence
intervals (because we used `conf_int=TRUE`

in the original
function call). We can take a look at this object using the following
code:

```
## time surv group se ci_lower ci_upper
## 1 0.00000000 1.0000000 0 0.000000000 1.0000000 1.0000000
## 2 0.01457659 0.9959356 0 0.004039858 0.9880177 1.0000000
## 3 0.04062032 0.9917639 0 0.005767859 0.9804591 1.0000000
## 4 0.04232678 0.9876128 0 0.007034862 0.9738247 1.0000000
## 5 0.06129069 0.9834495 0 0.008123567 0.9675276 0.9993714
## 6 0.06271999 0.9792709 0 0.009066121 0.9615017 0.9970402
```

More importantly however, we can plot the survival curves directly
using the `plot`

method:

This plot function comes with many options which are listed in the
documentation. To plot the point-wise confidence intervals, we can set
the argument `conf_int`

to TRUE:

Instead of using colors to differentiate the curves we can also use
different linetypes, by setting the `linetype`

argument to
TRUE and the `color`

argument to `FALSE`

:

We can also add small indicator lines for censored observations using
`censoring_ind="lines"`

:

Indicator lines for the median survival time can be added using
`median_surv_lines=TRUE`

:

Many more custom settings are available. For more details and
examples see `?plot.adjustedsurv`

.

The `adjustedsurv`

function essentially works the same
with every available method. Since the methods are however vastly
different in nature, some additional arguments have to be supplied by
the user. For example, using the Inverse Probability of Treatment
Weighting method (IPTW), we need to model the *treatment assignment
mechanism* instead of the outcome mechanism (see Xie and Liu
(2005)). This can be done using a logistic regression model as
follows:

```
treatment_model <- glm(group ~ x1 + x2 + x3 + x4 + x5 + x6,
data=data_1, family="binomial"(link="logit"))
adjsurv <- adjustedsurv(data=data_1,
variable="group",
ev_time="time",
event="event",
method="iptw_km",
treatment_model=treatment_model,
conf_int=TRUE)
```

The resulting curves can be plotted as before:

Since both methods were used correctly here, there are only slight differences in the results. Big differences between the two methods usually indicate that either the cox-regression model or the logistic regression model are incorrectly specified.

Doubly-Robust methods can be helpful in such cases (see Robins and Rotnitzky (1992) or Ozenne et al. (2020)). The standard Augmented Inverse Probability of Treatment Weighting estimator utilized both kinds of models at the same time. If either of the models is correctly specified, the resulting estimates will be unbiased. It can be used in the same way as the other methods, only that this time both models have to be supplied:

In many situations there are multiple mutually exclusive types of events instead of just one event. This is formally known as a competing risks situation. In these situations, survival curves can not be estimated anymore. However, the cumulative incidence function can be used instead. Without randomization, these CIFs face the same problems due to confounding as the survival curves do. Many of the methods to adjust survival curves for confounders can be used to adjust CIFs in very similar fashion.

While the computational details and the underlying theory is slightly
different (see Ozenne et al. (2020)), the syntax for the R-Package stays
pretty much exactly the same. The only major difference is that instead
of using the `adjustedsurv`

function, the
`adjustedcif`

function should be utilized. Additionally, the
user now also has to specify which event-type is of interest (argument
`cause`

).

First we need new example data, mirroring the competing risks
situation. We are going to use the `sim_confounded_crisk`

function, which does the same thing as the
`sim_confounded_surv`

function, but with competing risks
data:

```
# simulate the data
data_2 <- sim_confounded_crisk(n=300)
data_2$group <- as.factor(data_2$group)
head(data_2)
```

```
## x1 x2 x3 x4 x5 x6 group time event
## 1 1 1 0 0.96216812 -0.66375234 0.65527293 1 0.02085406 0
## 2 1 1 0 -0.58477811 1.73696566 -0.18714061 1 0.18924205 0
## 3 1 0 0 -0.33549773 0.52008044 0.08290031 0 0.26288996 2
## 4 0 1 1 -0.04008865 0.19841348 0.38858944 1 0.08210559 2
## 5 0 0 1 0.78936972 -0.27545174 0.49322272 1 1.67993934 2
## 6 0 0 0 0.26001387 -0.03590098 0.86231143 1 1.70000000 0
```

Let’s again start with the *Direct Standardization* method.
Instead of using a simple `coxph`

method, we need to use a
model for the time-to-event process which takes the multiple event-types
into account. One such method is the *Cause-Specific
Cox-Regression* model. A simple implementation of this model is
contained in the `riskRegression`

R-Package.

```
outcome_model <- riskRegression::CSC(Hist(time, event) ~ x1 + x2 + x3 + x4 +
x5 + x6 + group, data=data_2)
```

This model can then be used in a call to the `adjustedcif`

function, as shown below:

```
adjcif <- adjustedcif(data=data_2,
variable="group",
ev_time="time",
event="event",
method="direct",
outcome_model=outcome_model,
cause=1,
conf_int=TRUE)
plot(adjcif, conf_int=TRUE)
```

This shows the confounder-adjusted CIFs for `cause = 1`

.
By setting `cause`

to `2`

we get the
confounder-adjusted CIFs for the other cause:

The IPTW estimator can be used exactly the same way as we did with
the `adjustedsurv`

function:

```
treatment_model <- glm(group ~ x1 + x2 + x3 + x4 + x5 + x6,
data=data_2, family="binomial"(link="logit"))
adjcif <- adjustedcif(data=data_2,
variable="group",
ev_time="time",
event="event",
method="iptw",
treatment_model=treatment_model,
cause=1,
conf_int=TRUE)
plot(adjcif, conf_int=TRUE)
```

The other estimators can be used using essentially the same syntax.
We used `bootstrap=TRUE`

in the last example, because we will
use functionality relying on it when comparing groups later.

In many applications there are more than two treatments. Some of the methods included in this R-Package allow calculations for an arbitrary number of treatments. In these cases the code does not change at all and can be used exactly in the same way as before. The only difference occurs when using IPTW methods. Here the user has to use a multinomial logistic regression model instead of a regular logistic regression to model the outcome. This is illustrated below.

First we again create a simulated data set (for single event survival data). The function is only able to create binary treatment variables, but we can simply resample all occurrences of 1 into 1 and 2, creating 3 treatments, where 1 and 2 have an identical treatment effect and selection process:

```
# add another group
# NOTE: this is done only to showcase the method and does not
# reflect what should be done in real situations
data_1$group <- factor(data_1$group, levels=c("0", "1", "2"))
data_1$group[data_1$group=="1"] <- sample(c("1", "2"), replace=TRUE,
size=nrow(data_1[data_1$group=="1",]))
```

The Direct Adjusted survival curves can be calculated exactly as before:

```
outcome_model <- survival::coxph(Surv(time, event) ~ x1 + x2 + x3 + x4 +
x5 + x6 + group, data=data_1, x=TRUE)
adjsurv <- adjustedsurv(data=data_1,
variable="group",
ev_time="time",
event="event",
method="direct",
outcome_model=outcome_model,
conf_int=TRUE)
plot(adjsurv)
```

For the IPTW based estimates we first fit a multinomial logistic
regression model using the `multinom`

function from the
`nnet`

R-Package and use it with the same code as before:

```
## # weights: 24 (14 variable)
## initial value 329.583687
## iter 10 value 286.543657
## iter 20 value 283.765460
## final value 283.765313
## converged
```

```
adjsurv <- adjustedsurv(data=data_1,
variable="group",
ev_time="time",
event="event",
method="iptw_km",
treatment_model=treatment_model,
conf_int=TRUE,
bootstrap=TRUE,
n_boot=50)
plot(adjsurv, conf_int=TRUE)
```

We use `bootstrap=TRUE`

here because we will need it in
the next section. In practice, however, 50 bootstrap samples would
definitely not be enough to guarantee convergence. We choose this low
number here only due to speed considerations enforced by CRAN.

The estimated adjusted curves can subsequently be used to compare the treatment groups using additional plots and statistics.

One simple way to compare the groups is to pick a point in time and
simply calculate the differences between the survival or failure
probability at this point. The `adjusted_curve_diff`

function
automates this process. In the following example we calculate the
difference between the adjusted survival probability of group 0 and 1 at
time = 0.4:

```
## time diff se ci_lower ci_upper p_value
## 1 0.4 -0.008257592 0.09049062 -0.1856159 0.1691008 0.9272911
```

The confidence interval does include 0, suggesting that this difference is not statistically significant. This can also be seen from the associated p-value.

Instead of focusing on a single point in time, we can also plot the
entire difference curve using the `plot_curve_diff`

function:

These kind of curves are a great way to visualize the group effect
over time. When more than two groups are present, this function can be
called multiple times with changing `group_1`

and
`group_2`

values.

Similarly, we can calculate the adjusted survival time quantiles
using the `adjusted_surv_quantile`

function. For example, the
adjusted median survival time is given by:

```
## p group q_surv ci_lower ci_upper
## 1 0.5 0 0.5067310 0.4487885 0.5657139
## 2 0.5 1 0.8014008 0.4068963 NA
## 3 0.5 2 0.4835157 0.4311141 0.6283029
```

In our case, they are all very similar.

An alternative measure is the restricted mean survival time, which
denotes the integral of the survival curve from time 0 up to a specified
point in time (see Royston et al. 2013). A confounder adjusted version
of this statistic can be calculated by simply integrating the confounder
adjusted survival curves instead of a normal Kaplan-Meier curve (Conner
et al. 2019). The `adjustedCurves`

package allows the user to
do this by simply calling the `adjusted_rmst`

function on a
previously created `adjustedsurv`

object:

```
## to group rmst
## 1 0.4 0 0.3485056
## 2 0.4 1 0.3415740
## 3 0.4 2 0.3547187
```

The function returns adjusted RMST for every level of the
`"group"`

variable. When bootstrapping was performed when
calling the `adjustedsurv`

function, standard errors and
confidence intervals can be also calculated using this function as well
by setting the `conf_int`

argument to `TRUE`

:

```
## to group rmst se ci_lower ci_upper n_boot
## 1 0.4 0 0.3485056 0.009156084 0.3305600 0.3664512 50
## 2 0.4 1 0.3415740 0.019812278 0.3027427 0.3804054 50
## 3 0.4 2 0.3547187 0.009117535 0.3368487 0.3725888 50
```

Similar to the `adjusted_curve_diff`

function, we can
perform a test of the difference between two RMST values:

```
## to diff se ci_lower ci_upper p_value
## 1 0.4 0.006931522 0.02182568 -0.03584602 0.04970907 0.7507993
```

There seems to be no statistically significant difference between the two RMSTs. However, due to the very low number of bootstrap replications used here these results should not be trusted. In practice one should use a much higher number, such as 500 or 1000 replications.

Instead of focusing on a single point in time, it is also possible to
plot the RMST as it evolves over time using the
`plot_rmst_curve`

function:

Bootstrap confidence intervals could also be added to this plot using
`conf_int=TRUE`

.

A similar statistic is the adjusted *Restricted Mean Time
Lost* (RMTL), which is defined as the area under the cause-specific
confounder adjusted CIF. It can be calculated in the same way as the
RMST, but using the `adjusted_rmtl`

function:

```
## to group rmtl
## 1 0.4 0 0.05621035
## 2 0.4 1 0.04968113
```

In contrast to the `adjusted_rmst`

function, this works
for both `adjustedsurv`

and `adjustedcif`

objects
because the survival curve can easily be transformed to a CIF. It also
allows difference tests, just like the `adjusted_rmtl`

function.

We can also plot the RMTL over time, this time using the
`plot_rmtl_curve`

function:

Robin Denz, Renate Klaaßen-Mielke, and Nina Timmesfeld (2023). “A Comparison of Different Methods to Adjust Survival Curves for Confounders”. In: Statistics in Medicine 42.10, pp. 1461-1479

Donald B. Rubin (1974). “Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies”. In: Journal of Educational Psychology 66.5, pp. 688-701

Judea Pearl (2009). Causality: Models, Reasoning and Inference. 2nd ed. Cambridge: Cambridge University Press

Arthur Chatton, Florent Le Borgne, Clemence Leyrat, and Yohann Foucher (2020). G-Computation and Inverse Probability Weighting for Time-To-Event Outcomes: A Comparative Study. arXiv:2006.16859v1

Robert W. Makuch (1982). “Adjusted Survival Curve Estimation Using Covariates”. In: Journal of Chronic Diseases 35.6, pp. 437-443

I-Ming Chang, Rebecca Gelman, and Marcello Pagano (1982). “Corrected Group Prognostic Curves and Summary Statistics”. In: Journal of Chronic Diseases 35, pp. 669-674

Jun Xie and Chaofeng Liu (2005). “Adjusted Kaplan-Meier Estimator and Log-Rank Test with Inverse Probability of Treatment Weighting for Survival Data”. In: Statistics in Medicine 24, pp. 3089-3110

James M. Robins and Andrea Rotnitzky (1992). “Recovery of Information and Adjustment for Dependent Censoring Using Surrogate Markers”. In: AIDS Epidemiology: Methodological Issues. Ed. by Nicholas P. Jewell, Klaus Dietz, and Vernon T. Farewell. New York: Springer Science + Business Media, pp. 297-331

Brice Maxime Hugues Ozenne, Thomas Harder Scheike, and Laila Staerk (2020). “On the Estimation of Average Treatment Effects with Right-Censored Time to Event Outcome and Competing Risks”. In: Biometrical Journal 62, pp. 751-763

Weixin Cai and Mark J. van der Laan (2020). “One-Step Targeted Maximum Likelihood Estimation for Time-To-Event Outcomes”. In: Biometrics 76, pp. 722-733

Patrick Royston and Mahesh K. B. Parmar (2013). “Restricted Mean Survival Time: An Alternative to the Hazard Ratio for the Design and Analysis of Randomized Trials with a Time-To-Event Outcome”. In: BMC Medical Research Methodology 13.152

Sarah C. Conner, Lisa M. Sullivan, Emelia J. Benjamin, Michael P. LaValley, Sandro Galea, and Ludovic Trinquart (2019). “Adjusted Restricted Mean Survival Times in Observational Studies”. In: Statistics in Medicine 38, pp. 3832-3860