A primary purpose of the **predRupdate** package is to
externally validate an existing (previously published) clinical
prediction model (CPM) on a new dataset. Sometimes the existing CPMs
will include spline terms as predictor variables for non-linear
associations. The aim of this vignette is to overview how to use
**predRupdate** to validate such a model, and is intended
to supplement other vignettes in the package
(`vignette("predRupdate")`

).

The data, called *SYNPM*, used throughout this vignette are
available within the **predRupdate** package. See “?SYNPM”
for details of these data. In short, the data and models included in
*SYNPM* are synthetic, but for the purposes of this vignette, we
imagine that one is interested in predicting someone’s risk of mortality
after surgery. Data are available on 20000 people, which records each
individuals age, gender, smoking status, diabetes status, and Creatinine
value at the time of surgery. The data includes a binary outcome, Y,
indicating if the patient died within 1 month.

For this vignette, we imagine a situation where a CPM has previously been developed (in another dataset) to predict the risk of mortality within 1 month of surgery, and we wish to validate this model in our dataset to test the predictive performance (e.g., an external validation study). The existing model was a logistic regression model, with the following predictor variables and coefficients (log-odds ratios) reported:

Coefficient | |
---|---|

(Intercept) | -3.99500 |

Age spline1 | 0.72918 |

Age spline2 | 0.06249 |

Age spline3 | 1.67003 |

Age spline4 | 0.75348 |

SexM | 0.47859 |

The existing model included Age as a B-spline (of degree 3) with 4 degrees of freedom. The coefficient for each basis function is given in the table above. The existing model reported that the internal knot location was 50.09 years, with boundary knot locations of 36 and 64 years.

We now show how one can use this reported information to externally
validate the model in new data using **predRupdate**
package.

The first step in using **predRupdate** to validate this
model is to input the model information. We start by creating a
data.frame of the model coefficients, with the columns being the
predictor variable names. This information is then passed into the
`pred_input_info()`

function to input the information about
the existing model. See `pred_input_info()`

for details.

```
# create a data.frame of the model coefficients, with columns being variables
coefs_table <- data.frame("Intercept" = -3.995, #the intercept needs to be named exactly as given here
"Age_spline1" = 0.72918,
"Age_spline2" = 0.06249,
"Age_spline3" = 1.67003,
"Age_spline4" = 0.75348,
"SexM" = 0.47859)
#pass this into pred_input_info()
Existing_Logistic_Model <- pred_input_info(model_type = "logistic",
model_info = coefs_table)
summary(Existing_Logistic_Model)
#> Information about 1 existing model(s) of type 'logistic'
#>
#> Model Coefficients
#> =================================
#> Intercept Age_spline1 Age_spline2 Age_spline3 Age_spline4 SexM
#> 1 -3.995 0.72918 0.06249 1.67003 0.75348 0.47859
#>
#> Model Functional Form
#> =================================
#> Age_spline1 + Age_spline2 + Age_spline3 + Age_spline4 + SexM
```

Next, we need to apply the B-spline function (exactly as originally
published for the existing model under validation) to our dataset. We
can do this in R using the **splines** package, and passing
in the knot locations as reported by the existing model development:

```
Age_spline <- splines::bs(SYNPM$ValidationData$Age,
knots = c(50.09),
Boundary.knots = c(36, 64))
head(Age_spline)
#> 1 2 3 4
#> [1,] 0.3274396 0.4868261 0.18474109 0.0000000000
#> [2,] 0.1739705 0.4823196 0.34237812 0.0013318088
#> [3,] 0.1847176 0.4873202 0.32718198 0.0007802354
#> [4,] 0.2026126 0.4936276 0.30350207 0.0002577483
#> [5,] 0.1404779 0.4598862 0.39461373 0.0050222206
#> [6,] 0.5795287 0.2856465 0.04120013 0.0000000000
```

We can see that this creates the basis functions, which we add to out validation dataset (taking care to name these columns the same as how the corresponding coefficients for each basis function is specified in above):

```
ValidationData <- SYNPM$ValidationData
ValidationData$Age_spline1 <- Age_spline[,1]
ValidationData$Age_spline2 <- Age_spline[,2]
ValidationData$Age_spline3 <- Age_spline[,3]
ValidationData$Age_spline4 <- Age_spline[,4]
```

Now, we can validate this model in our validation dataset by using
`pred_validate()`

as normal (see
`vignette("predRupdate")`

):

```
validation_results <- pred_validate(x = Existing_Logistic_Model,
new_data = ValidationData,
binary_outcome = "Y")
summary(validation_results) #use summary() to obtain a tidy output summary of the model performance
#> Calibration Measures
#> ---------------------------------
#> Estimate Lower 95% Confidence Interval
#> Observed:Expected Ratio 2.9952 2.8947
#> Calibration Intercept 1.2097 1.1697
#> Calibration Slope 1.0555 0.9197
#> Upper 95% Confidence Interval
#> Observed:Expected Ratio 3.0992
#> Calibration Intercept 1.2496
#> Calibration Slope 1.1913
#>
#> Also examine the calibration plot, if produced.
#>
#> Discrimination Measures
#> ---------------------------------
#> Estimate Lower 95% Confidence Interval Upper 95% Confidence Interval
#> AUC 0.5854 0.574 0.5968
#>
#>
#> Overall Performance Measures
#> ---------------------------------
#> Cox-Snell R-squared: -0.1282
#> Nagelkerke R-squared: -0.23
#> Brier Score (CI): 0.1295 (0.1268, 0.1321)
#>
#> Also examine the distribution plot of predicted risks.
```