Bootstrapping and hazard ratio thresholds

Dominic Pearce, The Institute of Genetics and Molecular Medicine, The University of Edinburgh
2018-04-25

 

Bootstrapping and hazard ratio thresholds

 

Libraries

library(survivALL)
library(Biobase)
library(knitr)

 

To determine and ensure reliable prognostic association as a measure of significance, survivALL can perform a non-parametric bootstrapping procedure. In short we calculate, for each point-of-separation a distribution of expected hazard ratios (HRs), against which we're able to compare our observed HRs as part of our analysis.

To achieve this, we randomly sample our survival data with replacement and then calculate survival statistics for all points-of-separation, exactly as we would for a biomarker under investigation. By repeating this procedure 1,000s or 10,000s of times, we produce our distribution of expected hazard ratios.

 

data(nki_subset)

#bootstrapping data should be in the format of 1 repeat per column
bs_mtx <- matrix(nrow = ncol(nki_subset), ncol = 20)

system.time(
            for(i in 1:ncol(bs_mtx)){
                bs_mtx[, i] <- allHR(measure = sample(1:ncol(nki_subset), 
                                                      replace = TRUE),
                                     srv = pData(nki_subset),
                                     time = "t.dmfs",
                                     event = "e.dmfs")
            }
)

user system elapsed 24.313 0.296 24.632


kable(bs_mtx[1:20, 1:5])
NA NA NA NA NA
-0.4014227 NA NA -0.4576410 NA
0.2984494 -0.4491361 NA 0.0792404 -0.8907620
-0.5303545 0.0990314 0.4857405 0.7285488 NA
-0.1702408 0.6413651 0.8452752 1.0275861 -0.7785881
0.1814099 0.9289172 1.1349544 0.0946139 -0.3816460
-0.1740861 -0.0305109 0.1757499 0.3099642 -0.9078913
0.0197442 0.3146530 0.4616417 -0.2501668 -0.4518784
-0.3744622 0.5120723 -0.0896996 -0.4760801 -0.3932269
-0.1658159 0.6869374 0.0243721 -0.3161967 -0.1424456
-0.0318268 0.8520492 0.0243721 -0.1669160 0.1284438
0.1789310 1.0547973 0.1929272 -0.3623228 0.3106466
0.2946136 1.1722670 0.4310035 -0.2210120 0.4851842
0.3954521 NA 0.6013607 -0.0617779 0.6461764
0.5589154 NA 0.7511898 -0.0617779 0.8070691
0.6965117 NA 0.8487477 -0.3131932 0.4366925
0.3651488 NA 0.9855458 -0.1528381 0.5390877
0.2145058 1.1179661 1.0938748 -0.0143247 0.6437619
0.0621976 1.2047041 0.7007348 -0.1846323 0.6883963
0.1542357 1.2857949 0.7851356 -0.1051644 0.8163849

 

Having calculated our bootstrapped data we then simply hand the matrix to either the survivALL() or plotALL() functions (using the bs_dfr = argument) to handle the subsequent significance calculations. It should be noted that bootstrapping up to 10,000x can be a long process requiring an investment of time.