The `CRTanalysis()`

function is a wrapper for different statistical analysis packages that
can be used to analyse either simulated or real trial datasets. It is
designed for use in simulation studies of different analytical methods
for spatial CRTs by automating the data processing and selecting some
appropriate analysis options. It does not replace conventional use of
these packages. Real field trials very often entail complications that
are not catered for any of the analysis options in
`CRTanalysis()`

and it does not aspire to carry out the full
analytical workflow for a trial. It can be used as part of a wider
workflow. In particular the usual object output by the statistical
analysis package constitutes the `model_object`

element
within the `CRTanalysis`

object generated by
`CRTanalysis()`

. This can be accessed by the usual methods
(e.g `predict()`

, `summary()`

,
`plot()`

) which may be needed for diagnosing errors,
assessing goodness of fit, and for identifying needs for additional
analyses.

The options that can be specified using the `method`

parameter in the function call are:

`method = "T"`

summarises the outcome at the level of the cluster, and uses 2-sample t-tests to carry out statistical significance tests of the effect, and to compute confidence intervals for the effect size. The t.test function in the`stats`

package is used.`method = "GEE"`

uses Generalised Estimating Equations to estimate the efficacy in a model with iid random effects for the clusters. An estimate of the intracluster correlation (ICC) is also provided. This uses calls to the geepack package.`method = "LME4"`

fits linear (for continuous data) or generalized linear (for counts and proportions) mixed models with iid random effects for clusters in lme4.`method = "MCMC"`

uses Markov chain Monte Carlo simulation in package jagsUI, which calls r-JAGS.`method = "INLA"`

uses approximate Bayesian inference via the R-INLA package. This provides functionality for geostatistical analysis, which can be used for geographical mapping of model outputs (as illustrated in . INLA spatial analysis requires a prediction mesh. This can be generated using`CRTspat::new_mesh()`

. This can be computationally expensive, so it is recommended to compute the mesh just once for each dataset.

All these analysis methods can be used to carry out a simple
comparision of outcomes between trial arms. Each offers different
additional functionality, and has its own limitations (see Table 5.1).
Some of these limitations are specific to the options offered within
`CRTanalysis()`

, which does not embrace the full range of
options of the packages that are ‘wrapped’. These are specified using
the `method`

argument of the function.

Table 5.1. Available statistical methods

`method` |
Package | What the `CRTanalysis()` implementation offers |
Limitations (as implemented) |
---|---|---|---|

`T` |
t.test | P-values and confidence intervals for efficacy based on comparison of cluster means | No analysis of spillover or degree of clustering |

`GEE` |
geepack | Interval estimates for efficacy and Intra-cluster correlations | No analysis of spillover or degree of clustering |

`LME4` |
lme4 | Analysis of spillover | No geostatistical analysis |

`INLA` |
INLA | Analysis of spillover, geostatistical analysis and spatially structured outputs | Computationally intensive |

`MCMC` |
jagsUI | Interval estimates for spillover parameters | Identifiability issues and slow convergence are possible |

For the analysis of proportions, the outcome in the control arm is estimated as: \(\hat{p}_{C} = \frac{1}{1 + exp(-\beta_1)}\), in the intervention arm as \(\hat{p}_{I} = \frac{1}{1 + exp(-\beta_1-\beta_2)}\), and the efficacy is estimated as \(\tilde{E}_{s} = 1- \frac{\tilde{p}_{I}}{\tilde{p}_{C}}\) where \(\beta_1\) is the intercept term and \(\beta_2\) the incremental effect associated with the intervention.

`summary("<analysis>"")`

is used to view the key
results of the trial. To display the output from the statistical
procedure that is called, try `<analysis>$model_object`

or `summary("<analysis>$model_object")`

.

```
library(CRTspat)
example <- readdata("exampleCRT.txt")
analysisT <- CRTanalysis(example, method = "T")
summary(analysisT)
```

```
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: T
## Link function: logit
## Model formula: arm + (1 | cluster)
## No modelling of spillover
## Estimates: Control: 0.364 (95% CL: 0.286 0.451 )
## Intervention: 0.21 (95% CL: 0.147 0.292 )
## Efficacy: 0.423 (95% CL: 0.208 0.727 )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
##
## P-value (2-sided): 0.006879064
```

```
##
## Two Sample t-test
##
## data: lp by arm
## t = 2.9818, df = 22, p-value = 0.006879
## alternative hypothesis: true difference in means between group control and group intervention is not equal to 0
## 95 percent confidence interval:
## 0.2332638 1.2989425
## sample estimates:
## mean in group control mean in group intervention
## -0.5561662 -1.3222694
```

The `model = "LME4"`

option outputs the deviance of the
model and the Akaike information criterion (AIC), which can be used to
select the best fitting model. The deviance information criterion (DIC)
and Bayesian information criterion (BIC) perform the same role for the
Bayesian methods (`"INLA"`

, and `"MCMC"`

). The
comparison of results with `cfunc = "X"`

and
`cfunc = "Z"`

is used to assess whether the intervention
effect is likely to be due to chance. With `method = "T"`

,
`cfunc = "X"`

provides a significance test of the
intervention effect directly. The models with spillover (see below) can
be compared by that with `cfunc = "X"`

to evaluate whether
spillover has led to an important bias.

`CRTanalysis()`

provides options for analysing spillover
effects either as function of a Euclidean distance or as a function of a
surround measure:

Models that do not consider spillover can be fitted using options
`Z`

and `X`

. These are included both to allow
conventional analyses (see above), and also to enable model selection
using and likelihood ratio tests, the Akaike information criterion
(AIC), deviance information criterion (DIC) or Bayesian information
criterion (BIC) .

These methods require a measure of distance from the boundary between
the trial arms, with locations in the control arm assigned negative
values, and those in the intervention arm assigned positive values. The
functional forms for this relationship is specified by the value of
`cfunc`

(Table 5.2).

Table 5.2. Available spillover functions

`cfunc` |
Description | Formula for \(P\left( d \right)\) | Compatible `method` (s) |
---|---|---|---|

`Z` |
No intervention effect | \(P\left( d \right) = \ 0\ \) | `GEE` `LME4` `INLA`
`MCMC` |

`X` |
Simple intervention effect | \(\begin{matrix} P\left( d \right) = \ 0\ for\ d\ < \ 0 \\ P\left( d \right) = \ 1\ for\ d\ > \ 0 \\ \end{matrix}\ \) | `T` `GEE` `LME4` `INLA`
`MCMC` |

`L` |
inverse logistic (sigmoid) | \(P\left( d \right) = \ \frac{1}{\left( 1\ + \ exp\left( - d/S \right) \right)}\) | `LME4` `INLA` `MCMC` |

`P` |
inverse probit (error function) | \(P\left( d \right) = 1\ +\ erf\left(\frac{d}{S\sqrt2}\right)\) | `LME4` `INLA` `MCMC` |

`S` |
piecewise linear | \(\begin{matrix} P\left( d \right) = \ 0\ for\ d\ < \ - S/2\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ P\left( d \right) = \ \left(S/2\ + \ d \right)/S\ for\ - S/2 < d\ < \ S/2\\ P\left( d \right) = \ 1\ for\ d\ > \ S/2\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \\ \end{matrix}\ \) | `LME4` `INLA` `MCMC` |

`R` |
rescaled linear | \(P\left( d \right) =\frac{d\ -\ min(d)}{max(d)\ -\ min(d)}\) | `LME4` `INLA` `MCMC` |

`cfunc`

options `P`

, `L`

and
`S`

lead to non-linear models in which the spillover scale
parameter (`S`

) must be estimated. This is done by selecting
`scale_par`

using a one-dimensional optimisation of the
goodness of fit of the model in `stats::optimize()`

.

The different values for `cfunc`

lead to the fitted curves
shown in Figure 5.1. The light blue shaded part of the plot corresponds
to the spillover interval in those cases where this is estimated.

```
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Model formula: (1 | cluster)
## No comparison of arms
## Estimates: Control: 0.285 (95% CL: NA )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
## deviance: 1387.609
## AIC : 1391.609
```

```
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Model formula: arm + (1 | cluster)
## No modelling of spillover
## Estimates: Control: 0.366 (95% CL: 0.292 0.449 )
## Intervention: 0.216 (95% CL: 0.162 0.281 )
## Efficacy: 0.41 (95% CL: 0.165 0.584 )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
## deviance: 1379.898
## AIC : 1385.898
```

```
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: Signed distance to other arm (km)
## Estimated scale parameter: 0.45
## Model formula: pvar + (1 | cluster)
## Error function model for spillover
## Estimates: Control: 0.418 (95% CL: 0.331 0.509 )
## Intervention: 0.186 (95% CL: 0.136 0.25 )
## Efficacy: 0.553 (95% CL: 0.327 0.703 )
## Spillover interval(km): 4.22 (95% CL: 4.2 4.23 )
## % locations contaminated: 91.6 (95% CL: 90.6 92 %)
## Total effect : 0.23 (95% CL: 0.114 0.344 )
## Ipsilateral Spillover : 0.0233 (95% CL: 0.0127 0.0323 )
## Contralateral Spillover : 0.0417 (95% CL: 0.0192 0.0651 )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
## deviance: 1374.215
## AIC : 1382.215 including penalty for the spillover scale parameter
```

```
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: Signed distance to other arm (km)
## Estimated scale parameter: 0.249
## Model formula: pvar + (1 | cluster)
## Sigmoid (logistic) function for spillover
## Estimates: Control: 0.417 (95% CL: 0.332 0.51 )
## Intervention: 0.186 (95% CL: 0.136 0.249 )
## Efficacy: 0.552 (95% CL: 0.329 0.7 )
## Spillover interval(km): 4.26 (95% CL: 4.24 4.28 )
## % locations contaminated: 92.7 (95% CL: 92.2 93.1 %)
## Total effect : 0.229 (95% CL: 0.115 0.342 )
## Ipsilateral Spillover : 0.0219 (95% CL: 0.0121 0.0304 )
## Contralateral Spillover : 0.0388 (95% CL: 0.0183 0.0604 )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
## deviance: 1374.201
## AIC : 1382.201 including penalty for the spillover scale parameter
```

```
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: Signed distance to other arm (km)
## Estimated scale parameter: 1.674
## Model formula: pvar + (1 | cluster)
## Piecewise linear function for spillover
## Estimates: Control: 0.423 (95% CL: 0.334 0.516 )
## Intervention: 0.185 (95% CL: 0.135 0.247 )
## Efficacy: 0.561 (95% CL: 0.341 0.711 )
## Spillover interval(km): 4.1 (95% CL: 4.1 4.11 )
## % locations contaminated: 86.6 (95% CL: 86.6 87.1 %)
## Total effect : 0.237 (95% CL: 0.12 0.356 )
## Ipsilateral Spillover : 0.029 (95% CL: 0.016 0.0403 )
## Contralateral Spillover : 0.0522 (95% CL: 0.0248 0.0818 )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
## deviance: 1374.094
## AIC : 1382.094 including penalty for the spillover scale parameter
```

```
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: Signed distance to other arm (km)
## No non-linear parameter. 1
## Model formula: pvar + (1 | cluster)
## Rescaled linear function for spillover
## Estimates: Control: 0.584 (95% CL: 0.381 0.758 )
## Intervention: 0.116 (95% CL: 0.0587 0.216 )
## Efficacy: 0.801 (95% CL: 0.465 0.92 )
## Spillover interval(km): 6.64 (95% CL: 6.61 6.65 )
## % locations contaminated: 99.8 (95% CL: 99.8 99.8 %)
## Total effect : 0.468 (95% CL: 0.181 0.694 )
## Ipsilateral Spillover : 0.117 (95% CL: 0.0564 0.157 )
## Contralateral Spillover : 0.238 (95% CL: 0.0831 0.368 )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
## deviance: 1378.711
## AIC : 1384.711
```

```
p0 <- plotCRT(analysisLME4_Z, map = FALSE)
p1 <- plotCRT(analysisLME4_X, map = FALSE)
p2 <- plotCRT(analysisLME4_P, map = FALSE)
p3 <- plotCRT(analysisLME4_L, map = FALSE)
p4 <- plotCRT(analysisLME4_S, map = FALSE)
p5 <- plotCRT(analysisLME4_R, map = FALSE)
library(cowplot)
plot_grid(p0, p1, p2, p3, p4, p5, labels = c('Z', 'X', 'P', 'L', 'S', 'R'), label_size = 10, ncol = 2)
```

*Fig 5.1 Fitted curves for the
example dataset with different options for cfunc*

The piecewise linear spillover function, `cfunc = "S"`

, is
only linear on the scale of the linear predictor. When used in a
logistic model, as here, the transformation via the inverse of the link
function leads to a slightly curved plot (Figure 5.1S). The rescaled
linear function, `cfunc = "R"`

, is provided as a comparator
and for use with `distance`

values other than
`distance = "nearestDiscord"`

see below (it should not be
used to estimate the spillover interval).

The full set of different `cfunc`

options are available
for each of model options `"LME4"`

, `"INLA"`

, and
`"MCMC"`

. The performance of all these different models has
not yet been thoroughly investigated. The analyses of Multerer
*et al.* (2021b) found that that a model equivalent to
`method = "MCMC"`

, `cfunc = "L"`

gave estimates of
efficacy with low bias, even in simulations with considerable
spillover.

Spillover can also be analysed by assuming the effect size to be a
function of the number of intervened locations in the surroundings of
the location Anaya-Izquierdo
& Alexander(2021). Several different surround functions are
available. These are specified by the `distance`

parameter
(Table 5.3).

Table 5.3. Available surround functions

`distance` |
Description | Details |
---|---|---|

`nearestDiscord` |
Distance to nearest discordant location | The default. This is used for analyses by distance (see above) |

`hdep` |
Tukey half-depth | Algorithm of Rousseeuw & Ruts(1996) |

`sdep` |
Simplicial depth | Algorithm of Rousseeuw & Ruts(1996) |

`disc` |
disc | The number of intervened locations within the specified radius (excluding the location itself) as described by Anaya-Izquierdo & Alexander(2021) |

`kern` |
Sum of kernels | The sum of normal kernels |

The `compute_distance()`

function is provided to compute these quantities, so that they can be
described, compared, and analysed independently of
`CRTanalysis()`

. Note that the values of the surround
calculated by `compute_distance()`

are scaled to avoid
correlation with the spatial density of the points (see documentation) and so are
not equivalent to the quantities reported in the original
publications.

Users can also devise other measures of surround or distance, add
them to a `trial`

data frame and specify them using
`distance`

. `CRTanalysis()`

computes the minimum
value for the specified field

```
examples <- compute_distance(example, distance = "hdep")
ps1 <- plotCRT(examples, distance = "hdep", legend.position = c(0.6, 0.8))
ps2 <- plotCRT(examples, distance = "sdep")
examples <- compute_distance(examples, distance = "disc", scale_par = 0.5)
ps3 <- plotCRT(examples, distance = "disc")
examples <- compute_distance(examples, distance = "kern", scale_par = 0.5)
ps4 <- plotCRT(examples, distance = "kern")
plot_grid(ps1, ps2, ps3, ps4, labels = c('hdep', 'sdep', 'disc', 'kern'), label_size = 10, ncol = 2)
```

*Fig 5.2 Stacked bar plots for
different surrounds*

If `distance`

is assigned a value of either
`hdep`

, `sdep`

, then `cfunc = "R"`

is
used by default and the overall effect size is computed by comparing the
fitted values of the model for a surround value of zero with that of the
maximum of the surround in the data. If `distance = "disc"`

or `distance = "kern"`

and `scale_par`

is assigned
a value, then `cfunc = "R"`

is also used. If
`cfunc = "E"`

is specified then an escape function is fitted
with the scale parameter estimated in the same way as in the scale
parameter in other models (see above Table 5.2).

```
examples_hdep <- CRTanalysis(examples, method = "LME4", distance = "hdep", cfunc = 'R')
summary(examples_hdep)
```

```
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: Tukey half-depth
## No non-linear parameter. 1
## Model formula: pvar + (1 | cluster)
## Rescaled linear function for spillover
## Estimates: Control: 0.381 (95% CL: 0.292 0.478 )
## Intervention: 0.209 (95% CL: 0.15 0.282 )
## Efficacy: 0.452 (95% CL: 0.167 0.639 )
## Spillover interval(km): 0.978 (95% CL: 0.976 0.98 )
## % locations contaminated: 55 (95% CL: 55 55 %)
## Total effect : 0.172 (95% CL: 0.0524 0.292 )
## Ipsilateral Spillover : 0.0313 (95% CL: 0.01 0.0512 )
## Contralateral Spillover : 0.0444 (95% CL: 0.0128 0.0785 )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
## deviance: 1379.89
## AIC : 1385.89
```

```
ps4 <- plotCRT(examples_hdep,legend.position = c(0.8, 0.8))
examples_sdep <- CRTanalysis(examples, method = "LME4", distance = "sdep", cfunc = 'R')
summary(examples_sdep)
```

```
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: Simplicial depth
## No non-linear parameter. 1
## Model formula: pvar + (1 | cluster)
## Rescaled linear function for spillover
## Estimates: Control: 0.393 (95% CL: 0.307 0.485 )
## Intervention: 0.199 (95% CL: 0.145 0.268 )
## Efficacy: 0.493 (95% CL: 0.243 0.66 )
## Spillover interval(km): 0.978 (95% CL: 0.976 0.98 )
## % locations contaminated: 52.4 (95% CL: 52.2 52.4 %)
## Total effect : 0.193 (95% CL: 0.0802 0.306 )
## Ipsilateral Spillover : 0.0299 (95% CL: 0.013 0.0456 )
## Contralateral Spillover : 0.0431 (95% CL: 0.0169 0.0704 )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
## deviance: 1376.417
## AIC : 1382.417
```

```
ps5 <- plotCRT(examples_sdep)
examples_disc <- CRTanalysis(examples, method = "LME4", distance = "disc", cfunc = 'R', scale_par = 0.15)
summary(examples_disc)
```

```
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: disc of radius 0.15 km
## Precalculated scale parameter: 0.15
## Model formula: pvar + (1 | cluster)
## Rescaled linear function for spillover
## Estimates: Control: 0.387 (95% CL: 0.312 0.467 )
## Intervention: 0.2 (95% CL: 0.149 0.26 )
## Efficacy: 0.482 (95% CL: 0.273 0.634 )
## Spillover interval(km): 0.978 (95% CL: 0.976 0.98 )
## % locations contaminated: 8.89 (95% CL: 8.89 8.89 %)
## Total effect : 0.186 (95% CL: 0.0912 0.282 )
## Ipsilateral Spillover : 0.00458 (95% CL: 0.00239 0.00656 )
## Contralateral Spillover : 0.00576 (95% CL: 0.00271 0.00905 )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
## deviance: 1374.274
## AIC : 1380.274
```

```
ps6 <- plotCRT(examples_disc)
examples_kern <- CRTanalysis(examples, method = "LME4", distance = "kern", cfunc = 'R', scale_par = 0.15)
summary(examples_kern)
```

```
##
## =====================CLUSTER RANDOMISED TRIAL ANALYSIS =================
## Analysis method: LME4
## Link function: logit
## Measure of distance or surround: kern with kernel s.d. 0.15 km
## Precalculated scale parameter: 0.15
## Model formula: pvar + (1 | cluster)
## Rescaled linear function for spillover
## Estimates: Control: 0.406 (95% CL: 0.327 0.491 )
## Intervention: 0.185 (95% CL: 0.136 0.245 )
## Efficacy: 0.542 (95% CL: 0.349 0.684 )
## Spillover interval(km): 0.979 (95% CL: 0.977 0.98 )
## % locations contaminated: 50.8 (95% CL: 50.6 50.9 %)
## Total effect : 0.22 (95% CL: 0.122 0.32 )
## Ipsilateral Spillover : 0.011 (95% CL: 0.00661 0.0152 )
## Contralateral Spillover : 0.0134 (95% CL: 0.00707 0.0203 )
## Coefficient of variation: 41.6 % (95% CL: 31.2 63.1 )
## deviance: 1369.677
## AIC : 1375.677
```

```
ps7 <- plotCRT(examples_kern)
plot_grid(ps4, ps5, ps6, ps7, labels = c('hdep', 'sdep', 'disc', 'kern'), label_size = 10, ncol = 2)
```

*Fig 5.3 Fitted curves for the
example dataset with different surrounds *

To carry out a geostatistical analysis with
`method = "INLA"`

a prediction mesh is needed. By default a
very low resolution mesh is created (creating a high resolution mesh is
computationally expensive). To create a 100m INLA mesh for
`<MyTrial>`

, use:

`mesh <- new_mesh(trial = <MyTrial> , pixel = 0.1)`