This is an `R`

package to use the model-impled
simulation-based power estimation (MSPE) method to find the minimum
required sample size for a given power within a nonlinear Structural
Equation Model (NLSEM) for several parameters of interest (POI). The
package was created as a supplement to the publication **Irmer et
al. (2024b)** and its theory is based on Irmer et al. (2024a).
Here, a probit regression model with \(\sqrt{n}\) as a predictor is fit to
significance decisions for single parameters (using the \(z\)-test) within simulated data to describe
the relationship between power and sample size \(n\) (Irmer et al., 2024b).

This requires the package `devtools`

.

Use `build_vignettes = TRUE`

to be able to see the
documentation linked in “Getting Started”.

Load the package:

The `powerNLSEM`

packages uses `lavaan`

syntax
(Rosseel, 2012) to describe the model including population values:

```
model <- "
# measurement models
X =~ 1*x1 + 0.8*x2 + 0.7*x3
Y =~ 1*y1 + 0.85*y2 + 0.78*y3
Z =~ 1*z1 + 0.9*z2 + 0.6*z3
# structural models
Y ~ 0.3*X + .2*Z + .2*X:Z
# residual variances
Y~~.7975*Y
X~~1*X
Z~~1*Z
# covariances
X~~0.5*Z
# measurement error variances
x1~~.1*x1
x2~~.2*x2
x3~~.3*x3
z1~~.2*z1
z2~~.3*z2
z3~~.4*z3
y1~~.5*y1
y2~~.4*y2
y3~~.3*y3
"
```

All parameters in the model are given by the user, otherwise
`lavaan`

’s defaults are used. These are `1`

for
variances, `0.5`

for covariance and `0`

for all
other coefficients. Hence, not stating a coefficient will result in
zero-effects for which the power is just the level of significance
(\(\alpha\) or the type-I error).

Interactions among latent variables have not yet been included into
`lavaan`

(version 0.6.13), which is why this is handeled by
the `powerNLSEM`

package by translating the syntax into
syntax for which nonlinear models can be estimated. For now these are
`LMS`

(latent moderated structural equations, Klein &
Moosbrugger, 2000), which needs an installation of `Mplus`

(Muthén & Muthén, 1998-2017), `UPI`

(unconstrained
product indicator approach, Marsh et al., 2004, Kelava & Brandt,
2009), which further makes use of the `semTools`

package
(Jorgensen et al., 2022) to compute the product indicators in a matched
or unmatched way (including different ways of centering the indicators),
a factor score approach using the SL-method (named after Skrondal &
Laake, 2001, as studied in Ng & Chang, 2020) and a scale mean
regression based path analysis where the latent variables are collapsed
to means of the indicators per latent variables and path analysis is
used to fit the NLSEM.

```
#> # structural models
#> Y ~ 0.3*X + .2*Z + .2*X:Z
#>
#> # residual variances
#> Y~~.7975*Y
```

States the structural model of the NLSEM as \(Y=.3X + .2Z + .2XZ + \varepsilon_Y\), where \(\varepsilon_Y\sim\mathcal{N}(0,.7975)\), i.e., the variance \(\mathbb{V}ar[\varepsilon_Y]=.7975\). We are interested how large the sample size needs to be for a power of 80% for the three regression coefficients. We will use the adaptive search algorithm to find the necessary sample size.

After stating the model, we can use the `powerNLSEM`

function to use the adaptive search algorithm to find the optimal sample
size for our desired power for a given Type I error rate for our latent
moderation model using the product indicator approach (UPI, with matched
products). For computational reasons a very small number of replications
is used and only two steps are utilized (for further information see
also Irmer et al., 2024b):

```
Result_Power <- powerNLSEM(model = model,
POI = c("Y~X", "Y~Z", "Y~X:Z"),
method = "UPI",
search_method = "adaptive",
steps = 2, # for computational reasons, better >= 10
power_modeling_method = "probit",
R = 200, # for computational reasons, better >= 2000
power_aim = .8,
alpha = .05,
alpha_power_modeling = .05,
CORES = 1,
seed = 2024)
#> Initiating smart search to find simulation based N for power of 0.8 within 2 steps
#> and in total 200 replications. Ns are drawn randomly...
#> Step 1 of 2. Fitting 67 models with Ns in [140, 420].
#> Step 2 of 2. Fitting 133 models with Ns in [140, 270].
```

The argument `model`

is given our previously stated model
in `lavaan`

-syntax, `POI`

describe the
**P**arameters **O**f
**I**nterest (here we are interested in the power of the
linear effect of `X`

and `Z`

on `Y`

and
the interaction between `X:Z`

on `Y`

, namely all
the structural effects within the model, therefore
`POI = c("Y~X", "Y~Z", "Y~X:Z")`

), the `method`

is
choosen to be `"UPI"`

, which indicates that the unconstrained
product indicator approach should be used (Marsh et al., 2004, Kelava
& Brandt, 2009), `search_method`

is choosen to be
`"adaptive"`

, an alternative would be
`"bruteforce"`

(see documentation for more details), 2
adaptive search `steps`

are chosen with a
`power_modeling_method`

of `"probit"`

, which means
the significance decisions per parameter are modeled via a probit
regression model. `R`

is the total number of replications
fitted (here 200 is small, this number should be increased for higher
precision, values much smaller may create unwanted behaviour of the
search algorithm as the power-model might be to plane, a good suggestion
is \(R\ge 2000\) and for precise
results better \(R\ge 10^5\)).
`power_aim`

is the desired power level (here 0.8) for which
the adaptive algorithm is optimized (and for which the \(N\) is found), `alpha`

is the
corresponding Type I error rate for the significance decision per
replication (level of significance, here 0.05),
`alpha_power_modeling`

is the Type I error rate used within
the power modeling process, i.e., the confidence band used to derive the
lower bound of power that is then used to solve for \(\hat{N}\) which further enables that \(\hat{N}\) will ensure the desired power
rate with Type I error `alpha_power_modeling`

divided by 2,
`CORES`

are the number of computer cpu cores used to estimate
the models (here 1 is chosen, this number should be increased to reduce
runtime). As this is a random search algorithm, we need to set a seed
for comparison: `seed`

.

The output object `Result_Power`

is a list with the
following objects:

```
names(Result_Power)
#> [1] "N" "N_trials" "est"
#> [4] "se" "Ns" "fitOK"
#> [7] "truth" "power" "beta"
#> [10] "alpha" "alpha_power_modeling" "method"
#> [13] "search_method" "power_modeling_method" "test"
#> [16] "convergenceRate" "Performance" "AveragePerformance"
#> [19] "seed" "model" "runtime"
#> [22] "call" "args"
```

When we apply the `summary`

function to the output of the
`powerNLSEM`

package, we get an overview of the most
important information of the estimation with some visual highlights:

```
summary(Result_Power)
#> -----------------------------------------------------------------------------
#> Model-Implied Simulation-Based Power Estimation: powerNLSEM 0.1.2
#>
#> Parameters of Interest (POI):
#> Y~X, Y~Z, Y~X:Z
#>
#> True Values for POI:
#> Y~X Y~Z Y~X:Z
#> 0.3 0.2 0.2
#>
#>
#> Method:
#> UPI
#>
#> Test:
#> onesided z-Test
#>
#> Power (optimized for):
#> 0.8
#>
#> Type I error/Alpha (for significance decision/z-Tests):
#> 0.05
#>
#> Power Modeling:
#> probit
#>
#> Type I error/Alpha (for power modeling):
#> 0.05
#>
#> R (number of replications):
#> 200
#>
#> Convergence Rate:
#> 0.965 (converged samples: 193)
#>
#> Seed:
#> 2024
#>
#> -------------------------------Results---------------------------------------
#> Desired Sample Size:
#> 328
#>
#> Estimation Performance:
#> Y~X Y~Z Y~X:Z
#> Bias 0.01267340 -0.0009403001 0.001975726
#> absolute Bias 0.07162958 0.0673287802 0.057565311
#> relative Bias 0.04224466 -0.0047015005 0.009878631
#> RWMSE 0.08962312 0.0841871999 0.073139976
#> *weighted bias, absolute bias, relative bias,
#> and root weighted mean squared error
#> -----------------------------------------------------------------------------
```

The `Result_Power`

contains far more information than that
of the `summary`

.

is the necessary sample size to ensure that the power is \(\ge .8\).

```
head(Result_Power$est) # first 6 rows
#> Y~X Y~Z Y~X:Z
#> 1 0.4259735 0.1642598 0.1517858
#> 2 0.3364663 0.2582934 0.1800099
#> 3 0.3177444 0.1617880 0.1866130
#> 4 0.2699603 0.2074662 0.1787506
#> 5 0.1727635 0.2363284 0.2591777
#> 6 0.3324757 0.1781149 0.1983582
```

is the data.frame including all parameter estimates from which the significance decision are computed using the corresponding standard errors in

```
head(Result_Power$se) # first 6 rows
#> Y~X Y~Z Y~X:Z
#> 1 0.06618769 0.06789268 0.04367723
#> 2 0.07393442 0.06911093 0.04735818
#> 3 0.05915466 0.06298318 0.04493189
#> 4 0.05719477 0.05868278 0.03609598
#> 5 0.06024498 0.05513426 0.05130208
#> 6 0.06596830 0.06104587 0.04105743
```

is a vector of logicals indicating whether the models converged and the results are trustworthy to be used in power modeling with

being the convergence rate.

are the calculated necessary sample sizes within every step of the adaptive search algorithm. This can be used for diagnistics and to check whether the algorithm has converged.

are the desired power level, the corresponding beta-error level (type II-error level: \(\beta=\mathbb{P}(H_0|H_1)\), since Power = \(1-\beta = \mathbb{P}(H_1 | H_1)\)) and the desired alpha-error level (Type I error level: \(\alpha=\mathbb{P}(H_1|H_0)\)).

```
head(Result_Power$args$seeds) # seeds within each simulation
#> [1] 373379621 774681276 889107345 711740832 543694458 652342271
```

include information on the search algorithm (here “adaptive” search, could also be “bruteforce”), the chosen method to model the power (here “probit”, i.e., probit regression model), the runtime, the general seed and the seeds used within each simulation (for replicability of e.g., non-convergences, etc.) used for replicability.

The `powerNLSEM`

package offers several plots, which
visualize the power:

plots the model implied power for the `POI`

vs. sample
size `N`

. The vertical line indicates the necessary sample
size found be the adaptive search algorithm. The horizontal line
indicates the desired power level.

Within this plot the standard errors of the
`power_modeling_method`

are included into the plot.

plots the empirical power per sample size and fits a LOESS fit to the resulting data. All plots indicate that the linear effect of Z has the smallest power.

One can also find other sample sizes for power values other than that
the process has been optimized for by using the
`reanalyze.powerNlSEM`

function.

```
reanalyse.powerNLSEM(Result_Power,
powerLevels = c(.5, .6, .7, .8, .9, .95))
#> $Npower
#> [1] 166 191 227 328 601 906
#>
#> $power
#> [1] 0.50 0.60 0.70 0.80 0.90 0.95
#>
#> $beta
#> [1] 0.50 0.40 0.30 0.20 0.10 0.05
#>
#> $alpha
#> [1] 0.05
#>
#> $alpha_power_modeling
#> [1] 0.05
#>
#> $method
#> [1] "UPI"
#>
#> $test
#> [1] "onesided"
#>
#> $search_method
#> [1] "adaptive"
#>
#> $power_modeling_method
#> [1] "probit"
#>
#> attr(,"class")
#> [1] "powerNLSEM.reanalyzed" "list"
```

These new values can also be plotted into the plot

were we see that some of the power values actually fall out of the support for which sample sizes had been drawn indicating that these values might be less precise.

Further, if we want more precision in the power modeling process we
can alter `alpha_power_modeling`

to a lower value.

```
reanalyse.powerNLSEM(Result_Power,
powerLevels = c(.5, .6, .7, .8, .9),
alpha_power_modeling = .001)
#> $Npower
#> [1] 182 207 300 -Inf -Inf
#>
#> $power
#> [1] 0.5 0.6 0.7 0.8 0.9
#>
#> $beta
#> [1] 0.5 0.4 0.3 0.2 0.1
#>
#> $alpha
#> [1] 0.05
#>
#> $alpha_power_modeling
#> [1] 0.001
#>
#> $method
#> [1] "UPI"
#>
#> $test
#> [1] "onesided"
#>
#> $search_method
#> [1] "adaptive"
#>
#> $power_modeling_method
#> [1] "probit"
#>
#> attr(,"class")
#> [1] "powerNLSEM.reanalyzed" "list"
```

If we wish to not use confidence bands in the power modeling process
we can use `alpha_power_modeling = 1`

.

```
reanalyse.powerNLSEM(Result_Power,
powerLevels = c(.5, .6, .7, .8, .9),
alpha_power_modeling = 1)
#> $Npower
#> [1] 127 158 196 246 323
#>
#> $power
#> [1] 0.5 0.6 0.7 0.8 0.9
#>
#> $beta
#> [1] 0.5 0.4 0.3 0.2 0.1
#>
#> $alpha
#> [1] 0.05
#>
#> $alpha_power_modeling
#> [1] 1
#>
#> $method
#> [1] "UPI"
#>
#> $test
#> [1] "onesided"
#>
#> $search_method
#> [1] "adaptive"
#>
#> $power_modeling_method
#> [1] "probit"
#>
#> attr(,"class")
#> [1] "powerNLSEM.reanalyzed" "list"
```

If we choose `alpha_power_modeling = 1`

within the
adaptive search algorithm using `powerNLSEM`

, then the sample
sizes get optimized for that value. However, this is not adviced since
in approx. half of the replications (retrials of the adaptive algorithm
or brute algorithm) the sample size will actually be smaller than that
resulting in the desired power rate.

The required sample size can be double checked by running a
simulation using just this sample size. This is presented for a linear
SEM next, as other functions exist which can independently check the
results. First we need to formulate the population model and the
analysis model. We use a simplified version of the `model`

used above.

```
populationModel <- "
# measurement models
X =~ 1*x1 + 0.8*x2 + 0.7*x3
Y =~ 1*y1 + 0.85*y2 + 0.78*y3
# structural models
Y ~ 0.3*X
# residual variances
Y~~.91*Y
X~~1*X
# measurement error variances
x1~~.1*x1
x2~~.2*x2
x3~~.3*x3
y1~~.5*y1
y2~~.4*y2
y3~~.3*y3
"
```

Now we fit the MSPE to estimate power for the effect
`"Y~X"`

for this linear SEM. We use
`method = "UPI"`

as the product indicator approach without
any nonlinear effects simplifies to standard SEM.

```
Simple <- powerNLSEM(model = populationModel, POI = c("Y~X"), method = "UPI",
search_method = "adaptive", steps = 2,
power_modeling_method = "probit",
R = 200, power_aim = .8, alpha = .05,
seed = 2024, CORES = 1)
#> Initiating smart search to find simulation based N for power of 0.8 within 2 steps
#> and in total 200 replications. Ns are drawn randomly...
#> Step 1 of 2. Fitting 67 models with Ns in [75, 225].
#> Step 2 of 2. Fitting 133 models with Ns in [75, 112].
```

The required sample size is

Now, we simulate data for this particular sample size and compute the resulting power rate.

```
VerifyRes <- powerNLSEM(model = populationModel, POI = c("Y~X"), method = "UPI",
search_method = "bruteforce", Ns = Simple$N,
R = 200, seed = 2024, CORES = 1)
#> Initiating brute force search to find simulation based N for power of 0.8 within 200 replications...
#> Fitting 200 models with Ns in [99, 99].
```

The argument `$powersPerN`

gives the power rate per
selected sample size which is what we need here:

This power rate is close to the desired power rate of \(.8\). In fact the confidence interval [0.8,
0.901] just excludes \(.8\). This is
desired, as the computed required sample size is derived from the lower
bound of the confidence band around the predicted power rate. Hence,
with `alpha_power_modeling = .05`

, we have that in 2.5% of
cases the computed sample size will result in a power rate smaller than
the desired power rate of .8. Hence, in most cases, the computed power
rate will be larger. With increasing `R`

this slight
overestimation becomes smaller.

Irmer, J. P., Klein, A. G., & Schermelleh-Engel, K. (2024a). A
General Model-Implied Simulation-Based Power Estimation Method for
Correctly and Misspecfied Models: Applications to Nonlinear and Linear
Structural Equation Models. *Behavior Research Methods.* https://doi.org/10.31219/osf.io/pe5bj

Irmer, J. P., Klein, A. G., & Schermelleh-Engel, K. (2024b).
Estimating Power in Complex Nonlinear Structural Equation Modeling
Including Moderation Effects: The `powerNLSEM R`

-Package.
*Behavior Research Methods.* https://doi.org/10.3758/s13428-024-02476-3

Jorgensen, T. D., Pornprasertmanit, S., Schoemann, A. M., &
Rosseel, Y. (2022). * semTools: Useful tools for
structural equation modeling*. R package version 0.5-6. Retrieved
from https://CRAN.R-project.org/package=semTools

Kelava, A., & Brandt, H. (2009). Estimation of nonlinear latent
structural equation models using the extended unconstrained approach.
*Review of Psychology, 16*(2), 123–131.

Klein, A. G., & Moosbrugger, H. (2000). Maximum likelihood
estimation of latent interaction effects with the LMS method.
*Psychometrika, 65*(4), 457–474. https://doi.org/10.1007/BF02296338

Marsh, H. W., Wen, Z. & Hau, K. T. (2004). Structural equation
models of latent interactions: Evaluation of alternative estimation
strategies and indicator construction. *Psychological Methods,
9*(3), 275–300. https://doi.org/10.1037/1082-989X.9.3.275

Muthén, L., & Muthén, B. (1998-2017). *Mplus user’s guide
(Eighth ed.)*. Los Angeles, CA: Muthén & Muthén.

Rosseel, Y. (2012). lavaan: An R package for structural equation
modeling. *Journal of Statistical Software, 48*(2), 1–36. https://doi.org/10.18637/jss.v048.i02

Skrondal, A., & Laake, P. (2001). Regression among factor scores.
*Psychometrika, 66*(4), 563-575. https://doi.org/10.1007/BF02296196