Traditional bioequivalence (BE) study design and statistical methods are well established (1,2) and are based on non compartmental analysis (NCA). There are, however, a number of clinical scenarios where a traditional BE study is impractical (3,4). Examples include:

Very long half life drugs.

Clinical scenarios were rich sampling is not feasible, for example, pediatrics or critically ill patients.

Clinical scenarios were a multi-period cross over study is not feasible, e.g., cytotoxic drugs that cannot be given to healthy volunteers.

There is value in developing method to meet the BE regulatory requirements for such drugs. Model Based Bio Equivalence (MBBE) is one method to achieve this.

The key parameter for implementing MBBE is the power of simulated studies. The power is key, rather than the outcome of any single simulated study, as the outcome of any single study will depend on the random seed chosen. A more general and robust answer regarding the likelihood BE can be obtained by simulating many studies, and examining the distribution of outcomes, and therefore, the predicted distribution of outcomes of actual studies, were such studies to be performed. The use of nonlinear mixed effect models and Monte Carlo simulation is well established for describing such distributions. Typically the data used for development of a population PK model do not come from a BE study. They may come from other, e.g., parallel design studies and/or sparse sampling. If data from a well designed BE study were available, Virtual BE studies would not be needed.

“All models are wrong, some models are useful” [5]. We interpret this well known aphorism to mean that if a single model (structure and parameters) is used to describe the data, the prediction will be wrong (or at least not exactly correct). But, if “some” models (e.g., multiple models averaged together) are used, it may be useful, or perhaps at least less wrong. Aoki[8] showed that for simulated studies, model averaging in most cases increases the accuracy of predictions. In Monte Carlo simulation we typically include parameter uncertainty for a single model. With model averaging, we extend model uncertainty from a single model with uncertain parameters to uncertainty about the model structure as well. Rather than a single (incorrect) model, a number of “adequate” models are examined. Adequate models are models that meet some set of minimal requirements in describing the data. The method described by Aoki [8] is used for model averaging. A set of bootstrap samples from the original data set are generated. NONMEM is then used to estimated parameters for each adequate-model/bootstrap-data set combination is performed. For each sample, the model with the best (lowest) Bayesian Information Criteria (BIC [9]) is selected. The Monte Carlo simulation is then done with the model structure and parameter estimates selected for each bootstrap data set, thus capturing uncertainty in both the model structure and parameters.

The simulations can be performed using a data set different from the source data, as, typically, the reason for doing MBBE is that there are no formal BE studies available. In this way, non-BE study data can be used for model definition and then a formal BE study (e.g., 4 period, single dose cross over study) can be simulated. The NCA parameters for each (simulated) subject in each study are then calculated using and algorithm derived. The workflow is:

Develop model(s) of the data. Model development and criteria for an “adequate” model are not discussed, but typically would include separate estimation by formulation for all absorption parameters (e.g, F, Ka, Lag time, zero order infusion duration, transit compartment rate constants etc.). Different structural models for the absorption of refernce vs test formulation may also be considered. An adequate model may also include between occasion variability on Volume and elimination terms, if multiple periods are available in the observed data.

For each bootstrap sample, select the “best” model. In the current implementation, the best model is selected based on Bayesian information criteria (BIC). That is, parameters for each adequate model are estimated for each bootstrap sample. For each bootstrap sample, the model and associated parameter estimates with the best (lowest) BIC is selected. In this way, the uncertainty of both the model structure and the model parameters is described.

From the selected models, perform Monte Carlo simulation, with a reasonable study design (e.g., 4 period, cross over, rich sampling, 50 subjects).

Determine power of the analysis by calculating two one side t test statistics [6] on each simulated study.

Two options exist for how to combine those different models/parameter estimate. First, the data from each of the simulations can be used as simulated, with a single model per simulated study (the “model_averaging_by”: “study” option). This represents the possibility that any given model may be “correct” for all subjects, we’re just not certain which one it is. Alternatively, the study data can be recombined such that each study has a population of individuals based on the representation of overall models. (the “model_averaging_by”: “subject” option). For this option, any given study will be comprised of the same set of subjects (demographics, sample time etc.), but have different structure models. The interpretation here is that there is a distribution of models within the population (similar to the distribution of parameters). So, for the population we have a set prior probabilities for the structural model, but uncertainty about which model any given subject follows. This is analogous to a prior distribution (THETA and OMEGA in NONMEM) for parameters, but uncertainty about the “true value” parameters.

An overall diagram (from Aoki [8]) for model averaging is shown below:

Nyberg et. al [10] described using a SADDLE_RESET as a check for local non-identifiability. MBBE included an option

`"use_check_identifiable": true,`

to instruct the algorithm to determine whether any model resulting from the bootstrap is identifiable, based on the largest absolute fractional difference between pre and post saddle reset parameters of delta_parms. delta_parms can be set in the json argument file e.g:

`"delta_parms": 0.2,`

Traditional bioequivalence is assessed in a single study population (N=1, drawn from a universe of possible study populations). Thus there is some degree of “luck” associated with whether or not the formulae are deemed bioequivalent. MBBE permits a more robust approach, not dependent on a single draw from a distribution, that is, the power of the given study design to find that the formulae are bioequivalent. The success of each study is then calculated based on the method described by Schütz (11) in the replicateBE package in R. The resulting power then is simply the fraction of simulated studies that successfully demonstrate bioequivalence for each NCA endpoint.

MBBE is an R package available on CRAN. Details of implementation are provide in the Step-by-Step Vignette. Several files are required, include observed and simulation data sets, control file and an options file. Sample files with explanation is given below.

```
{ "run_dir": "c:/fda/mbbe",
"model_source": [
"u:/mbbe/inst/examples/NM_05D01_11.mod",
"u:/mbbe/inst/examples/NM_05D01_05.mod",
"u:/mbbe/inst/examples/NM_04_085.mod",
"u:/mbbe/inst/examples/NM_04_090.mod",
"u:/mbbe/inst/examples/NM_05D01_12.MOD"
],
"num_parallel": 32,
"crash_value": 999999,
"nmfe_path": "c:/nm74g64/util/nmfe74.bat",
"delta_parms": 0.2,
"use_check_identifiable": true,
"NCA_end_time": 72,
"rndseed": 1,
"simulation_data_path": "U:/mbbe/inst/examples/data_sim.csv",
"ngroups": 4, "samp_size": 100,
"reference_groups": [ 1, 2 ],
"test_groups": [3, 4 ],
"plan": "multisession",
"alpha_error": 0.05,
"NTID": false,
"model_averaging_by": "study",
"user_R\_code": true,
"R_code_path": "u:/mbbe/R/RPenaltyCode.r" }
```

Descriptions of these options are below:

run_dir - run directory where bootstrap samples will be run

model_source - a json formatted list of NONMEM control files to be used for model averaging

num_parallel - number of parallel NONMEM models to run for bootstrap and Monte Carlo Simulation

crash_value - Value assigned for model “goodness” (BIC plus any other penalty) if the model fails to generate output. Should be larger than any “goodness” value for a model that completes

nmfe_path - path to nmfe??.bat file

delta_parms - fractional difference in any parameter before and after a SADDLE_RESET that will be deemed not identifiable

use_check_identifiable - logical (true|false) for whether to test for identifability

NCA_end_time - end time for NCA parameter calculation. Note that all NCA intervals will start at 0 (likely requiring a reset event in the simulation data set for each period)

rndseed - random seed for Bootstrap and Monte Carlo Simulation

simulation_data_path - path to data file to be used for simulation

ngroups - number of groups in simulation data set

samp_size - sample size to be used for Bootstrap and Monte Carlo Simulation

reference_groups - GROUP value indicating reference formulation in the simulation data set

test_groups - GROUP value indicating test formulation in the simulation data set. Note that the total number of reference_groups and test_groups must equal ngroups

plan - parallel plan to be used for bootstrap and Monte Carlo Simulations. Options are

sequential

multisession

multicore

alpha_error” - type 1 error for the two one sides tests, default = 0.05

NTID - Is this a Narrow Therapeutic Index Drug? (12)

model_averaging_by - should model averaging be by study or by subject (experimental)

user_R_code - will R code be used to calculate a penalty to be added to the Bayesian Information Criteria?

R_code_path - if user_R_code, path to the R source file

The bootstrap analysis data set can have any form required for the NONMEM model. However, several data items are required. These are:

PERIOD

GROUP

SEQ (Sequence)

TRT (treatment)

The analysis data set need not have multiple periods or sequences (in which case all values can be 0, or missing). However the $INPUT record is copied from the analysis control file into the simulation control file, and the simulation control file MUST include GROUP, PERIOD, SEQ and TRT data items for statistical testing.

Typically, the simulation data set will be comprised of:

A least two periods, cross over, single dose study, for a randomized sequence of test and reference. A four period study, with random sequence may be preferred.

Rich sampling out to at least 3 half-lives in the terminal phase for nearly all subjects.

The data set must include:

PERIOD - sequential number of periods, must be at least 2

GROUP - group assigned, e.g., Groups 1 and 2 are reference, Groups 3 and 4 are test

TRT (treatment) - usually 2, test and reference

SEQ (sequence) sequence of treatment, usually at least 2 sequence, e.g., AB or BA.

As these will be tested in the two one side T tests. These data items need not be used in the bootstrap analysis control files.

The bootstrap analysis NONMEM control file will be the template for the simulation control file. Therefore, the $INPUT must be the same between the bootstrap analysis data set and the Monte Carlo simulation data set. Basically, the “best” model control file for each bootstrap sample is edited by replacing any $TABLE records with those needed for the two one side T test, the $EST replaced with $SIM. The final parameters copied from the selected model .xml output file into the simulation control file.

Bioavailability-and-Bioequivalence-Studies-Submitted-in-NDAs-or-INDs—–General-Considerations .pdf

Gabrielsson, J. and Weiner, D. (2001). Pharmacokinetic and pharmacodynamic data analysis: concepts and applications, volume 2. CRC Press

Hu, C., Moore, K. H., Kim, Y. H., and Sale, M. E. (2004). Statistical issues in a modeling approach to assessing bioequivalence or pk similarity with presence of sparsely sampled subjects. Journal of pharmacokinetics and pharmacodynamics, 31(4):321–3

Seng Yue C, Ozdin D, Selber-Hnatiw S, Ducharme MP. Opportunities and Challenges Related to the Implementation of Model-Based Bioequivalence Criteria. Clin Pharmacol Ther. 2019 Feb;105(2):350-362. doi: 10.1002/cpt.1270. Epub 2019 Jan 8.

Box, George E. P. (1976), “Science and statistics” (PDF),

*Journal of the American Statistical Association*,**71**(356): 791–799,Schuirmann, D.J. A comparison of the Two One-Sided Tests Procedure and the Power Approach for assessing the equivalence of average bioavailability.

*Journal of Pharmacokinetics and Biopharmaceutics***15**, 657–680 (1987). https://doi.org/10.1007/BF01068419Lunn, D.J. Automated covariate selection and Bayesian model averaging in population PK/PD models.

*J Pharmacokinet Pharmacodyn***35**, 85–100 (2008). https://doi.org/10.1007/s10928-007-9077-xAoki, Y., Röshammar, D., Hamrén, B.

*et al.*Model selection and averaging of nonlinear mixed-effect models for robust phase III dose selection.*J Pharmacokinet Pharmacodyn***44**, 581–597 (2017). https://doi.org/10.1007/s10928-017-9550-0Neath, A.A. and Cavanaugh, J.E. (2012), The Bayesian information criterion: background, derivation, and applications. WIREs Comp Stat, 4: 199-203. https://doi.org/10.1002/wics.199

Nyberg HM, Hooker AC, Bauer RJ, Aoki Y. SADDLE_RESET: more robust parameter estimation with a check for local practical identifiability. https://www.page-meeting.org/pdf_assets/1345-PAGE_2017_SADDLE_RESET_Final.pdf