Consider allowing JavaScript. Otherwise, you have to be proficient in reading since formulas will not be rendered. Furthermore, the table of contents in the left column for navigation will not be available and code-folding not supported. Sorry for the inconvenience.

Examples in this article were generated with 4.0.5 by the packages TeachingDemos,1 magicaxis,2 and PowerTOST.3

For applications see also a collection of other articles.

• The right-hand badges give the respective section’s ‘level’.

1. Basics about sample size methodology – requiring no or only limited statistical expertise.

1. These sections are the most important ones. They are – hopefully – easily comprehensible even for novices.

1. A somewhat higher knowledge of statistics and/or R is required. May be skipped or reserved for a later reading.

1. An advanced knowledge of statistics and/or R is required. Not recommended for beginners in particular.

1. If you are not a neRd or statistics afficionado, skipping is recommended. Suggested for experts but might be confusing for others.
• Click to show / hide R code.
Abbreviation Meaning
ABE Average Bioequivalence
ABEL Average Bioequivalence with Expanding Limits
CV (CVwT, CVwR) Within-subject Coefficient of Variation
(of the Test and Reference treatment)
GMR Geometric Mean Ratio
HVD(P) Highly Variable Drug (Product)
RSABE Reference-scaled Average Bio­equivalence
TOST Two One-Sided Tests

# Introduction

What are simulations any why may we need them?

In statistics (or better, in science in general) we prefer to obtain results by an analytical calculation. Remember your classes in algebra and calculus? That’s the stuff.

However, sometimes the desire for an ‘exact’ solution of a problem reminds on the hunt for un­obtainium.

If you know the movie Hidden Figures maybe you recall the scene where Katherine Johnson solved the transition from the elliptic orbit of the Mercury capsule to the parabolic re-entry trajectory by Euler’s approximations. Given, not a simulation but a numerical method since an algebraic solution simply did not – and will not – exist.

Least-squares and maximum-likelihood optimizations are part of our bread-and-butter jobs. Nobody complains that the result is not ‘exact’.

Statistics. A sort of elementary form of mathematics which consists of adding things together and occasionally squaring them.
Stephen Senn

However, sometimes numerical methods are not feasible if the problem gets too large. Large in this context means many parameters, a high number of data points to fit, or both.
Two examples: Fig.1 Three body problem.

Approximate trajectories of three identical bodies located at the vertices of a scalene triangle and having zero initial velocities. It is seen that the center of mass, in accordance with the law of conservation of momentum, remains in place.
© Dnttllthmmnm @ wikimedia commons

• How many objects are in the solar system? You name it. Nevertheless, after a 9½ years and 5 billion km journey the New Horizons space probe had a 12,472 km close encounter with Pluto.
THX to numerics & simulations.

When it comes to (bio)equivalence we need simulations in various areas. One is the power calculation / sample size estimation for Reference-scaled Average Bioequivalence and the other for adaptive sequential Two-Stage Designs.

Whenever a theory appears to you as the only possible one, take this as a sign that you have neither understood the theory nor the problem which it was intended to solve.

As we will see in the examples, in the Monte Carlo method values approach the exact one asymptotically.

top of section ↩︎

# Examples

# attach libraries to run the examples
library(TeachingDemos)
library(magicaxis)
library(PowerTOST)

## Rolling two dice

Here it’s easy to get the average sum of pips. If you remember probability and combinatorics, you will get the answer 7. If you are a gambler, you simply know it. If you have two handy, roll them. Of course, you will get anything between the possible extremes (2 and 12) in a single roll. But you will need many rolls to come close to the theoretical average 7. Since this is a boring excercise (unless you find an idiot betting against 7), let R do the job (10,000 rolls):

set.seed(123)
nsims <- ceiling(10^(seq(0, 4, 0.01)))
df    <- data.frame(nsims = nsims, sum.pips = NA)
for (i in seq_along(nsims)) {
df[i, 2] <- sum(dice(rolls = nsims[i],
ndice = 2, sides = 6))/nsims[i]
}
ylim <- c(2, 12)
j    <- 10^(seq(0, 4, 1))
mean.j <- numeric()
for (i in seq_along(j)) {
mean.j[i] <- mean(df[df$nsims == j[i], 2]) } dev.new(width = 4.5, height = 4.5) op <- par(no.readonly = TRUE) par(mar = c(3, 3, 3, 0.1), cex.axis = 0.9, mgp = c(2, 0.5, 0)) plot(df$nsims, df[, 1], type = "n",
ylim = ylim, axes = FALSE, log = "x",
xlab = "Number of simulated rolls",
ylab = "Average sum of pips")
mtext("Average at 1, 10, \u2026", 3, line = 2)
grid(); box()
abline(h = 7)
magaxis(1)
magaxis(2, minorn = 2, las = 1)
axis(3, labels = sprintf("%.2f", mean.j), at = j)
df$pi.estimate[i] <- 2 * l / mean(HHH) setTxtProgressBar(pb, i/nrow(df)) } close(pb) tail(df, 1) # show last dev.new(width = 4.5, height = 4.5) op <- par(no.readonly = TRUE) par(mar = c(3, 3, 0, 0), cex.axis = 0.9, mgp = c(2, 0.5, 0)) plot(df$nsims, df$pi.estimate, type = "n", log = "x", axes = FALSE, xlab = "Number of simulations", ylab = expression(italic(pi)*"-estimate")) grid(); box() abline(h = pi, col = "red") magaxis(1:2, las = 1) points(df$nsims, df$pi.estimate, pch = 21, col = "blue", bg = "#9090FF80", cex = 1.15) par(op) Fig.4 Estimating π (Buffon’s needle drops). Of course, there are extremely efficient methods like the Chudnovsky algorithm (with January 2020 the record holder with 50 trillion digits). q <- 0:1L pi.rep <- 0 for (i in seq_along(q)) { pi.rep <- pi.rep - 12/640320^(3/2) * (factorial(6 * q[i]) * (13591409 + 545140134 *q[i])) / (factorial(3 * q[i]) * factorial(q[i])^3 * -640320^(3 * q[i])) } pi.approx <- 1/pi.rep comp <- data.frame(q = as.integer(max(q)), pi = sprintf("%.15f", pi), pi.approx = sprintf("%.15f", pi.approx), RE = sprintf("%-.15f%%", 100 * (pi.approx - pi) / pi)) print(comp, row.names = FALSE) R> q pi pi.approx RE R> 1 3.141592653589793 3.141592653589675 -0.000000000003746% Note that increasing the number of iterations q does not improve the result. Since this is an extremely efficient algorithm, we are already at the numeric resolution of a 64 bit numeric, which is ~2.220446·10-16. The wonderland of representing numbers on digital computers is elaborated in another article. ## Power of TOST Over the top since exact (analytical) solutions are available. However, if you don’t trust them, simulations are possible with the function power.TOST.sim(), which employs the distributional properties ($$\small{\sigma^2}$$ follows a $$\small{\chi^2}$$-distribution with $$\small{n-2}$$ degrees of freedom and $$\small{\log_{e}(\theta_0)}$$ follows a normal distribution).5 Convergence takes a while. Empiric power, its standard error, its relative error compared to the exact power, and execution times on a Xeon E3-1245v3 3.4 GHz, 8 MB cache, 16 GB RAM, 64 bit R 4.0.5 on Windows 7: CV <- 0.25 n <- sampleN.TOST(CV = CV, print = FALSE)[["Sample size"]] nsims <- c(1e5, 5e5, 1e6, 5e6, 1e7, 5e7, 1e8) exact <- power.TOST(CV = CV, n = n) df <- data.frame(simulations = nsims, exact = rep(exact, length(nsims)), simulated = NA, SE = NA, RE = NA) for (i in 1:nrow(df)) { start.time <- proc.time()[] df$simulated[i] <- power.TOST.sim(CV = CV, n = n,
nsims = nsims[i])
df$secs[i] <- proc.time()[] - start.time df$SE[i]        <- sqrt(0.025 / i)
df$RE <- 100 * (df$simulated - exact) / exact
}
df$exact <- signif(df$exact, 5)
df$SE <- signif(df$RE, 4)
df$RE <- sprintf("%+.4f%%", df$RE)
df$simulated <- signif(df$simulated, 5)
df$simulations <- formatC(df$simulations, format = "f",
digits = 0, big.mark=",")
names(df)[c(1, 3)] <- c("sim\u2019s", "sim\u2019d")
print(df, row.names = FALSE)
R>        sim’s   exact   sim’d         SE       RE   secs
R>      100,000 0.80744 0.80814  0.0867600 +0.0868%   0.18
R>      500,000 0.80744 0.80699 -0.0551700 -0.0552%   0.79
R>    1,000,000 0.80744 0.80766  0.0279300 +0.0279%   1.70
R>    5,000,000 0.80744 0.80751  0.0090080 +0.0090%   8.27
R>   10,000,000 0.80744 0.80733 -0.0136200 -0.0136%  16.32
R>   50,000,000 0.80744 0.80743 -0.0009096 -0.0009%  81.16
R>  100,000,000 0.80744 0.80744  0.0001147 +0.0001% 161.95 Fig.5 Empiric power of TOST (2×2×2 crossover, T/R ratio 0.95, n 28).

## Type I Error of TOST

We know that $$\small{\alpha}$$ (the probability of the Type I Error) never exceeds the nominal level of the test (generally 0.05). In e.g., a 2×2×2 crossover design with CV 25% and 28 subjects it is 0.04999963… We can simulate studies with a T/R-ratio at one of the borders of the acceptance range and assess them for ABE as well. We expect ≤5% to pass. The number of passing studies divided by the number of simulations gives the empiric Type I Error.

# Cave: Long run-time!
runs  <- 5
nsims <- as.integer(10^(seq(3, 7, 0.1)))
df    <- data.frame(run = 1:runs,
nsims = rep(nsims, each = runs),
binomial.lower = NA,
binomial.upper = NA,
TIE = NA)
CV    <- 0.25
n     <- sampleN.TOST(CV = CV,
print = FALSE)[["Sample size"]]
exact <- power.TOST(theta0 = 1.25, CV = CV, n = n)
pb    <- txtProgressBar(0, 1, 0, char = "\u2588",
width = NA, style = 3)
for (i in 1:nrow(df)) {
x   <- ceiling(0.05 * df$nsims[i]) b.t <- binom.test(x, df$nsims[i], alternative =
"two.sided")[["conf.int"]]
df[i, 3:4] <- as.numeric(b.t)
ifelse (df$run[i] == 1, setseed <- TRUE, setseed <- FALSE) df$TIE[i] <- power.TOST.sim(theta0 = 1.25, CV = CV,
n = n, nsims = df$nsims[i], setseed = setseed) setTxtProgressBar(pb, i/nrow(df)) } close(pb) ylim <- range(df[, 3:5]) dev.new(width = 4.5, height = 4.5) op <- par(no.readonly = TRUE) par(mar = c(3, 3.5, 0, 0), cex.axis = 0.9, mgp = c(2, 0.5, 0)) plot(df$nsims, df[, 2], type = "n", log = "x",
ylim = ylim, axes = FALSE,
xlab = "Number of simulations", ylab = "")
mtext("Empiric Type I Error", 2, line = 2.5)
grid(); box()
abline(h = exact, col = "red")
lines(df$nsims, df[, 3]) lines(df$nsims, df[, 4])
magaxis(1:2, las = 1)
for (i in 1:nrow(df)) {
ifelse (df$run[i] == 1, pch <- 19, pch <- 1) points(df$nsims[i], df$TIE[i], pch = pch, col = "blue", cex = 1.15) } par(op) Fig.6 Empiric Type I Error of TOST (2×2×2 crossover, n 28). Five runs. The filled circles are runs with a fixed seed of 123456 and open circles ones with random seeds. The lines show the upper and lower significance limits of the binomial test. We see that convergence of this problem is bad. Therefore, e.g., in Two-Stage Designs and in assessing a potentially inflated Type I Error in reference-scaling 106 simulations are common. ## Sample size for ABEL Table A2 gives for a 4-period full replicate study (EMA method), GMR 0.85, CVwR 50%, and ≥80% power a sample size of 54 (10,000 simulations),6 whilst the function sampleN.scABEL() of PowerTOST gives only 52 (with its default of 100,000 simulations). runs <- 5 nsims <- as.integer(10^(seq(3, 6, 0.1))) df <- data.frame(run = 1:runs, nsims = rep(nsims, each = runs), n = NA) pb <- txtProgressBar(0, 1, 0, char = "\u2588", width = NA, style = 3) for (i in 1:nrow(df)) { ifelse (df$run[i] == 1,
setseed <- TRUE,
setseed <- FALSE)
df$n[i] <- sampleN.scABEL(theta0 = 0.85, CV = 0.5, nsims = df$nsims[i],
setseed = setseed,
design = "2x2x4",
targetpower = 0.8,
print = FALSE,
details = FALSE)[["Sample size"]]
setTxtProgressBar(pb, i/nrow(df))
}
close(pb)
ylim <- range(df$n) dev.new(width = 4.5, height = 4.5) op <- par(no.readonly = TRUE) par(mar = c(3, 3, 0, 0), cex.axis = 0.9, mgp = c(2, 0.5, 0)) plot(df$nsims, df$n, type = "n", log = "x", ylim = ylim, axes = FALSE, xlab = "Number of simulations", ylab = "Sample size") grid(); box() magaxis(1) magaxis(2, minorn = 2, las = 1) for (i in 1:nrow(df)) { ifelse (df$run[i] == 1, pch <- 19, pch <- 1)
points(df$nsims[i], df$n[i], pch = pch,
col = "blue", cex = 1.15)
}
points(1e5, 52, pch = 16, col = "#00AA00", cex = 1.15)
points(1e4, 54, pch = 16, col = "red", cex = 1.15)
par(op) Fig.7 Sample size for ABEL (2×2×4 replicate, CV 0.50, GMR 0.85, power ≥0.80).

Five runs. As before filled circles are with a fixed seed of 123456 and open circles with random seeds. With only 10,000 simulations I got7 four times 50, once 52, and once 54 (like ‘The Two Lászlós’). If you have more runs, you will get in some of them more extreme ones (in 1,000 runs of 10,000 I got a median of 52 and a range of 48–56). However, this wider range demonstrates that the result is not stable. Hence, we need at least 50,000 simulations to end up with 52 as a stable result. Although you are free to opt for more than the default of 100,000 it’s a waste of time.

## Sample size for RSABE

Table A4 of the same reference gives for a 4-period full replicate study (FDA method), GMR 0.90, CVwR 70%, and ≥90% power a sample size of only 46 (10,000 simulations), whilst the function sampleN.RSABE() of PowerTOST gives 48 (with its default of 100,000 simulations).

runs  <- 5
nsims <- as.integer(10^(seq(3, 6, 0.1)))
df    <- data.frame(run = 1:runs,
nsims = rep(nsims, each = runs),
n = NA)
pb    <- txtProgressBar(0, 1, 0, char = "\u2588",
width = NA, style = 3)
for (i in 1:nrow(df)) {
ifelse (df$run[i] == 1, setseed <- TRUE, setseed <- FALSE) df$n[i] <- sampleN.RSABE(theta0 = 0.9, CV = 0.7,
nsims = df$nsims[i], setseed = setseed, design = "2x2x4", targetpower = 0.9, print = FALSE, details = FALSE)[["Sample size"]] setTxtProgressBar(pb, i/nrow(df)) } close(pb) ylim <- range(df$n)
dev.new(width = 4.5, height = 4.5)
op <- par(no.readonly = TRUE)
par(mar = c(3, 3, 0, 0), cex.axis = 0.9,
mgp = c(2, 0.5, 0))
plot(df$nsims, df$n, type = "n", log = "x", ylim = ylim,
axes = FALSE, xlab = "Number of simulations",
ylab = "Sample size")
grid(); box()
magaxis(1)
magaxis(2, minorn = 2, las = 1)
for (i in 1:nrow(df)) {
ifelse (df$run[i] == 1, pch <- 19, pch <- 1) points(df$nsims[i], df$n[i], pch = pch, col = "blue", cex = 1.15) } points(1e5, 48, pch = 16, col = "#00AA00", cex = 1.15) points(1e4, 46, pch = 16, col = "red", cex = 1.15) par(op) Fig.8 Sample size for RSABE (2×2×4 replicate, CV 0.70, GMR 0.90, power ≥0.90). Five runs. As before filled circles are with a fixed seed of 123456 and open circles with random seeds. With only 10,000 simulations I did’n get 46 (like ‘The Two Lászlós’) but once 48, twice 50, once 52, and once 54. In 1,000 runs of 10,000 I got a median of 48 and a range of 44–52). With more simulations we end up with 48 as a stable result. # Caveats, Hints One regulator once told me »I don’t believe in simulations«. Strong view. I doubt that he avoids to use any kind of modern transportation – a lot of simulations involved in the design and testing. However, there’s a grain of truth in his statement. Garbage in, garbage out. It’s very very simple. If an analytical solution exists ot more efficient methods are available, it’s easy to check whether the result of a simulation is reliable. ## General If an analytical solution exists and is reasonably fast, use it. In such a case performing simulations is a waste of time. If an exact method does not exist, it get’s tricky, i.e., rigorous testing is of utmost importance. • In statistics understanding of the data generating process helps us to decide which test we should use. • Concentrations – and hence, an integrated pharmacokinetic metric as the AUC as well – likely follow a lognormal distribution. Hence, in order to apply methods requiring a normal distributed data, we log-transform them first. The variance follows a χ2-distribution. • Time points of observations (tmax, tlag) depend on the sampling schedule and therefore, are discrete (although the distribution of the true ones is continuous). • Given that, in simulations we have take the distributional properties into account. It would be a serious flaw to simply fire up rnorm(n, mean = 0, sd = 1) in R which will give values of the standard normal. As said above, in the simulation-functions of PowerTOST these ‘key’ statistics are implemented: $$\small{\sigma^2}$$ follows a $$\small{\chi^2}$$-distribution with $$\small{n-2}$$ degrees of freedom and $$\small{\log_{e}(\theta_0)}$$ follows a normal distribution. This approach is less computational intensive than simulating subjects of a study. Let’s compare the sample size, power, and empiric Type I Error for a 4-period full replicate design, homoscedasticity (CVwT = CVwR = 0.45), T/R-ratio 0.90, target power ≥0.80, intended for the EMA’s ABEL. CV <- 0.45 design <- "2x2x4" U <- scABEL(CV = CV)[["upper"]] comp <- data.frame(method = c("stats", "subj"), n = NA, power = NA, TIE = NA, secs = NA) st <- proc.time()[] comp[1, 2:3] <- sampleN.scABEL(CV = CV, design = design, details = FALSE, print = FALSE)[8:9] comp[1, 4] <- power.scABEL(CV = CV, design = design, n = comp$n, theta0 = U,
nsims = 1e6)
comp[1, 5]   <- proc.time()[] - st
st           <- proc.time()[]
comp[2, 2:3] <- sampleN.scABEL.sdsims(CV = CV,
design = design,
details = FALSE,
print = FALSE)[8:9]
comp[2, 4]   <- power.scABEL.sdsims(CV = CV,
design = design,
n = comp$n, theta0 = U, nsims = 1e6, progress = FALSE) comp[2, 5] <- proc.time()[] - st comp <- c("\u2018key\u2019 stat\u2019s", "subject sim\u2019s") print(comp, row.names = FALSE) R> method n power TIE secs R> ‘key’ stat’s 28 0.81116 0.048889 0.66 R> subject sim’s 28 0.81196 0.048334 27.86 That’s a sufficiently close match and in general simulating via the ‘key’ statistics is justified due to speed reasons. However, there is one case where subject simulations are recommended: A partial replicate design with heteroscedasticity (only if CVwT > CVwR). CV <- 0.45 # split based on the variance ratio, # first element CVwT, second element CVwR CV <- CVp2CV(CV, ratio = 1.5) design <- "2x3x3" U <- scABEL(CV = CV)[["upper"]] comp <- data.frame(method = c("stats", "subj"), CVwT = CV, CVwR = CV, n = NA, power = NA, TIE = NA, secs = NA) st <- proc.time()[] comp[1, 4:5] <- sampleN.scABEL(CV = CV, design = design, details = FALSE, print = FALSE)[8:9] comp[1, 6] <- power.scABEL(CV = CV, design = design, n = comp$n, theta0 = U,
nsims = 1e6)
R> Warning: Heteroscedasticity in the 2x3x3 design may lead to poor accuracy of power;
R> consider the function power.scABEL.sdsims() instead.
comp[1, 7]   <- proc.time()[] - st
st           <- proc.time()[]
comp[2, 4:5] <- sampleN.scABEL.sdsims(CV = CV,
design = design,
details = FALSE,
print = FALSE)[8:9]
comp[2, 6]   <- power.scABEL.sdsims(CV = CV,
design = design,
n = comp$n, theta0 = U, nsims = 1e6, progress = FALSE) comp[2, 7] <- proc.time()[] - st comp[, 2:3] <- signif(comp[, 2:3], 4) comp <- c("\u2018key\u2019 stat\u2019s", "subject sim\u2019s") print(comp, row.names = FALSE) R> method CVwT CVwR n power TIE secs R> ‘key’ stat’s 0.4977 0.3987 51 0.81558 0.054635 0.61 R> subject sim’s 0.4977 0.3987 54 0.81723 0.067935 57.94 THX for throwing a warning! Yes, there is an inflated Type I Error. It will be discussed in another article. Compare that with homoscedasticity. CV <- c(0.45, 0.45) design <- "2x3x3" U <- scABEL(CV = CV)[["upper"]] comp <- data.frame(method = c("stats", "subj"), CVwT = CV, CVwR = CV, n = NA, power = NA, TIE = NA, secs = NA) st <- proc.time()[] comp[1, 4:5] <- sampleN.scABEL(CV = CV, design = design, details = FALSE, print = FALSE)[8:9] comp[1, 6] <- power.scABEL(CV = CV, design = design, n = comp$n, theta0 = U,
nsims = 1e6)
comp[1, 7]   <- proc.time()[] - st
st           <- proc.time()[]
comp[2, 4:5] <- sampleN.scABEL.sdsims(CV = CV,
design = design,
details = FALSE,
print = FALSE)[8:9]
comp[2, 6]   <- power.scABEL.sdsims(CV = CV,
design = design,
n = comp$n, theta0 = U, nsims = 1e6, progress = FALSE) comp[2, 7] <- proc.time()[] - st comp[, 2:3] <- signif(comp[, 2:3], 4) comp <- c("\u2018key\u2019 stat\u2019s", "subject sim\u2019s") print(comp, row.names = FALSE) R> method CVwT CVwR n power TIE secs R> ‘key’ stat’s 0.45 0.45 39 0.80588 0.046774 0.74 R> subject sim’s 0.45 0.45 39 0.80256 0.048737 37.16 No problems even in a partial replicate design. Why do we need less subjects? Although with heteroscedasticity we had the same pooled CVw of 0.45, here we can expand the limits more (CVwR 0.45 → 72.15–138.59% than with CVwR 0.3987 → 74.68–133.90%) and the lower CVwT (0.45 instead of 0.4977) will result in a narrower 90% confidence interval. ## Comparing Simulations Welcome to the nightmare. What you should do: PRNG: Pseudorandom number generator. • Use a fixed seed of the PRNG and give it in reports and publications. Only then your results are reproducible. From an infamous example: »A different randomly selected seed was used for each scenario.« Of course, I like the paper8 though it should have been published in the Journal of Irreproducible Results in parallel. Yes, you can almost reproduce the result but only if you round them to three significant digits. In a manuscript we recently submitted one reviewer remarked: »The authors report the statistical precision of their simulation results, generators, random seeds, etc. which is frequently forgotten in other simulation studies.« THX for the flowers! LCG: Linear congruential generator. • State which PRNG you used. Nowadays in many software packages the. Mersenne Twister9 is the default. It has a period of ≈4.3×106,001 and hence, one can be sure to get no repetitions in extremely large simulations. It is implemented even im M$ Excel since v2010.
However, in VBA still a LCG is implemented, which is useless in Monte Carlo simulations due to its short period.

• Find a compromise between accuracy and speed. As we have seen above, 100,000 simulations are sufficient in sample size estimations for reference-scaling. Nevertheless, in simulating the empiric Type I Error 106 (e.g., for adaptive Two-Stage Designs, reference-scaling) are highly recommended and the standard nowadays.

License

Footnotes and References

1. Snow G. TeachingDemos: Demonstrations for Teaching and Learning. 2020-04-07. CRAN.↩︎

2. Robotham A. magicaxis: Pretty Scientific Plotting with Minor-Tick and Log Minor-Tick Support. 2020-12-04. CRAN.↩︎

3. Labes D, Schütz H, Lang B. PowerTOST: Power and Sample Size for (Bio)Equivalence Studies. 2021-01-18. CRAN.↩︎

4. Jensen R. Simulating Buffon’s Needle in R. Answer at StackExchange. 2019-12-18.↩︎

5. Zheng C, Wang J, Zhao L. Testing bioequivalence for multiple formulations with power and sample size calculations. Pharm Stat. 2012; 11(4): 334–41. doi:10.1002/pst.1522.↩︎

6. Tóthfalusi L, Endrényi L. Sample Sizes for Designing Bioequivalence Studies for Highly Variable Drugs. J Pharm Pharmaceut Sci. 2011; 15(1): 73–84. doi:10.18433/j3z88f.
Open access.↩︎

7. If you perform five runs with 10,000 simulations, you will get different numbers. That’s the effect of random seeds and a low number of simulations.↩︎

8. Potvin D, DiLiberti CE, Hauck WW, Parr AF, Schuirmann DJ, Smith RA. Sequential design approaches for bioequivalence studies with crossover designs Pharm Stat. 2007; 7(4): 245–262. doi:10.1002/pst.294.↩︎

9. Matsumoto M, Nishimura T. Mersenne Twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans Model Comput Simul. 1998; 8: 3–30. doi:10.1145/272991.272995.↩︎

10. Schütz H. Reference-Scaled Average Bioequivalence. Heterogenicity. 2020-12-23. CRAN.↩︎

11. FDA. Office of Generic Drugs. Draft Guidance on Progesterone. Step 2.. Recommended Apr 2010; Revised Feb 2011.↩︎