Consider allowing JavaScript. Otherwise, you have to be proficient in reading LaTeX since formulas will not be rendered. Furthermore, the table of contents in the left column for navigation will not be available and code-folding not supported. Sorry for the inconvenience.

Examples in this article were generated with R 4.0.5 by the packages TeachingDemos,1 magicaxis,2 and PowerTOST.3

For applications see also a collection of other articles.

  • The right-hand badges give the respective section’s ‘level’.
    
  1. Basics about sample size methodology – requiring no or only limited statistical expertise.
    
  1. These sections are the most important ones. They are – hopefully – easily comprehensible even for novices.
    
  1. A somewhat higher knowledge of statistics and/or R is required. May be skipped or reserved for a later reading.
    
  1. An advanced knowledge of statistics and/or R is required. Not recommended for beginners in particular.
    
  1. If you are not a neRd or statistics afficionado, skipping is recommended. Suggested for experts but might be confusing for others.
  • Click to show / hide R code.
Abbreviation Meaning
ABE Average Bioequivalence
ABEL Average Bioequivalence with Expanding Limits
CV (CVwT, CVwR) Within-subject Coefficient of Variation
(of the Test and Reference treatment)
GMR Geometric Mean Ratio
HVD(P) Highly Variable Drug (Product)
RSABE Reference-scaled Average Bio­equivalence
TOST Two One-Sided Tests

Introduction

    

What are simulations any why may we need them?

In statistics (or better, in science in general) we prefer to obtain results by an analytical calculation. Remember your classes in algebra and calculus? That’s the stuff.

However, sometimes the desire for an ‘exact’ solution of a problem reminds on the hunt for un­obtainium.

If you know the movie Hidden Figures maybe you recall the scene where Katherine Johnson solved the transition from the elliptic orbit of the Mercury capsule to the parabolic re-entry trajectory by Euler’s approximations. Given, not a simulation but a numerical method since an algebraic solution simply did not – and will not – exist.

Least-squares and maximum-likelihood optimizations are part of our bread-and-butter jobs. Nobody complains that the result is not ‘exact’.

Statistics. A sort of elementary form of mathematics which consists of adding things together and occasionally squaring them.
Stephen Senn
    

However, sometimes numerical methods are not feasible if the problem gets too large. Large in this context means many parameters, a high number of data points to fit, or both.
Two examples:

    
    

Fig.1 Three body problem.

Approximate trajectories of three identical bodies located at the vertices of a scalene triangle and having zero initial velocities. It is seen that the center of mass, in accordance with the law of conservation of momentum, remains in place.
© Dnttllthmmnm @ wikimedia commons

  • How many objects are in the solar system? You name it. Nevertheless, after a 9½ years and 5 billion km journey the New Horizons space probe had a 12,472 km close encounter with Pluto.
    THX to numerics & simulations.
    

When it comes to (bio)equivalence we need simulations in various areas. One is the power calculation / sample size estimation for Reference-scaled Average Bioequivalence and the other for adaptive sequential Two-Stage Designs.

Whenever a theory appears to you as the only possible one, take this as a sign that you have neither understood the theory nor the problem which it was intended to solve.

As we will see in the examples, in the Monte Carlo method values approach the exact one asymptotically.

top of section ↩︎

Examples

# attach libraries to run the examples
library(TeachingDemos)
library(magicaxis)
library(PowerTOST)

Rolling two dice

    

Here it’s easy to get the average sum of pips. If you remember probability and combinatorics, you will get the answer 7. If you are a gambler, you simply know it. If you have two handy, roll them. Of course, you will get anything between the possible extremes (2 and 12) in a single roll. But you will need many rolls to come close to the theoretical average 7. Since this is a boring excercise (unless you find an idiot betting against 7), let R do the job (10,000 rolls):

set.seed(123)
nsims <- ceiling(10^(seq(0, 4, 0.01)))
df    <- data.frame(nsims = nsims, sum.pips = NA)
for (i in seq_along(nsims)) {
  df[i, 2] <- sum(dice(rolls = nsims[i],
                       ndice = 2, sides = 6))/nsims[i]
}
ylim <- c(2, 12)
j    <- 10^(seq(0, 4, 1))
mean.j <- numeric()
for (i in seq_along(j)) {
  mean.j[i] <- mean(df[df$nsims == j[i], 2])
}
dev.new(width = 4.5, height = 4.5)
op <- par(no.readonly = TRUE)
par(mar = c(3, 3, 3, 0.1), cex.axis = 0.9,
    mgp = c(2, 0.5, 0))
plot(df$nsims, df[, 1], type = "n",
     ylim = ylim, axes = FALSE, log = "x",
     xlab = "Number of simulated rolls",
     ylab = "Average sum of pips")
mtext("Average at 1, 10, \u2026", 3, line = 2)
grid(); box()
abline(h = 7)
magaxis(1)
magaxis(2, minorn = 2, las = 1)
axis(3, labels = sprintf("%.2f", mean.j), at = j)
points(df$nsims, df[, 2], pch = 21,
       col = "blue", bg = "#9090FF80", cex = 1.15)
par(op)

Fig.2 Rolling two dice.

top of section ↩︎ previous section ↩︎

Estimating π

    

We generate a set of uniformly distributed coordinates \(\small{\left\{x|y\right\}}\) in the range \(\small{\left\{-1|+1\right\}}\). According to Mr Pythagoras all \(\small{x^2+y^2\leq1}\) are at or within the unit circle and coded in blue. Since \(\small{A=\frac{\pi\,d^2}{4}}\) and \(\small{d=1}\), we get an estimate of \(\small{\pi}\) by 4 times the number of coded points (the ‘area’) divided by the number of simulations. Here for 50,000:

set.seed(12345678)
nsims <- 5e4
x <- runif(nsims, -1, 1)
y <- runif(nsims, -1, 1)
is.inside   <- (x^2 + y^2) <= 1
pi.estimate <- 4 * sum(is.inside) / nsims
dev.new(width = 5.7, height = 5.7)
op <- par(no.readonly = TRUE)
par(pty = "s", cex.axis = 0.9, mgp = c(2, 0.5, 0))
main <- bquote(italic(pi)*-estimate == .(pi.estimate))
plot(c(-1, 1), c(-1, 1), type = "n", axes = FALSE,
     xlim = c(-1, 1), ylim = c(-1, 1), font.main = 1,
     xlab = expression(italic(x)),
     ylab = expression(italic(y)), cex.main = 1.1,
     main = main)
points(x[is.inside], y[is.inside], pch = '.', 
       col = "blue")
points(x[!is.inside], y[!is.inside], pch = '.',
       col = "lightgrey")
magaxis(1:2, las = 1)
abline(h = 0, v = 0)
box()
par(op)

Fig.3 Estimating π (unit circle inclusion method).

This was a lucky punch since the convergence is lousy. To get a stable estimate with just four correct significant digits, we need at least 100 million (‼) simulations.

    

Another slowly converging approach is simulating Buffon’s needle.4

# Cave: Extremely long runtime
set.seed(123456)
buffon_needles <- function(l, n = 1) {
  H <- rep(0, n)
  for (i in 1:n) {
    M  <- runif(2)
    D  <- c(1, 1)
    while (norm(D, type = '2') > 1) {
      D <- runif(2)
    }
    DD   <- D/norm(D, type = '2')
    E1   <- M + l / 2 * DD
    E2   <- M - l / 2 * DD
    H[i] <- (E1[1] < 0) |
            (E1[1] > 1) |
            (E2[1] < 0) |
            (E2[1] > 1)
  }
  return(H)
}
l     <- 0.4
nsims <- as.integer(10^(seq(3, 7, 0.05)))
pb    <- txtProgressBar(0, 1, 0, char = "\u2588",
                        width = NA, style = 3)
for (i in 1:nrow(df)) {
  HHH <- buffon_needles(l, n = df$nsims[i])
  df$pi.estimate[i] <- 2 * l / mean(HHH)
  setTxtProgressBar(pb, i/nrow(df))
}
close(pb)
tail(df, 1) # show last
dev.new(width = 4.5, height = 4.5)
op <- par(no.readonly = TRUE)
par(mar = c(3, 3, 0, 0), cex.axis = 0.9,
    mgp = c(2, 0.5, 0))
plot(df$nsims, df$pi.estimate, type = "n",
     log = "x", axes = FALSE,
     xlab = "Number of simulations",
     ylab = expression(italic(pi)*"-estimate"))
grid(); box()
abline(h = pi, col = "red")
magaxis(1:2, las = 1)
points(df$nsims, df$pi.estimate, pch = 21,
       col = "blue", bg = "#9090FF80", cex = 1.15)
par(op)

Fig.4 Estimating π (Buffon’s needle drops).

    

Of course, there are extremely efficient methods like the Chudnovsky algorithm (with January 2020 the record holder with 50 trillion digits).

q      <- 0:1L
pi.rep <- 0
for (i in seq_along(q)) {
  pi.rep <- pi.rep - 
            12/640320^(3/2) *
            (factorial(6 * q[i]) * (13591409 + 545140134 *q[i])) /
            (factorial(3 * q[i]) * factorial(q[i])^3 *
             -640320^(3 * q[i]))
}
pi.approx <- 1/pi.rep
comp <- data.frame(q = as.integer(max(q)),
                   pi = sprintf("%.15f", pi),
                   pi.approx = sprintf("%.15f", pi.approx),
                   RE = sprintf("%-.15f%%",
                               100 * (pi.approx - pi) / pi))
print(comp, row.names = FALSE)
R>  q                pi         pi.approx                  RE
R>  1 3.141592653589793 3.141592653589675 -0.000000000003746%

Note that increasing the number of iterations q does not improve the result. Since this is an extremely efficient algorithm, we are already at the numeric resolution of a 64 bit numeric, which is ~2.220446·10-16.

The wonderland of representing numbers on digital computers is elaborated in another article.

top of section ↩︎ previous example ↩︎ previous section ↩︎

Power of TOST

    

Over the top since exact (analytical) solutions are available. However, if you don’t trust them, simulations are possible with the function power.TOST.sim(), which employs the distributional properties (\(\small{\sigma^2}\) follows a \(\small{\chi^2}\)-distribution with \(\small{n-2}\) degrees of freedom and \(\small{\log_{e}(\theta_0)}\) follows a normal distribution).5 Convergence takes a while. Empiric power, its standard error, its relative error compared to the exact power, and execution times on a Xeon E3-1245v3 3.4 GHz, 8 MB cache, 16 GB RAM, 64 bit R 4.0.5 on Windows 7:

CV    <- 0.25
n     <- sampleN.TOST(CV = CV,
                      print = FALSE)[["Sample size"]]
nsims <- c(1e5, 5e5, 1e6, 5e6, 1e7, 5e7, 1e8)
exact <- power.TOST(CV = CV, n = n)
df    <- data.frame(simulations = nsims,
                    exact = rep(exact, length(nsims)),
                    simulated = NA, SE = NA, RE = NA)
for (i in 1:nrow(df)) {
  start.time      <- proc.time()[[3]]
  df$simulated[i] <- power.TOST.sim(CV = CV, n = n,
                                    nsims = nsims[i])
  df$secs[i]      <- proc.time()[[3]] - start.time
  df$SE[i]        <- sqrt(0.025 / i)
  df$RE           <- 100 * (df$simulated - exact) / exact
}
df$exact       <- signif(df$exact, 5)
df$SE          <- signif(df$RE, 4)
df$RE          <- sprintf("%+.4f%%", df$RE)
df$simulated   <- signif(df$simulated, 5)
df$simulations <- formatC(df$simulations, format = "f",
                          digits = 0, big.mark=",")
names(df)[c(1, 3)] <- c("sim\u2019s", "sim\u2019d")
print(df, row.names = FALSE)
R>        sim’s   exact   sim’d         SE       RE   secs
R>      100,000 0.80744 0.80814  0.0867600 +0.0868%   0.18
R>      500,000 0.80744 0.80699 -0.0551700 -0.0552%   0.79
R>    1,000,000 0.80744 0.80766  0.0279300 +0.0279%   1.70
R>    5,000,000 0.80744 0.80751  0.0090080 +0.0090%   8.27
R>   10,000,000 0.80744 0.80733 -0.0136200 -0.0136%  16.32
R>   50,000,000 0.80744 0.80743 -0.0009096 -0.0009%  81.16
R>  100,000,000 0.80744 0.80744  0.0001147 +0.0001% 161.95

Fig.5 Empiric power of TOST (2×2×2 crossover, T/R ratio 0.95, n 28).

top of section ↩︎ previous example ↩︎ previous section ↩︎

Type I Error of TOST

    

We know that \(\small{\alpha}\) (the probability of the Type I Error) never exceeds the nominal level of the test (generally 0.05). In e.g., a 2×2×2 crossover design with CV 25% and 28 subjects it is 0.04999963… We can simulate studies with a T/R-ratio at one of the borders of the acceptance range and assess them for ABE as well. We expect ≤5% to pass. The number of passing studies divided by the number of simulations gives the empiric Type I Error.

# Cave: Long run-time!
runs  <- 5
nsims <- as.integer(10^(seq(3, 7, 0.1)))
df    <- data.frame(run = 1:runs,
                    nsims = rep(nsims, each = runs),
                    binomial.lower = NA,
                    binomial.upper = NA,
                    TIE = NA)
CV    <- 0.25
n     <- sampleN.TOST(CV = CV,
                      print = FALSE)[["Sample size"]]
exact <- power.TOST(theta0 = 1.25, CV = CV, n = n)
pb    <- txtProgressBar(0, 1, 0, char = "\u2588",
                        width = NA, style = 3)
for (i in 1:nrow(df)) {
  x   <- ceiling(0.05 * df$nsims[i])
  b.t <- binom.test(x, df$nsims[i], alternative =
                    "two.sided")[["conf.int"]]
  df[i, 3:4] <- as.numeric(b.t)
  ifelse (df$run[i] == 1,
    setseed <- TRUE,
    setseed <- FALSE)
  df$TIE[i] <- power.TOST.sim(theta0 = 1.25, CV = CV,
                              n = n, nsims = df$nsims[i],
                              setseed = setseed)
  setTxtProgressBar(pb, i/nrow(df))
}
close(pb)
ylim <- range(df[, 3:5])
dev.new(width = 4.5, height = 4.5)
op <- par(no.readonly = TRUE)
par(mar = c(3, 3.5, 0, 0), cex.axis = 0.9,
    mgp = c(2, 0.5, 0))
plot(df$nsims, df[, 2], type = "n", log = "x",
     ylim = ylim, axes = FALSE,
     xlab = "Number of simulations", ylab = "")
mtext("Empiric Type I Error", 2, line = 2.5)
grid(); box()
abline(h = exact, col = "red")
lines(df$nsims, df[, 3])
lines(df$nsims, df[, 4])
magaxis(1:2, las = 1)
for (i in 1:nrow(df)) {
  ifelse (df$run[i] == 1, pch <- 19, pch <- 1)
  points(df$nsims[i], df$TIE[i], pch = pch,
         col = "blue", cex = 1.15)
}
par(op)

Fig.6 Empiric Type I Error of TOST (2×2×2 crossover, n 28).

Five runs. The filled circles are runs with a fixed seed of 123456 and open circles ones with random seeds. The lines show the upper and lower significance limits of the binomial test. We see that convergence of this problem is bad.

Therefore, e.g., in Two-Stage Designs and in assessing a potentially inflated Type I Error in reference-scaling 106 simulations are common.

top of section ↩︎ previous example ↩︎ previous section ↩︎

Sample size for ABEL

    

Table A2 gives for a 4-period full replicate study (EMA method), GMR 0.85, CVwR 50%, and ≥80% power a sample size of 54 (10,000 simulations),6 whilst the function sampleN.scABEL() of PowerTOST gives only 52 (with its default of 100,000 simulations).

runs  <- 5
nsims <- as.integer(10^(seq(3, 6, 0.1)))
df    <- data.frame(run = 1:runs,
                    nsims = rep(nsims, each = runs),
                    n = NA)
pb    <- txtProgressBar(0, 1, 0, char = "\u2588",
                        width = NA, style = 3)
for (i in 1:nrow(df)) {
    ifelse (df$run[i] == 1,
      setseed <- TRUE,
      setseed <- FALSE)
    df$n[i] <- sampleN.scABEL(theta0 = 0.85, CV = 0.5,
                              nsims = df$nsims[i],
                              setseed = setseed,
                              design = "2x2x4",
                              targetpower = 0.8,
                              print = FALSE,
                              details = FALSE)[["Sample size"]]
  setTxtProgressBar(pb, i/nrow(df))
}
close(pb)
ylim <- range(df$n)
dev.new(width = 4.5, height = 4.5)
op <- par(no.readonly = TRUE)
par(mar = c(3, 3, 0, 0), cex.axis = 0.9,
    mgp = c(2, 0.5, 0))
plot(df$nsims, df$n, type = "n", log = "x", ylim = ylim,
     axes = FALSE, xlab = "Number of simulations",
     ylab = "Sample size")
grid(); box()
magaxis(1)
magaxis(2, minorn = 2, las = 1)
for (i in 1:nrow(df)) {
  ifelse (df$run[i] == 1, pch <- 19, pch <- 1)
  points(df$nsims[i], df$n[i], pch = pch,
         col = "blue", cex = 1.15)
}
points(1e5, 52, pch = 16, col = "#00AA00", cex = 1.15)
points(1e4, 54, pch = 16, col = "red", cex = 1.15)
par(op)

Fig.7 Sample size for ABEL (2×2×4 replicate, CV 0.50, GMR 0.85, power ≥0.80).

Five runs. As before filled circles are with a fixed seed of 123456 and open circles with random seeds. With only 10,000 simulations I got7 four times 50, once 52, and once 54 (like ‘The Two Lászlós’). If you have more runs, you will get in some of them more extreme ones (in 1,000 runs of 10,000 I got a median of 52 and a range of 48–56). However, this wider range demonstrates that the result is not stable. Hence, we need at least 50,000 simulations to end up with 52 as a stable result. Although you are free to opt for more than the default of 100,000 it’s a waste of time.

top of section ↩︎ previous example ↩︎ previous section ↩︎

Sample size for RSABE

    

Table A4 of the same reference gives for a 4-period full replicate study (FDA method), GMR 0.90, CVwR 70%, and ≥90% power a sample size of only 46 (10,000 simulations), whilst the function sampleN.RSABE() of PowerTOST gives 48 (with its default of 100,000 simulations).

runs  <- 5
nsims <- as.integer(10^(seq(3, 6, 0.1)))
df    <- data.frame(run = 1:runs,
                    nsims = rep(nsims, each = runs),
                    n = NA)
pb    <- txtProgressBar(0, 1, 0, char = "\u2588",
                        width = NA, style = 3)
for (i in 1:nrow(df)) {
    ifelse (df$run[i] == 1,
      setseed <- TRUE,
      setseed <- FALSE)
    df$n[i] <- sampleN.RSABE(theta0 = 0.9, CV = 0.7,
                             nsims = df$nsims[i],
                             setseed = setseed,
                             design = "2x2x4",
                             targetpower = 0.9,
                             print = FALSE,
                             details = FALSE)[["Sample size"]]
  setTxtProgressBar(pb, i/nrow(df))
}
close(pb)
ylim <- range(df$n)
dev.new(width = 4.5, height = 4.5)
op <- par(no.readonly = TRUE)
par(mar = c(3, 3, 0, 0), cex.axis = 0.9,
    mgp = c(2, 0.5, 0))
plot(df$nsims, df$n, type = "n", log = "x", ylim = ylim,
     axes = FALSE, xlab = "Number of simulations",
     ylab = "Sample size")
grid(); box()
magaxis(1)
magaxis(2, minorn = 2, las = 1)
for (i in 1:nrow(df)) {
  ifelse (df$run[i] == 1, pch <- 19, pch <- 1)
  points(df$nsims[i], df$n[i], pch = pch,
         col = "blue", cex = 1.15)
}
points(1e5, 48, pch = 16, col = "#00AA00", cex = 1.15)
points(1e4, 46, pch = 16, col = "red", cex = 1.15)
par(op)

Fig.8 Sample size for RSABE (2×2×4 replicate, CV 0.70, GMR 0.90, power ≥0.90).

Five runs. As before filled circles are with a fixed seed of 123456 and open circles with random seeds. With only 10,000 simulations I did’n get 46 (like ‘The Two Lászlós’) but once 48, twice 50, once 52, and once 54. In 1,000 runs of 10,000 I got a median of 48 and a range of 44–52). With more simulations we end up with 48 as a stable result.

top of section ↩︎ previous example ↩︎ previous section ↩︎

Caveats, Hints

    

One regulator once told me »I don’t believe in simulations«. Strong view. I doubt that he avoids to use any kind of modern transportation – a lot of simulations involved in the design and testing. However, there’s a grain of truth in his statement.

Garbage in, garbage out. It’s very very simple.

If an analytical solution exists ot more efficient methods are available, it’s easy to check whether the result of a simulation is reliable.

General

    

If an analytical solution exists and is reasonably fast, use it. In such a case performing simulations is a waste of time. If an exact method does not exist, it get’s tricky, i.e., rigorous testing is of utmost importance.

    
  • In statistics understanding of the data generating process helps us to decide which test we should use.
    • Concentrations – and hence, an integrated pharmacokinetic metric as the AUC as well – likely follow a lognormal distribution. Hence, in order to apply methods requiring a normal distributed data, we log-transform them first.
      The variance follows a χ2-distribution.
    • Time points of observations (tmax, tlag) depend on the sampling schedule and therefore, are discrete (although the distribution of the true ones is continuous).
  • Given that, in simulations we have take the distributional properties into account. It would be a serious flaw to simply fire up rnorm(n, mean = 0, sd = 1) in R which will give values of the standard normal.

As said above, in the simulation-functions of PowerTOST these ‘key’ statistics are implemented: \(\small{\sigma^2}\) follows a \(\small{\chi^2}\)-distribution with \(\small{n-2}\) degrees of freedom and \(\small{\log_{e}(\theta_0)}\) follows a normal distribution. This approach is less computational intensive than simulating subjects of a study.
Let’s compare the sample size, power, and empiric Type I Error for a 4-period full replicate design, homoscedasticity (CVwT = CVwR = 0.45), T/R-ratio 0.90, target power ≥0.80, intended for the EMA’s ABEL.

CV           <- 0.45
design       <- "2x2x4"
U            <- scABEL(CV = CV)[["upper"]]
comp         <- data.frame(method = c("stats", "subj"),
                           n = NA, power = NA, TIE = NA,
                           secs = NA)
st           <- proc.time()[[3]]
comp[1, 2:3] <- sampleN.scABEL(CV = CV, design = design,
                               details = FALSE,
                               print = FALSE)[8:9]
comp[1, 4]   <- power.scABEL(CV = CV, design = design,
                             n = comp$n[1], theta0 = U,
                             nsims = 1e6)
comp[1, 5]   <- proc.time()[[3]] - st
st           <- proc.time()[[3]]
comp[2, 2:3] <- sampleN.scABEL.sdsims(CV = CV,
                                      design = design,
                                      details = FALSE,
                                      print = FALSE)[8:9]
comp[2, 4]   <- power.scABEL.sdsims(CV = CV,
                                    design = design,
                                    n = comp$n[1],
                                    theta0 = U,
                                    nsims = 1e6,
                                    progress = FALSE)
comp[2, 5]   <- proc.time()[[3]] - st
comp[1]      <- c("\u2018key\u2019 stat\u2019s",
                  "subject sim\u2019s")
print(comp, row.names = FALSE)
R>         method  n   power      TIE  secs
R>   ‘key’ stat’s 28 0.81116 0.048889  0.66
R>  subject sim’s 28 0.81196 0.048334 27.86

That’s a sufficiently close match and in general simulating via the ‘key’ statistics is justified due to speed reasons.

    

However, there is one case where subject simulations are recommended: A partial replicate design with heteroscedasticity (only if CVwT > CVwR).

CV           <- 0.45
# split based on the variance ratio,
# first element CVwT, second element CVwR
CV           <- CVp2CV(CV, ratio = 1.5)
design       <- "2x3x3"
U            <- scABEL(CV = CV[2])[["upper"]]
comp         <- data.frame(method = c("stats", "subj"),
                           CVwT = CV[1], CVwR = CV[2],
                           n = NA, power = NA, TIE = NA,
                           secs = NA)
st           <- proc.time()[[3]]
comp[1, 4:5] <- sampleN.scABEL(CV = CV, design = design,
                               details = FALSE,
                               print = FALSE)[8:9]
comp[1, 6]   <- power.scABEL(CV = CV, design = design,
                             n = comp$n[1], theta0 = U,
                             nsims = 1e6)
R> Warning: Heteroscedasticity in the 2x3x3 design may lead to poor accuracy of power;
R> consider the function power.scABEL.sdsims() instead.
comp[1, 7]   <- proc.time()[[3]] - st
st           <- proc.time()[[3]]
comp[2, 4:5] <- sampleN.scABEL.sdsims(CV = CV,
                                      design = design,
                                      details = FALSE,
                                      print = FALSE)[8:9]
comp[2, 6]   <- power.scABEL.sdsims(CV = CV,
                                    design = design,
                                    n = comp$n[1],
                                    theta0 = U,
                                    nsims = 1e6,
                                    progress = FALSE)
comp[2, 7]   <- proc.time()[[3]] - st
comp[, 2:3]  <- signif(comp[, 2:3], 4)
comp[1]      <- c("\u2018key\u2019 stat\u2019s",
                  "subject sim\u2019s")
print(comp, row.names = FALSE)
R>         method   CVwT   CVwR  n   power      TIE  secs
R>   ‘key’ stat’s 0.4977 0.3987 51 0.81558 0.054635  0.61
R>  subject sim’s 0.4977 0.3987 54 0.81723 0.067935 57.94

THX for throwing a warning!
Yes, there is an inflated Type I Error. It will be discussed in another article.

Compare that with homoscedasticity.

CV           <- c(0.45, 0.45)
design       <- "2x3x3"
U            <- scABEL(CV = CV[2])[["upper"]]
comp         <- data.frame(method = c("stats", "subj"),
                           CVwT = CV[1], CVwR = CV[2],
                           n = NA, power = NA, TIE = NA,
                           secs = NA)
st           <- proc.time()[[3]]
comp[1, 4:5] <- sampleN.scABEL(CV = CV, design = design,
                               details = FALSE,
                               print = FALSE)[8:9]
comp[1, 6]   <- power.scABEL(CV = CV, design = design,
                             n = comp$n[1], theta0 = U,
                             nsims = 1e6)
comp[1, 7]   <- proc.time()[[3]] - st
st           <- proc.time()[[3]]
comp[2, 4:5] <- sampleN.scABEL.sdsims(CV = CV,
                                      design = design,
                                      details = FALSE,
                                      print = FALSE)[8:9]
comp[2, 6]   <- power.scABEL.sdsims(CV = CV,
                                    design = design,
                                    n = comp$n[1],
                                    theta0 = U,
                                    nsims = 1e6,
                                    progress = FALSE)
comp[2, 7]   <- proc.time()[[3]] - st
comp[, 2:3]  <- signif(comp[, 2:3], 4)
comp[1]      <- c("\u2018key\u2019 stat\u2019s",
                  "subject sim\u2019s")
print(comp, row.names = FALSE)
R>         method CVwT CVwR  n   power      TIE  secs
R>   ‘key’ stat’s 0.45 0.45 39 0.80588 0.046774  0.74
R>  subject sim’s 0.45 0.45 39 0.80256 0.048737 37.16

No problems even in a partial replicate design.

Why do we need less subjects? Although with heteroscedasticity we had the same pooled CVw of 0.45, here we can expand the limits more (CVwR 0.45 → 72.15–138.59% than with CVwR 0.3987 → 74.68–133.90%) and the lower CVwT (0.45 instead of 0.4977) will result in a narrower 90% confidence interval.

top of section ↩︎ previous section ↩︎

Comparing Simulations

    

Welcome to the nightmare. What you should do:

PRNG: Pseudorandom number generator.
  • Use a fixed seed of the PRNG and give it in reports and publications. Only then your results are reproducible.
    From an infamous example: »A different randomly selected seed was used for each scenario.«
    Of course, I like the paper8 though it should have been published in the Journal of Irreproducible Results in parallel. Yes, you can almost reproduce the result but only if you round them to three significant digits.
    In a manuscript we recently submitted one reviewer remarked: »The authors report the statistical precision of their simulation results, generators, random seeds, etc. which is frequently forgotten in other simulation studies.« THX for the flowers!
LCG: Linear congruential generator.
  • State which PRNG you used. Nowadays in many software packages the. Mersenne Twister9 is the default. It has a period of ≈4.3×106,001 and hence, one can be sure to get no repetitions in extremely large simulations. It is implemented even im M$ Excel since v2010.
    However, in VBA still a LCG is implemented, which is useless in Monte Carlo simulations due to its short period.

  • Find a compromise between accuracy and speed. As we have seen above, 100,000 simulations are sufficient in sample size estimations for reference-scaling. Nevertheless, in simulating the empiric Type I Error 106 (e.g., for adaptive Two-Stage Designs, reference-scaling) are highly recommended and the standard nowadays.

    

top of section ↩︎ previous section ↩︎

License

CC BY 4.0 Helmut Schütz 2021
1st version March 13, 2021.
Rendered 2021-04-11 12:48:34 CEST by rmarkdown in 0.86 seconds.

Footnotes and References


  1. Snow G. TeachingDemos: Demonstrations for Teaching and Learning. 2020-04-07. CRAN.↩︎

  2. Robotham A. magicaxis: Pretty Scientific Plotting with Minor-Tick and Log Minor-Tick Support. 2020-12-04. CRAN.↩︎

  3. Labes D, Schütz H, Lang B. PowerTOST: Power and Sample Size for (Bio)Equivalence Studies. 2021-01-18. CRAN.↩︎

  4. Jensen R. Simulating Buffon’s Needle in R. Answer at StackExchange. 2019-12-18.↩︎

  5. Zheng C, Wang J, Zhao L. Testing bioequivalence for multiple formulations with power and sample size calculations. Pharm Stat. 2012; 11(4): 334–41. doi:10.1002/pst.1522.↩︎

  6. Tóthfalusi L, Endrényi L. Sample Sizes for Designing Bioequivalence Studies for Highly Variable Drugs. J Pharm Pharmaceut Sci. 2011; 15(1): 73–84. doi:10.18433/j3z88f.
    Open access.↩︎

  7. If you perform five runs with 10,000 simulations, you will get different numbers. That’s the effect of random seeds and a low number of simulations.↩︎

  8. Potvin D, DiLiberti CE, Hauck WW, Parr AF, Schuirmann DJ, Smith RA. Sequential design approaches for bioequivalence studies with crossover designs Pharm Stat. 2007; 7(4): 245–262. doi:10.1002/pst.294.↩︎

  9. Matsumoto M, Nishimura T. Mersenne Twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans Model Comput Simul. 1998; 8: 3–30. doi:10.1145/272991.272995.↩︎

  10. Schütz H. Reference-Scaled Average Bioequivalence. Heterogenicity. 2020-12-23. CRAN.↩︎

  11. FDA. Office of Generic Drugs. Draft Guidance on Progesterone. Step 2.. Recommended Apr 2010; Revised Feb 2011.↩︎