Consider allowing JavaScript. Otherwise, you have to be proficient in reading since formulas will not be rendered. Furthermore, the table of contents in the left column for navigation will not be available and code-folding not supported. Sorry for the inconvenience.

Examples in this article were generated with 4.1.0 by the packages PowerTOST1 and TeachingDemos.2

More examples are given in the respective vignettes.3 4 See also the README on GitHub for an overview and the online manual5 for details and a collection of other articles.

• The right-hand badges give the respective section’s ‘level’.

1. Basics about power and sample size methodology – requiring no or only limited statistical expertise.

1. These sections are the most important ones. They are – hopefully – easily comprehensible even for novices.

1. A somewhat higher knowledge of statistics and/or R is required. May be skipped or reserved for a later reading.

1. An advanced knowledge of statistics and/or R is required. Not recommended for beginners in particular.
• Click to show / hide R code.
Abbreviation Meaning
ABE Average Bioequivalence
ABEL Average Bioequivalence with Expanding Limits
CVw Within-subject Coefficient of Variation
CVwT, CVwR Within-subject Coefficient of Variation of the Test and Reference treatment
HVD(P) Highly Variable Drug (Product)
RSABE Reference-Scaled Average Bioequivalence
SABE Scaled Average Bioequivalence

# Introduction

What are the differences between ABE, RSABE, and ABEL in terms of power and sample sizes?

For background about power and sample size estimations see the respective articles (ABE, RSABE, and ABEL). See also the article about power analysis.

Definitions:

• A Highy Variable Drug (HVD) shows a within-subject Coefficient of Variation (CVw) of ≥30% if administered as a solution in a replicate design. The high variability is an intrinsic property of the drug (absorption/permeation, clearance).
Agencies are generally only interested in CVwR.
• A Highy Variable Drug Product (HVDP) shows a CVw of ≥30% in a replicate design.6

The concept of Scaled Average Bioequivalence (SABE) for HVD(P)s is based on the following considerations:

• HVD(P)s are safe and efficacious despite their high variability because:
• They have a wide therapeutic index (i.e., a flat dose-response curve). Consequently, even substantial changes in concentrations have only a limited impact on the effect.
If they would have a narrow therapeutic index, adverse effects (due to high concentrations) and lacking effects (due to low concentrations) would have been observed in Phase III and consequently, the originator’s product not be approved in the first place.
• Once approved, the product has a documented safety / efficacy record in phase IV and in clinical practice – despite its high variability.
If problems were evident, the product would have been taken off the market.
• Given that, the conventional ‘clinically relevant difference’ Δ of 20% (leading to the limits of 80.00 – 125.00% in Average Bioequivalence) is overly conservative and thus, requiring large sample sizes.
• Hence, a more relaxed Δ of > 20% was proposed. A natural approach is to scale (expand / widen) the limits based on the within-subject variability of the reference product σwR.

Reference-Scaled Average Bioequivalence (RSABE) is preferred by the FDA and by China’s CDE. Average Bioequivalence with Expanding Limits (ABEL) is another variant of SABE (preferred in all other jurisdictions).

In order to apply the methods following conditions have to be fulfilled:

• The study has to be performed in a replicate design (i.e., at least the reference product has to be administered twice).
• The observed within-subject variability of the reference has to be high (in RSABE swR ≥0.294 and in ABEL CVwR >30%).
• ABEL only:
• A clincial justification has to be provided that the expanded limits will not impact safety / efficacy.
• Except in Chile and Brazil, it has to be demonstrated that the high variability of the reference is not caused by ‘outliers’.
• There is an ‘upper cap’ of scaling (uc = 50%, except for Health Canada, where uc ≈ 57.382%).

In all methods a point estimate-constraint is employed. Even if a study would pass the expanded limits, the PE has to lie within 80.00 – 125.00% in order to pass.

top of section ↩︎

# Sample size

The idea behind reference-scaling is to avoid extreme sample sizes required for ABE and preserve power independent from the CV.

Let’s explore some examples. I assumed a CV of 0.45, a T/R-ratio of 0.90, and targeted ≥80% power for this combination in some common replicate designs.

library(PowerTOST)     # attach the packages
library(TeachingDemos) # to run the examples

Note that sample sizes are integers and follow a step function because in software packages balanced sequences are returned.

CV      <- 0.45
theta0  <- 0.90
target  <- 0.80
designs <- c("2x2x4", "2x2x3", "2x3x3")
method  <- c("ABE", "ABEL", "RSABE")
res     <- data.frame(design = rep(designs, each = length(method)),
method = method, n = NA)
for (i in 1:nrow(res)) {
if (res$method[i] == "ABE") { res[i, 3] <- sampleN.TOST(CV = CV, theta0 = theta0, design = res$design[i],
targetpower = target,
print = FALSE)[7]
}
if (res$method[i] == "ABEL") { res[i, 3] <- sampleN.scABEL(CV = CV, theta0 = theta0, design = res$design[i],
targetpower = target,
print = FALSE,
details = FALSE)[8]
}
if (res$method[i] == "RSABE") { res[i, 3] <- sampleN.RSABE(CV = CV, theta0 = theta0, design = res$design[i],
targetpower = target,
print = FALSE,
details = FALSE)[8]
}
}
print(res, row.names = FALSE)
R>  design method   n
R>   2x2x4    ABE  84
R>   2x2x4   ABEL  28
R>   2x2x4  RSABE  24
R>   2x2x3    ABE 124
R>   2x2x3   ABEL  42
R>   2x2x3  RSABE  36
R>   2x3x3    ABE 126
R>   2x3x3   ABEL  39
R>   2x3x3  RSABE  33

CV      <- 0.45
theta0  <- seq(0.95, 0.85, -0.001)
methods <- c("ABE", "ABEL", "RSABE")
clr     <- c("red", "magenta", "blue")
ylab    <- paste0("sample size (CV = ", 100*CV, "%)")
#################
design <- "2x2x4"
res1   <- data.frame(theta0 = theta0,
method = rep(methods, each =length(theta0)),
n = NA)
for (i in 1:nrow(res1)) {
if (res1$method[i] == "ABE") { res1$n[i] <- sampleN.TOST(CV = CV, theta0 = res1$theta0[i], design = design, print = FALSE)[["Sample size"]] } if (res1$method[i] == "ABEL") {
res1$n[i] <- sampleN.scABEL(CV = CV, theta0 = res1$theta0[i],
design = design, print = FALSE,
details = FALSE)[["Sample size"]]
}
if (res1$method[i] == "RSABE") { res1$n[i] <- sampleN.RSABE(CV = CV, theta0 = res1$theta0[i], design = design, print = FALSE, details = FALSE)[["Sample size"]] } } dev.new(width = 4.5, height = 4.5, record = TRUE) op <- par(no.readonly = TRUE) par(lend = 2, ljoin = 1, mar = c(4, 3.3, 0.1, 0.2), cex.axis = 0.9) plot(theta0, res1$n[res1$method == "ABE"], type = "n", axes = FALSE, ylim = c(12, max(res1$n)), xlab = expression(theta[0]),
log = "xy", ylab = "")
abline(v = seq(0.85, 0.95, 0.025), lty = 3, col = "lightgrey")
abline(v = 0.90, lty = 2)
abline(h = axTicks(2, log = TRUE), lty = 3, col = "lightgrey")
axis(1, at = seq(0.85, 0.95, 0.025))
axis(2, las = 1)
mtext(ylab, 2, line = 2.4)
legend("bottomleft", legend = methods, inset = 0.02, lwd = 2, cex = 0.9,
col = clr, box.lty = 0, bg = "white", title = "\u226580% power")
lines(theta0, res1$n[res1$method == "ABE"],
type = "S", lwd = 2, col = clr[1])
lines(theta0, res1$n[res1$method == "ABEL"],
type = "S", lwd = 2, col = clr[2])
lines(theta0, res1$n[res1$method == "RSABE"],
type = "S", lwd = 2, col = clr[3])
box()
#################
design <- "2x2x3"
res2   <- data.frame(theta0 = theta0,
method = rep(methods, each =length(theta0)),
n = NA)
for (i in 1:nrow(res2)) {
if (res2$method[i] == "ABE") { res2$n[i] <- sampleN.TOST(CV = CV, theta0 = res2$theta0[i], design = design, print = FALSE)[["Sample size"]] } if (res2$method[i] == "ABEL") {
res2$n[i] <- sampleN.scABEL(CV = CV, theta0 = res2$theta0[i],
design = design, print = FALSE,
details = FALSE)[["Sample size"]]
}
if (res2$method[i] == "RSABE") { res2$n[i] <- sampleN.RSABE(CV = CV, theta0 = res2$theta0[i], design = design, print = FALSE, details = FALSE)[["Sample size"]] } } plot(theta0, res2$n[res2$method == "ABE"], type = "n", axes = FALSE, ylim = c(12, max(res2$n)), xlab = expression(theta[0]),
log = "xy", ylab = "")
abline(v = seq(0.85, 0.95, 0.025), lty = 3, col = "lightgrey")
abline(v = 0.90, lty = 2)
abline(h = axTicks(2, log = TRUE), lty = 3, col = "lightgrey")
axis(1, at = seq(0.85, 0.95, 0.025))
axis(2, las = 1)
mtext(ylab, 2, line = 2.4)
legend("bottomleft", legend = methods, inset = 0.02, lwd = 2, cex = 0.9,
col = clr, box.lty = 0, bg = "white", title = "\u226580% power")
lines(theta0, res2$n[res2$method == "ABE"],
type = "S", lwd = 2, col = clr[1])
lines(theta0, res2$n[res2$method == "ABEL"],
type = "S", lwd = 2, col = clr[2])
lines(theta0, res2$n[res2$method == "RSABE"],
type = "S", lwd = 2, col = clr[3])
box()
#################
design <- "2x3x3"
res3   <- data.frame(theta0 = theta0,
method = rep(methods, each =length(theta0)),
n = NA)
for (i in 1:nrow(res3)) {
if (res3$method[i] == "ABE") { res3$n[i] <- sampleN.TOST(CV = CV, theta0 = res3$theta0[i], design = design, print = FALSE)[["Sample size"]] } if (res3$method[i] == "ABEL") {
res3$n[i] <- sampleN.scABEL(CV = CV, theta0 = res3$theta0[i],
design = design, print = FALSE,
details = FALSE)[["Sample size"]]
}
if (res3$method[i] == "RSABE") { res3$n[i] <- sampleN.RSABE(CV = CV, theta0 = res3$theta0[i], design = design, print = FALSE, details = FALSE)[["Sample size"]] } } plot(theta0, res3$n[res3$method == "ABE"], type = "n", axes = FALSE, ylim = c(12, max(res3$n)), xlab = expression(theta[0]),
log = "xy", ylab = "")
abline(v = seq(0.85, 0.95, 0.025), lty = 3, col = "lightgrey")
abline(v = 0.90, lty = 2)
abline(h = axTicks(2, log = TRUE), lty = 3, col = "lightgrey")
axis(1, at = seq(0.85, 0.95, 0.025))
axis(2, las = 1)
mtext(ylab, 2, line = 2.4)
legend("bottomleft", legend = methods, inset = 0.02, lwd = 2, cex = 0.9,
col = clr, box.lty = 0, bg = "white", title = "\u226580% power")
lines(theta0, res3$n[res3$method == "ABE"],
type = "S", lwd = 2, col = clr[1])
lines(theta0, res3$n[res3$method == "ABEL"],
type = "S", lwd = 2, col = clr[2])
lines(theta0, res3$n[res3$method == "RSABE"],
type = "S", lwd = 2, col = clr[3])
box()
par(op)

It’s obvious that we need substantially smaller sample sizes in the methods for reference-scaling than we would require for ABE. The sample size functions of the scaling methods are also not that steep, which means that even if our assumptions about the T/R-ratio would be wrong, power (and hence, sample sizes) would be affected to a lesser degree.

Nevertheless, one should not be overly optimistic about the T/R-ratio. For HVD(P)s a T/R-ratio of ‘better’ than 0.90 should be avoided.7
NB, that’s the reason why in sampleN.scABEL() and sampleN.RSABE() the default is theta0 = 0.90. If scaling is not acceptable (e.g., AUC for the EMA), I strongly recommend to specify theta0 = 0.90 in sampleN.TOST() because its default is 0.95.

Since power depends on the number of treatments, roughly 50% more subjects are required than in the 4-period full replicate design.

Similar sample sizes than in the 3-period full replicate design because both have the same degrees of freedom. However, the step size is wider (3 sequences instead of 2).

# Power

Let’s change the point of view. As above, I assumed a CV of 0.45, a T/R-ratio of 0.90, and targeted ≥80% power for this combination. This time I explored how CV different from my assumption affects power with the estimated sample size.

Additionally I assessed ‘pure’ SABE, i.e., without an upper cap of scaling and without the PE-constraint for the EMA’s conditions (switching at CVwR 30%, regulatory constant $$\small{k=0.760}$$).

CV      <- 0.45
theta0  <- 0.90
target  <- 0.80
designs <- c("2x2x4", "2x2x3", "2x3x3")
method  <- c("ABE", "ABEL", "RSABE", "SABE")
# Pure SABE (only for comparison)
# No upper cup of scaling, no PE constraint
pure    <- reg_const("USER",
r_const  = 0.760,
CVswitch = 0.30,
CVcap    = Inf)
pure$pe_constr <- FALSE res <- data.frame(design = rep(designs, each = length(method)), method = method, n = NA, power = NA, CV0.40 = NA, CV0.50 = NA) for (i in 1:nrow(res)) { if (res$method[i] == "ABE") {
res[i, 3:4] <- sampleN.TOST(CV = CV, theta0 = theta0,
design = res$design[i], targetpower = target, print = FALSE)[7:8] res[i, 5] <- power.TOST(CV = 0.4, theta0 = theta0, n = res[i, 3], design = res$design[i])
res[i, 6]   <- power.TOST(CV = 0.5, theta0 = theta0,
n = res[i, 3],
design = res$design[i]) } if (res$method[i] == "ABEL") {
res[i, 3:4] <- sampleN.scABEL(CV = CV, theta0 = theta0,
design = res$design[i], targetpower = target, print = FALSE, details = FALSE)[8:9] res[i, 5] <- power.scABEL(CV = 0.4, theta0 = theta0, n = res[i, 3], design = res$design[i])
res[i, 6]   <- power.scABEL(CV = 0.5, theta0 = theta0,
n = res[i, 3],
design = res$design[i]) } if (res$method[i] == "RSABE") {
res[i, 3:4] <- sampleN.RSABE(CV = CV, theta0 = theta0,
design = res$design[i], targetpower = target, print = FALSE, details = FALSE)[8:9] res[i, 5] <- power.RSABE(CV = 0.4, theta0 = theta0, n = res[i, 3], design = res$design[i])
res[i, 6]   <- power.RSABE(CV = 0.5, theta0 = theta0,
n = res[i, 3],
design = res$design[i]) } if (res$method[i] == "SABE") {
res[i, 3:4] <- sampleN.scABEL(CV = CV, theta0 = theta0,
design = res$design[i], targetpower = target, regulator = pure, print = FALSE, details = FALSE)[8:9] res[i, 5] <- power.scABEL(CV = 0.4, theta0 = theta0, n = res[i, 3], design = res$design[i],
regulator = pure)
res[i, 6]   <- power.scABEL(CV = 0.5, theta0 = theta0,
n = res[i, 3],
design = res$design[i], regulator = pure) } } res[, 4:6] <- signif(res[, 4:6], 5) print(res, row.names = FALSE) R> design method n power CV0.40 CV0.50 R> 2x2x4 ABE 84 0.80569 0.87483 0.73700 R> 2x2x4 ABEL 28 0.81116 0.78286 0.81428 R> 2x2x4 RSABE 24 0.82450 0.80516 0.83001 R> 2x2x4 SABE 28 0.81884 0.78415 0.84388 R> 2x2x3 ABE 124 0.80012 0.87017 0.73102 R> 2x2x3 ABEL 42 0.80017 0.77676 0.80347 R> 2x2x3 RSABE 36 0.81147 0.79195 0.81888 R> 2x2x3 SABE 42 0.80961 0.77868 0.83463 R> 2x3x3 ABE 126 0.80570 0.87484 0.73701 R> 2x3x3 ABEL 39 0.80588 0.77587 0.80763 R> 2x3x3 RSABE 33 0.82802 0.80845 0.83171 R> 2x3x3 SABE 39 0.81386 0.77650 0.84100 # Cave: very long runtime CV.fix <- 0.45 CV <- seq(0.35, 0.55, length.out = 201) theta0 <- 0.90 methods <- c("ABE", "ABEL", "RSABE", "SABE") clr <- c("red", "magenta", "blue", "#00800080") # Pure SABE (only for comparison) # No upper cup of scaling, no PE constraint pure <- reg_const("USER", r_const = 0.760, CVswitch = 0.30, CVcap = Inf) pure$pe_constr <- FALSE
#################
design  <- "2x2x4"
res1    <- data.frame(CV = CV,
method = rep(methods, each =length(CV)),
power = NA)
n.ABE   <- sampleN.TOST(CV = CV.fix, theta0 = theta0,
design = design,
print = FALSE)[["Sample size"]]
n.RSABE <- sampleN.RSABE(CV = CV.fix, theta0 = theta0,
design = design, print = FALSE,
details = FALSE)[["Sample size"]]
n.ABEL  <- sampleN.scABEL(CV = CV.fix, theta0 = theta0,
design = design, print = FALSE,
details = FALSE)[["Sample size"]]
n.SABE  <- sampleN.scABEL(CV = CV.fix, theta0 = theta0,
design = design, print = FALSE,
regulator = pure,
details = FALSE)[["Sample size"]]
for (i in 1:nrow(res1)) {
if (res1$method[i] == "ABE") { res1$power[i] <- power.TOST(CV = res1$CV[i], theta0 = theta0, n = n.ABE, design = design) } if (res1$method[i] == "ABEL") {
res1$power[i] <- power.scABEL(CV = res1$CV[i], theta0 = theta0,
n = n.ABEL, design = design, nsims = 1e6)
}
if (res1$method[i] == "RSABE") { res1$power[i] <- power.RSABE(CV = res1$CV[i], theta0 = theta0, n = n.RSABE, design = design, nsims = 1e6) } if (res1$method[i] == "SABE") {
res1$power[i] <- power.scABEL(CV = res1$CV[i], theta0 = theta0,
n = n.ABEL, design = design,
regulator = pure, nsims = 1e6)
}
}
dev.new(width = 4.5, height = 4.5, record = TRUE)
par(mar = c(4, 3.3, 0.1, 0.1), cex.axis = 0.9)
plot(CV, res1$power[res1$method == "ABE"], type = "n", axes = FALSE,
ylim = c(0.65, 1), xlab = "CV", ylab = "")
abline(v = seq(0.35, 0.55, 0.05), lty = 3, col = "lightgrey")
abline(v = 0.45, lty = 2)
abline(h = axTicks(2, log = FALSE), lty = 3, col = "lightgrey")
axis(1, at = seq(0.35, 0.55, 0.05))
axis(2, las = 1)
mtext("power", 2, line = 2.6)
legend("topright", legend = methods, inset = 0.02, lwd = 2, cex = 0.9,
col = clr, box.lty = 0, bg = "white", title = "n for CV = 45%")
lines(CV, res1$power[res1$method == "ABE"], lwd = 2, col = clr[1])
lines(CV, res1$power[res1$method == "ABEL"], lwd = 2, col = clr[2])
lines(CV, res1$power[res1$method == "RSABE"], lwd = 2, col = clr[3])
lines(CV, res1$power[res1$method == "SABE"], lwd = 2, col = clr[4])
box()
#################
design  <- "2x2x3"
res2    <- data.frame(CV = CV,
method = rep(methods, each =length(CV)),
power = NA)
n.ABE   <- sampleN.TOST(CV = CV.fix, theta0 = theta0,
design = design,
print = FALSE)[["Sample size"]]
n.RSABE <- sampleN.RSABE(CV = CV.fix, theta0 = theta0,
design = design, print = FALSE,
details = FALSE)[["Sample size"]]
n.ABEL  <- sampleN.scABEL(CV = CV.fix, theta0 = theta0,
design = design, print = FALSE,
details = FALSE)[["Sample size"]]
n.SABE  <- sampleN.scABEL(CV = CV.fix, theta0 = theta0,
design = design, print = FALSE,
regulator = pure,
details = FALSE)[["Sample size"]]
for (i in 1:nrow(res2)) {
if (res2$method[i] == "ABE") { res2$power[i] <- power.TOST(CV = res2$CV[i], theta0 = theta0, n = n.ABE, design = design) } if (res2$method[i] == "ABEL") {
res2$power[i] <- power.scABEL(CV = res2$CV[i], theta0 = theta0,
n = n.ABEL, design = design, nsims = 1e6)
}
if (res2$method[i] == "RSABE") { res2$power[i] <- power.RSABE(CV = res2$CV[i], theta0 = theta0, n = n.RSABE, design = design, nsims = 1e6) } if (res2$method[i] == "SABE") {
res2$power[i] <- power.scABEL(CV = res2$CV[i], theta0 = theta0,
n = n.ABEL, design = design,
regulator = pure, nsims = 1e6)
}
}
plot(CV, res2$power[res2$method == "ABE"], type = "n", axes = FALSE,
ylim = c(0.65, 1), xlab = "CV", ylab = "")
abline(v = seq(0.35, 0.55, 0.05), lty = 3, col = "lightgrey")
abline(v = 0.45, lty = 2)
abline(h = axTicks(2, log = FALSE), lty = 3, col = "lightgrey")
axis(1, at = seq(0.35, 0.55, 0.05))
axis(2, las = 1)
mtext("power", 2, line = 2.6)
legend("topright", legend = methods, inset = 0.02, lwd = 2, cex = 0.9,
col = clr, box.lty = 0, bg = "white", title = "n for CV = 45%")
lines(CV, res2$power[res2$method == "ABE"], lwd = 2, col = clr[1])
lines(CV, res2$power[res2$method == "ABEL"], lwd = 2, col = clr[2])
lines(CV, res2$power[res2$method == "RSABE"], lwd = 2, col = clr[3])
lines(CV, res2$power[res2$method == "SABE"], lwd = 2, col = clr[4])
box()
#################
design  <- "2x3x3"
res3    <- data.frame(CV = CV,
method = rep(methods, each =length(CV)),
power = NA)
n.ABE   <- sampleN.TOST(CV = CV.fix, theta0 = theta0,
design = design,
print = FALSE)[["Sample size"]]
n.RSABE <- sampleN.RSABE(CV = CV.fix, theta0 = theta0,
design = design, print = FALSE,
details = FALSE)[["Sample size"]]
n.ABEL  <- sampleN.scABEL(CV = CV.fix, theta0 = theta0,
design = design, print = FALSE,
details = FALSE)[["Sample size"]]
n.SABE  <- sampleN.scABEL(CV = CV.fix, theta0 = theta0,
design = design, print = FALSE,
regulator = pure,
details = FALSE)[["Sample size"]]
for (i in 1:nrow(res3)) {
if (res3$method[i] == "ABE") { res3$power[i] <- power.TOST(CV = res3$CV[i], theta0 = theta0, n = n.ABE, design = design) } if (res3$method[i] == "ABEL") {
res3$power[i] <- power.scABEL(CV = res3$CV[i], theta0 = theta0,
n = n.ABEL, design = design, nsims = 1e6)
}
if (res3$method[i] == "RSABE") { res3$power[i] <- power.RSABE(CV = res3$CV[i], theta0 = theta0, n = n.RSABE, design = design, nsims = 1e6) } if (res3$method[i] == "SABE") {
res3$power[i] <- power.scABEL(CV = res3$CV[i], theta0 = theta0,
n = n.ABEL, design = design,
regulator = pure, nsims = 1e6)
}
}
plot(CV, res3$power[res3$method == "ABE"], type = "n", axes = FALSE,
ylim = c(0.65, 1), xlab = "CV", ylab = "")
abline(v = seq(0.35, 0.55, 0.05), lty = 3, col = "lightgrey")
abline(v = 0.45, lty = 2)
abline(h = axTicks(2, log = FALSE), lty = 3, col = "lightgrey")
axis(1, at = seq(0.35, 0.55, 0.05))
axis(2, las = 1)
mtext("power", 2, line = 2.6)
legend("topright", legend = methods, inset = 0.02, lwd = 2, cex = 0.9,
col = clr, box.lty = 0, bg = "white", title = "n for CV = 45%")
lines(CV, res3$power[res3$method == "ABE"], lwd = 2, col = clr[1])
lines(CV, res3$power[res3$method == "ABEL"], lwd = 2, col = clr[2])
lines(CV, res3$power[res3$method == "RSABE"], lwd = 2, col = clr[3])
lines(CV, res3$power[res3$method == "SABE"], lwd = 2, col = clr[4])
box()
par(op)

As expected, power of ABE is extremely dependent on the CV. Not surprising, because the acceptance limits are fixed at 80.00 – 125.00%.

As stated above, ideally reference-scaling should preserve power independent from the CV. If that would be the case, power would be a line parallel to the x-axis. However, the methods implemented by authorities are frameworks, where certain conditions have to be observed. Therefore, beyond a maximum around 50%, power starts to decrease because the PE-constraint becomes increasingly important and – for ABEL  – the upper cap of scaling sets in.

On the other hand, ‘pure’ SABE shows the unconstrained behavior of ABEL.

Let’s go deeper into the matter. As above but a wider range of CV values (0.3 – 1).

Here we see a clear difference between RSABE and ABEL. Although in both the PE-constraint has to be observed, in the former no upper cap of scaling is imposed and hence, power affected to a minor degree.
On the contrary, due to the upper upper cap of scaling in the latter, it behaves similarly to ABE with fixed limits of 69.84 – 143.19%.

Consequently, if the CV will be substantially larger than assumed, in ABEL power may be compromised.

Note also the huge gap between ABEL and ‘pure’ SABE. Whilst the PE-constraint is statistically not justified, it was introduced in all jurisdictions ‘for political reasons’.

1. There is no scientific basis or rationale for the point estimate recommendations
2. There is no belief that addition of the point estimate criteria will improve the safety of approved generic drugs
3. The point estimate recommendations are only “political” to give greater assurance to clinicians and patients who are not familiar (don’t understand) the statistics of highly variable drugs
Leslie Benet. 2006.8

# Pros and Cons

From a statistical perspective, replicate designs are preferrable over the 2×2×2 crossover design. If we observe discordant9 outliers in the latter, we cannot distinguish between lack of compliance (the subject didn’t take the drug), a product failure, and a subject-by-formulation interaction (the subject belongs to a subpopulation).10

A member of the EMA’s PKWP once told me that he would like to see all studies performed in a replicate design – regardless whether the drug / drug product is highly variable or not. One of the rare cases where we were of the same opinion…11

We design studies always for the worst case combination, i.e., based on the PK metric requiring the largest sample size. In jurisdictions accepting reference-scaling only for Cmax (e.g. by ABEL) the sample size is driven by AUC.

metrics <- c("Cmax", "AUCt", "AUCinf")
alpha   <- 0.05
CV      <- c(0.45, 0.34, 0.36)
theta0  <- rep(0.90, 3)
theta1  <- 0.80
theta2  <- 1 / theta1
target  <- 0.80
design  <- "2x2x4"
plan    <- data.frame(metric = metrics,
method = c("ABEL", "ABE", "ABE"),
CV = CV, theta0 = theta0,
L = 100*theta1, U = 100*theta2,
n = NA, power = NA)
for (i in 1:nrow(plan)) {
if (plan$method[i] == "ABEL") { plan[i, 5:6] <- round(100*scABEL(CV = CV[i]), 2) plan[i, 7:8] <- signif( sampleN.scABEL(alpha = alpha, CV = CV[i], theta0 = theta0[i], theta1 = theta1, theta2 = theta2, targetpower = target, design = design, details = FALSE, print = FALSE)[8:9], 4) } else { plan[i, 7:8] <- signif( sampleN.TOST(alpha = alpha, CV = CV[i], theta0 = theta0[i], theta1 = theta1, theta2 = theta2, targetpower = target, design = design, print = FALSE)[7:8], 4) } } txt <- paste0("Sample size based on ", plan$metric[plan$n == max(plan$n)], ".\n")
print(plan, row.names = FALSE); cat(txt)
R>  metric method   CV theta0     L      U  n  power
R>    Cmax   ABEL 0.45    0.9 72.15 138.59 28 0.8112
R>    AUCt    ABE 0.34    0.9 80.00 125.00 50 0.8055
R>  AUCinf    ABE 0.36    0.9 80.00 125.00 56 0.8077
R> Sample size based on AUCinf.
If the study is performed with 56 subjects and all assumed values are realized, post hoc power will be 0.9666 for Cmax. I have seen deficiency letters by regulatory assessors asking for a
»justification of too high power for Cmax«.

As shown in the article about ABEL, we get an incentive in the sample size if $$\small{CV_\textrm{wT}<CV_\textrm{wR}}$$. However, this does not help if reference-scaling is not acceptable (say, for AUC) because the conventional model for ABE assumes equal variances.

theta0        <- 0.90
design        <- "2x2x4"
CVw           <- 0.36 # AUC - no reference-scaling
# variance-ratio 0.80: T lower than R
CV            <- signif(CVp2CV(CV = CVw, ratio = 0.80), 5)
# 'switch off' all scaling conditions of ABEL
reg           <- reg_const("USER", r_const = 0.76,
CVswitch = Inf, CVcap = Inf)
reg$pe_constr <- FALSE res <- data.frame(variance = c("homoscedastic", "heteroscedastic"), CVwT = c(CVw, CV[1]), CVwR = c(CVw, CV[2]), CVw = rep(CVw, 2), n = NA) res$n[1]      <- sampleN.TOST(CV = CVw, theta0 = theta0,
design = design,
print = FALSE)[["Sample size"]]
res$n[2] <- sampleN.scABEL(CV = CV, theta0 = theta0, design = design, regulator = reg, details = FALSE, print = FALSE)[["Sample size"]] print(res, row.names = FALSE) R> variance CVwT CVwR CVw n R> homoscedastic 0.36000 0.36000 0.36 56 R> heteroscedastic 0.33824 0.38079 0.36 56 Although we know that the test has a lower CV than the reference, this information is ignored and the (pooled) within-subject CV used. For ABE costs of a replicate design are similar to the 2×2×2 crossover design. Power depends on the number of treatments – more administrations are compensated by the lower sample size. If the sample size of a 2×2×2 crossover design is $$\small{n}$$, then the sample size for a 4-period replicate design is $$\small{^1/_2\,n}$$ and for a 3-period replicate design $$\small{^3/_4\,n}$$. We have the same number of samples to analyze and study costs are driven to a good part by bioanalytics.12 We will save costs due to less pre-/post-study exams but have to pay a higher subject remuneration (more hospitalizations and blood samples). If applicable (depending on the drug): Increased costs for in-study safety and/or PD measurements. ## Pros • Statistically sound. Estimation of CVwR (and and in full replicate studies of CVwT) is possible. More information never hurts. • Mandatory for RSABE and ABEL. Smaller sample sizes for RSABE and ABEL than for ABE. • ‘Outliers’ can be better assessed than in the 2×2×2 crossover design. • In ABE for the EMA this will be rather difficult (exclusion of subjects based on statistics and/or PK grounds alone is not acceptable). • For ABEL assessment of outliers (of the reference treatment only) is part of the recommended procedure.13 14 • Since extreme values are a natural property of HVD(P)s, assessment of outliers is not recommended by the FDA and China’s CDE for RSABE. ## Cons • For ABE higher sample size adjustment according to the anticipated dropout-rate required than in a 2×2×2 crossover design due to three or four periods instead of two.15 • The elephant in the room: Potential inflation of the Type I Error (patient’s risk) in RSABE (if CVwR < 30%) and in ABEL (if ~25% < CVwR < ~42%). This issue will be covered in another article. # Uncertain CVwR An intriguing statement of the EMA’s Pharmacokinetics Working Party. Suitability of a 3-period replicate design scheme for the demonstration of within-subject variability for Cmax The question raised asks if it is possible to use a design where subjects are randomised to receive treatments in the order of TRT or RTR. This design is not considered optimal […]. However, it would provide an estimate of the within subject variability for both test and reference products. As this estimate is only based on half of the subjects in the study the uncertainty associated with it is higher than if a RRT/RTR/TRR design is used and therefore there is a greater chance of incorrectly concluding a reference product is highly variable if such a design is used. The CHMP bioequivalence guideline requires that at least 12 patients are needed to provide data for a bioequivalence study to be considered valid, and to estimate all the key parameters. Therefore, if a 3-period replicate design, where treatments are given in the order TRT or RTR, is to be used to justify widening of a confidence interval for Cmax then it is considered that at least 12 patients would need to provide data from the RTR arm. This implies a study with at least 24 patients in total would be required if equal number of subjects are allocated to the 2 treatment sequences. Q&A document16 I fail to find a statement in the guideline17 that CVwR is a ‘key parameter’ – only that »The number of evaluable subjects in a bioequivalence study should not be less than 12.« However, in sufficiently powered studies such a situation is extremely unlikely (dropout-rate ≥42%).18 Let us explore the uncertainty of $$\small{CV_\textrm{wR}=30\%}$$ based on its 95% confidence interval in two scenarios: 1. No dropouts. In the partial replicate design all subjects provide data for the estimation of CVwR. In full replicate designs only half of the subjects provide this information. 2. Extreme dropout-rates. Only twelve subjects remain in R-replicated sequence(s). # CI of the CV for sample sizes of replicate designs # (theta0 0.90, target power 0.80) CV <- 0.30 des <- c("2x3x3", # 3-sequence 3-period (partial) replicate design "2x2x3", # 2-sequence 3-period full replicate designs "2x2x4") # 2-sequence 4-period full replicate designs type <-c("partial", rep("full", 2)) seqs <- c("TRR|RTR|RTR", "TRT|RTR ", "TRTR|RTRT ") res <- data.frame(design = rep(des, 2), type = rep(type, 2), sequences = rep(seqs, 2), n = c(rep(NA, 3), rep(0, 3)), RR = c(rep(NA, 3), rep(0, 3)), df = NA, lower = NA, upper = NA, width = NA) for (i in 1:nrow(res)) { if (is.na(res$n[i])) {
res$n[i] <- sampleN.scABEL(CV = CV, design = res$design[i],
details = FALSE,
print = FALSE)[["Sample size"]]
if (res$design[i] == "2x2x3") { res$RR[i] <- res$n[i] / 2 } else { res$RR[i] <- res$n[i] } } if (i > 3) { if (res$design[i] == "2x3x3") {
res$n[i] <- res$n[i-3] - 12
res$RR[i] <- 12 # only 12 eligible subjects in sequence RTR } else { res$n[i]  <- 12       # min. sample size
res$RR[i] <- res$n[i] # CVwR can be estimated
}
}
res$df[i] <- res$RR[i] - 2
res[i, 7:8] <- CVCL(CV = CV, df = res$df[i], side = "2-sided", alpha = 0.05) res[i, 9] <- res[i, 8] - res[i, 7] } res[, 7] <- sprintf("%.1f%%", 100 * res[, 7]) res[, 8] <- sprintf("%.1f%%", 100 * res[, 8]) res[, 9] <- sprintf("%.1f%%", 100 * res[, 9]) # Rows 1-2: Sample sizes for target power # Rows 3-4: Only 12 eligible subjects to estimate CVwR print(res, row.names = FALSE) R> design type sequences n RR df lower upper width R> 2x3x3 partial TRR|RTR|RTR 54 54 52 25.0% 37.6% 12.5% R> 2x2x3 full TRT|RTR 50 25 23 23.1% 43.0% 19.9% R> 2x2x4 full TRTR|RTRT 34 34 32 23.9% 40.3% 16.4% R> 2x3x3 partial TRR|RTR|RTR 42 12 10 20.7% 55.1% 34.4% R> 2x2x3 full TRT|RTR 12 12 10 20.7% 55.1% 34.4% R> 2x2x4 full TRTR|RTRT 12 12 10 20.7% 55.1% 34.4% Given, the CI of the $$\small{CV_\textrm{wR}}$$ in the partial replicate design is narrower than in a three period full replicate design. Is that really relevant, esp. since only 12 eligible subjects in the RTR-sequence are acceptable to provide a ‘valid’ estimate? Obviously the EMA’s PKWP is aware of the uncertainty of the estimated $$\small{CV_\textrm{wR}}$$, which may lead to a misclassification (the study is assessed by ABEL although the drug / drug product is not highly variable) and hence, an inflated Type I Error (TIE, patient’s risk). The partial replicate has – given studies with the same power – the highest degrees of freedom and hence, leads to the lowest TIE.19 However, it does not magically disappear. A misclassification may also affect the Type II Error (producer’s risk). If the estimated $$\small{CV_\textrm{wR}}$$ is lower than assumed in sample size estimation, less expansion can be applied and the study will be underpowered. Of course, that’s not a regulatory concern. Of note, if there are no / few dropouts, the estimated $$\small{CV_\textrm{wR}}$$ in 4-period full replicate designs carries a larger uncertainty due to its lower sample size and therefore, degrees of freedom. If the PKWP is concerned about an ‘uncertain’ estimate, why is this design given as an example?20 21 Many studies are performed in this design and are accepted. Since for RSABE generally smaller sample sizes are required than for ABEL, the estimated $$\small{CV_\textrm{wR}}$$ is more uncertain in the former. # Cave: very long runtime theta0 <- 0.90 target <- 0.80 CV <- seq(0.3, 0.5, 0.00025) x <- seq(0.3, 0.5, 0.05) des <- c("2x3x3", # 3-sequence 3-period (partial) replicate design "2x2x3", # 2-sequence 3-period full replicate designs "2x2x4") # 2-sequence 4-period full replicate designs RSABE <- ABEL <- data.frame(design = rep(des, each = length(CV)), n = NA, RR = NA, df = NA, CV = CV, lower = NA, upper = NA) for (i in 1:nrow(ABEL)) { RSABE$n[i] <- sampleN.RSABE(CV = RSABE$CV[i], theta0 = theta0, targetpower = target, design = RSABE$design[i],
details = FALSE,
print = FALSE)[["Sample size"]]
if (RSABE$design[i] == "2x2x3") { RSABE$RR[i] <- RSABE$n[i] / 2 } else { RSABE$RR[i] <- RSABE$n[i] } RSABE$df[i]   <- RSABE$RR[i] - 2 RSABE[i, 6:7] <- CVCL(CV = RSABE$CV[i], df = RSABE$df[i], side = "2-sided", alpha = 0.05) ABEL$n[i]     <- sampleN.scABEL(CV = ABEL$CV[i], theta0 = theta0, targetpower = target, design = ABEL$design[i],
details = FALSE,
print = FALSE)[["Sample size"]]
if (ABEL$design[i] == "2x2x3") { ABEL$RR[i] <- ABEL$n[i] / 2 } else { ABEL$RR[i] <- ABEL$n[i] } ABEL$df[i]   <- ABEL$RR[i] - 2 ABEL[i, 6:7] <- CVCL(CV = ABEL$CV[i], df = ABEL$df[i], side = "2-sided", alpha = 0.05) } ylim <- range(c(RSABE[6:7], ABEL[6:7])) col <- c("blue", "red", "magenta") leg <- c("2×3×3 (partial)", "2×2×3 (full)", "2×2×4 (full)") dev.new(width = 4.5, height = 4.5, record = TRUE) op <- par(no.readonly = TRUE) par(mar = c(4, 4.1, 0.2, 0.1), cex.axis = 0.9) plot(CV, rep(0.3, length(CV)), type = "n", ylim = ylim, log = "xy", xlab = expression(italic(CV)[wR]), ylab = expression(italic(CV)[wR]*" (95% confidence interval)"), axes = FALSE) grid() abline(h = 0.3, col = "lightgrey", lty = 3) axis(1, at = x) axis(2, las = 1) axis(2, at = c(0.3, 0.5), las = 1) lines(CV, CV, col = "darkgrey") legend("topleft", bg = "white", box.lty = 0, title = "replicate designs", legend = leg, col = col, lwd = 2, seg.len = 2.5, cex = 0.9, y.intersp = 1.25) box() for (i in seq_along(des)) { lines(CV, RSABE$lower[RSABE$design == des[i]], col = col[i], lwd = 2) lines(CV, RSABE$upper[RSABE$design == des[i]], col = col[i], lwd = 2) y <- RSABE$upper[signif(RSABE$CV, 4) %in% x & RSABE$design == des[i]]
n <- RSABE$n[signif(RSABE$CV, 4) %in% x & RSABE$design == des[i]] # sample sizes at CV = x shadowtext(x, y, labels = n, bg = "white", col = "black", cex = 0.75) } plot(CV, rep(0.3, length(CV)), type = "n", ylim = ylim, log = "xy", xlab = expression(italic(CV)[wR]), ylab = expression(italic(CV)[wR]*" (95% confidence interval)"), axes = FALSE) grid() abline(h = 0.3, col = "lightgrey", lty = 3) axis(1, at = x) axis(2, las = 1) axis(2, at = c(0.3, 0.5), las = 1); box() lines(CV, CV, col = "darkgrey") legend("topleft", bg = "white", box.lty = 0, title = "replicate designs", legend = leg, col = col, lwd = 2, seg.len = 2.5, cex = 0.9, y.intersp = 1.25) box() for (i in seq_along(des)) { lines(CV, ABEL$lower[ABEL$design == des[i]], col = col[i], lwd = 2) lines(CV, ABEL$upper[ABEL$design == des[i]], col = col[i], lwd = 2) y <- ABEL$upper[signif(ABEL$CV, 4) %in% x & ABEL$design == des[i]]
n <- ABEL$n[signif(ABEL$CV, 4) %in% x & ABEL$design == des[i]] # sample sizes at CV = x shadowtext(x, y, labels = n, bg = "white", col = "black", cex = 0.75) } par(op) cat("RSABE\n"); print(RSABE[signif(RSABE$CV, 4) %in% x, ], row.names = FALSE)
cat("ABEL\n"); print(ABEL[signif(ABEL\$CV, 4) %in% x, ], row.names = FALSE)

That’s interesting. Say, you assumed $$\small{CV_\textrm{wR}=37\%}$$, a T/R-ratio 0.90 targeted at 80% power in a 4-period full replicate design intended for ABEL. You perform the study with 32 subjects (30 degrees of freedom). The 95% CI of the $$\small{CV_\textrm{wR}}$$ is 29.2% (no expansion) – 50.8% (above the upper cap). Disturbing, isn’t it?

If you wonder why the confidence intervals are asymmetric ($$\small{upper\;CL-CV_\textrm{wR}>CV_\textrm{wR}-lower\;CL}$$):

The $$\small{100(1-\alpha)}$$ confidence interval of the $$\small{CV_\textrm{wR}}$$ is obtained via the $$\small{\chi^2}$$-distribution of its error variance $$\small{s_\textrm{wR}^2}$$ with $$\small{n-2}$$ degrees of freedom. $\begin{matrix} s_\textrm{wR}^2=\log_{e}(CV_\textrm{wR}+1)\\ L=\frac{(n-1)\,s_\textrm{wR}^2}{\chi_{\alpha/2,\,n-2}^{2}}\leq s_\textrm{wR}^2\leq\frac{(n-1)\,s_\textrm{wR}^2}{\chi_{1-\alpha/2,\,n-2}^{2}}=U\\ \left\{lower\;CL,\;upper\;CL\right\}=\left\{\sqrt{\exp(L)-1},\sqrt{\exp(U)-1}\right\} \end{matrix}$

The $$\small{\chi^2}$$-distribution is skewed to the right. Since the width of the confidence interval for a given $$\small{CV_\textrm{wR}}$$ depends on the degrees of freedom, it implies a more precise estimate in larger studies, which will be required for relatively low variabilities (least scaling). In the example above the width of the CI in the partial replicate design is for RSABE 0.139 (n 45) at $$\small{CV_\textrm{wR}=0.30}$$ and 0.322 (n 30) at $$\small{CV_\textrm{wR}=0.50}$$. For ABEL the widths are 0.125 (n 54) and 0.273 (n 39).

# Post Scriptum

Regularly I’m asked whether it is possible to use an adaptive Two-Stage Design for RSABE or ABEL.

Whereas for ABE it is possible in principle (though nothing is published so far), for SABE the answer is no. Contrary to ABE, where power and the Type I Error can be calculated by analytical methods, in SABE we have to rely on simulations. We would have to find a suitable adjusted $$\small{\alpha}$$ and demonstrate beforehand that the patient’s risk will be controlled.
With the implemented regulatory frameworks the power/sample-size estimation requires 105 simulations to obtain a stable result (see here and there). Since the convergence of the empiric Type I Error is poor, we need 106 simulations. Combining that with a reasonably narrow grid of possible stage 1 sample sizes / CVwR-combinations,22 we end up with with 1013–1014 simulations. I don’t see how that can be done in the near future, unless one has access to a massively parallel supercomputer. I made a quick estimation for my fast workstation: ~60 years running 24/7…

As outlined above, SABE is rather insensitive to the CV. Hence, the main advantage of TSDs over fixed sample designs in ABE (re-estimating the sample size based on the CV observed in the first stage) is simply not relevant. Fully adaptive methods for the 2×2×2 crossover allow also to adjust for the PE observed in the first stage.23 Here it is not possible. If you are concerned about the T/R-ratio, perform a (reasonably large!)24 pilot study and – even if the T/R-ratio looks promising – plan for a ‘worse’ one since it is not stable between studies.

Helmut Schütz 2021
1st version April 22, 2021.
Rendered 2021-05-22 10:44:02 CEST by rmarkdown in 0.48 seconds.

Footnotes and References

1. Labes D, Schütz H, Lang B. PowerTOST: Power and Sample Size for (Bio)Equivalence Studies. 2021-01-18. CRAN.↩︎

2. Snow G. TeachingDemos: Demonstrations for Teaching and Learning. 2020-04-07. CRAN.↩︎

3. Schütz H. Average Bioequivalence. 2021-01-18. CRAN.↩︎

4. Schütz H. Reference-Scaled Average Bioequivalence. 2020-12-23. CRAN.↩︎

5. Labes D, Schütz H, Lang B. Package ‘PowerTOST’. January 18, 2021. CRAN.↩︎

6. Some gastric resistant formulations of diclofenac are HVDPs, practically all topical formulations are HVDPs, whereas diclofenac itself is not a HVD (CVw of a solution ~8%).↩︎

7. Tóthfalusi L, Endrényi L. Sample Sizes for Designing Bioequivalence Studies for Highly Variable Drugs. J Pharm Pharmacol Sci. 2012; 15(1): 73–84. doi:10.18433/J3Z88F.  Open Access.↩︎

8. Benet L. Why Highly Variable Drugs are Safer. Presentation at the FDA Advisory Committee for Pharmaceutical Science. Rockville. 06 October, 2006.  Internet Archive.↩︎

9. The T/R-ratio in a particular subject differs from other subjects showing a ‘normal’ response. A concordant outlier will show deviant responses for both T and R. That’s not relevant in crossover designs.↩︎

10. FDA. Center for Drug Evaluation and Research. Guidance for Industry. Statistical Approaches to Establishing Bioequivalence. Rockville. January 2001. download.↩︎

11. If the study was planned for ABE, fails due to lacking power (CV higher than assumed and CVwR >30%), and reference-scaling would be acceptable (no safety/efficacy issues with the expanded limits), one has already estimates of CVwT and CVwR and is able to design the next study properly.↩︎

12. In case of a “poor” bioanalytical method requiring a large sample volume: Since the total blood sampling volume generally is limited with the one of a blood donation, one may opt for a 3-period full replicate or has to measure HCT prior to administration in higher periods and – for safety reasons – exclude subjects if the HCT is too high.↩︎

13. The CVwR has to be recalculated after exlusion of the outlier(s). Consequently, less expansion of the limits. However, the outlying subject(s) has/have to be kept in the data set for calculating the 90% confidence interval.
However, that contradicts the principle »The data from all treated subjects should be treated equally« stated in the guideline.↩︎

14. Although Brazil’s ANVISA and Chile’s ANAMED apply ABEL, assessment of outliers is not recomended by both agencies.↩︎

15. Also two or three washouts instead of one. Once I faced a case where during a washout a volunteer was bitten by a dog. Since he had to visit a hospital to get his wound stitched, it was rated according to the protocol as a – not drug-related – SAE and we had to exlude him from the study. Shit happens.↩︎

16. European Medicines Agency. Questions & Answers: positions on specific questions addressed to the Pharmacokinetics Working Party (PKWP). EMA/618604/2008. London. June 2015 (Rev. 12 and later).↩︎

17. European Medicines Agency. Committee for Medicinal Products for Human Use. Guideline on the Investigation of Bioequivalence. CPMP/EWP/QWP/1401/98 Rev. 1/ Corr **. London. 20 January 2010↩︎

18. Schütz H. The almighty oracle has spoken! BEBA Forum. RSABE /ABEL. 2015-07-23↩︎

19. Labes D, Schütz H. Inflation of Type I Error in the Evaluation of Scaled Average Bioequivalence, and a Method for its Control. Pharm Res. 2016; 33(11): 2805–14. doi:10.1007/s11095-016-2006-1.↩︎

20. European Medicines Agency. EMA/582648/2016. Annex I. 21 September 2016.↩︎

21. European Medicines Agency. EMA/582648/2016. Annex II. 21 September 2016.↩︎

22. Step sizes of n1 2 in full replicate designs and 3 in the partial replicate design. Step size of CVwR 2%.↩︎

23. Maurer W, Jones B, Chen Y. Controlling the type 1 error rate in two-stage sequential designs when testing for average bioequivalence. Stat Med. 2018; 37(10): 1587–1607. doi:10.1002/sim.7614..↩︎

24. I know one large generic player’s rule for pilot studies of HVD(P)s: The minimum sample size is 24 in a 4-period fully replicate design. I have seen pilot studies with 80 subjects.↩︎