Consider allowing JavaScript. Otherwise, you have to be proficient in reading since formulas will not be rendered. Furthermore, the table of contents in the left column for navigation will not be available and code-folding not supported. Sorry for the inconvenience.
Examples in this article were generated with 4.0.5 by the package PowerTOST
.^{1}
See also the README on GitHub for an overview and the online manual^{2} for details and a collection of other articles.
Abbreviation | Meaning |
---|---|
BE | Bioequivalence |
CV | (Within-subject) Coefficient of Variation |
H_{0} | Null hypothesis |
H_{1} | Alternative hypothesis (also H_{a}) |
TOST | Two One-Sided Tests |
What is a significant treatment effect and do we have to care about one?
Sometimes regulatory assessors ask for the ‘justification’ of a significant treatment in an equivalence trial.
I will try to clarify why such a justification is futile and – a bit provocative – asking for one demonstrates a lack of understanding the underlying statistical concepts.
All examples deal with the 2×2×2 Crossover (RT|TR) but are applicable to any kind of Crossover (Higher-Order, Replicate Designs) as well. A basic knowledge of R does not hurt.
A basic knowledge of R is required. To run the scripts at least version 1.4.3 (2016-11-01) of PowerTOST
is suggested. Any version of R would likely do, though the current release of PowerTOST
was only tested with version 3.6.3 (2020-02-29) and later.
All examples deal with the 2×2×2 Crossover Design but are applicable to any kind of equivalence study.
In order to get prospective power (and hence, a sample size), we need five values:
1 – 2 are fixed by the agency,
3 is set by the sponsor (commonly to 0.80 – 0.90), and
4 – 5 are just (uncertain!) assumptions.
In other words, obtaining a sample size is not an exact calculation but always just an estimation.
It is extremely unlikely that all assumptions will be exactly realized in a particular study. If the realized values differ from the assumptions (i.e., T more deviating from R, and/or CV lower, and/or fewer dropouts than anticipated), the chance to observe a statistically significant treatment effect increases.
Most agencies (like the EMA) require an ANOVA of \(\small{\log_{e}}\) transformed responses, i.e., a linear model where all effects are fixed. In R:
lm(log(Y) ~ sequence + subject%in%sequence +
m <- period + treatment, data = data)
Other agencies (FDA, Health Canada) require a mixed-effects model where sequence
, period
, and treatment
are fixed effects and subject(sequence)
is a random effect.^{3}
Let us first recap the hypotheses in bioequivalence.
The ‘Two One-Sided Tests Procedure’ (TOST)^{4} \[\begin{matrix}\tag{1} H_\textrm{0L}:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}\leq\theta_1\:vs\:H_\textrm{1L}:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}>\theta_1\\ H_\textrm{0U}:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}\geq\theta_2\:vs\:H_\textrm{1U}:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}<\theta_2 \end{matrix}\]The confidence interval inclusion approach \[H_0:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}\ni\left\{\theta_1,\theta_2\right\}\:vs\:H_1:\theta_1<\frac{\mu_\textrm{T}}{\mu_\textrm{R}}<\theta_2\tag{2}\]
Note that the null-hypotheses imply bioinequivalence where \(\small{[\theta_1,\theta_2]}\) are the lower and upper limits of the bioequivalence range.
TOST provides two \(\small{p}\) values (where \(\small{H_0}\) is not rejected if \(\small{\max}\,[p_\textrm{L},p_\textrm{U}]>\alpha\)) and is of historical interest only because the CI inclusion approach is preferred in regulatory guidelines.
The limits \(\small{\left\{\theta_1,\theta_2\right\}}\) are based on the clinically not relevant difference \(\small{\Delta}\), which is commonly set to 0.20. For NTIDs (EMA and other jurisdictions) \(\small{\Delta}\) is 0.10, and for C_{max} (Russian Federation, EEU, GCC) \(\small{\Delta}\) is 0.25. \[\left\{\theta_1=100(1-\Delta),\,\theta_2=100(1-\Delta)^{-1}\right\}\tag{3}\] For the \(\small{\Delta}\)s mentioned above: \[\begin{matrix}\tag{4} \left\{\theta_1=80.00\%,\,\theta_2=125.00\%\right\}\\ \left\{\theta_1=90.00\%,\,\theta_2=111.1\dot{1}\%\right\}\\ \left\{\theta_1=75.00\%,\,\theta_2=133.3\dot{3}\%\right\} \end{matrix}\]
As long as the \(\small{100(1-2\,\alpha)}\) confidence interval lies entirely within the relevant pre-specified \(\small{\left\{\theta_1,\theta_2\right\}}\), the null-hypothesis is rejected and the alternative hypothesis of equivalence accepted \(\small{(2)}\). Neither the location of the point estimate \(\small{\theta_0}\) nor the width of the CI play any role in this decison.
Since \(\small{(1)}\) and \(\small{(2)}\) are operationally equivalent, it follows that if one of the \(\small{p}\) values of \(\small{(1)}\) < \(\small{\alpha}\), the \(\small{100(1-2\,\alpha)}\) CI does not include 100%.
That means, the treatments differ statistically significantly although this difference is not clinically relevant.
Asking for the ‘justification’ of a statistical significant treatment difference contradicts the accepted principles laid down in guidelines since the 1980s.
With a sufficiently large sample size any treatment with a \(\small{\theta_0\neq100\%}\) will show a statistical significant difference.^{5}
Throughout the examples I’m dealing with studies in a 2×2×2 Crossover Design. Of course, the same logic is applicable for any other as well.
library(PowerTOST) # attach it to run the examples
function(n) {
up2even <-return(as.integer(2 * (n %/% 2 + as.logical(n %% 2))))
} function(n, do) {
nadj <-return(as.integer(up2even(n / (1 - do))))
}
Say, the assumed CV was 0.15, the T/R-ratio 0.95, and we planned the study for power ≥0.90 and an anticipated a dropout-rate of 0.15. More subjects than estimated were dosed (quite often the management decides). In a small survey a whooping 37% of respondents reported that.^{6}
0.15
CV <- 0.90
target <- 0.15
do <- pe <- TR <- 0.95
theta0 <- sampleN.TOST(CV = CV, targetpower = target,
n <-theta0 = theta0,
print = FALSE)[["Sample size"]]
nadj(n, do) # adjust for droputs
n <- n:(n*2.5)
ns <- data.frame(n = ns, CL.lo = NA, CL.hi = NA,
res1 <-p.lo = NA, p.hi = NA,
post.hoc = NA)
# as planned
for (i in seq_along(ns)) {
2:3] <- 100*CI.BE(CV = CV, pe = pe,
res1[i, n = ns[i])
4:5] <- suppressMessages(
res1[i, pvalues.TOST(CV = CV, pe = pe,
n = ns[i]))
6] <- suppressMessages(
res1[i, power.TOST(CV = CV, n = ns[i],
theta0 = theta0))
}windows(width = 4.5, height = 4.5)
par(no.readonly = TRUE)
op <-par(mar = c(4.1, 4, 0, 0), cex.axis = 0.9)
plot(ns, rep(100, length(ns)), ylim = c(80, 125),
type = "n", axes = FALSE, log = "y",
xlab = "sample size",
ylab = "T/R-ratio (90% CI)")
axis(1, at = seq(n, tail(ns, 1), 6))
axis(2, at = c(80, 90, 100, 110, 120, 125), las = 1)
grid(nx = NA, ny = NULL)
abline(v = seq(n, tail(ns, 1), 6), col = "lightgray", lty = 3)
abline(h = c(80, 100, 125), lty = 2,
col = c("red", "black", "red"))
abline(v = head(res1$n[res1$CL.hi < 100], 1), lty = 2)
segments(x0 = ns[1], x1 = tail(ns, 1),
y0 = 100*TR, col = "blue")
lines(res1$n, res1$CL.lo, type = "s", col = "blue", lwd = 2)
lines(res1$n, res1$CL.hi, type = "s", col = "blue", lwd = 2)
box()
par(op)
We face a significant treatment effect with 48 subjects (upper confidence limit 99.98%) or more.
Similar to above but the management was more relaxed (24 subjects dosed).
0.95
theta0 <- 0.90
target <- 0.15
do <- 0.92
TR <- sampleN.TOST(CV = CV, targetpower = target,
n <-theta0 = theta0,
print = FALSE)[["Sample size"]]
nadj(n, do) # adjust for droputs
n.adj <- up2even(n.adj * 1.2)
n.hi <- n:n.hi
ns <- data.frame(n = ns, CL.lo = NA, CL.hi = NA,
res2 <-p.lo = NA, p.hi = NA,
post.hoc = NA)
for (i in seq_along(ns)) {
2:3] <- 100*CI.BE(CV = CV, pe = TR,
res2[i, n = ns[i])
4:5] <- suppressMessages(
res1[i, pvalues.TOST(CV = CV, pe = TR,
n = ns[i]))
6] <- suppressMessages(
res1[i, power.TOST(CV = CV, n = ns[i],
theta0 = TR))
}windows(width = 4.5, height = 4.5)
par(no.readonly = TRUE)
op <-par(mar = c(4.1, 4, 0, 0), cex.axis = 0.9)
plot(ns, rep(100, length(ns)), ylim = c(80, 125),
type = "n", axes = FALSE, log = "y",
xlab = "sample size",
ylab = "T/R-ratio (90% CI)")
axis(1, at = seq(n, tail(ns, 1), 2))
axis(2, at = c(80, 90, 100, 110, 120, 125), las = 1)
grid(nx = NA, ny = NULL)
abline(v = seq(n, tail(ns, 1), 2), col = "lightgray", lty = 3)
abline(h = c(80, 100, 125), lty = 2,
col = c("red", "black", "red"))
abline(v = head(res2$n[res2$CL.hi < 100], 1), lty = 2)
segments(x0 = ns[1], x1 = tail(ns, 1),
y0 = 100*TR, col = "blue")
lines(res2$n, res2$CL.lo, type = "s", col = "blue", lwd = 2)
lines(res2$n, res2$CL.hi, type = "s", col = "blue", lwd = 2)
box()
par(op)
This time the T/R-ratio turned out to be worse (0.92 instead of the assumed 0.95). We face a significant treatment effect with 20 subjects (upper confidence limit 99.84%) or more.
We had to deal with a drug with low variability. The assumed CV was 0.10, the T/R-ratio 0.95, and we planned the study for power ≥0.80. Theoretically we would need only 8 (eight!) subjects but the minimum sample size according to the guidelines is 12. We increased the sample size for an anticipated a dropout-rate of 0.15.
0.10
CV <- 0.80
target <- 0.95
theta0 <- 0.935
TR <- 0.15
do <- sampleN.TOST(CV = CV, targetpower = target,
n <-theta0 = theta0,
print = FALSE)[["Sample size"]]
if (n < 12) n <- 12 # acc. to GL
nadj(n, do) # adjust for droputs
n.adj <- n:n.adj
ns <- data.frame(n = ns, CL.lo = NA, CL.hi = NA,
res3 <-p.lo = NA, p.hi = NA,
post.hoc = NA)
for (i in seq_along(ns)) {
2:3] <- 100*CI.BE(CV = CV, pe = TR,
res3[i, n = ns[i])
4:5] <- suppressMessages(
res1[i, pvalues.TOST(CV = CV, pe = TR,
n = ns[i]))
6] <- suppressMessages(
res3[i, power.TOST(CV = CV, n = ns[i],
theta0 = TR))
}windows(width = 4.5, height = 4.5)
par(no.readonly = TRUE)
op <-par(mar = c(4.1, 4, 0, 0), cex.axis = 0.9)
plot(ns, rep(100, length(ns)), ylim = c(80, 125),
type = "n", axes = FALSE, log = "y",
xlab = "sample size",
ylab = "T/R-ratio (90% CI)")
axis(1, at = ns)
axis(2, at = c(80, 90, 100, 110, 120, 125), las = 1)
grid(nx = NA, ny = NULL)
abline(v = ns, col = "lightgray", lty = 3)
abline(h = c(80, 100, 125), lty = 2,
col = c("red", "black", "red"))
abline(v = head(res3$n[res3$CL.hi < 100], 1), lty = 2)
segments(x0 = ns[1], x1 = tail(ns, 1),
y0 = 100*TR, col = "blue")
lines(res3$n, res3$CL.lo, type = "s", col = "blue", lwd = 2)
lines(res3$n, res3$CL.hi, type = "s", col = "blue", lwd = 2)
box()
par(op)
The T/R-ratio turned out to be slightly worse (0.935 instead of the assumed 0.95). Already with 14 subjects we face a significant treatment effect (upper confidence limit 99.998%). This ‘nasty’ value will disappear due to rounding but will remain in the output of the ANOVA.
Drugs with a low CV regularly show a significant treatment effect, since following the guidelines leads to ‘overpowered’ studies. Already with 12 subjects we have a post hoc power of 0.972 (though we planned only for 0.80).
It is not unusal that equivalence of more than one endpoint has to be demonstrated. In bioequivalence the pharmacokinetic metrics C_{max} and AUC_{0–t} are mandatory (in some jurisdictions like the FDA additionally AUC_{0–∞}).
We don’t have to worry about multiplicity issues (inflated Type I Error) since if all tests must pass at level \(\alpha\), we are protected by the intersection-union principle.^{7} ^{8}
We design the study always for the worst case combination, i.e., based on the PK metric requiring the largest sample size.
Let us explore a simple example. The assumed CV of C_{max} is 0.25 and the one of AUC is lower (say the variance ratio is 0.70). We assume a T/R-ratio of 0.95 for both, aiming at power ≥ 0.80. The anticipated dropout-rate is 0.10.
function(x) {
opt <-suppressMessages(
power.TOST(theta0 = x, CV = res4$CV[2],
n = n.est)) - target
} c("Cmax", "AUC")
metrics <- 0.70 # variance ratio (AUC / Cmax)
ratio <- 0.25 # CV of Cmax
CV.Cmax <- mse2CV(CV2mse(CV.Cmax)*ratio)
CV.AUC <- signif(c(CV.Cmax, CV.AUC))
CV <- 0.95 # both metrics
theta0 <- 0.80 # target (desired) power
target <- 0.10 # anticipated dropout-rate 10%
do.rate <- data.frame(metric = metrics, theta0 = theta0,
res4 <-CV = CV, n = NA, power1 = NA,
nadj = NA, power2 = NA)
# sample sizes for both metrics
# study sample size based one the one with
# higher CV and adjusted for the dropout-rate
for (i in 1:nrow(res4)) {
4:5] <- sampleN.TOST(CV = CV[i],
res4[i, theta0 = theta0,
targetpower = target,
print = FALSE)[7:8]
}$nadj <- nadj(max(res4$n), do.rate)
res4$power1 <- res4$power1
res4for (i in 1:nrow(res4)) {
7] <- power.TOST(CV = CV[i], theta0 = theta0,
res4[i, n = res4$nadj[i])
}c(5, 7)] <- signif(res4[, c(5, 7)], 4)
res4[, names(res4)[c(5, 7)] <- c("pwr (n)", "pwr (nadj)")
data.frame(n = max(res4$nadj):max(res4$n),
res5 <-PE = theta0, CL.hi1 = NA,
PE.lo = NA, CL.hi2 = NA)
for (i in 1:nrow(res5)) {
res5$n[i]
n.est <-if (theta0 < 1) {
$PE.lo[i] <- uniroot(opt, tol = 1e-8,
res5interval =
c(0.80 + 1e-4,
$root
theta0))else {
} $PE.lo[i] <- uniroot(opt, tol = 1e-8,
res5interval =
c(theta0,
1.25 - 1e-4))$root
}3] <- CI.BE(CV = CV[2], pe = theta0,
res5[i, n = res5$n[i])[["upper"]]
5] <- CI.BE(CV = CV[2],
res5[i, pe = res5$PE.lo[i],
n = res5$n[i])[["upper"]]
}2:5] <- round(100*res5[, 2:5], 2)
res5[, names(res5)[c(3, 5)] <- rep("upper CL", 2)
print(res4, row.names = F);cat("AUC:\n");print(res5, row.names = F)
R> metric theta0 CV n pwr (n) nadj pwr (nadj)
R> Cmax 0.95 0.250000 28 0.8074 32 0.8573
R> AUC 0.95 0.208208 20 0.8057 32 0.9467
R> AUC:
R> n PE upper CL PE.lo upper CL
R> 32 95 103.68 91.20 99.53
R> 31 95 103.83 91.41 99.91
R> 30 95 104.00 91.62 100.29
R> 29 95 104.17 91.85 100.71
R> 28 95 104.35 92.08 101.14
Due to its lower CV we would need only 20 subjects for AUC. However, for C_{max} we need 28. We perform the study in 32 subjects (adjusted for the dropout-rate). Consequently, the study is ‘overpowered’ for AUC (~0.95 instead of ~0.81 with 20 subjects).
The supportive function opt()
provides extreme point estimates of AUC which will still give our target power (only the lower one is given in the 4^{th} column). If this value is realized in the study, its upper confidence limit will not include 100% if we have not at least two droputs.
In an particular study the point estimate may be even lower and/or the CV whilst the study still passes. Then we will get a significant treatment effect with more dropouts.
Such a situation is quite common and the further the CVs of PK metrics are apart, the more often we will face it.
Although the above is straightforward and based on elementary statistics, below an R-script to perform simulations.
Say we assume a CV of CV_{max} (0.25), base the sample size on it (taking the dropout rate into account) and the CV of AUC is lower but unknown. How often can we expect a significant treatment effect for a range of CVs (here 0.25 down to 0.15)?
# Cave: Long runtime!
function(n, sequences) {
balance <-return (as.integer(sequences *
(n %/% sequences + as.logical(n %% sequences))))
} function(n, do.rate) {
adjust.dropouts <-return (as.integer(balance(n / (1 - do.rate), sequences = 2)))
}set.seed(123456)
1e4L # number of simulations
nsims <- 0.80 # target power
target <- 0.95 # assumed PE (both metrics)
PE <- 0.25 # assumed CV of Cmax
CV.Cmax <- c(0.25, 0.20, 0.15)
CV.AUC <- 0.1 # anticipated dropout-rate (10%)
do.rate <- 0.15 # assumed CV of the dropout-rate (15%)
CV.do <- sampleN.TOST(CV = CV.Cmax, theta0 = PE,
tmp <-targetpower = target,
details = FALSE, print = FALSE)
tmp[["Sample size"]]
n.des <-if (n.des >= 12) {
tmp[["Achieved power"]]
power.Cmax <-else { # GL!
} 12
n.des <- power.TOST(CV = CV.Cmax, theta0 = PE, n = n.des)
power.Cmax <-
} numeric()
power.AUC <-for (j in seq_along(CV.AUC)) {
power.TOST(CV = CV.AUC[j], theta0 = PE, n = n.des)
power.AUC[j] <-
} adjust.dropouts(n = n.des, do.rate = do.rate)
n.adj <- data.frame(CV = rep(NA, nsims), n = NA, PE = NA,
res.Cmax <-lower = NA, upper = NA, BE = FALSE,
signif = FALSE)
data.frame(sim = 1:nsims)
post.Cmax <- data.frame(CV.ass = rep(CV.AUC, each = nsims),
res.AUC <-CV = rep(NA, nsims), n = NA, PE = NA,
lower = NA, upper = NA, BE = FALSE,
signif = FALSE)
data.frame(CV.ass = rep(CV.AUC, each = nsims),
post.AUC <-sim =1:nsims*length(CV.AUC))
for (j in 1:nsims) {
rlnorm(1, meanlog = log(do.rate) - 0.5*CV2mse(CV.do),
do <-sdlog = sqrt(CV2mse(CV.do)))
$n[j] <- as.integer(round(n.des * (1 - do)))
res.Cmax$CV[j] <- mse2CV(CV2mse(CV.Cmax) *
res.Cmax rchisq(1, df = res.Cmax$n[j] - 2)/(res.Cmax$n[j] - 2))
$PE[j] <- exp(rnorm(1, mean = log(PE),
res.Cmaxsd = sqrt(0.5 / res.Cmax$n[j]) *
sqrt(CV2mse(CV.Cmax))))
4:5] <- round(CI.BE(CV = res.Cmax$CV[j],
res.Cmax[j, pe = res.Cmax$PE[j],
n = res.Cmax$n[j]), 4)
if (res.Cmax$lower[j] >= 0.80 & res.Cmax$upper[j] <= 1.25) {
$BE[j] <- TRUE
res.Cmaxif (res.Cmax$lower[j] > 1 | res.Cmax$upper[j] < 1)
$signif[j] <- TRUE
res.Cmax
}
} 0
i <-for (k in seq_along(CV.AUC)) {
for (j in 1:nsims) {
i + 1
i <-$n[i] <- res.Cmax$n[j]
res.AUC$CV[i] <- mse2CV(CV2mse(CV.AUC[k]) *
res.AUC rchisq(1, df = res.AUC$n[i] - 2)/(res.AUC$n[i] - 2))
$PE[i] <- exp(rnorm(1, mean = log(PE),
res.AUCsd = sqrt(0.5 / res.AUC$n[i]) *
sqrt(CV2mse(CV.AUC[k]))))
5:6] <- round(CI.BE(CV = res.AUC$CV[i],
res.AUC[i, pe = res.AUC$PE[i],
n = res.AUC$n[i]), 4)
if (res.AUC$lower[i] >= 0.80 & res.AUC$upper[i] <= 1.25) {
$BE[i] <- TRUE
res.AUCif (res.AUC$lower[i] > 1 | res.AUC$upper[i] < 1)
$signif[i] <- TRUE
res.AUC
}
}
} sum(res.Cmax$BE)
passed.Cmax <- numeric(length(CV.AUC))
passed.AUC <-for (j in seq_along(CV.AUC)) {
sum(res.AUC$BE[res.AUC$CV.ass == CV.AUC[j]])
passed.AUC[j] <-
} paste("Assumed CV (Cmax) :", sprintf("%.4f", CV.Cmax),
txt <-"\nAssumed CVs (AUC) :", paste(sprintf("%.4f", CV.AUC),
collapse = ", "),
"\nAssumed PE :", sprintf("%.4f", PE),
"\nTarget power :", sprintf("%.4f", target),
"\nSample size :", n.des, "(based on Cmax)",
"\nAchieved power (Cmax):", sprintf("%.4f", power.Cmax),
"\nAchieved powers (AUC):", paste(sprintf("%.4f", power.AUC),
collapse = ", "),
"\nDosed :", n.adj,
sprintf("(anticip. dropout-rate %g)", do.rate),
"\n ", formatC(nsims, format = "d", big.mark = ","),
"simulated 2\u00D72\u00D72 studies",
"\n n:", min(res.Cmax$n), "\u2013", max(res.Cmax$n),
sprintf("(median %g)", median(res.Cmax$n)),
"\n Cmax", sprintf("(%.4f)", CV.Cmax),
"\n CV :",
sprintf("%6.4f \u2013 %6.4f", min(res.Cmax$CV), max(res.Cmax$CV)),
sprintf("(median %7.4f)", exp(median(log(res.Cmax$CV)))),
"\n PE :",
sprintf("%6.4f \u2013 %6.4f", min(res.Cmax$PE), max(res.Cmax$PE)),
sprintf("(g. mean%7.4f)", exp(mean(log(res.Cmax$PE)))),
"\n 100% not within CI (stat. significant):",
sprintf("%5.2f%%", 100*sum(res.Cmax$signif)/passed.Cmax), "\n")
for (j in seq_along(CV.AUC)) {
paste(txt, " AUC", sprintf("(%.4f)", CV.AUC[j]),
txt <-"\n CV :",
sprintf("%6.4f \u2013 %6.4f",
min(res.AUC$CV[res.AUC$CV.ass == CV.AUC[j]]),
max(res.AUC$CV[res.AUC$CV.ass == CV.AUC[j]])),
sprintf("(median %7.4f)",
exp(median(log(res.AUC$CV[res.AUC$CV.ass == CV.AUC[j]])))),
"\n PE :",
sprintf("%6.4f \u2013 %6.4f",
min(res.AUC$PE[res.AUC$CV.ass == CV.AUC[j]]),
max(res.AUC$PE[res.AUC$CV.ass == CV.AUC[j]])),
sprintf("(g. mean%7.4f)",
exp(mean(log(res.AUC$PE[res.AUC$CV.ass == CV.AUC[j]])))),
"\n 100% not within CI (stat. significant):",
sprintf("%5.2f%%",
100*sum(res.AUC$signif[res.AUC$CV.ass == CV.AUC[j]])/passed.AUC[j]),
"\n")
}cat(txt)
R> Assumed CV (Cmax) : 0.2500
R> Assumed CVs (AUC) : 0.2500, 0.2000, 0.1500
R> Assumed PE : 0.9500
R> Target power : 0.8000
R> Sample size : 28 (based on Cmax)
R> Achieved power (Cmax): 0.8074
R> Achieved powers (AUC): 0.8074, 0.9349, 0.9946
R> Dosed : 32 (anticip. dropout-rate 0.1)
R> 10,000 simulated 2×2×2 studies
R> n: 23 – 26 (median 25)
R> Cmax (0.2500)
R> CV : 0.1199 – 0.4127 (median 0.2468)
R> PE : 0.8314 – 1.0736 (g. mean 0.9502)
R> 100% not within CI (stat. significant): 2.30%
R> AUC (0.2500)
R> CV : 0.1248 – 0.4099 (median 0.2462)
R> PE : 0.8373 – 1.0814 (g. mean 0.9491)
R> 100% not within CI (stat. significant): 2.70%
R> AUC (0.2000)
R> CV : 0.1008 – 0.3302 (median 0.1969)
R> PE : 0.8506 – 1.0458 (g. mean 0.9501)
R> 100% not within CI (stat. significant): 7.96%
R> AUC (0.1500)
R> CV : 0.0831 – 0.2663 (median 0.1481)
R> PE : 0.8722 – 1.0216 (g. mean 0.9497)
R> 100% not within CI (stat. significant): 19.96%
What does that mean? You name it.
Coming back to the questions asked in the introduction. To repeat:
What is a significant treatment effect … ?
… and do we have to care about one?
License
Helmut Schütz 2021
1^{st} version March 18, 2021.
Rendered 2021-04-08 11:09:48 CEST by rmarkdown in 0.51 seconds.
Footnotes and References
Labes D, Schütz H, Lang B. PowerTOST: Power and Sample Size for (Bio)Equivalence Studies. 2021-01-18. CRAN.↩︎
Labes D, Schütz H, Lang B. Package ‘PowerTOST’. January 18, 2021. CRAN.↩︎
Unfortunately due to different ‘design philosophies’ the SAS-code given by the FDA cannot be translated to R.↩︎
Schuirmann DJ. A Comparison of the Two One-Sided Tests Procedure and the Power Approach for Assessing the Equivalence of Average Bioavailability. J Pharmacokin Biopharm. 1987; 15(6): 657–80. doi:10.1007/BF01068419.↩︎
Stupid example: CV = 10% (NTID), n = 120, 4-period full replicate design, \(\small{\theta_0=98.5\%}\) → 90% CI 97.03–99.99%, \(\small{p}\) 5·10^{–72}…↩︎
Schütz H. Sample Size Estimation in Bioequivalence. Evaluation. 2020-10-23. BEBA Forum.↩︎
Berger RL, Hsu JC. Bioequivalence Trials, Intersection-Union Tests and Equivalence Confidence Sets. Stat Sci. 1996; 11(4): 283–302. JSTOR:2246021.↩︎
Zeng A. The TOST confidence intervals and the coverage probabilities with R simulation. March 14, 2014..↩︎