Consider allowing JavaScript. Otherwise, you have to be proficient in reading since formulas will not be rendered. Furthermore, the table of contents in the left column for navigation will not be available and code-folding not supported. Sorry for the inconvenience.

Examples in this article were generated with
4.2.1
by the package `PowerTOST`

.^{1} See also its Online
manual^{2}
for details.

- The right-hand badges give the respective section’s ‘level’.

- Basics about sample size methodology – requiring no or only limited statistical expertise.

- These sections are the most important ones. They are – hopefully – easily comprehensible even for novices.

- A somewhat higher knowledge of statistics and/or R is required. May be skipped or reserved for a later reading.

- Click to show / hide R code.
- To copy R code to the clipboard click on the icon in the top left corner.

Abbreviation | Meaning |
---|---|

\(\small{\alpha}\) | Nominal level of the test, probability of Type I Error, patient’s risk |

\(\small{\beta}\) | Probability of Type II Error, producer’s risk |

(A)BE | (Average) Bioequivalence |

ABEL | Average Bioequivalence with Expanding Limits |

CI, CL | Confidence Interval, Limit |

\(\small{CV}\) | Coefficient of Variation |

\(\small{CV_\textrm{inter}}\) | Between-subject Coefficient of Variation |

\(\small{CV_\textrm{intra}}\) | Within-subject Coefficient of Variation |

\(\small{CV_\textrm{wR}}\) | Within-subject Coefficient of Variation of the Reference treatment |

\(\small{\delta}\) | Margin of clinical relevance in Non-Inferiority/Superiority and Non-Superiority |

\(\small{H_0}\) | Null hypothesis |

\(\small{H_1}\) | Alternative hypothesis (also \(\small{H_\textrm{a}}\)) |

L, U | Lower and upper limits in ABE(L) |

\(\small{\mu_\text{T},\,\mu_\text{R}}\) | True mean of the Test and Reference treatment, respectively |

\(\small{\pi}\) | Prospective power (\(\small{1-\beta}\)) |

TOST | Two One-Sided Tests |

What are the main statistical issues in planning a confirmatory experiment?

For details about inferential statistics and hypotheses see another article.

An ‘optimal’ study design is one, which – taking all assumptions into account – has a reasonably high chance of demonstrating non-inferiority or non-superiority (power) whilst controlling the patient’s risk.

Contrary to Bioequivalence (BE), where a study is assessed with \(\small{\alpha=0.05}\) by
TOST (or by a \(\small{100\,(1-2\times0.05)}\) Confidence
Interval), in Non-Inferiority and Non-Superiority a single one-sided
test with \(\small{\alpha=0.025}\) is
employed.

Based on a ‘clinically relevant margin’ \(\small{\delta}\) we have different hypotheses.

We assume that __higher__ responses are *better*.^{3} ^{4} If data
follow a lognormal
distribution the hypotheses are \[H_0:\frac{\mu_\text{T}}{\mu_\text{R}}\leq
\log_{e}\delta\;vs\;H_1:\frac{\mu_\text{T}}{\mu_\text{R}}>\log_{e}\delta\tag{1a}\]

If data follow a normal distribution the hypotheses are \[H_0:\mu_\text{T}-\mu_\text{R}\leq \delta\;vs\;H_1:\mu_\text{T}-\mu_\text{R}>\delta\tag{1b}\]

Applications:

- Clinical phase III trials comparing a new treatment with placebo or an established treatment (efficacy).
- Comparing minimum concentrations of a new
MR formulation with the ones of an
approved IR formulation as a
surrogate of efficacy.
^{5}

We assume that __lower__ responses are *better*. If data
follow a lognormal distribution the hypotheses are
\[H_0:\frac{\mu_\text{T}}{\mu_\text{R}}\geq
\log_{e}\delta\;vs\;H_1:\frac{\mu_\text{T}}{\mu_\text{R}}<\log_{e}\delta\tag{1a}\]

If data follow a normal distribution the hypotheses are \[H_0:\mu_\text{T}-\mu_\text{R}\geq \delta\;vs\;H_1:\mu_\text{T}-\mu_\text{R}<\delta\tag{1b}\]

Applications:

- Clinical phase III trials comparing Adverse Effects of a new treatment with placebo or an established treatment (safety).
- Comparing maximum concentrations of a new
MR formulation with the ones of an
approved IR formulation as a
surrogate of safety.
^{5}

A *basic* knowledge of R is
required. To run the scripts at least version 1.4.9 (2019-12-19) of
`PowerTOST`

is suggested. Any version of R would likely do, though the current release of
`PowerTOST`

was only tested with version 4.1.3 (2022-03-10)
and later.

All scripts were run on a Xeon E3-1245v3 @ 3.40GHz (1/4 cores) 16GB RAM
with R 4.2.1 on Windows 7 build 7601, Service
Pack 1, Universal C Runtime 10.0.10240.16390.

Note that in the functions `sampleN.noninf()`

and
`power.noninf()`

the assumed coefficient of variation
`CV`

has to be given as a ratio and not in percent. If the
analysis is based on lognormal data by \(\small{(1\text{a})}\) or \(\small{(2\text{a})}\), the assumed
`theta0`

and margin \(\small{\delta}\) (`margin`

) have
to be given as ratios and not in percent. If the analysis is based on
normal data by \(\small{(1\text{b})}\) or \(\small{(2\text{b})}\),
`theta0`

and`margin`

have to be given with the
original value. Data have to be continuous on a ratio scale, either
lognormal \(\small{\left(x\in\mathbb{R}^{+}=\{0<x\leq\infty\}\right)}\)
or normal \(\small{\left(x\in\mathbb{R}=\{-\infty\leq
x\leq+\infty\}\right)}\) distributed.

Count data (*e.g.*, events), rates (0 – 1) and percentages, as
well as ordinal data (*e.g.*, *t*_{max}) are not
supported.

`sampleN.noninf()`

gives balanced sequences for crossover
designs (*i.e.*, the same number of subjects is allocated to all
sequences) or equal group sizes in a parallel design. Furthermore, the
estimated sample size is the *total* number of subjects, not
subjects per sequence or treatment arm – like in some other software
packages. The sample size functions of `PowerTOST`

use a
modification of Zhang’s method^{6} based on the large sample approximation as
the starting value of the iterations.

Most examples deal with studies where the response variables follow a
lognormal distribution, *i.e.*, we assume a multiplicative model
(ratios instead of differences). We work with \(\small{\log_{e}}\)-transformed data in
order to allow analysis by the *t*-test (requiring differences).
This is the default in most functions of `PowerTOST`

and
hence, the argument `logscale = TRUE`

does not need to be
specified.

It may sound picky but ‘sample size __calculation__’ (as used in
most guidelines and alas, in some publications and textbooks) is sloppy
terminology. In order to get prospective power (and hence, a sample
size), we need five values:

- The level of the test \(\small{\alpha}\) (in Non-Superiority / Non-Inferiority commonly 0.025),
- the clinicall relevant margin \(\small{\delta}\),
- the desired (or target) power \(\small{\pi}\),
- the variance (commonly expressed as a coefficient of variation), and
- the deviation of the test from the reference treatment.

1 – 2 are __fixed__ by the agency,

3 is __set__ by the sponsor, and

4 – 5 are just (uncertain!) __assumptions__.

In other words, obtaining a sample size is *not* an
*exact* calculation like \(\small{2\times2=4}\) but always just an
__estimation__.

“Power Calculation – A guess masquerading as mathematics.

Realization: Observations (in a sample) of a random variable (of the
population).

Of note, it is extremely unlikely that all assumptions will be
exactly realized in a particular study. Hence, calculating retrospective
(a.k.a. *post hoc*, *a
posteriori*) power is not only futile but plain nonsense.^{8}

Since generally the within-subject variability is lower than the between-subject variability, crossover studies are so popular. The efficiency of a crossover study compared to a parallel study is given by \(\small{\frac{\sigma_\textrm{intra}^2\;+\,\sigma_\textrm{inter}^2}{0.5\,\times\,\sigma_\textrm{intra}^2}}\). If, say, \(\small{\sigma_\textrm{intra}^2=0.5\times\sigma_\textrm{inter}^2}\) in a paralled study we need six times as many subjects than in a crossover to obtain the same power. On the other hand, in a crossover we have two measurements per subject, which makes the parallel study approximately three times more costly.

Note that there is *no* relationship
between \(\small{CV_\textrm{intra}}\)
and \(\small{CV_\textrm{inter}}\). An
example are drugs which are subjected to polymorphic metabolism. For
these drugs \(\small{CV_\textrm{intra}\ll
CV_\textrm{inter}}\). On the other hand, some
HVD(P)s show
\(\small{CV_\textrm{intra}>CV_\textrm{inter}}\).

Carryover: A residual effect of a previous period.

It is a prerequisite that no carryover from one period to the next
exists. Only then the comparison of treatments will be unbiased.
For details see another
article.^{9} Subjects have to be in the same
physiological state^{10} throughout the study – guaranteed by a
sufficiently long washout phase. Crossover studies cannot only be
performed in healthy volunteers but also in patients with a
*stable* disease (*e.g.*, asthma). Studies in patients
with an *instable* disease (*e.g.*, in oncology)
__must__ be performed in a parallel design.

If crossovers are not feasible (*e.g.*, for drugs with a very
long half life), studies could be performed in a parallel design as
well.

The sample size cannot be
*directly* estimated,

only power calculated for an already *given* sample
size.

The power equations cannot be re-arranged to solve for sample size.

“Power. That which statisticians are always calculating but never have.

`library(PowerTOST) # attach it to run the examples`

Note that in Non-Inferiority and Non-Superiority – contrary to other
functions of the package – a __one-sided__ *t*-test (instead
of TOST aiming at equivalence)
is employed.

Throughout the examples I’m referring to studies in a single center –
not multiple groups *within* them or multicenter studies. That’s
another pot of tea.

Most methods of `PowerTOST`

are based on pairwise
comparisons. It is up to you to adjust the level of the test
`alpha`

if you want to compare more (say, two test treatments
*vs* a reference or each of them against one of the others) in
order to avoid inflation of the family-wise
error rate due to multiplicity.

Say, we want to demonstrate Non-Inferiority in a 2×2×2
crossover-design, assume a *CV* of 25%, a T/R-ratio of 0.95,
\(\small{\delta}\) 0.8, and target a
power of at least 0.80.

Since `alpha = 0.025`

, `theta0 = 0.95`

,
`margin = 0.8`

, `targetpower = 0.8`

,
`design = "2x2"`

, and `logscale = TRUE`

are
defaults of the function we don’t have to give them explicitly.

`sampleN.noninf(CV = 0.25)`

```
#
# ++++++++++++ Non-inferiority test +++++++++++++
# Sample size estimation
# -----------------------------------------------
# Study design: 2x2 crossover
# log-transformed data (multiplicative model)
#
# alpha = 0.025, target power = 0.8
# Non-inf. margin = 0.8
# True ratio = 0.95, CV = 0.25
#
# Sample size (total)
# n power
# 36 0.820330
```

If you want to perform the analysis with untransformed data, specify
`logscale = FALSE`

. Then the defaults are
`theta0 = -0.05`

and `margin = -0.2`

.

`sampleN.noninf(CV = 0.25, logscale = FALSE)`

```
#
# ++++++++++++ Non-inferiority test +++++++++++++
# Sample size estimation
# -----------------------------------------------
# Study design: 2x2 crossover
# untransformed data (additive model)
#
# alpha = 0.025, target power = 0.8
# Non-inf. margin = -0.2
# True diff. = -0.05, CV = 0.25
#
# Sample size (total)
# n power
# 46 0.803507
```

Let’s return to lognormal distributed data because that’s more
common.

Say, you have information from a pilot study that the treatment performs
*really* (*i.e.*, 30%) better than placebo. You are
cautious (good idea!), and assume a *lower* T/R-ratio and a
*higher CV* than the observed 1.30 and 0.25.

`sampleN.noninf(CV = 0.28, theta = 1.25, margin = 1)`

```
#
# ++++++++++++ Non-inferiority test +++++++++++++
# Sample size estimation
# -----------------------------------------------
# Study design: 2x2 crossover
# log-transformed data (multiplicative model)
#
# alpha = 0.025, target power = 0.8
# Non-inf. margin = 1
# True ratio = 1.25, CV = 0.28
#
# Sample size (total)
# n power
# 26 0.802234
```

What about a parallel design? Likely the *CV* will be
substantially higher.^{12}

`sampleN.noninf(CV = 0.50, theta = 1.25, margin = 1, design = "parallel")`

```
#
# ++++++++++++ Non-inferiority test +++++++++++++
# Sample size estimation
# -----------------------------------------------
# Study design: 2 parallel groups
# log-transformed data (multiplicative model)
#
# alpha = 0.025, target power = 0.8
# Non-inf. margin = 1
# True ratio = 1.25, CV = 0.5
#
# Sample size (total)
# n power
# 144 0.803753
```

I hear the ‘Guy in the Armani suit’^{13} shouting »*C’mon,
72 subjects / arm, who shall pay for that? Hey, we have the wonder-drug!
It works twice as good as snake oil!*«

`sampleN.noninf(CV = 0.50, theta = 2, margin = 1, design = "parallel")`

```
#
# ++++++++++++ Non-inferiority test +++++++++++++
# Sample size estimation
# -----------------------------------------------
# Study design: 2 parallel groups
# log-transformed data (multiplicative model)
#
# alpha = 0.025, target power = 0.8
# Non-inf. margin = 1
# True ratio = 2, CV = 0.5
#
# Sample size (total)
# n power
# 18 0.831844
```

Cross fingers that the drug performs *really* that great. If
it is actually just 60% better than snake oil, power with this sample
will be only ≈51%. Master of disaster…

Possibly the ‘Guy in the Armani suit’ has read about ‘allocation
ratios’ in the COVID-19 vaccination trials and asks »*Why should we
treat as many patients with snake oil than with our
wonder-drug?*«

Let’s see.

```
<- function(n, alloc) {
round.up return(as.integer(alloc * (n %/% alloc + as.logical(n %% alloc))))
}<- 0.50 # Total (pooled) CV
CV <- 2 # Assumed T/R-ratio
theta0 <- 1 # Non-Inferiority margin
margin <- 0.8 # Target (desired) power
target <- 3 # Allocation of wonder-drug (T)
alloc.T <- 1 # Allocation of snake oil (R)
alloc.R # conventional 1:1
<- sampleN.noninf(CV = CV, theta0 = theta0, margin = margin,
tmp design = "parallel", targetpower = target,
print = FALSE)
.0 <- as.integer(tmp[["Sample size"]])
n.0 <- tmp[["Achieved power"]]
pwr# 3:1 allocation (naïve)
.1 <- setNames(c(round.up(n.0 / (alloc.T + alloc.R) * alloc.T, alloc.T),
nround.up(n.0 / (alloc.T + alloc.R) * alloc.R, alloc.R)),
c("Test", "Reference"))
.1 <- power.noninf(CV = CV, theta0 = theta0, margin = margin,
pwrn = n.1, design = "parallel")
# 3:1 allocation (preserving power)
.2 <- n.1
nrepeat {# increase the sample size if necessary
.2 <- power.noninf(CV = CV, theta0 = theta0, margin = margin,
pwrn = n.2, design = "parallel")
if (pwr.2 >= target) break
.2[["Test"]] <- as.integer(n.2[["Test"]] + alloc.T)
n.2[["Reference"]] <- as.integer(n.2[["Reference"]] + alloc.R)
n
}<- paste0("%", nchar(as.character(n.0)), ".0f")
fmt cat("\n++++++++++++ Non-inferiority test +++++++++++++",
"\n Sample size estimation",
"\n-----------------------------------------------",
"\nStudy design: 2 parallel groups",
"\nlog-transformed data (multiplicative model)",
"\n\nalpha = 0.025, target power =", target,
"\nNon-inf. margin =", margin,
paste0("\nTrue ratio = ", theta0, ", CV = ", CV),
"\n\nTotal sample size =", n.0, "(1:1 allocation)",
paste0("\n n (T) = ", sprintf(fmt, n.0/2),
", n (R) = ", sprintf(fmt, n.0/2),
": power = ", signif(pwr.0, 6)),
"\nTotal sample size =", sum(n.1),
"(naïve", paste0(alloc.T, ":", alloc.R, " allocation)"),
"penalty", sprintf("%.0f%%", 100*(sum(n.1)/n.0-1)),
paste0("\n n (T) = ", sprintf(fmt, n.1[["Test"]]),
", n (R) = ", sprintf(fmt, n.1[["Reference"]]),
": power = ", signif(pwr.1, 6)),
"change", sprintf("%+.2f%%", 100 * (pwr.1 - pwr.0) / pwr.0),
"\nTotal sample size =", sum(n.2),
paste0("(", alloc.T, ":", alloc.R, " allocation)"),
sprintf("%13s %.0f%%", "penalty", 100*(sum(n.2)/n.0-1)),
paste0("\n n (T) = ", sprintf(fmt, n.2[["Test"]]),
", n (R) = ", sprintf(fmt, n.2[["Reference"]]),
": power = ", signif(pwr.2, 6)),
"change", sprintf("%+.2f%%", 100 * (pwr.2 - pwr.0) / pwr.0), "\n")
```

```
#
# ++++++++++++ Non-inferiority test +++++++++++++
# Sample size estimation
# -----------------------------------------------
# Study design: 2 parallel groups
# log-transformed data (multiplicative model)
#
# alpha = 0.025, target power = 0.8
# Non-inf. margin = 1
# True ratio = 2, CV = 0.5
#
# Total sample size = 18 (1:1 allocation)
# n (T) = 9, n (R) = 9: power = 0.831844
# Total sample size = 20 (naïve 3:1 allocation) penalty 11%
# n (T) = 15, n (R) = 5: power = 0.766496 change -7.86%
# Total sample size = 24 (3:1 allocation) penalty 33%
# n (T) = 18, n (R) = 6: power = 0.844798 change +1.56%
```

Already in the naïve 3:1 allocation you have to round the sample size
up because the 18 of the 1:1 allocation is not a mutiple of 4.
Nevertheless, you loose 7.86% power. In order to preserve power, you
have to increase the sample size further.

However, it’s still based on a strong *belief* in the performance
of the wonder-drug. If it again turns out to be just 60% better than
snake oil, power with 24 subjects in the 3:1 allocation will be only
≈52%. Hardly better than tossing a coin.

A special case: Bracketing approach (EMA)

Compare a new MR
formulation (regimen once a day) with an
IR formulation (twice a day).
*C*_{max} is the surrogate target metric for safety
(Non-Superiority) and *C*_{min} the surrogate for
efficacy (Non-Inferiority):

“[…] therapeutic studies might be waived [if …]:

- there is a well-defined therapeutic window in terms of safety and efficacy, the rate of input is known not to influence the safety and efficacy profile or the risk for tolerance development and

- bioequivalence between the reference and the test product is shown in terms of AUC
_{(0-τ),ss}and- C
_{max,ss}for the new MR formulation is below or equivalent to the C_{max,ss}for the approved formulation and C_{min,ss}for the MR formulation is above or equivalent to the C_{min,ss}of the approved formulation.

Although not explicitly stated in the guideline, AFAIK the EMA expects tests at \(\small{\alpha=0.05}\).

Margins are 1.25 for *C*_{max} and 0.80 for
*C*_{min}. We assume *CV*s of 0.15 for
*AUC*, 0.20 for *C*_{max}, 0.35 for
*C*_{min}, T/R-ratios of 0.95 for *AUC* and
*C*_{min} and 1.05 for *C*_{max}. We plan
the study in a 2-treatment, 2-sequence, 4-period full replicate design due to the high
variability of *C*_{min}.

Which PK metric leads the sample
size in such a Bioequivalence (*AUC*) / Non-Superiority
(*C*_{max}) / Non-Inferiority (*C*_{min})
study?

```
<- "2x2x4"
design <- data.frame(design = "2x2x4", metric = c("AUC", "Cmax", "Cmin"),
x margin = c(NA, 1.25, 0.80), CV = c(0.15, 0.20, 0.35),
theta0 = c(0.95, 1.05, 0.95), n = NA_integer_,
power = NA_real_, stringsAsFactors = FALSE)
for (i in 1:nrow(x)) {
if (x$metric[i] == "AUC") {# ABE
6:7] <- sampleN.TOST(design = design,
x[i, theta0 = x$theta0[i],
CV = x$CV[i],
details = FALSE,
print = FALSE)[7:8]
if (x$n[i] < 12) {# minimum acc. to GLs
$n[i] <- 12
x$power[i] <- power.TOST(design = design,
xtheta0 = x$theta0[i],
CV = x$CV[i],
n = x$n[i])
}else { # Non-Inferiority, Non-Superiority
}6:7] <- sampleN.noninf(design = design,
x[i, alpha = 0.05,
margin = x$margin[i],
theta0 = x$theta0[i],
CV = x$CV[i],
details = FALSE,
print = FALSE)[6:7]
if (x$n[i] < 12) {# minimum acc. to GLs
$n[i] <- 12
x$power[i] <- power.noninf(design = design,
xalpha = 0.05,
margin = x$margin[i],
theta0 = x$theta0[i],
CV = x$CV[i],
n = x$n[i])
}
}
}$power <- signif(x$power, 4) # cosmetics
x$margin <- sprintf("%.2f", x$margin)
x$margin[x$margin == "NA"] <- "– "
xprint(x, row.names = FALSE)
cat(paste0("Sample size lead by ", x$metric[x$n == max(x$n)], ".\n"))
```

```
# design metric margin CV theta0 n power
# 2x2x4 AUC – 0.15 0.95 12 0.9881
# 2x2x4 Cmax 1.25 0.20 1.05 12 0.9098
# 2x2x4 Cmin 0.80 0.35 0.95 26 0.8184
# Sample size lead by Cmin.
```

However, with 26 subjects to show Non-Inferiority of
*C*_{min} the study is ‘overpowered’ for
BE of *AUC* and
Non-Superiority of *C*_{max}.

```
cat("Power with", max(x$n), "subjects for",
"\nAUC :",
power.TOST(design = design, CV = x$CV[1],
theta0 = x$theta0[1], n = max(x$n)),
"\nCmax:",
power.noninf(design = design, alpha = 0.05, margin = 1.25,
CV = x$CV[3], theta0 = x$theta0[3], n = max(x$n)), "\n")
```

```
# Power with 26 subjects for
# AUC : 0.9999851
# Cmax: 0.9974663
```

That gives us some space to navigate for *e.g.*,
*C*_{max} if values turn out to be ‘worse’ (say,
*CV* 0.20 → 0.25, T/R-ratio 1.05 → 1.10):

```
power.noninf(design = design, alpha = 0.05, margin = x$margin[3],
CV = 0.25, theta0 = 1.10, n = max(x$n)) # higher CV, worse theta0
```

`# [1] 0.8359967`

The bracketing approach __may__ require a lower sample size than
required for demonstrating BE with
the common CI-inclusion
approach for all PK metrics, which
is another option mentioned in the guideline.^{3} Note that reference-scaling by
ABEL is
acceptable for *C*_{max} and *C*_{min} if
their *CV*_{wR} >30%, expanding the limits can be
justified based on clinical grounds, and *CV*_{wR}
> 30% is not caused by ‘outliers’.^{3} ^{14} How
does that compare?

```
<- data.frame(design = design, method = "ABE",
y metric = c("AUC", "Cmax", "Cmin"),
CV = c(0.15, 0.20, 0.35),
theta0 = c(0.95, 1.05, 0.90),
L = 0.8, U = 1.25, n = NA_integer_,
power = NA_real_, stringsAsFactors = FALSE)
for (i in 1:nrow(y)) {
if (y$metric[i] == "AUC" | y$CV[i] <= 0.3) {
8:9] <- sampleN.TOST(CV = y$CV[i], theta0 = y$theta0[i],
y[i, design = design, print = FALSE,
details = FALSE)[7:8]
if (y$n[i] < 12) {# minimum acc. to the GL
$n[i] <- 12
y$power[i] <- power.TOST(CV = y$CV[i], theta0 = y$theta0[i],
ydesign = design, n = y$n[i])
}else {
}$method[i] <- "ABEL"
y6:7] <- scABEL(CV = y$CV[i])
y[i, 8:9] <- sampleN.scABEL(CV = y$CV[i], theta0 = y$theta0[i],
y[i, design = design, print = FALSE,
details = FALSE)[8:9]
}
}$L <- sprintf("%.2f%%", 100 * y$L) # cosmetics
y$U <- sprintf("%.2f%%", 100 * y$U)
y$power <- signif(y$power, 4)
ynames(y)[6:7] <- c("L ", "U ")
print(y, row.names = FALSE)
```

```
# design method metric CV theta0 L U n power
# 2x2x4 ABE AUC 0.15 0.95 80.00% 125.00% 12 0.9881
# 2x2x4 ABE Cmax 0.20 1.05 80.00% 125.00% 12 0.9085
# 2x2x4 ABEL Cmin 0.35 0.90 77.23% 129.48% 34 0.8118
```

Which approach is optimal is a case-to-case decision. Although in
this example bracketing is the ‘winner’ (26 subjects instead of 34), it
might be problematic if a *CV* is larger and/or a T/R-ratio worse
than assumed: *CV* of *AUC* 0.15 → 0.20,
*C*_{max} 0.20 → 0.25, *C*_{min} 0.35 →
0.50; T/R-ratio of *AUC* 0.95 → 0.90, *C*_{max}
1.05 → 1.12, *C*_{min} 0.90 → 0.88.

```
<- max(y$n)
n <- data.frame(approach = c("ABE", "Non-Superiority", "ABE",
z "Non-Inferiority", "ABE"),
metric = c("AUC", rep(c("Cmax", "Cmin"), each = 2)),
CV = c(0.2, rep(c(0.25, 0.50), each = 2)),
theta0 = c(0.90, rep(c(1.12, 0.88), each = 2)),
margin = c(NA, 1.25, NA, 0.80, NA),
L = c(0.80, NA, 0.80, NA, 0.80),
U = c(1.25, NA, 1.25, NA, 1.25),
n = n, power = NA_real_,
stringsAsFactors = FALSE)
for (i in 1:nrow(z)) {
if (z$approach[i] %in% c("Non-Superiority", "Non-Inferiority")) {
$power[i] <- power.noninf(design = design,
zalpha = 0.05,
margin = z$margin[i],
theta0 = z$theta0[i],
CV = z$CV[i],
n = z$n[i])
else {
}if (z$CV[i] <= 0.3) {
$power[i] <- power.TOST(design = design,
ztheta0 = z$theta0[i],
CV = z$CV[i],
n = z$n[i])
else {
}$approach[i] <- "ABEL"
z6:7] <- scABEL(CV = z$CV[i])
z[i, $power[i] <- power.scABEL(design = design,
ztheta0 = z$theta0[i],
CV = z$CV[i],
n = z$n[i])
}
}
}$L <- sprintf("%.2f%%", 100 * z$L) # cosmetics
z$U <- sprintf("%.2f%%", 100 * z$U)
z$power <- signif(z$power, 4)
z$margin <- sprintf("%.2f", z$margin)
z$margin[z$margin == "NA"] <- "– "
z$L[z$L == "NA%"] <- "– "
z$U[z$U == "NA%"] <- "– "
znames(z)[6:7] <- c("L ", "U ")
print(z, row.names = FALSE)
```

```
# approach metric CV theta0 margin L U n power
# ABE AUC 0.20 0.90 – 80.00% 125.00% 34 0.9640
# Non-Superiority Cmax 0.25 1.12 1.25 – – 34 0.8258
# ABE Cmax 0.25 1.12 – 80.00% 125.00% 34 0.8258
# Non-Inferiority Cmin 0.50 0.88 0.80 – – 34 0.3169
# ABEL Cmin 0.50 0.88 – 69.84% 143.19% 34 0.8183
```

**Non-Superiority / Non-Inferiority**

We will pass*C*_{max}(note that its power equals the one of ABE) but fail^{15}*C*_{min}.

If your software does not support one-sided tests (*i.e.*, gives only a two-sided CI), use the upper reported CL for Non-Superiority and the lower CL for Non-Inferiority.**ABEL / ABE**

Although we yet have to assess*C*_{max}by ABE (*CV*_{wR}< 30%), it is not ‘overpowered’ any more.

In reference-scaling by ABEL*C*_{min}will still pass due to more expansion of the limits (69.84% – 143.19% for*CV*_{wR}50% instead of 77.23% – 129.48% for*CV*_{wR}35%).

Hence, in this case the equivalence approach by ABE(L) is the ‘winner’ because it tolerates more deviations from assumptions.

What happens if you fail to convince the agency that ABEL is acceptable? The picture changes.

```
<- data.frame(design = design, method = "ABE",
a metric = c("AUC", "Cmax", "Cmin"),
CV = c(0.15, 0.20, 0.35),
theta0 = c(0.95, 1.05, 0.90),
L = 0.8, U = 1.25, n = NA_integer_,
power = NA_real_, stringsAsFactors = FALSE)
for (i in 1:nrow(a)) {
8:9] <- sampleN.TOST(CV = a$CV[i], theta0 = a$theta0[i],
a[i, design = design, print = FALSE,
details = FALSE)[7:8]
if (a$n[i] < 12) {# minimum acc. to the GL
$n[i] <- 12
a$power[i] <- power.TOST(CV = a$CV[i], theta0 = a$theta0[i],
adesign = design, n = a$n[i])
}
}$L <- sprintf("%.2f%%", 100 * a$L) # cosmetics
a$U <- sprintf("%.2f%%", 100 * a$U)
a$power <- signif(a$power, 4)
anames(a)[6:7] <- c("L ", "U ")
print(a, row.names = FALSE)
```

```
# design method metric CV theta0 L U n power
# 2x2x4 ABE AUC 0.15 0.95 80.00% 125.00% 12 0.9881
# 2x2x4 ABE Cmax 0.20 1.05 80.00% 125.00% 12 0.9085
# 2x2x4 ABE Cmin 0.35 0.90 80.00% 125.00% 52 0.8003
```

Nasty – we need a ≈53% larger sample size.

If all values turn out to be as worse as above:

```
<- data.frame(design = design, method = "ABE",
b metric = c("AUC", "Cmax", "Cmin"),
CV = c(0.20, 0.25, 0.50),
theta0 = c(0.90, 1.12, 0.88),
L = 0.8, U = 1.25, n = NA_integer_,
power = NA_real_, stringsAsFactors = FALSE)
for (i in 1:nrow(a)) {
8:9] <- sampleN.TOST(CV = b$CV[i], theta0 = b$theta0[i],
b[i, design = design, print = FALSE,
details = FALSE)[7:8]
if (b$n[i] < 12) {# minimum acc. to the GL
$n[i] <- 12
b$power[i] <- power.TOST(CV = b$CV[i], theta0 = b$theta0[i],
bdesign = design, n = b$n[i])
}
}$L <- sprintf("%.2f%%", 100 * b$L) # cosmetics
b$U <- sprintf("%.2f%%", 100 * b$U)
b$power <- signif(b$power, 4)
bnames(b)[6:7] <- c("L ", "U ")
print(b, row.names = FALSE)
```

```
# design method metric CV theta0 L U n power
# 2x2x4 ABE AUC 0.20 0.90 80.00% 125.00% 18 0.8007
# 2x2x4 ABE Cmax 0.25 1.12 80.00% 125.00% 32 0.8050
# 2x2x4 ABE Cmin 0.50 0.88 80.00% 125.00% 154 0.8038
```

End of the story. Recall that this is a study in a 2-treatment, 2-sequence, 4-period full replicate design.

```
<- data.frame(design = "2x2x4",
c approach = c("ABE", "Non-Superiority", "Non-Inferiority"),
metric = c("AUC", "Cmax", "Cmin"),
margin = c(NA, 1.25, 0.80), CV = c(0.20, 0.25, 0.50),
theta0 = c(0.90, 1.12, 0.88), n = NA_integer_,
power = NA_real_, stringsAsFactors = FALSE)
for (i in 1:nrow(c)) {
if (c$approach[i] == "ABE") {# ABE
7:8] <- sampleN.TOST(CV = c$CV[i], theta0 = c$theta0[i],
c[i, design = c$design[i], print = FALSE,
details = FALSE)[7:8]
if (c$n[i] < 12) {# minimum acc. to the GL
$n[i] <- 12
c$power[i] <- power.TOST(CV = c$CV[i], theta0 = c$theta0[i],
cdesign = c$design[i], n = c$n[i])
}else { # Non-Inferiority, Non-Superiority
}7:8] <- sampleN.noninf(alpha = 0.05, CV = c$CV[i],
c[i, margin = c$margin[i], theta0 = c$theta0[i],
design = c$design[i], details = FALSE,
print = FALSE)[6:7]
if (c$n[i] < 12) {# minimum acc. to GLs
$n[i] <- 12
c$power[i] <- power.noninf(alpha = 0.05, CV = c$CV[i],
cmargin = c$margin[i], theta0 = c$theta0[i],
design = c$design[i], n = c$n[i])
}
}
}$power <- signif(c$power, 4) # cosmetics
c$margin <- sprintf("%.2f", c$margin)
c$margin[c$margin == "NA"] <- "– "
cprint(c, row.names = FALSE)
```

```
# design approach metric margin CV theta0 n power
# 2x2x4 ABE AUC – 0.20 0.90 18 0.8007
# 2x2x4 Non-Superiority Cmax 1.25 0.25 1.12 32 0.8050
# 2x2x4 Non-Inferiority Cmin 0.80 0.50 0.88 154 0.8038
```

As an aside, we would need also require 154 subjects to demonstrate
Non-Inferiority of *C*_{min} in the bracketing approach.
Perhaps it is readily more economic to opt for a clinical trial…

top of section ↩︎ previous section ↩︎

Helmut Schütz 2022

`R`

, `PowerTOST`

, and
`arsenal`

GPL 3.0,
`klippy`

MIT,
`pandoc`

GPL 2.0.

1^{st} version July 24, 2022. Rendered September 10, 2022 14:18
CEST by rmarkdown via pandoc in 0.73 seconds.

Labes D, Schütz H, Lang B.

*PowerTOST: Power and Sample Size for (Bio)Equivalence Studies.*Package version 1.5.4. 2022-02-21. CRAN.↩︎Labes D, Schütz H, Lang B.

*Package ‘PowerTOST’.*February 21, 2022. CRAN.↩︎Chow S-C, Shao J, Wang H.

*Sample Size Calculations in Clinical Research.*New York: Marcel Dekker. 2003. Chapter 3.↩︎Julious SA.

*Sample Sizes for Clinical Trials.*Boca Raton: CRC Press; 2010. Chapter 4.↩︎EMA, CHMP.

*Guideline on the pharmacokinetic and clinical evaluation of modified release dosage forms.*London. 20 November 2014. EMA/CPMP/EWP/280/96 Corr1. Online.↩︎Zhang P.

*A Simple Formula for Sample Size Calculation in Equivalence Studies.*J Biopharm Stat. 2003; 13(3): 529–38. doi:10.1081/BIP-120022772.↩︎Senn S.

*Guernsey McPearson’s Drug Development Dictionary.*April 21, 2020. Online.↩︎Hoenig JM, Heisey DM.

*The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis.*Am Stat. 2001; 55(1): 19–24. doi:10.1198/000313001300339897. Open Access.↩︎In short: There is no statistical method to ‘correct’ for unequal carryover. It can only be avoided by design,

*i.e.*, a sufficiently long washout between periods. According to the guidelines subjects with pre-dose concentrations > 5% of their*C*_{max}can by excluded from the comparison if stated in the protocol.↩︎Especially important for drugs which are auto-inducers or -inhibitors and biologics.↩︎

Senn S.

*Statistical Issues in Drug Development.*Chichester: John Wiley; 2^{nd}ed 2007.↩︎It depends on

*both*the within- and between-subject variances. In general the latter is larger than the former (see above).↩︎‘The Guy in the Armani suit’ (© ElMaestro, introduced there) is a running gag in the BEBA Forum. He (occasionally she) is only proficient in Powerpoint, copypasting from one document to an other, and shouting »

*You are Fired!*« if a study fails.↩︎EMA, CHMP.

*Guideline on the Investigation of Bioequivalence.*CPMP/EWP/QWP/1401/98 Rev. 1/ Corr. London. 20 January 2010. Online.↩︎Any power < 50% is a failure by definition.↩︎