Consider allowing JavaScript. Otherwise, you have to be proficient in reading since formulas will not be rendered. Furthermore, the table of contents in the left column for navigation will not be available and code-folding not supported. Sorry for the inconvenience.

Examples in this article were generated with
4.2.1 by the packages `survival`

,^{1} `coin`

,^{2}
`truncnorm`

,^{3} `reshape`

,^{4}
`lattice`

,^{5} and `moments`

.^{6} An
*intermediate* knowledge of R is
required. Any version of R would likely be
sufficient to run the examples.

- The right-hand badges give the respective section’s ‘level’.

- Basics about Noncompartmental Analysis (NCA) – requiring no or only limited statistical expertise.

- These sections are the most important ones. They are – hopefully – easily comprehensible even for novices.

- A somewhat higher knowledge of statistics and/or R is required. May be skipped or reserved for a later reading.

- An advanced knowledge of statistics and/or R is required. Not recommended for beginners in particular.

- Click to show / hide R code.

What is an ‘apparent’ difference in
*t*_{max}?

An excellent question! Puzzles me for years…

In early European guidances a nonparametric test was recommeded.

“Statistical evaluation of \(\small{t_\text{max}}\) only makes sense if there is a clinically relevant claim for rapid release or action or signs related to adverse effects. The non-parametric 90 % confidence interval for this measure of relative bioavailability should lie within a clinically determined range.

In the revised guideline the European Medicines Agency abandoned the test and stated for immediate release products:

“A statistical evaluation of \(\small{t_\text{max}}\) is not required. However, if rapid release is claimed to be clinically relevant and of importance for onset of action or is related to adverse events, there should be no apparent difference in median \(\small{t_\text{max}}\) and its variability between test and reference product.

Later the EMA stated for delayed and multiphasic release formulations:

“A formal statistical evaluation of \(\small{t_\text{max}}\) is not required. However, there should be no apparent difference in median \(\small{t_\text{max}}\) and its range between test and reference product.

In recent product-specific draft guidances for ibuprofen, paracetamol (acetaminophen for American readers), and tadalafil we find:

“Comparable median (≤ 20 % difference) and range for \(\small{T_\text{max}}\).

In a footnote of the guidances find:

“This revision concerns defining what is meant by ‘comparable’ \(\small{T_\text{max}}\) as an additional main pharmacokinetic variable in the bioequivalence assessment section of the guideline.

‘Apparent’ is – and for good reasons – not contained in the
statistical toolbox. Not surprising, since according to the guidelines
»*a* [formal] *statistical evaluation of* \(\small{t_\text{max}}\) *is not
required*«.

Now what? Based on gut feeling, reading tea leaves?

These definitions of ‘apparent’ are given in dictionaries:

Macmillan (British English)

- Easy to see or understand.
- An apparent quality, feeling, or situation seems to exist although it may not be real.

Meriam-Webster (American English)

- Open to view, visible.
- Clear or manifest to the understanding.
- Appearing as actual to the eye or mind.
- Manifest to the senses or mind as real or true on the basis of evidence that may or may not be factually valid.

Beauty Lies in the Eye of the Beholder. Paper doesn’t blush…

Hence, the assessment of formulation differences opens the door for
*interpretation*, which is – from a scientific perspective – extremely unsatisfactory.

A plot supporting visual inspection was proposed^{14} but
AFAIK, rarely applied (I used it
only once in 30 years). An example from my archive:

```
# sampling time points
<- c(0, 0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75,
smpl 2, 2.25, 2.5, 3, 4, 6, 8, 10, 12)
# subjects’ tmax of R and T
<- c(1.25, 2.00, 1.00, 1.25, 2.50, 1.25, 1.50, 2.25,
R 1.00, 1.25, 1.50, 1.25, 2.25, 1.75, 2.50, 2.25)
<- c(1.50, 0.50, 0.75, 2.25, 0.75, 0.75, 2.25, 2.25,
T 0.75, 0.75, 1.50, 1.50, 0.75, 1.00, 2.50, 0.50)
<- max(c(length(R), length(T))) # number of subjects
n # vector of observed tmax values (R and T)
<- unique(sort(c(R, T)))
obs # intervals for the pseudo bar charts
<- smpl[(which(smpl == min(obs))-1):(which(smpl == max(obs))+1)]
int # counts
<- n.T <- integer()
n.R for (i in seq_along(int)) {
<- sum(abs(R - int[i]) < 1e-6)
n.R[i] <- sum(abs(T - int[i]) < 1e-6)
n.T[i]
}<- c(-1.04, 1.04) * max(n.R, n.T)
ylim dev.new(width = 4.6, height = 4.6)
<- par(no.readonly = TRUE)
op par(mar = c(4.1, 4, 2, 0), cex.axis = 0.9)
plot(c(0, max(obs) * 1.04), ylim, type = "n", axes = FALSE,
xlab = "Sampling time (h)", ylab = "Number of subjects",
main = expression(italic(t)[max]))
axis(side = 1, at = unique(round(obs, 0)))
axis(side = 1, at = int, labels = FALSE, tcl = -0.25)
axis(side = 2, at = pretty(ylim[1]:ylim[2], 5),
labels = abs(pretty(ylim[1]:ylim[2], 5)), las = 1)
axis(side = 2, at = round((ylim[1]:ylim[2])), labels = FALSE, tcl = -0.25)
abline(h = 0)
mtext("Reference", side = 2, line = 1.75, at = ylim[2] / 2)
mtext("Test", side = 2, line = 1.75, at = ylim[1] / 2)
points(median(R), ylim[2], cex = 1.15, pch = 21, bg = "#BFFFEF", col = "black")
points(median(T), ylim[1], cex = 1.15, pch = 21, bg = "#BFEFFF", col = "black")
legend(median(R), ylim[2], cex = 0.9, box.lty = 0, x.intersp = 0,
yjust = 0.5, legend = bquote(tilde(t)[max] == .(median(R))))
legend(median(T), ylim[1], cex = 0.9, box.lty = 0, x.intersp = 0,
yjust = 0.5, legend = bquote(tilde(t)[max] == .(median(T))))
box()
<- 1
j for (i in seq_along(int)) {
<- j + 1
j <- (int[i] + int[i+1]) / 2 - int[i]
D rect(int[i] + D, 0, int[i+1] + D, n.R[j], col = "#BFFFEF", border = "black")
rect(int[i] + D, 0, int[i+1] + D, -n.T[j], col = "#BFEFFF", border = "black")
}par(op)
```

Although the plot *appears* like two bar charts, they aren’t.
If sampling intervals are not equally spaced like in this case, the
areas of rectangles will give a false impression. Of course, such a plot
leaves space for interpretation.

A comparison of \(\small{t_\text{max}}\) was never – and is not – required by the FDA and Health Canada.

Whereas the *underlying* distribution of \(\small{t_\text{max}}\) is
*continuous*, due to the sampling schedule the *observed*
distribution is *discrete*. This calls for a comparison by a
robust (nonparametric)
statistical method.

In Europe a comparison by a nonparametric method was recommended for
19 years.^{7} ^{8}
That was – and still is! – statistically sound. A nonparametric method
is currently recommended in some jurisdictions (*e.g.*,
Argentina,^{15} Japan,^{16} ^{17} and South Africa^{18}).

For modified release products \(\small{t_\text{max}}\) was not mentioned in
the European Note for Guidance of 1999 at all.^{19}

Alas, the EMA’s vague
recommendation of 2010 was incurred in numerous other jurisdictions
(*e.g.*, Australia,^{20}
ASEAN
states,^{21} Chile,^{22} the
EEU,^{23} Egypt,^{24} members
of the Gulf Cooperation Council,^{25} New Zealand^{26}).

The WHO leaves more scope for interpretation.

“Where \(\small{t_\text{max}}\) is considered clinically relevant, median and range of \(\small{t_\text{max}}\) should be compared between test and comparator to exclude numerical differences with clinical importance. A formal statistical comparison is rarely necessary. Generally the sample size is not calculated to have enough statistical power for \(\small{t_\text{max}}\). However, if \(\small{t_\text{max}}\) is to be subjected to a statistical analysis, this should be based on non-parametric methods and should be applied to untransformed data.

- »
*A*[formal]*statistical evaluation of**t*_{max}*is not required.*«

- Well. Does that imply that if one is performed anyhow, it is not welcomed?

- » […]
*there should be no apparent difference in median**t*_{max}*and its variability*

*between test and reference product.*«

- Leaving the dubious statement about an ‘apparent’ difference aside –
what might the ‘variability’ of the median be? The median is a statistic (one of
many
*estimators*) whose value is a certain number (the*estimate*). To be clear: The median simply does not possess any ‘variability’.

If we agree that*t*_{max}follows a discrete distribution,^{28}we can only make a statement about the variability of the*sample*.^{29}A nonparametric statistic is the Interquartile Range (IQR). Is*that*meant?

- Leaving the dubious statement about an ‘apparent’ difference aside –
what might the ‘variability’ of the median be? The median is a statistic (one of
many
- » […]
*there should be no apparent difference in median**t*_{max}*and its range between test and reference product.*«

- Similarly the median does not have a
*range*, only the sample. At least the range is a statistical term we can work with.

- Similarly the median does not have a
- »
*Comparable median (≤ 20 % difference)*[…]*T*_{max}.«

- This approach is statistically questionable at least.

- »
*Comparable*[…]*range for**T*_{max}.«

- What’s comparable?

Interlude: Scales and Distributions

Temperature in ℃ follows a *continuous* distribution on an
*interval scale*.
Say, we want to compare 10 ℃ with 20 ℃. Is 10 ℃ 50 % of 20 ℃? Of course,
that’s absurd. What about 0 ℃ or –10 ℃? Calculating a ratio is not an
allowed operation for data on an interval scale (*i.e.*, with an
arbitrary zero point). We __must only__ *add* or
*subtract* values. Hence, saying that 10 ℃ is 10 ℃ less than
20 ℃, is trivial but correct.

Compare that to the absolute temperature in Kelvin. It follows a
*continuous* distribution on a *ratio scale*
(*i.e.*, with a true zero). Hence, beside adding/subtracting we
are also allowed to multiply/divide values. Coming back to the example:
Saying that 283 K (10 ℃) is 96.6 % of 293 K (20 ℃) is absolutely
[*sic*] correct.

One might argue that \(\small{t_\text{max}}\) has a true zero as
well and hence, is on a *ratio scale*. That’s valid for the
*true* (but unknown) \(\small{t_\text{max}}\), which follows a
continuous distribution. However, observed values follow a
*discrete* distribution. Furthermore, it’s unlikely that all
sampling intervals are equally spaced and therefore, calculating a ratio
might be questionable.^{30} An option would be to sample in equally
spaced intervals until absorption is essentially complete (in a one
compartment model \(\small{\approx 2\times
t_\textrm{max}}\)), divide the subject’s \(\small{t_\text{max}}\) by the sampling
interval to obtain a positive integer, and analyze the counts as a Poisson
distribution. However, the power of such an approach is poor.^{31}

»\(\small{t_\text{max}}\) *of test
was 83.3 % of the reference*«

»\(\small{t_\text{max}}\) *of test
was observed 20 minutes earlier than the one of the reference*«?

Of course, the ‘Two Lászlós’ explored how \(\small{t_\text{max}}\) behaves.

“The positive bias of \(\small{T_\textrm{max}}\) increase[s] together with the observational error. This result can be attributed to the asymmetry of the observed concentrations around the peak. The concentrations rise more steeply before the peak than they decline following the true maximum response. Consequently, it is more likely that large observed concentrations occur after than before the true peak time (\(\small{T^{\circ}_\textrm{max}}\)). “[…] the usefulness of peak times greatly diminishes when trying to characterize some MR formulations. Care must be taken when \(\small{T_\text{max}}\) (and \(\small{C_\text{max}}\)) is interpreted with concentration profiles having multiple peaks and MR preparations containing multiple components. Furthermore, the uncertainty of the recorded \(\small{T_\text{max}}\) renders it practically useless with the wide, almost flat peaks observed with extended-release formulations.

I could confirm the simulations^{32} ^{33} by data from
my archive (drug ‘X’, single dose, studies powered for the moderate
variability of \(\small{C_\text{max}\textsf{)}}\).

I estimated the sample
skewness by the method
of moments \[g_1=\frac{\tfrac{1}{n} \sum_{i=1}^n
(x_i-\overline{x})^3}{\sqrt{\tfrac{1}{n-1} \sum_{i=1}^n
(x_i-\overline{x})^2}^{\,3}}.\tag{1}\] Obviously the
distributions of \(\small{t_\text{max}}\) are skewed to the
right (IR +0.778,
MR +0.416) with a wide spread in
the studies of MR formulations.
Given that – and for comparing differences with the ±20 % criterion –
wouldn’t it makes sense to use an *asymmetrical* acceptance
range? More about that further down.

top of section ↩︎ previous section ↩︎

Imagine \(\small{x_{i=2}=\ldots=x_{i=n}}\) for \(\small{n\rightarrow \infty}\). If \(\small{x_1\rightarrow \infty}\) then \(\small{\bar{x}\rightarrow \infty}\). Hence,
a *single* (‼) value distorts the mean towards the extreme.
Therefore, the breakdown
point of the arithmetic mean is 0. If any of the values is infinite,
all others are practically ignored.

The median has a breakdown point of 0.5. That means, it would need
50 % of the data contaminated by ‘junk’ until the median of the sample
would change.^{34} Hence, it is a robust
estimator.

The breakdown point of the IQR
is 0.25.

On the other hand, the range has – like the mean – a breakdown point
of 0. Therefore, it is *not* a robust statistic. Without
inspecting the data or performing an EAD
(*e.g.*, box
plots), the range is not informative and for comparisons completely
useless.^{35}

Imagine a study where *all* \(\small{t_\text{max}}\) values for three
treatments (R, T_{1}, T_{2}) were 1.

However, in one case after R it was 2 and in one case after
T_{1} it was 3. Now what?

```
<- function(x) {
sum.np # Nonparametric summary:
# Remove Mean but keep eventual NAs, add IQR and Range
<- summary(x)
y if (length(y) == 6) {
<- y[c(1:3, 5:6)]
y else {
}<- y[c(1:3, 5:7)]
y
}<- c(names(y), "IQR", "Range")
names length(y) + 1] <- y[["3rd Qu."]] - y[["1st Qu."]]
y[length(y) + 1] <- y[["Max."]] - y[["Min."]]
y[<- setNames(as.vector(y), names)
y return(y)
}<- function(x, na.action = na.omit) {
HL # Hodges-Lehmann estimator of x
if (!is.vector(x) || !is.numeric(x))
stop("'x' must be a numeric vector!")
<- na.action(x)
x <- outer(x, x, '+')
y return(median(y[row(y) >= col(y)]) / 2)
}<- 16
n <- 0.5 # sampling every 30 minutes
spl <- c(rep(1, n-1), 2)
R <- sum.np(R)
sumR <- c(rep(1, n-1), 3)
T1 <- sum.np(T1)
sumT1 <- c(rep(1, n-1), 1)
T2 <- sum.np(T2)
sumT2 <- c(HL(R), HL(T1), HL(T2))
hl <- data.frame(tmax = c(R, T1),
comp1 treatment = factor(rep(c("R", "T1"), c(n, n))))
<- data.frame(tmax = c(R, T2),
comp2 treatment = factor(rep(c("R", "T2"), c(n, n))))
<- data.frame(treatment = c("R", "T1", "T2"),
res Median = c(sumR[["Median"]], sumT1[["Median"]], sumT2[["Median"]]),
HL.est = hl,
IQR = c(sumR[["IQR"]], sumT1[["IQR"]], sumT2[["IQR"]]),
Range = c(sumR[["Range"]], sumT1[["Range"]], sumT2[["Range"]]))
# we need suppressWarnings() if data are identical
# (the CI cannot be computed)
<- suppressWarnings(
wt1 wilcox_test(tmax ~ treatment, data = comp1,
distribution = "exact", conf.int = TRUE,
conf.level = 0.90))
<- suppressWarnings(
wt2 wilcox_test(tmax ~ treatment, data = comp2,
distribution = "exact", conf.int = TRUE,
conf.level = 0.90))
<- c(pvalue(wt1), pvalue(wt2))
p <- c(suppressWarnings(confint(wt1)[[2]]),
PE suppressWarnings(confint(wt2)[[2]]))
<- c(suppressWarnings(confint(wt1)$conf.int[1]),
lower suppressWarnings(confint(wt2)$conf.int[1]))
<- c(suppressWarnings(confint(wt1)$conf.int[2]),
upper suppressWarnings(confint(wt2)$conf.int[2]))
<- data.frame(comparison = c("T1 vs R", "T2 vs R"), p.value = p,
wilc PE = PE, lower.CL = lower, upper.CL = upper)
2:5] <- signif(wilc[, 2:5], 4)
wilc[, print(res, row.names = FALSE)
print(wilc, row.names = FALSE)
```

```
# treatment Median HL.est IQR Range
# R 1 1 0 1
# T1 1 1 0 2
# T2 1 1 0 0
# comparison p.value PE lower.CL upper.CL
# T1 vs R 1 0 0 0
# T2 vs R 1 0 NA NA
```

Not only ‘apparently’ the medians and Hodges–Lehmann estimates are identical. The IQRs are funny.

Are the ranges ‘apparently’ different? I would say so. Is
T_{1} *inferior* to R due to its wider range? Is
T_{2} *superior* to R due to its narrower range?
IMHO, such a comparision
doesn’t make sense.

No significant differences are detected with a nonparametric test.
Note that in the second comparison the confidence interval cannot be
computed because all values of T_{2} are identical.

Interested in a more ‘realistic’ example? 48 subjects, sampling every
20 minutes between 40 and 100 minutes post dose. As above R
‘contaminated’ with one \(\small{t_\text{max}}\) value of two hours
and T_{1} with one of three hours.

```
library(survival)
library(coin)
<- function(x) {
sum.np # Nonparametric summary:
# Remove Mean but keep eventual NAs, add IQR and Range
<- summary(x)
y if (length(y) == 6) {
<- y[c(1:3, 5:6)]
y else {
}<- y[c(1:3, 5:7)]
y
}<- c(names(y), "IQR", "Range")
names length(y) + 1] <- y[["3rd Qu."]] - y[["1st Qu."]]
y[length(y) + 1] <- y[["Max."]] - y[["Min."]]
y[<- setNames(as.vector(y), names)
y return(y)
}<- function(x, na.action = na.omit) {
HL # Hodges-Lehmann estimator of x
if (!is.vector(x) || !is.numeric(x))
stop("'x' must be a numeric vector!")
<- na.action(x)
x <- outer(x, x, '+')
y return(median(y[row(y) >= col(y)]) / 2)
}<- function(x, y) {
roundClosest # round x to the closest multiple of y
return(y * round(x / y))
}set.seed(1234567) # for reproducibility
<- 48
n <- 40/60
lo <- 100/60
hi <- 20/60
y <- runif(n - 1, min = lo, max = hi)
x <- roundClosest(x, y)
x <- c(x, 2)
R <- runif(n - 1, min = lo, max = hi)
x <- roundClosest(x, y)
x <- c(x, 3)
T1 <- runif(n - 1, min = lo, max = hi)
x <- roundClosest(x, y)
x <- c(x, 1)
T2 <- sum.np(R)
sumR <- sum.np(T1)
sumT1 <- sum.np(T2)
sumT2 <- c(HL(R), HL(T1), HL(T2))
hl <- data.frame(tmax = c(R, T1),
comp1 treatment = factor(rep(c("R", "T1"), c(n, n))))
<- data.frame(tmax = c(R, T2),
comp2 treatment = factor(rep(c("R", "T2"), c(n, n))))
<- data.frame(treatment = c("R", "T1", "T2"),
res Median = c(sumR[["Median"]], sumT1[["Median"]], sumT2[["Median"]]),
HL.est = hl,
IQR = c(sumR[["IQR"]], sumT1[["IQR"]], sumT2[["IQR"]]),
Min = c(sumR[["Min."]], sumT1[["Min."]], sumT2[["Min."]]),
Max = c(sumR[["Max."]], sumT1[["Max."]], sumT2[["Max."]]),
Range = c(sumR[["Range"]], sumT1[["Range"]], sumT2[["Range"]]))
2:7] <- signif(res[, 2:7], 4)
res[, <- wilcox_test(tmax ~ treatment, data = comp1,
wt1 distribution = "exact", conf.int = TRUE,
conf.level = 0.90)
<- wilcox_test(tmax ~ treatment, data = comp2,
wt2 distribution = "exact", conf.int = TRUE,
conf.level = 0.90)
<- c(pvalue(wt1), pvalue(wt2))
p <- c(confint(wt1)[[2]], confint(wt2)[[2]])
PE <- c(confint(wt1)$conf.int[1], confint(wt2)$conf.int[1])
lower <- c(confint(wt1)$conf.int[2], confint(wt2)$conf.int[2])
upper <- data.frame(comparison = c("T1 vs R", "T2 vs R"), p.value = p,
wilc PE = PE, lower.CL = lower, upper.CL = upper)
2:5] <- signif(wilc[, 2:5], 4)
wilc[, # box plots
<- 1.04 * c(0, max(c(R, T1, T2), na.rm = TRUE))
ylim dev.new(width = 4.6, height = 4.6)
<- par(no.readonly = TRUE)
op par(mar = c(2.1, 4.1, 0, 0), cex.axis = 0.9)
plot(c(0.5, 3.5), c(0, 0), type = "n", axes = FALSE,
ylim = ylim, xlab = "", ylab = expression(italic(t)[max]))
axis(1, at = 1:3, labels = c("R", expression(T[1]), expression(T[2])), tick = FALSE)
axis(2, las = 1)
boxplot(tmax ~ treatment, data = comp1, add = TRUE, at = 1:2, boxwex = 0.4,
ylim = ylim, names = "", whisklty = 1, medcol = "mediumblue", main = "",
axes = FALSE, col = "lightblue1", outcol = "red")
boxplot(tmax ~ treatment, data = comp2, add = TRUE, at = c(1, 3), boxwex = 0.4,
ylim = ylim, names = "", ylab = "", whisklty = 1, medcol = "mediumblue",
main = "", axes = FALSE, col = "lightblue1", outcol = "red")
points(1:3, hl, col = "blue", pch = 19, cex = 1.15)
par(op)
print(res, row.names = FALSE)
print(wilc, row.names = FALSE)
```

```
# treatment Median HL.est IQR Min Max Range
# R 1.333 1.167 0.3333 0.6667 2.000 1.333
# T1 1.000 1.167 0.3333 0.6667 3.000 2.333
# T2 1.333 1.167 0.3333 0.6667 1.667 1.000
# comparison p.value PE lower.CL upper.CL
# T1 vs R 0.1378 0 0 0.3333
# T2 vs R 0.4410 0 0 0.3333
```

Is the median of T_{1} with 60 minutes ‘apparently’ different
from the one of R with 75 minutes? If we would assess the median for a
±20 % difference,^{11} ^{12} ^{13}
T_{1} would fail with its –25 %. Note that the Hodges-Lehmann
estimates of all three treatments are identical.

Since we contamined the data of R and T_{1}, all ranges are
different. To be expected with the breakdown point of 0. The
IQRs are not affected by 2 % of
contaminated data due to the breakdown point of 0.25.

With a nonparametric test no significant differences (*p*
> 0.05) are detected; the confidence intervals include zero.

What might an almighty assessor conclude by looking at the medians or ranges?

Are they ‘apparently’ different?

I selected a simple one. An immediate release formulation, one
compartment model, PK-parameters:
\(\small{D=100}\), fraction absorbed
\(\small{f=0.80}\), volume of
distribution \(\small{V=4}\),
absorption rate constant \(\small{k_{\,01}=2\,\textrm{h}^{-1}}\)
(\(\small{t_{1/2}\approx0.35\,\text{h}}\)),
lag time \(\small{t_\textrm{lag}=5\,\textrm{min}}\),
elimination rate constant \(\small{k_{\,10}=0.25\,\textrm{h}^{-1}}\)
(\(\small{t_{1/2}\approx2.77\,\text{h}}\)).

The sampling schedule was every 15 minutes until 2.5 hours, 3.25, 4.25,
5.5, 7, 9.25, and 12 hours in order to get a reliable estimate of \(\small{t_\text{max}}\) (the theoretical
value based on the model is 1.272 hours).

Error distributions were uniform
for \(\small{f}\) (0.6–1), log-normal
for \(\small{V}\) (*CV* 50 %),
\(\small{k_{\,01}}\) (*CV*
35 %), \(\small{k_{\,10}}\)
(*CV* 40 %), and truncated
normal for \(\small{t_\textrm{lag}}\) (limits 0–15 min).
Analytical error was log-normal with a *CV* of 5 % of the
simulated concentration.

25 simulated studies with 48 subjects each. The LLOQ was set to 5 % of the theoretical \(\small{C_\text{max}}\).

I split the data of the studies in halves (mimicking a parallel
design with 24 subjects / treatment arm) and compared the medians,
interquartile ranges, and ranges. Furthermore, I compared the \(\small{t_\text{max}}\) values by the
Mann–Whitney *U* (aka
Wilcoxon-Mann-Whitney) test.^{36} Cave: 192
LOC.

```
library(truncnorm)
library(reshape)
library(survival)
library(coin)
library(lattice)
<- function(x) {
sum.np # Nonparametric summary:
# Remove Mean but keep eventual NAs, add IQR and Range
<- summary(x)
y if (length(y) == 6) {
<- y[c(1:3, 5:6)]
y else {
}<- y[c(1:3, 5:7)]
y
}<- c(names(y), "IQR", "Range")
names length(y) + 1] <- y[["3rd Qu."]] - y[["1st Qu."]]
y[length(y) + 1] <- y[["Max."]] - y[["Min."]]
y[<- setNames(as.vector(y), names)
y return(y)
}<- function(x, na.action = na.omit) {
HL # Hodges-Lehmann estimator of x
if (!is.vector(x) || !is.numeric(x))
stop("'x' must be a numeric vector!")
<- na.action(x)
x <- outer(x, x, '+')
y return(median(y[row(y) >= col(y)]) / 2)
}<- function(x, D = D, F = F.d, V = V.d,
Ct k01 = k01.d, k10 = k10.d, tlag = tlag.d) {
# one-compartment model with a lag time
<- F.d * D * k01 / (V.d * (k01 - k10.d)) *
C exp(-k10 *(x - tlag)) - exp(-k01 * (x - tlag)))
(< 0] <- 0
C[C return(C)
}set.seed(123456)
<- 20/60 # clinically relevant difference (20 minutes)
delta <- 0.25 # early interval (up to ~2× theoretical tmax)
spl <- cumsum(rep(spl, 10))
early <- c(0, early, 3.25, 4.25, 5.5, 7, 9.25, 12)
t <- 25 # number of studies
studies <- 48 # number of subjects / study (must be even)
n <- 100 # dose
D # Index ".d" denotes theoretical value, ".c" its CV
<- 0.8 # fraction absorbed
F.d <- 0.6 # lower limit
F.l <- 1 # upper limit
F.u <- 4 # volume of distribution
V.d <- 0.50 # CV 50%, lognormal
V.c <- 2 # absorption rate constant
k01.d <- 0.35 # CV 35%, lognormal
k01.c <- 0.25 # elimination rate constant
k10.d <- 0.40 # CV 40%, lognormal
k10.c <- 5/60 # lag time
tlag.d <- 0 # lower truncation 1
tlag.l <- 0.25 # upper truncation 1
tlag.u <- 0.5 # CV 50%, truncated normal
tlag.c <- 0.05 # analytical error CV 5%, lognormal
AErr <- 0.05 # fraction of theoretical Cmax
LLOQ.f # theoretical profile
<- seq(min(t), max(t), length.out=2500)
x <- Ct(x = x, D = D, F = F.d, V = V.d,
C.th k01 = k01.d, k10 = k10.d, tlag = tlag.d)
<- log(k01.d / k10.d) / (k01.d - k10.d) + tlag.d
tmax <- Ct(x = tmax, D = D, F = F.d, V = V.d,
Cmax k01 = k01.d, k10 = k10.d, tlag = tlag.d)
<- LLOQ.f * Cmax
LLOQ <- data.frame(study = rep(1:studies, each = n * length(t)),
data subject = rep(1:n, each = length(t)),
t = t, C = NA)
for (study in 1:studies) {
# individual PK parameters
<- runif(n = n, min = F.l, max = F.u)
F <- rlnorm(n = n, meanlog = log(V.d) - 0.5 * log(V.c^2 + 1),
V sdlog = sqrt(log(V.c^2 + 1)))
<- rlnorm(n = n, meanlog = log(k01.d) - 0.5 * log(k01.c^2 + 1),
k01 sdlog = sqrt(log(k01.c^2 + 1)))
<- rlnorm(n = n, meanlog = log(k10.d) - 0.5 * log(k10.c^2 + 1),
k10 sdlog = sqrt(log(k10.c^2 + 1)))
<- rtruncnorm(n = n, a = tlag.l, b = tlag.u,
tlag mean = tlag.d, sd = tlag.c)
for (subject in 1:n) {
# individual profiles
<- Ct(x = t, D = D, F = F[subject], V = V[subject],
C k01 = k01[subject], k10 = k10[subject],
tlag = tlag[subject])
for (k in 1:length(t)) {# analytical error (multiplicative)
if (k == 1) {
<- rnorm(n = 1, mean = 0, sd = abs(C[k] * AErr))
AErr1 else {
}<- c(AErr1, rnorm(n = 1, mean = 0, sd = abs(C[k] * AErr)))
AErr1
}
}<- C + AErr1 # add analytical error
C < LLOQ] <- NA # assign NAs to Cs below LLOQ
C[C $C[data$study == study & data$subject == subject] <- C
data
}
}# simple NCA
<- data.frame(study = rep(1:studies, each = n),
res subject = rep(1:n, studies),
tlag = NA, tmax = NA, Cmax = NA)
<- 0
i for (j in 1:studies) {
for (k in 1:n) {
<- i + 1
i <- data$t[data$study == j & data$subject == k]
tmp.t <- data$C[data$study == j & data$subject == k]
tmp.C $tlag[i] <- tmp.t[which(tmp.t == t[!is.na(tmp.C)][1])+1]
res$Cmax[i] <- max(tmp.C, na.rm = TRUE)
res$tmax[i] <- tmp.t[which(tmp.C == res$Cmax[i])]
res
}
}<- data.frame(study = 1:studies, med.T = NA, med.R = NA, ratio = NA,
comp IQR.T = NA, IQR.R = NA, rg.T = NA, rg.R = NA,
CL.lo = NA, CL.hi = NA, pass.ratio = FALSE,
pass.CI = FALSE, hl.T = NA, hl.R = NA)
for (i in 1:studies) {
<- res[res$study == i, ]
study <- sum.np(study$tmax[study$subject == 1:(n/2)])
sum.T <- sum.np(study$tmax[study$subject == (n/2+1):n])
sum.R $med.T[i] <- sum.T[["Median"]]
comp$med.R[i] <- sum.R[["Median"]]
comp$ratio[i] <- signif(comp$med.T[i] / comp$med.R[i], 4)
comp$IQR.T[i] <- sum.T[["IQR"]]
comp$IQR.R[i] <- sum.R[["IQR"]]
comp$rg.T[i] <- sum.T[["Range"]]
comp$rg.R[i] <- sum.R[["Range"]]
compif (comp$ratio[i] >= 0.8 & comp$ratio[i] <= 1.2) comp$pass.ratio[i] <- TRUE
<- data.frame(tmax = study$tmax,
stud.tmax part = factor(rep(c("R", "T"), c(n/2, n/2))))
<- wilcox_test(tmax ~ part, data = stud.tmax,
wt distribution = "exact", conf.int = TRUE,
conf.level = 0.90)
$CL.lo[i] <- confint(wt)$conf.int[1]
comp$CL.hi[i] <- confint(wt)$conf.int[2]
compif (abs(comp$CL.lo[i]) <= delta & abs(comp$CL.lo[i]) <= delta)
$pass.CI[i] <- TRUE
comp$hl.T[i] <- HL(study$tmax[study$subject == 1:(n/2)])
comp$hl.R[i] <- HL(study$tmax[study$subject == (n/2+1):n])
comp
}$CL.lo <- sprintf("%+.2f", comp$CL.lo)
comp$CL.hi <- sprintf("%+.2f", comp$CL.hi)
compfor (i in 1:studies) {# cosmetics
if (comp$CL.lo[i] == "+0.00") comp$CL.lo[i] <- "\u00B10.00"
if (comp$CL.hi[i] == "+0.00") comp$CL.hi[i] <- "\u00B10.00"
}<- sprintf("%+.0f min", 60 * range(comp$med.T - comp$med.R, na.rm = TRUE))
diff.med <- sprintf("%.4f", range(comp$ratio))
diff.rat <- sprintf("%+.0f min", 60 * range(comp$IQR.T - comp$IQR.R, na.rm = TRUE))
diff.iqr <- sprintf("%+.0f min", 60 * range(comp$rg.T - comp$rg.R, na.rm = TRUE))
diff.rge <- paste("\nDifferences of medians:", paste(diff.med, collapse = " to "),
txt "\nMedian ratios : ", paste(diff.rat, collapse = " to "),
"\nDifferences of IQRs :", paste(diff.iqr, collapse = " to "),
"\nDifferences of ranges :", paste(diff.rge, collapse = " to "),
"\nPassed CI inclusion :", sprintf("%6.2f%%",
100 * sum(comp$pass.CI) /
studies),"\nPassed \u00B120% difference:", sprintf("%6.2f%%",
100 * sum(comp$pass.ratio) /
studies))print(comp[, 1:10], row.names = FALSE)
cat(txt)
<- as.data.frame(
prep melt(res, id = c("study", "subject"), measure = "tmax"))
for (i in 1:nrow(prep)) {
if (prep$subject[i] %in% 1:(n/2)) {
$group[i] <- "Test"
prepelse {
}$group[i] <- "Reference"
prep
}
}$study <- as.factor(prep$study)
prep$subject <- as.factor(prep$subject)
prep$group <- as.factor(prep$group)
prep# box plots
dev.new(width = 6, height = 6)
<- par(no.readonly = TRUE)
op trellis.par.set(box.umbrella = list(lty = 1))
trellis.par.set(box.dot = list(cex = 0.82, col = "#0000BB"))
bwplot(value ~ study | variable*group, data = prep, group = group,
xlab = "study", ylab = "hours", ylim = 1.04 * c(0, max(res$tmax)),
panel=function(...) {
panel.bwplot(...)
panel.points(x = c(comp$hl.T, comp$hl.R)[1:n],
pch = 21, cex = 0.6,
col = "red", fill = "#BB000080")
}
)par(op)
```

```
# study med.T med.R ratio IQR.T IQR.R rg.T rg.R CL.lo CL.hi
# 1 1.500 1.500 1.0000 0.5000 0.5625 1.50 1.50 -0.25 +0.25
# 2 1.500 1.375 1.0910 0.3125 0.3125 1.00 1.50 -0.25 +0.25
# 3 1.250 1.500 0.8333 0.2500 0.5000 1.50 1.25 -0.25 ±0.00
# 4 1.500 1.500 1.0000 0.3125 0.5000 1.25 1.75 -0.25 +0.25
# 5 1.500 1.250 1.2000 0.5000 0.7500 1.75 1.50 ±0.00 +0.50
# 6 1.375 1.500 0.9167 0.5625 0.5000 1.50 1.50 -0.25 ±0.00
# 7 1.625 1.500 1.0830 1.0000 0.6250 2.50 1.50 -0.25 +0.25
# 8 1.500 1.250 1.2000 0.5000 0.5625 1.25 1.50 -0.25 +0.25
# 9 1.250 1.375 0.9091 0.5625 0.7500 2.00 1.50 -0.25 ±0.00
# 10 1.500 1.500 1.0000 0.5000 0.5000 1.25 1.25 ±0.00 +0.25
# 11 1.250 1.500 0.8333 0.5000 0.5625 1.75 1.25 -0.50 ±0.00
# 12 1.500 1.500 1.0000 0.5625 0.7500 1.50 1.25 ±0.00 +0.50
# 13 1.250 1.750 0.7143 0.3125 0.7500 1.50 1.50 -0.50 ±0.00
# 14 1.250 1.750 0.7143 0.5000 0.8125 1.50 1.25 -0.50 ±0.00
# 15 1.500 1.250 1.2000 0.7500 0.2500 1.50 1.25 ±0.00 +0.50
# 16 1.500 1.500 1.0000 0.5000 0.5625 2.25 1.50 -0.25 +0.25
# 17 1.500 1.375 1.0910 0.5000 0.5000 1.75 1.50 -0.25 +0.25
# 18 1.250 1.500 0.8333 0.5625 0.5000 1.75 1.50 -0.25 ±0.00
# 19 1.750 1.500 1.1670 0.7500 0.5625 1.00 1.50 ±0.00 +0.25
# 20 1.500 1.500 1.0000 0.2500 0.5000 1.75 1.75 -0.25 ±0.00
# 21 1.250 1.250 1.0000 0.5000 0.2500 2.50 1.50 ±0.00 +0.25
# 22 1.500 1.375 1.0910 0.7500 0.5000 1.75 1.50 -0.25 +0.25
# 23 1.375 1.500 0.9167 0.7500 0.5625 1.25 1.75 -0.25 ±0.00
# 24 1.250 1.375 0.9091 0.2500 0.3125 1.50 1.25 -0.25 +0.25
# 25 1.500 1.500 1.0000 0.5625 0.5000 2.50 1.50 -0.25 +0.25
#
# Differences of medians: -30 min to +15 min
# Median ratios : 0.7143 to 1.2000
# Differences of IQRs : -26 min to +30 min
# Differences of ranges : -30 min to +60 min
# Passed CI inclusion : 88.00%
# Passed ±20% difference: 92.00%
```

Note that in three cases \(\small{t_\text{max}}\) was observed at 3.25 hours because the tight sampling ended at 2.5 hours. Try to modify the script with

`early <- cumsum(rep(spl, 12))`

More studies woul pass because we get more accurate estimates.

In a real study – if \(\small{t_\text{max}}\) is a clinically relevant PK metric – one would have to pre-specify the acceptance limits and assess whether the ≈90% CI lies entirely within. For an analgesic for rapid relief of pain it might be as short as ±20 minutes. If this would have been the case in the example above, four studies would fail because the CI is not entirely with the acceptance range. Here the ±20 % criterion is more permissive; only two studies would fail.

However, if we follow the current guidelines we are left in the dark.

- Is a difference in medians of –30 minutes or +15 minutes
‘apparent’ ?

No idea. - Even worse the interquartile ranges. Here we have values from –26 to
+30 minutes.

Are they ‘apparently’ different? - Let’s forget the range, please.

What might –30 minutes to +1 hour tell us?

Let’s evaluate the simulated studies by the nonparametric approach
proposed by Basson *et al.*^{31}
(fitting a generalized
linear model with a Poisson error
distribution).

```
# requires the result of the simulation script
<- 0.05
alpha <- prep[, c(1, 4:5)]
df names(df)[2:3] <- c("tmax", "treatment")
<- data.frame(study = 1:studies, Estim.T = NA, Estim.R = NA,
comp1 delta = NA, p.value = NA, pass = FALSE)
# convert tmax to counts
$count <- as.integer(df$tmax / spl)
dffor (study in 1:studies) {
<- df[df$study == study, ]
tmp <- glm(count ~ treatment, data = tmp, family = "poisson")
m <- summary(m)
smry # estimated counts in log-scale
<- smry$coeff[["(Intercept)", "Estimate"]] +
T $coeff[["treatmentTest", "Estimate"]]
smry<- smry$coeff[["(Intercept)", "Estimate"]] -
R $coeff[["treatmentTest", "Estimate"]]
smry# back-convert estimated counts to tmax
$Estim.T[study] <- exp(T) * spl
comp1$Estim.R[study] <- exp(R) * spl
comp1$delta[study] <- comp1$Estim.T[study] - comp1$Estim.R[study]
comp1$p.value[study] <- smry$coeff[["treatmentTest", "Pr(>|z|)"]]
comp1if (comp1$p.value[study] >= alpha) comp1$pass[study] <- TRUE
}c(1:2, 5)] <- round(comp1[, c(1:2, 5)], 5)
comp1[, 4] <- sprintf("%+.5f", comp1[, 4])
comp1[, for (i in 1:studies) {# cosmetics
if (comp1[i, 4] == "-0.00000" | comp1[i, 4] == "+0.00000")
4] <- "\u00B10.00000"
comp1[i,
}print(comp1, row.names = FALSE)
```

```
# study Estim.T Estim.R delta p.value pass
# 1 1.52083 1.562785 -0.04195 0.90714 TRUE
# 2 1.41667 1.522748 -0.10608 0.76387 TRUE
# 3 1.36458 1.694975 -0.33039 0.36768 TRUE
# 4 1.47917 1.479167 ±0.00000 1.00000 TRUE
# 5 1.59375 1.222495 +0.37126 0.26241 TRUE
# 6 1.41667 1.655101 -0.23843 0.51329 TRUE
# 7 1.62500 1.442909 +0.18209 0.60518 TRUE
# 8 1.41667 1.395910 +0.02076 0.95156 TRUE
# 9 1.37500 1.546717 -0.17172 0.62768 TRUE
# 10 1.51042 1.388003 +0.12241 0.72183 TRUE
# 11 1.34375 1.890262 -0.54651 0.15345 TRUE
# 12 1.59375 1.204316 +0.38943 0.23734 TRUE
# 13 1.33333 2.083333 -0.75000 0.05988 TRUE
# 14 1.35417 1.661538 -0.30737 0.39789 TRUE
# 15 1.60417 1.251082 +0.35308 0.29082 TRUE
# 16 1.56250 1.646944 -0.08444 0.81855 TRUE
# 17 1.48958 1.327579 +0.16200 0.63141 TRUE
# 18 1.38542 1.491541 -0.10612 0.76135 TRUE
# 19 1.58333 1.343202 +0.24013 0.48265 TRUE
# 20 1.44792 1.641487 -0.19357 0.59530 TRUE
# 21 1.44792 1.286046 +0.16187 0.62640 TRUE
# 22 1.53125 1.388889 +0.14236 0.67949 TRUE
# 23 1.43750 1.653382 -0.21588 0.55439 TRUE
# 24 1.43750 1.375679 +0.06182 0.85592 TRUE
# 25 1.60417 1.521916 +0.08225 0.81855 TRUE
```

In none of the studies a significant difference was detected at the
0.05 level and hence, all would pass. Study 13 showed the largest
difference of \(\small{t_\text{max}}\)
values and with 0.05988 the lowest *p*-value. As we have seen above, it failed the ±20 % criterion as well as
the CI inclusion approach.

Interlude: Exploring the ±20 % criterion

Let’s dive deeper into the murky waters.^{11} ^{12} ^{13}

One compartment model,
PK-parameters: \(\small{D=100}\), fraction absorbed \(\small{f=0.80}\), volume of distribution
\(\small{V=4}\), elimination half life
4 hours. Three formulations: A (‘fast’ test) and B (‘slow’ test)
compared to R (reference). The absorption rate constants are optimized
in such a way that \(\small{t_\text{max(A)}=0.8\,\text{h}}\),
\(\small{t_\text{max(B)}=1.2\,\text{h}}\),
and \(\small{t_\text{max(R)}=1\,\text{h}}\). The
sampling schedule was every 5 (five!) minutes until two hours
(*i.e.*, \(\small{2\times
t_\text{max(R)}}\)), 2.25, 2.5, 3, 3.5, 4, 6, 9, 12, and 16 hours
(34 time points). Error distributions were uniform
for \(\small{f}\) (0.6–1), log-normal
for \(\small{V}\) (*CV* 50 %),
\(\small{k_{\,01}}\)
(*CV* 35 %), and \(\small{k_{\,10}}\) (*CV* 40 %).
Distribution of the analytical error was normal with
a *CV* of 5 % of the simulated concentration. The
LLOQ was set to 5 %
of \(\small{C_\text{max(R)}}\).

Simulated parallel designs with 24 subjects in each of the three
treatment arms. Although the \(\small{t_\text{max}\textsf{-}}\)values of
treatments A (fast) and B (slow) are at the borders of the ±20 %
criterion, this is __not__ a statistical test and hence, it is
impossible to explore the empiric Type I Error. At least the number of
passing studies divided by the number of simulations should give us an
impression what might happen. Naïvely we expect that 50 % would pass,
right?

There should be an equal chance that \(\small{\widetilde{t}_\text{max(A)}<0.8\times\widetilde{t}_\text{max(R)}}\)
and \(\small{\widetilde{t}_\text{max(A)}\geq0.8\times\widetilde{t}_\text{max(R)}}\)
as well as \(\small{\widetilde{t}_\text{max(B)}\leq1.2\times\widetilde{t}_\text{max(R)}}\)
and \(\small{\widetilde{t}_\text{max(B)}>1.2\times\widetilde{t}_\text{max(R)}}\).

Contrary to this *hocuspocus* we can perform the Mann–Whitney
*U* (aka
Wilcoxon-Mann-Whitney) test at level \(\small{\alpha}\) and assess the studies
based on the \(\small{100\,(1-2\,\alpha)}\) confidence
inclusion approach \[\begin{array}{c}
\theta_1=-\Delta\:\small{\textsf{and}}\:\theta_2=+\Delta\\
H_0:\mu_\textrm{T}-\mu_\textrm{R}\not\subset\left\{\theta_1,\,\theta_2\right\}\;vs\;H_1:\theta_1<\mu_\textrm{T}-\mu_\textrm{R}<\theta_2
\end{array}\tag{2}\] based on a *pre-specified*
clinically relevant difference \(\small{\Delta}\) (in the example \(\small{\Delta=0.2\,\text{h}}\)).

Since \(\small{\mu_\textrm{T(A)}=\theta_1}\) and
\(\small{\mu_\textrm{T(B)}=\theta_2}\),
we expect for \(\small{\alpha=0.05}\)
that 5 % of both treatments will fail.

```
# Cave: long runtime (1,000 simulations take 30 minutes on my aged machine)
library(moments)
library(survival)
library(coin)
<- function(x, k10.d, tmax) {
opt # absorption rate constant for given elimination at tmax
log(x / k10.d) / (x - k10.d) - tmax
}<- function(D = D, F = F.d, V = V.d, k01 = k01.d, k10 = k10.d, t) {
Ct # one-compartment model, no lag time
if (!isTRUE(all.equal(k01, k10))) {# common: k01 != k10
<- F * D * k01 / (V * (k01 - k10)) * (exp(-k10 * t) - exp(-k01 * t))
C else { # flip-flop PK
}<- k10
k <- F * D / V * k * t * exp(-k * t)
C
}return(C)
}<- FALSE # in your own simulations set to TRUE
progr <- 0.05 # level of the test
alpha <- setNames(c(1, 0.8, 1.2), c("R", "A", "B"))
tmax <- 2500 # number of simulated studies
studies <- 24 # number of subjects in each arm of a study
n <- c(-20, +20) # acceptable differences (%)
delta # limits for the median of test / reference
<- signif((100 + delta) / 100, 9)
limit # limits for the confidence interval inclusion approach
# note: rounding to 9 significant digits because issues with
# numeric precision in the comparison
<- signif(tmax[["A"]] - tmax[["R"]], 9)
theta1 <- signif(tmax[["B"]] - tmax[["R"]], 9)
theta2 <- 100 # dose
D # Index ".d" denotes theoretical value, ".c" its CV
<- 0.8 # fraction absorbed
F.d <- 0.6 # lower limit
F.l <- 1 # upper limit
F.u <- 4 # volume of distribution
V.d <- 0.50 # CV 50% (lognormal)
V.c <- log(2) / 4 # elimination rate constant (equal for T and R)
k10.d <- 0.40 # CV 40% (lognormal)
k10.c # absorption rate constants
<- uniroot(opt, interval = c(0, 24), tol = 1e-8,
k01.dR k10.d = k10.d, tmax = tmax[["R"]])$root # R
<- uniroot(opt, interval = c(0, 24), tol = 1e-8,
k01.dA k10.d = k10.d, tmax = tmax[["A"]])$root # A (fast)
<- uniroot(opt, interval = c(0, 24), tol = 1e-8,
k01.dB k10.d = k10.d, tmax = tmax[["B"]])$root # B (slow)
<- 0.35 # CV 35% (lognormal)
k01.c <- 0.05 # analytical error CV 5% (normal)
AErr <- 0.05 # fraction of theoretical Cmax
LLOQ.f <- Ct(D, F.d, V.d, k01.dR, k10.d, tmax[["R"]])
Cmax.R <- LLOQ.f * Cmax.R
LLOQ <- sort(unique(c(seq(0, 2 * tmax[["R"]], 5 / 60),
t 2.25, 2.5, 3, 3.5, 4, 6, 9, 12, 16)))
<- A <- B <- data.frame(study = rep(1:studies, each = n * length(t)),
R subject = rep(1:n, each = length(t)), t = t, C = NA_real_)
<- res.A <- res.B <- data.frame(study = rep(1:studies, each = n),
res.R subject = rep(1:n, studies),
tmax = NA_real_)
<- data.frame(study = 1:studies, median.R = NA_real_, median.A = NA_real_,
comp median.B = NA_real_, ratio.AR = NA_real_, ratio.BR = NA_real_,
pass.A = FALSE, pass.B = FALSE, pass.CL.A = FALSE,
pass.CL.B = FALSE)
set.seed(123456) # for reproducibility
if (progr) pb <- txtProgressBar(0, 1, 0, width = NA, style = 3)
for (study in 1:studies) {# subject’s PK parameters & profiles
for (subject in 1:n) {
<- runif(n = 1, min = F.l, max = F.u)
F <- rlnorm(n = 1, meanlog = log(V.d) - 0.5 * log(V.c^2 + 1),
V sdlog = sqrt(log(V.c^2 + 1)))
<- rlnorm(n = 1, meanlog = log(k01.dR) - 0.5 * log(k01.c^2 + 1),
k01.Rs sdlog = sqrt(log(k01.c^2 + 1)))
<- rlnorm(n = 1, meanlog = log(k01.dA) - 0.5 * log(k01.c^2 + 1),
k01.As sdlog = sqrt(log(k01.c^2 + 1)))
<- rlnorm(n = 1, meanlog = log(k01.dB) - 0.5 * log(k01.c^2 + 1),
k01.Bs sdlog = sqrt(log(k01.c^2 + 1)))
<- rlnorm(n = 1, meanlog = log(k10.d) - 0.5 * log(k10.c^2 + 1),
k10s sdlog = sqrt(log(k10.c^2 + 1)))
<- Ct(D, F = F, V = V, k01 = k01.Rs, k10 = k10s, t)
C.R <- Ct(D, F = F, V = V, k01 = k01.As, k10 = k10s, t)
C.A <- Ct(D, F = F, V = V, k01 = k01.Bs, k10 = k10s, t)
C.B # add analytical error (normal)
<- C.R + rnorm(n = length(t), mean = 0, sd = abs(C.R * AErr))
C.R <- C.A + rnorm(n = length(t), mean = 0, sd = abs(C.A * AErr))
C.A <- C.B + rnorm(n = length(t), mean = 0, sd = abs(C.B * AErr))
C.B # assign NAs to Cs below LLOQ
< LLOQ] <- NA
C.R[C.R < LLOQ] <- NA
C.A[C.A < LLOQ] <- NA
C.B[C.B # set NAs before tmax to zero
<- t[which(C.R == max(C.R, na.rm = TRUE))]
tmax.R <- t[which(C.A == max(C.A, na.rm = TRUE))]
tmax.A <- t[which(C.B == max(C.B, na.rm = TRUE))]
tmax.B which(t[which(is.na(C.R))] < tmax.R)] <- 0
C.R[which(t[which(is.na(C.A))] < tmax.A)] <- 0
C.A[which(t[which(is.na(C.B))] < tmax.B)] <- 0
C.B[$C[R$study == study & R$subject == subject] <- C.R
R$C[A$study == study & A$subject == subject] <- C.A
A$C[B$study == study & B$subject == subject] <- C.B
B$tmax[res.R$study == study & res.R$subject == subject] <- tmax.R
res.R$tmax[res.A$study == study & res.A$subject == subject] <- tmax.A
res.A$tmax[res.B$study == study & res.B$subject == subject] <- tmax.B
res.B
}$median.R[study] <- median(res.R$tmax[res.R$study == study], na.rm = TRUE)
comp$median.A[study] <- median(res.A$tmax[res.A$study == study], na.rm = TRUE)
comp$median.B[study] <- median(res.B$tmax[res.B$study == study], na.rm = TRUE)
comp$ratio.AR[study] <- comp$median.A[study] / comp$median.R[study]
comp$ratio.BR[study] <- comp$median.B[study] / comp$median.R[study]
compif (comp$ratio.AR[study] >= limit[1] & comp$ratio.AR[study] <= limit[2])
$pass.A[study] <- TRUE
compif (comp$ratio.BR[study] >= limit[1] & comp$ratio.BR[study] <= limit[2])
$pass.B[study] <- TRUE
comp# 90% confidence intervals of the Wilcoxon-Mann-Whitney test
<- res.R[res.R$study == study, "tmax"]
tmax.R.res <- data.frame(tmax = c(res.A[res.A$study == study, "tmax"],
tmax.AR
tmax.R.res),treatment = factor(rep(c("R", "T"), c(n, n))))
<- signif(as.numeric(unlist(
CI.AR confint(wilcox_test(tmax ~ rev(treatment),
data = tmax.AR,
distribution = "exact",
conf.int = TRUE,
conf.level = 1 - 2 * alpha))[1])), 9)
<- data.frame(tmax = c(res.B[res.B$study == study, "tmax"],
tmax.BR
tmax.R.res),treatment = factor(rep(c("T", "R"), c(n, n))))
<- signif(as.numeric(unlist(
CI.BR confint(wilcox_test(tmax ~ rev(treatment),
data = tmax.BR,
distribution = "exact",
conf.int = TRUE,
conf.level = 1 - 2 * alpha))[1])), 9)
if (CI.AR[1] >= theta1 & CI.AR[2] <= theta2) comp$pass.CL.A[study] <- TRUE
if (CI.BR[1] >= theta1 & CI.BR[2] <= theta2) comp$pass.CL.B[study] <- TRUE
if (progr) setTxtProgressBar(pb, study / studies)
# patience please...
} if (progr) close(pb)
cat("Simulation settings",
paste0("\n ", formatC(studies, format = "d", big.mark = ","),
" studies (", n, " subjects / arm)"),
"\n tmax (A = fast) :", sprintf("%.4f h", tmax[["A"]]),
sprintf("(t½,a: %.2f min)", 60 * log(2) / k01.dA),
"\n tmax (B = slow) :", sprintf("%.4f h", tmax[["B"]]),
sprintf("(t½,a: %.2f min)", 60 * log(2) / k01.dB),
"\n tmax (Reference):", sprintf("%.4f h", tmax[["R"]]),
sprintf("(t½,a: %.2f min)", 60 * log(2) / k01.dR),
"\n\nSimulation results",
"\n A = fast",
paste0("\n Range : ",
paste(sprintf("%.4f",
range(res.A$tmax, na.rm = TRUE)), collapse = " h, "), " h"),
paste0("\n Skewness: ", sprintf("%+.4f", skewness(res.A$tmax))),
"\n B = slow",
paste0("\n Range : ",
paste(sprintf("%.4f",
range(res.B$tmax, na.rm = TRUE)), collapse = " h, "), " h"),
paste0("\n Skewness: ", sprintf("%+.4f", skewness(res.B$tmax))),
"\n Reference",
paste0("\n Range : ",
paste(sprintf("%.4f",
range(res.R$tmax, na.rm = TRUE)), collapse = " h, "), " h"),
paste0("\n Skewness: ", sprintf("%+.4f", skewness(res.R$tmax))),
"\n\n A = fast passed ±20% criterion :",
sprintf("%.1f%%", 100 * sum(comp$pass.A) / studies),
"\n B = slow passed ±20% criterion :",
sprintf("%.1f%%", 100 * sum(comp$pass.B) / studies),
"\n A = fast passed nonparametric CI:",
sprintf("%7.4f (empiric Type I Error)", sum(comp$pass.CL.A) / studies),
"\n B = slow passed nonparametric CI:",
sprintf("%7.4f (empiric Type I Error)", sum(comp$pass.CL.B) / studies),
"\n\n Significance limit of the binomial test =",
signif(binom.test(as.integer(alpha * studies), studies,
alternative = "less",
conf.level = 1 - alpha)$conf.int[2], 5), "\n")
```

```
# Simulation settings
# 2,500 studies (24 subjects / arm)
# tmax (A = fast) : 0.8000 h (t½,a: 10.05 min)
# tmax (B = slow) : 1.2000 h (t½,a: 17.74 min)
# tmax (Reference): 1.0000 h (t½,a: 13.69 min)
#
# Simulation results
# A = fast
# Range : 0.1667 h, 3.5000 h
# Skewness: +0.7777
# B = slow
# Range : 0.3333 h, 6.0000 h
# Skewness: +0.7499
# Reference
# Range : 0.2500 h, 4.0000 h
# Skewness: +0.6738
#
# A = fast passed ±20% criterion : 57.9%
# B = slow passed ±20% criterion : 55.0%
# A = fast passed nonparametric CI: 0.0532 (empiric Type I Error)
# B = slow passed nonparametric CI: 0.0368 (empiric Type I Error)
#
# Significance limit of the binomial test = 0.057771
```

Interesting, isn’t it? It confirms that the distributions of \(\small{\widetilde{t}_\text{max}}\) are
skewed to the right.^{32} ^{33} With the ±20 % criterion both formulations
showed a passing-rate significantly larger than the expected 50 % (limit
51.7 %).

The empiric Type I Error of the nonparametric test is not significantly
above the nominal \(\small{\alpha=0.05}\).

If we follow the logic of the guidances,^{11} ^{12} ^{13} the limits based on \(\small{t_\text{max(R)}=1\,\text{h}}\) are
\(\small{\{\theta_1,\theta_2\}=\{0.8\,\text{h},1.2\,\text{h}\}}\).
Such limits would translate into a clincially relevant \(\small{\Delta}\) of only 12 minutes. I
doubt whether that makes sense. What about a painkiller with \(\small{t_\text{max(R)}=30\,\text{min}}\)?
Is a \(\small{\Delta}\) of six minutes
clincially relevant? Which sampling interval would suffice to ‘catch’
\(\small{t_\text{max}}\) – two
minutes?

Do we have a method to estimate the sample size of such a study? Nope. One could only perform simulations, which would require:

- Sufficient information about the PK of the drug and the formulations (absorption rate constant, eventual lag-time) allowing to set up a suitable model.
- Not only the PK parameters themselves but also their variability would be required. A published population PK would come handy.
- An assumed difference in
*t*_{max}. - Exploring different sampling schedules.

Good luck!

In a replicate design in 18 subjects (two administrations of the
German 400 mg IR ibuprofen
reference product three days apart) after the 1^{st}
administration the range of \(\small{t_\text{max}}\) was 0.25 – 4 hours
(*CV* 94.3 %) and after the 2^{nd} 0.5 – 2 hours
(*CV* 62.3 %).^{37} It’s interesting that despite such a
large variability \(\small{\widetilde{t}_\text{max}}\) differed
only by two minutes. In my studies I observed a range of \(\small{t_\text{max}}\) of 0.5 – 4.5 hours
(*CV* 50 %).

Say, we have sufficent information about the
PK as mentioned above, *e.g.*, \(\small{t_{1/2}=1.93\,\text{h}}\), \(\small{t_\text{max(R)}=45\,\text{min}}\).
Let’s assume that the test is ten minutes faster than the reference.
Sampling every five minutes up to 1.5 hours (*i.e.*, \(\small{2\times t_\text{max(R)}}\)), 1.75,
2, 2.25, 2.5, 3, 3.5, 4, 4.5, 5, 6, 8, and 12 hours. Parallel design and
we target power ≥ 80 %. In the
CI inclusion approach we
specify \(\small{\{\theta_1,\theta_2\}=\{\mp20\,\text{min}\}}\).

For \(\small{C_\text{max}}\) a
*CV*_{intra} of 22.7 % and a *CV*_{inter}
of 27.4 % was reported.^{37} With a total
(pooled) *CV* of 25.4 % in a parallel design assuming a T/R-ratio
of 0.95 we would need 56 subjects (80.4 % power) and for a T/R-ratio of
0.90 we would need 114 (80.1 % power) to demonstrate
BE for this metric.

```
# Due to the large sample size wilcox_test() is extremely slow
library(moments)
library(survival)
library(coin)
<- function(x, k10.d, tmax) {
opt # absorption rate constant for given elimination at tmax
log(x / k10.d) / (x - k10.d) - tmax
}<- function(D = D, F = F.d, V = V.d, k01 = k01.d, k10 = k10.d, t) {
Ct # one-compartment model, no lag time
if (!isTRUE(all.equal(k01, k10))) {# common: k01 != k10
if (k10 > k01) {# absorption faster
<- F * D * k01 / (V * (k01 - k10)) *
C exp(-k10 * t) - exp(-k01 * t))
(else { # elimination faster
}<- F * D * k10 / (V * (k10 - k01)) *
C exp(-k01 * t) - exp(-k10 * t))
(
}else { # flip-flop PK
}<- k10
k <- F * D / V * k * t * exp(-k * t)
C
}return(C)
}<- FALSE # in your own simulations set to TRUE
progr <- 0.05 # level of the test
alpha # ibuprofen
<- setNames(c(0.75, 0.75 - 10 / 60), c("R", "T"))
tmax <- 2500 # number of simulated studies
studies <- 28 # number of subjects in each arm of a study
n <- c(-20, +20) # acceptable differences (%)
delta # limits for median (T) - median (R)
<- signif((100 + delta) / 100, 9)
limit # limits for the CI inclusion approach (±20 minutes in hours)
<- signif(-20 / 60, 9)
theta1 <- signif(+20 / 60, 9)
theta2 <- 400e3 # dose (µg)
D # Index ".d" denotes theoretical value, ".c" its CV
<- 0.9 # fraction absorbed
F.d <- 0.8 # lower limit
F.l <- 1 # upper limit
F.u <- 500 # volume of distribution (mL)
V.d <- 0.25 # CV 25% (lognormal)
V.c <- log(2) / 1.93 # elimination rate constant (geom. mean of the paper)
k10.d <- 0.30 # CV 30% (lognormal)
k10.c # absorption rate constant
<- uniroot(opt, interval = c(0, 24), tol = 1e-8,
k01.dR k10.d = k10.d, tmax = tmax[["R"]])$root
<- uniroot(opt, interval = c(0, 24), tol = 1e-8,
k01.dT k10.d = k10.d, tmax = tmax[["T"]])$root
<- 0.35 # CV 35% (lognormal)
k01.c <- 0.05 # analytical error CV 5% (normal)
AErr <- 0.05 # fraction of theoretical Cmax
LLOQ.f <- Ct(D, F.d, V.d, k01.dR, k10.d, tmax[["R"]]) # µg/mL
Cmax.R <- LLOQ.f * Cmax.R
LLOQ <- sort(unique(c(seq(0, 2 * tmax[["R"]], 5 / 60),
t 2.25, 2.5, 3, 3.5, 4, 4.5, 5, 6, 8, 12)))
<- T <- data.frame(study = rep(1:studies, each = n * length(t)),
R subject = rep(1:n, each = length(t)), t = t,
C = NA_real_)
<- res.T <- data.frame(study = rep(1:studies, each = n),
res.R subject = rep(1:n, studies), tmax = NA_real_)
<- data.frame(study = 1:studies, median.R = NA_real_, median.T = NA_real_,
comp ratio = NA_real_, pass = FALSE, pass.CL = FALSE)
set.seed(123456) # for reproducibility
if (progr) pb <- txtProgressBar(0, 1, 0, width = NA, style = 3)
for (study in 1:studies) {# subject’s PK parameters & profiles
<- runif(n = n, min = F.l, max = F.u)
F <- rlnorm(n = n, meanlog = log(V.d) - 0.5 * log(V.c^2 + 1),
V sdlog = sqrt(log(V.c^2 + 1)))
<- rlnorm(n = n, meanlog = log(k01.dR) - 0.5 * log(k01.c^2 + 1),
k01.Rs sdlog = sqrt(log(k01.c^2 + 1)))
<- rlnorm(n = n, meanlog = log(k01.dT) - 0.5 * log(k01.c^2 + 1),
k01.Ts sdlog = sqrt(log(k01.c^2 + 1)))
<- rlnorm(n = n, meanlog = log(k10.d) - 0.5 * log(k10.c^2 + 1),
k10s sdlog = sqrt(log(k10.c^2 + 1)))
for (subject in 1:n) {
<- Ct(D, F = F[subject], V = V[subject],
C.R k01 = k01.Rs[subject], k10 = k10s[subject], t)
<- Ct(D, F = F[subject], V = V[subject],
C.T k01 = k01.Ts[subject], k10 = k10s[subject], t)
# add analytical error (normal)
<- C.R + rnorm(n = length(t), mean = 0, sd = abs(C.R * AErr))
C.R <- C.T + rnorm(n = length(t), mean = 0, sd = abs(C.T * AErr))
C.T # assign NAs to Cs below LLOQ
< LLOQ] <- NA
C.R[C.R < LLOQ] <- NA
C.T[C.T # set NAs before tmax to zero
<- t[which(C.R == max(C.R, na.rm = TRUE))]
tmax.R <- t[which(C.T == max(C.T, na.rm = TRUE))]
tmax.T which(t[which(is.na(C.R))] < tmax.R)] <- 0
C.R[which(t[which(is.na(C.T))] < tmax.T)] <- 0
C.T[$C[R$study == study & R$subject == subject] <- C.R
R$C[T$study == study & T$subject == subject] <- C.T
T$tmax[res.R$study == study & res.R$subject == subject] <- tmax.R
res.R$tmax[res.T$study == study & res.T$subject == subject] <- tmax.T
res.T
}$median.R[study] <- median(res.R$tmax[res.R$study == study], na.rm = TRUE)
comp$median.T[study] <- median(res.T$tmax[res.T$study == study], na.rm = TRUE)
comp$ratio[study] <- comp$median.T[study] / comp$median.R[study]
compif (comp$ratio[study] >= limit[1] & comp$ratio[study] <= limit[2])
$pass[study] <- TRUE
comp# 90% confidence interval of the Wilcoxon-Mann-Whitney test
<- data.frame(tmax = c(res.T$tmax[res.T$study == study],
tmax.TR $tmax[res.R$study == study]),
res.Rtreatment = factor(rep(c("T", "R"),
c(n, n))))
<- signif(as.numeric(unlist(
CI.TR confint(wilcox_test(tmax ~ rev(treatment),
data = tmax.TR,
distribution = "exact",
conf.int = TRUE,
conf.level = 1 - 2 * alpha))[1])), 9)
if (CI.TR[1] >= theta1 & CI.TR[2] <= theta2) comp$pass.CL[study] <- TRUE
if (progr) setTxtProgressBar(pb, study / studies)
# patience please...
} if (progr) close(pb)
cat("Simulation settings",
paste0("\n ", formatC(studies, format = "d", big.mark = ","),
" studies (", n, " subjects / arm)"),
"\n tmax (Test) :", sprintf("%.4f h", tmax[["T"]]),
sprintf("(t½,a: %.2f min)", 60 * log(2) / k01.dT),
"\n tmax (Reference):", sprintf("%.4f h", tmax[["R"]]),
sprintf("(t½,a: %.2f min)", 60 * log(2) / k01.dR),
"\n\nSimulation results",
"\n Test",
paste0("\n Range : ",
paste(sprintf("%.4f",
range(res.T$tmax, na.rm = TRUE)), collapse = " h, "), " h"),
paste0("\n Skewness: ", sprintf("%+.4f", skewness(res.T$tmax))),
"\n Reference",
paste0("\n Range : ",
paste(sprintf("%.4f",
range(res.R$tmax, na.rm = TRUE)), collapse = " h, "), " h"),
paste0("\n Skewness: ", sprintf("%+.4f", skewness(res.R$tmax))),
"\n\n Passed ±20% criterion :",
sprintf("%5.1f%%", 100 * sum(comp$pass) / studies),
"\n Passed nonparametric CI:",
sprintf("%5.1f%% (empiric power)", 100 * sum(comp$pass.CL) / studies), "\n")
```

```
# Simulation settings
# 2,500 studies (28 subjects / arm)
# tmax (Test) : 0.5833 h (t½,a: 8.65 min)
# tmax (Reference): 0.7500 h (t½,a: 12.50 min)
#
# Simulation results
# Test
# Range : 0.1667 h, 2.2500 h
# Skewness: +0.7020
# Reference
# Range : 0.1667 h, 2.5000 h
# Skewness: +0.4922
#
# Passed ±20% criterion : 52.0%
# Passed nonparametric CI: 94.1% (empiric power)
```

»Seemingly the ±20 % criterion is pretty rigid. The ten minutes faster test product (\(\small{t_\text{max}=35\,\text{min}}\)) is already slightly below the lower border (\(\small{0.8\times t_\text{max(R)}=36\,\text{min}}\)). With this sample size the test product must not be more than seven minutes faster in order to achieve ≈80 % power.

On the other hand, we fare extremely well with the CI inclusion approach. However, with a more restrictive \(\small{\Delta=15\,\text{min}}\) we achieve only 60.5 % power. To achieve the desired power we would have to increase the sample size accordingly – regrettably by trial and error.

As expected, the power curve of the
CI inclusion approach is –
almost – symmetrical around zero.

The power curve of the ±20 % criterion is asymmetrical: Its maximum is
slightly shifted to the left and for and given \(\small{\Delta\,t_\text{max}}\) a negative
value has higher power than a negative one. That means, a faster test is
more likely to pass than a slower one. If, say, \(\small{\Delta\,t_\text{max}=-5\,\text{min}}\),
90.5 % of studies will pass but if \(\small{\Delta\,t_\text{max}=+5\,\text{min}}\)
only 74.8 %. This behavior is due to calculating a ratio while keeping
symmetrical limits. That’s flawed as if we would have kept back in the
day the BE limits at 80 – 120 % (see
this article).

Keep in mind that simulations depend on

assumptions about the PK of both formulations. Even if parameters of the reference product’s PK model are in the public domain, their variances are not necessarily.many

If you have data of a previous study, Monte Carlo simulations may
help.

One of my studies: 600 mg IR
ibuprofen, fasting state, 2×2×2 crossover, 16 subjects (powered to
≥ 90 % for \(\small{C_\text{max}}\)),
sampling every 15 minutes up to 2.5 hours. Resampled \(\small{t_\text{max}}\) of the reference in
10^5 simulations.

```
library(moments)
<- function(x, digits = 5) {
sum.simple # Nonparametric summary:
# Remove Mean but keep eventual NAs
<- summary(x, digits = digits)
y if (nrow(y) == 6) {
<- y[c(1:3, 5:6), ]
y else {
}<- y[c(1:3, 5:7), ]
y
}return(y)
}<- FALSE # in your own simulations set to TRUE
progr <- 1e5L # Number of simulations
nsims <- c(1.25, 2.00, 1.00, 1.25, 2.50, 1.25, 1.50, 2.25,
tmax 1.00, 1.25, 1.50, 1.25, 2.25, 1.75, 2.50, 2.25)
<- data.frame(sim = 1L:nsims,
aggr R1 = NA_real_,
R2 = NA_real_,
pct.med.diff = NA_real_,
pass = FALSE)
<- as.integer(length(tmax))
n <- 20 # Acc. to the guidance(s)
pct.delta if (progr) pb <- txtProgressBar(0, 1, 0, width = NA, style = 3)
set.seed(123456) # for reproducibility
for (i in 1L:nsims) {
# Now we resample the distribution, call it R1 and R2
<- sample(tmax, size = n, replace = TRUE)
R1 <- sample(tmax, size = n, replace = TRUE)
R2 2] <- median(R1)
aggr[i, 3] <- median(R2)
aggr[i, # arbitrary: Compare R2 with R1
4] <- 100 * (aggr$R2[i] - aggr$R1[i]) / aggr$R1[i]
aggr[i, # Check whether medians differ by more than 20%
if (abs(aggr[i, 4]) <= pct.delta) aggr$pass[i] <- TRUE
if (progr) setTxtProgressBar(pb, i / nsims)
# patience, please...
} <- sprintf("%+.4f", skewness(aggr$pct.med.diff))
skew <- c(-1, +1) * max(abs(range(c(-pct.delta, pct.delta,
xlim $pct.med.diff))))
aggrif (progr) close(pb)
dev.new(width = 4.6, height = 4.6)
<- par(no.readonly = TRUE)
op par(mar = c(4, 4, 0, 0), cex.axis = 0.9)
hist(aggr[, 4], freq = FALSE, main = "", axes = FALSE, col = "lightblue1",
xlab = expression(italic(t)[max] * ": " *
tilde(R)[2] / tilde(R)[1] * " (%)"),
ylab = "Density")
abline(v = c(-pct.delta, pct.delta), lty = 1, col = "red")
abline(v = quantile(aggr[, 4], probs = c(0.025, 0.5, 0.975)),
col = "black", lty = 2)
axis(1, at = seq(xlim[1], xlim[2], pct.delta))
axis(2, las = 1)
legend("topright", bg = "white", box.lty = 0, cex = 0.9,
legend = bquote(skewness == .(skew)), x.intersp = 0)
par(op)
<- paste0("\nNot equivalent if medians differ by more than 20%",
txt sprintf("\nEmpiric power = %.2f%%",
100 * (1 - sum(aggr$pass == FALSE) / nsims)), "\n\n")
<- sum.simple(aggr[c(2, 3:5)], digits = 4)
res dimnames(res)[[2]][3:4] <- c("diff. med. (%)", " passed")
print(res)
cat(txt)
```

```
# R1 R2 diff. med. (%) passed
# Min. :1.000 Min. :1.000 Min. :-50.000 Mode :logical
# 1st Qu.:1.375 1st Qu.:1.375 1st Qu.:-15.385 FALSE:34887
# Median :1.500 Median :1.500 Median : 0.000 TRUE :65113
# 3rd Qu.:1.750 3rd Qu.:1.750 3rd Qu.: 18.182
# Max. :2.500 Max. :2.500 Max. :100.000
#
# Not equivalent if medians differ by more than 20%
# Empiric power = 65.11%
```

Shocking, isn’t it? With the ±20 % criterion we have only ≈ 65 % power to show that the reference is equivalent to itself.

Good luck if your product differs from the reference. Now I used the same data but shifted a ‘test’ product by eight minutes (–10.7 % of the reference’s \(\small{t_\text{max}}\)). Theoretically that would mean 1.12 hours, but we round it to the closest multiple of the sampling schedule’s 15 minutes.

```
library(moments)
<- function(x, y) {
roundClosest # Round x to the closest multiple of y
return(y * round(x / y))
}<- function(x, digits = 5) {
sum.simple # Nonparametric summary:
# Remove Mean but keep eventual NAs
<- summary(x, digits = digits)
y if (nrow(y) == 6) {
<- y[c(1:3, 5:6), ]
y else {
}<- y[c(1:3, 5:7), ]
y
}return(y)
}<- c(1.25, 2.00, 1.00, 1.25, 2.50, 1.25, 1.50, 2.25,
tmax 1.00, 1.25, 1.50, 1.25, 2.25, 1.75, 2.50, 2.25)
<- 15/60 # Sampling interval
sampling <- FALSE # in your own simulations set to TRUE
progr <- 1e5L # Number of simulations
nsims <- data.frame(sim = 1L:nsims,
aggr R = NA_real_,
T = NA_real_,
pct.med.diff = NA_real_,
pass = FALSE)
<- as.integer(length(tmax))
n <- 20 # Acc. to the guidance(s)
pct.delta if (progr) pb <- txtProgressBar(0, 1, 0, width = NA, style = 3)
set.seed(123456) # for reproducibility
for (i in 1L:nsims) {
# Now we resample the distribution, call it R and T
<- sample(tmax, size = n, replace = TRUE)
R <- sample(tmax, size = n, replace = TRUE)
T # Eight minutes earlier, round to the closest 15 minutes
<- roundClosest(T - 8 / 60, sampling)
T 2] <- median(R)
aggr[i, 3] <- median(T)
aggr[i, # Compare T with R
4] <- 100 * (aggr$T[i] - aggr$R[i]) / aggr$R[i]
aggr[i, # Check whether medians differ by more than 20%
if (abs(aggr[i, 4]) <= pct.delta) aggr$pass[i] <- TRUE
if (progr) setTxtProgressBar(pb, i / nsims)
# patience, please...
} <- sprintf("%+.4f", skewness(aggr$pct.med.diff))
skew <- c(-1, +1) * max(abs(range(c(-pct.delta, pct.delta,
xlim $pct.med.diff))))
aggrif (progr) close(pb)
dev.new(width = 4.6, height = 4.6)
<- par(no.readonly = TRUE)
op par(mar = c(4, 4, 0, 0), cex.axis = 0.9)
hist(aggr[, 4], freq = FALSE, main = "", axes = FALSE, col = "lightblue1",
xlab = expression(italic(t)[max] * ": " *
tilde(T) / tilde(R) * " (%)"),
ylab = "Density")
abline(v = c(-pct.delta, pct.delta), lty = 1, col = "red")
abline(v = quantile(aggr[, 4], probs = c(0.025, 0.5, 0.975)),
col = "black", lty = 2)
axis(1, at = seq(xlim[1], xlim[2], pct.delta))
axis(2, las = 1)
legend("topright", bg = "white", box.lty = 0, cex = 0.9,
legend = bquote(skewness == .(skew)), x.intersp = 0)
par(op)
<- paste0("\nNot equivalent if medians differ by more than 20%",
txt sprintf("\nEmpiric power = %.2f%%",
100 * (1 - sum(aggr$pass == FALSE) / nsims)), "\n\n")
<- sum.simple(aggr[c(2, 3:5)], digits = 4)
res dimnames(res)[[2]][3:4] <- c("diff. med. (%)", " passed")
print(res)
cat(txt)
```

```
# R T diff. med. (%) passed
# Min. :1.000 Min. :0.750 Min. :-61.11 Mode :logical
# 1st Qu.:1.375 1st Qu.:1.125 1st Qu.:-30.77 FALSE:48403
# Median :1.500 Median :1.250 Median :-16.67 TRUE :51597
# 3rd Qu.:1.750 3rd Qu.:1.500 3rd Qu.: 0.00
# Max. :2.500 Max. :2.250 Max. : 80.00
#
# Not equivalent if medians differ by more than 20%
# Empiric power = 51.60%
```

Just not realistic to demonstrate
BE. Hardly better than tossing a
coin…

We would need approximately one hundred subjects for ≈ 80 % power.
Recall than in the original study we achieved 90 % power with 16
subjects.

Next I tried the reference data.^{37}
Since the sampling schedule was insufficient, I fitted a one-compartment
model and performed NCA
with sampling every five minutes. Recall that in this study the
reference was administered twice. I compared \(\small{t_\text{max}}\) of the second period
with the one in first.

```
library(moments)
<- function(x, y) {
roundClosest # Round x to the closest multiple of y
return(y * round(x / y))
}<- function(x, digits = 5) {
sum.simple # Nonparametric summary:
# Remove Mean but keep eventual NAs
<- summary(x, digits = digits)
y if (nrow(y) == 6) {
<- y[c(1:3, 5:6), ]
y else {
}<- y[c(1:3, 5:7), ]
y
}return(y)
}############################################################
# Wagener and Vögtle-Junkert (1996). One-compartment model #
# and fitted concentrations with five minutes sampling #
############################################################
<- c(60, 50, 30, 85, 25, 80, 55, 5, 30,
tmax.p1 65, 165, 50, 130, 30, 30, 45, 55, 50) / 60
<- c(30, 105, 100, 110, 40, 30, 130, 55, 40,
tmax.p2 45, 40, 35, 70, 85, 30, 90, 25, 35) / 60
<- FALSE # in your own simulations set to TRUE
progr <- 1e5L # Number of simulations
nsims <- data.frame(sim = 1L:nsims,
aggr R.p1 = NA_real_,
R.p2 = NA_real_,
pct.med.diff = NA_real_,
pass = FALSE)
<- as.integer(length(tmax.p1))
n <- 20 # Acc. to the guidance(s)
pct.delta if (progr) pb <- txtProgressBar(0, 1, 0, width = NA, style = 3)
set.seed(123456) # for reproducibility
for (i in 1L:nsims) {
# Now we resample the distribution, call it R1 and R2
<- sample(tmax.p1, size = n, replace = TRUE)
R.p1 <- sample(tmax.p2, size = n, replace = TRUE)
R.p2 2] <- median(R.p1)
aggr[i, 3] <- median(R.p2)
aggr[i, # Compare R in period 2 with R in period 1
4] <- 100 * (aggr$R.p2[i] - aggr$R.p1[i]) / aggr$R.p1[i]
aggr[i, # Check whether medians differ by more than 20%
if (abs(aggr[i, 4]) <= pct.delta) aggr$pass[i] <- TRUE
if (progr) setTxtProgressBar(pb, i / nsims)
# patience, please...
} <- sprintf("%+.4f", skewness(aggr$pct.med.diff))
skew <- c(-1, +1) * max(abs(range(c(-pct.delta, pct.delta,
xlim $pct.med.diff))))
aggrif (progr) close(pb)
dev.new(width = 4.6, height = 4.6)
<- par(no.readonly = TRUE)
op par(mar = c(4, 4, 0, 0), cex.axis = 0.9)
hist(aggr[, 4], freq = FALSE, main = "", axes = FALSE, col = "lightblue1",
xlab = expression(italic(t)[max] * ": " *
tilde(R)[p2] / tilde(R)[p1] * " (%)"),
ylab = "Density")
abline(v = c(-pct.delta, pct.delta), lty = 1, col = "red")
abline(v = quantile(aggr[, 4], probs = c(0.025, 0.5, 0.975)),
col = "black", lty = 2)
axis(1, at = seq(xlim[1], xlim[2], pct.delta),
labels = signif(seq(xlim[1], xlim[2], pct.delta), 2))
axis(2, las = 1)
legend("topright", bg = "white", box.lty = 0, cex = 0.9,
legend = bquote(skewness == .(skew)), x.intersp = 0)
par(op)
<- paste0("\nNot equivalent if medians differ by more than 20%",
txt sprintf("\nEmpiric power = %.2f%%",
100 * (1 - sum(aggr$pass == FALSE) / nsims)), "\n\n")
<- sum.simple(aggr[c(2, 3:5)], digits = 4)
res dimnames(res)[[2]][3:4] <- c("diff. med. (%)", " passed")
print(res)
cat(txt)
```

```
# R.p1 R.p2 diff. med. (%) passed
# Min. :0.4583 Min. :0.5000 Min. :-69.231 Mode :logical
# 1st Qu.:0.8333 1st Qu.:0.6667 1st Qu.:-21.053 FALSE:58247
# Median :0.8333 Median :0.7083 Median :-11.111 TRUE :41753
# 3rd Qu.:0.8750 3rd Qu.:0.9167 3rd Qu.: 13.636
# Max. :2.1667 Max. :1.7917 Max. :233.333
#
# Not equivalent if medians differ by more than 20%
# Empiric power = 41.75%
```

\(\small{t_\text{max}}\) and \(\small{C_\text{max}}\) are not sensitive to
even substantial changes in the rate of absorption \(\small{k_{01}}\), since both are
*composite* pharmacokinetic metrics.^{32} In a one compartment model they depending on
\(\small{k_{01}}\) __and__ the
elimination rate constant \(\small{k_{10}}\).^{38} Whereas the former
is a property of the *formulation* – we are interested in – the
latter is a property of the *drug*. \[\eqalign{
t_\textrm{max}&=\frac{\log_{e}(k_{01}/k_{10})}{k_{01}-k_{10}}\\
C_\textrm{max}&=\frac{f\cdot D\cdot k_{01}}{V\cdot
(k_{01}-k_{10})}\large(\small\exp(-k_{10}\cdot
t_\textrm{max})-\exp(-k_{01}\cdot
t_\textrm{max})\large)\tag{3}}\]

Especially \(\small{t_\text{max}}\) is a poor predictor of differences in the rate of absorption.

Example: An IR formulation, one compartment model with complete absorption. \(\small{D=100}\), \(\small{V=3}\), absorption half lives 30 minutes (Reference) and 21 minutes (Test), elimination half life four hours. Sigmoid effect model identical for both formulations: \(\small{E_\textrm{max}=125}\), \(\small{EC_{50}=50}\), \(\small{\gamma=0.5}\).

```
<- function(D, f, V, k01, k10, t) {
C.fun # PK (one compartment model)
<- D * f * k01 / (V * (k01 - k10)) * (exp(-k10 * t) - exp(-k01 * t))
C <- log(k01 / k10) / (k01 - k10)
tmax <- f * D * k01 / (V * (k01 - k10)) *
Cmax exp(-k10 * tmax) - exp(-k01 * tmax))
(<- f * D / V / k10
AUC <- list(C = C, Cmax = Cmax, tmax = tmax, AUC = AUC)
res return(res)
}<- function(Emax, EC50, gamma, x) {
E.fun # PD (sigmoidal link model)
<- (Emax * x^gamma)/(x^gamma + EC50^gamma)
Ef return(Ef)
}<- function(t, C) {
NCA <- data.frame(t = t, C = C, pAUC = 0)
x for (i in 1:(nrow(x) - 1)) {
if (x$C[i+1] < x$C[i]) {
$pAUC[i+1] <- (x$t[i+1] - x$t[i]) * (x$C[i+1] - x$C[i]) /
xlog(x$C[i+1] / x$C[i])
else {
}$pAUC[i+1] <- 0.5 * (x$t[i+1] - x$t[i]) *
x$C[i+1] + x$C[i])
(x
}
}<- max(C)
Cmax <- t[C == Cmax]
tmax <- tail(cumsum(x$pAUC), 1)
AUClast <- tail(x[, 1:2], 3)
x <- lm(log(C) ~ t, data = x)
m <- -coef(m)[[2]]
lambda.z <- AUClast + tail(x$C, 1) / lambda.z
AUCinf <- list(Cmax = Cmax, tmax = tmax,
res AUClast = AUClast, AUCinf = AUCinf)
return(res)
}<- 100
D <- 3
V <- 1
f <- log(2) / (30 / 60)
k01.R <- log(2) / (21 / 60)
k01.T <- log(2) / 4
k10 <- 125
Emax <- 50
EC50 <- 0.5
gamma <- seq(0, 24, length.out = 1001)
t <- c(seq(0, 100, 10), 120, 135, 150, 180,
smpl 210, 240, 360, 540, 720, 960, 1440) / 60
<- C.fun(D, f, V, k01.R, k10, t)
tmp <- tmp$C
C.R <- tmp$AUC
AUC.R <- tmp$Cmax
Cmax.R <- tmp$tmax
tmax.R <- C.fun(D, f, V, k01.T, k10, t)
tmp <- tmp$C
C.T <- tmp$AUC
AUC.T <- tmp$Cmax
Cmax.T <- tmp$tmax
tmax.T <- E.fun(Emax, EC50, gamma, C.R)
E.R <- E.fun(Emax, EC50, gamma, C.T)
E.T <- optimize(E.fun, interval = range(C.R), maximum = TRUE,
Emax.R Emax = Emax, EC50 = EC50, gamma = gamma)$objective
<- optimize(E.fun, interval = range(C.T), maximum = TRUE,
Emax.T Emax = Emax, EC50 = EC50, gamma = gamma)$objective
<- data.frame(AUC = c(AUC.R, AUC.T, AUC.T/AUC.R, NA),
res Cmax = c(Cmax.R, Cmax.T, Cmax.T / Cmax.R, NA),
tmax = c(tmax.R, tmax.T, tmax.T / tmax.R, tmax.T - tmax.R),
Emax = c(Emax.R, Emax.T, Emax.T / Emax.R, NA),
t.Emax = c(t[E.R == max(E.R)], t[E.T == max(E.T)], NA,
== max(E.T)] - t[E.R == max(E.R)]))
t[E.T <- C.fun(D, f, V, k01.R, k10, smpl)$C
obs.R <- C.fun(D, f, V, k01.T, k10, smpl)$C
obs.T <- NCA(smpl, obs.R)
NCA.R <- NCA(smpl, obs.T)
NCA.T <- data.frame(AUClast = c(NCA.R$AUClast, NCA.T$AUClast,
obs $AUClast / NCA.R$AUClast, NA),
NCA.TAUCinf = c(NCA.R$AUCinf, NCA.T$AUCinf,
$AUCinf / NCA.R$AUCinf, NA),
NCA.TCmax = c(NCA.R$Cmax, NCA.T$Cmax,
$Cmax / NCA.R$Cmax, NA), RE1 = NA,
NCA.Ttmax = c(NCA.R$tmax, NCA.T$tmax, NCA.T$tmax / NCA.R$tmax,
$tmax - NCA.R$tmax), RE2 = NA)
NCA.T$RE1 <- 100 * (obs$Cmax - res$Cmax) / res$Cmax
obs$RE2 <- 100 * (obs$tmax - res$tmax) / res$tmax
obsnames(obs)[c(4, 6)] <- rep("% RE", 2)
row.names(obs) <- row.names(res) <- c(" R", " T", "T / R", "T - R")
dev.new(width = 4.5, height = 4.5, record = TRUE)
<- par(no.readonly = TRUE)
op invisible(split.screen(c(2, 1)))
screen(1)
par(mar = c(3, 2.9, 1.1, 0.4), cex.axis = 0.8,
xaxs = "i", yaxs = "i")
plot(t, C.R, type = "n", axes = FALSE,
ylim = 1.04*c(0, max(Cmax.T, Cmax.R)),
xlab = "", ylab = "")
axis(1, at = seq(0, 24, 4))
axis(2, las = 1)
mtext("concentration", 2, line = 2)
lines(t, C.R, lwd = 3, col = "#1010FF80")
lines(t, C.T, lwd = 3, col = "#FF101080")
legend("topright", legend = c("R", "T"),
col = c("#1010FF80", "#FF101080"),
box.lty = 0, cex = 0.8, lwd = 3)
box()
screen(2)
par(mar = c(4, 2.9, 0.1, 0.4), cex.axis = 0.8,
xaxs = "i", yaxs = "i")
plot(t, E.R, type = "n", axes = FALSE,
ylim = 1.04*c(0, max(E.R, E.T)),
xlab = "time", ylab = "")
axis(1, at = seq(0, 24, 4))
axis(2, las = 1)
mtext("effect", 2, line = 2)
lines(t, E.R, lwd = 3, col = "#1010FF80")
lines(t, E.T, lwd = 3, col = "#FF101080")
legend("topright", legend = c("R", "T"),
col = c("#1010FF80", "#FF101080"),
box.lty = 0, cex = 0.8, lwd = 3)
box()
close.screen(all.screens = TRUE)
par(op)
print(round(res, 4))
```

```
# AUC Cmax tmax Emax t.Emax
# R 192.3593 24.7666 1.7143 51.6343 1.704
# T 192.3593 26.3893 1.3481 52.5986 1.344
# T / R 1.0000 1.0655 0.7864 1.0187 NA
# T - R NA NA -0.3662 NA -0.360
```

Although absorption of the Test is 30 % faster than the one of the Reference, its \(\small{t_\text{max}}\) occurs only ≈21 minutes earlier. The T/R-ratio of \(\small{C_\text{max}}\) is 106.6 %. Of course, the \(\small{AUC\text{s}}\) are identical.

It’s a common misconception that safety is *directly* related
to \(\small{C_\text{max}}\). Linking to
the effect site dampens the peaks. Hence, the T/R-ratio of \(\small{E_\text{max}}\) is only 101.9 %.^{39} The
maximum effect of the Test occurs ≈22 minutes earlier than the one of
the Reference.

As an aside, \(\small{E_\text{max}}\)
occurs commonly *earlier* than \(\small{C_\text{max}}\).

What do we obtain by NCA with rich sampling (every ten minutes till two hours, 2.25, 2.5, 3, 3.5, 4, 6, 9, 12, 16, and 24 hours)?

`print(round(obs, 4))`

```
# AUClast AUCinf Cmax % RE tmax % RE
# R 188.7237 192.1587 24.7597 -0.0279 1.6667 -2.7778
# T 188.8800 192.1739 26.3883 -0.0038 1.3333 -1.0921
# T / R 1.0008 1.0001 1.0658 0.0242 0.8000 1.7338
# T - R NA NA NA NA -0.3333 -8.9826
```

We see a practically negligible bias of \(\small{C_\text{max}}\) – though it would be
larger with wider sampling intervals – and a substantial one of \(\small{t_\text{max}}\), since we are never
able to ‘catch’ the true values. The observed \(\small{C_\text{max}}\) is always lower than
the true one because with any given schedule we will sample either
*before* or *after* the true \(\small{t_\text{max}}\).

If we follow the recent draft product-specific guidances,^{11} ^{12} ^{13} it will be a close shave
(*t*_{max} –20%).

Nonparametric analysis __is__ the appropriate approach for
estimating the discretely recorded \(\small{t_\text{max}}\).^{31} ^{40} ^{41}

It is a misconception that the Wilcoxon
signed-rank test and the Mann–Whitney
*U* test compare medians. The Wilcoxon signed-rank test (for
paired samples) employs the Hodges-Lehmann
estimator. The Mann–Whitney *U* test (for independent
samples) compares the median of the difference between a sample
from \(\small{x}\) and a sample
from \(\small{y}\).

Both are unbiased estimators of a shift in location *only* if
distributions are identical (though not necessarily symmetrical).
However, in well-controlled studies (*e.g.*, in
BE) this is likely the case.^{41} See also the simulations above, where the skewness of products’ \(\small{\widetilde{t}_\text{max}}\) was
similar. Recall that in parametric methods independent
and identical distributions are assumed as well.

Due to the discrete nature of the underlying distribution, the ≈90 %
confidence interval is always *apparently* wider than the 90 %
CI based on the normal
distribution. However, such a comparison is not fair, since the
distribution *is not* normal.

Below the exact confidence levels in balanced and slightly imbalanced 2×2×2 crossover designs.

```
# Distribution of the Wilcoxon signed-rank (aka rank sum) Statistic
<- function(n) {
nvec <- trunc(n / 2)
ni <- rep.int(ni, times = 2)
nv <- n - 2 * ni
r if(!r == 0) nv <- nv + c(rep.int(1, r), rep.int(0, 2 - r))
return(as.integer(nv))
}<- 0.05 # for desired ~90% CI
alpha <- 12L:120L
n <- data.frame(N = n, m = NA_integer_, n = NA_integer_)
res for (i in seq_along(n)) {
2:3] <- nvec(n[i])
res[i, # quantile function
<- qwilcox(p = alpha,
k m = res$m[i], n = res$n[i])
if (k == 0) k <- k + 1
# distribution function
$conf.level[i] <- 1 - 2 * pwilcox(q = k - 1,
resm = res$m[i], n = res$n[i])
}print(res, row.names = FALSE)
```

```
# N m n conf.level
# 12 6 6 0.9069264
# 13 7 6 0.9265734
# 14 7 7 0.9026807
# 15 8 7 0.9061383
# 16 8 8 0.9170163
# 17 9 8 0.9072810
# 18 9 9 0.9060880
# 19 10 9 0.9052805
# 20 10 10 0.9107904
# 21 11 10 0.9013824
# 22 11 11 0.9120539
# 23 12 11 0.9091576
# 24 12 12 0.9112662
# 25 13 12 0.9023576
# 26 13 13 0.9091527
# 27 14 13 0.9055210
# 28 14 14 0.9061319
# 29 15 14 0.9067843
# 30 15 15 0.9024742
# 31 16 15 0.9067410
# 32 16 16 0.9061876
# 33 17 16 0.9057654
# 34 17 17 0.9013126
# 35 18 17 0.9041047
# 36 18 18 0.9029409
# 37 19 18 0.9019280
# 38 19 19 0.9035907
# 39 20 19 0.9051653
# 40 20 20 0.9035004
# 41 21 20 0.9019906
# 42 21 21 0.9028384
# 43 22 21 0.9036705
# 44 22 22 0.9017265
# 45 23 22 0.9046107
# 46 23 23 0.9002555
# 47 24 23 0.9006013
# 48 24 24 0.9027958
# 49 25 24 0.9007585
# 50 25 25 0.9006068
# 51 26 25 0.9004985
# 52 26 26 0.9020722
# 53 27 26 0.9035707
# 54 27 27 0.9030133
# 55 28 27 0.9025131
# 56 28 28 0.9001460
# 57 29 28 0.9012392
# 58 29 29 0.9004810
# 59 30 29 0.9029184
# 60 30 30 0.9004952
# 61 31 30 0.9011928
# 62 31 31 0.9002387
# 63 32 31 0.9021996
# 64 32 32 0.9025308
# 65 33 32 0.9001599
# 66 33 33 0.9017373
# 67 34 33 0.9006684
# 68 34 34 0.9007852
# 69 35 34 0.9009153
# 70 35 35 0.9021280
# 71 36 35 0.9009372
# 72 36 36 0.9008405
# 73 37 36 0.9007642
# 74 37 37 0.9016981
# 75 38 37 0.9004213
# 76 38 38 0.9001633
# 77 39 38 0.9020363
# 78 39 39 0.9006456
# 79 40 39 0.9013446
# 80 40 40 0.9009302
# 81 41 40 0.9005431
# 82 41 41 0.9010410
# 83 42 41 0.9015318
# 84 42 42 0.9009986
# 85 43 42 0.9004941
# 86 43 43 0.9008202
# 87 44 43 0.9011458
# 88 44 44 0.9005209
# 89 45 44 0.9016222
# 90 45 45 0.9001134
# 91 46 45 0.9003059
# 92 46 46 0.9012281
# 93 47 46 0.9005427
# 94 47 47 0.9005926
# 95 48 47 0.9006506
# 96 48 48 0.9013977
# 97 49 48 0.9006428
# 98 49 49 0.9005814
# 99 50 49 0.9005307
# 100 50 50 0.9011339
# 101 51 50 0.9003244
# 102 51 51 0.9001724
# 103 52 51 0.9000328
# 104 52 52 0.9005161
# 105 53 52 0.9009919
# 106 53 53 0.9007486
# 107 54 53 0.9005194
# 108 54 54 0.9008802
# 109 55 54 0.9012374
# 110 55 55 0.9009200
# 111 56 55 0.9006174
# 112 56 56 0.9008761
# 113 57 56 0.9011342
# 114 57 57 0.9007557
# 115 58 57 0.9003925
# 116 58 58 0.9005651
# 117 59 58 0.9007395
# 118 59 59 0.9003103
# 119 60 59 0.9009961
# 120 60 60 0.9010812
```

Alternatives^{42} ^{43} ^{44} not requiring identical distributions are
out of scope of this article.

Though stated in the
WHO’s guideline,^{27} it is also a misconception that a statistical
comparison of \(\small{t_\text{max}}\)
with a nonparametric method has low power. For instance, the Wilcoxon
signed-rank test (for paired differences) and the Mann–Whitney
*U* test (for independent samples) have an asymptotical power
of \(\small{3/\pi\approx95.5\%}\) for
normal distributed data. For small sample sizes sufficient power was
confirmed in simulations.^{45} Of course, if data don’t follow a normal
distribution, nonparametric tests are more powerful than their
parametric counterparts.

Rich sampling in the early part of the profile (\(\small{\leq 2\times t_\textrm{max}}\)) is
generally recommended in order ‘catch’ \(\small{C_\text{max}}\). If one wants to
follow the recent draft product-specific guidances,^{11} ^{12} ^{13} it might be necessary to aim at sampling
intervals narrower than usual. Sample size estimation for this approach
is not implemented in any software.

An alternative to comparing \(\small{t_\text{max}}\) would be to assess
the ‘early exposure’ (*i.e.*, by a partial \(\small{AUC}\)) as recommended by the
FDA^{46} and
Health Canada^{47} for drugs with rapid onset of action.

However, this PK metric is
extremely variable and might require a replicate design in order to
apply reference-scaling (RSABE
or ABEL).
Note that reference-scaling for partial \(\small{AUC\textsf{s}}\) is acceptable for
the EMA.^{10}

Health Canada requires for drugs with rapid onset of action the cut-off
time as the subject’s \(\small{t_\text{max}}\) of the reference,^{48} which
can be difficult to set up in some software packages. Similarly to the
comparison of \(\small{C_\text{max}}\),
only the point estimate has to lie within 80.0 – 125.0 % and hence, a
parallel or a simple 2×2×2 crossover design would be sufficient.

Interlude: FDA, Health Canada

Interesting is an older guidance of the FDA.

“An early exposure measure may be informative on the basis of appropriate clinical efficacy/safety trials and/or pharmacokinetic/pharmacodynamic studies that call for better control of drug absorption into the systemic circulation (e.g., to ensure rapid onset of an analgesic effect or to avoid an excessive hypotensive action of an antihypertensive). In this setting, the guidance recommends use of partial AUC as an early exposure measure. We recommend that the partial area be truncated at the population median of Tmax values for the reference formulation.

Sorry folks but the population [*sic*]
median is *unknown*. Is the sample median
meant? Or is it the value reported in the
RLD’s label? What if – stupid
enough – only the arithmetic mean is given?

In the latest guidance for NDAs or INDs we find:

“ […] for certain classes of drugs (e.g., analgesic drug products), an evaluation of the partial exposure could be scientifically appropriate to support the determination of the relative BA of the drug product. The FDA recommends the use of partial AUC as a partial exposure measure. The time to truncate the partial AUC should be related to a clinically relevant response measure. The sponsor should collect sufficient quantifiable samples to allow an adequate estimation of the partial AUC. Sponsors should consult the appropriate review division for questions on the suitability of the response measure or the use of partial exposure.

It makes sense to base cut-off times on __pharmacodynamics__,
which is in line with guidances for multiphasic release products of the
FDA and Health
Canada.

“The partial AUCs, AUC_{0–1.5}and AUC_{1.5–t}, have been determined to be the most appropriate parameters for evaluation of the drug bioavailability responsible for the sleep onset and sleep maintenance phases, respectively. […] These two partial AUCs will ensure that the pharmacokinetic profiles of test and reference products are sufficiently similar that there will be no significant difference in sleep onset, sleep maintenance, and lack of residual effects between the test and reference products. “The sampling time (T1) for the first pAUC is based on time at which 90–95% of subjects are likely to achieve optimal early onset of response. Because the rate of initial methylphenidate absorption is associated with the rate of early onset of response, the sampling time “T1” is determined based on T_{max}of the immediate-release portion of the formulation. […] T2 is based on the school hours and […] T3 is based on the efficacy duration claim of Concerta^{®}from the pivotal clinical studies. “The requirement for pAUC assessment metrics for multiphasic modified‐release formulations will be based on data available from various sources, including but not limited to peer‐reviewed scientific literature and the approved Canadian labelling, as applicable. The time course of changes in the rate of drug delivery throughout the day should be reconciled with generally accepted and clinically relevant response data generated from a well‐designed randomized clinical trial program. Specifically, standards based on the 90% confidence interval of pAUC metrics should be met. The specific pAUC time intervals to be considered will be based on clinical data showing the therapeutic relevance of the particular time interval (e.g. early onset, maintenance, dose clearance, fasted versus fed state). Selected time intervals should be justified, specified a priori and applied to all study subjects for both the test and reference products.

For the EMA cut-off
time(s) should be based on __pharmacokinetics__.^{10} Nice on paper… Mean plots in the originator’s
approval package will be misleading. Even if the package contains
‘spaghetti plots’,^{53} ^{54} it will be difficult to identify a
‘trough’ between release phases.

Furthermore, contrary to the
FDA and Health
Canada, partial \(\small{AUC\textsf{s}}\) are required
*additionally* to \(\small{t_\text{max}}\).

Recall that in the simple simulations we
had perfectly matching products (the average
PK-parameters were
*identical*). If we apply an appropriate statistical method, we
could confirm that (*i.e.*, detected no differences in \(\small{t_\text{max}}\)).

There is no guarantee that by *looking* (‽) at reported
medians / IQRs / ranges an
assessor will arrive at the same conclusions as the applicant. A great
deal of discussions on its way.

**Assessing t_{max} based on eyeballing
‘apparent’ differences of medians, its ‘variability’ or range is bad
science and thus, should be abandoned.**

»*A* [formal] *statistical evaluation of t*_{max}
*is not required*«

does not preclude to perform one:

Only (‼) if clinically relevant, pre-specify an acceptance range for
*t*_{max} and assess the ≈90% confidence interval by an
appropriate nonparametric method.

I did so since the mid 1980s and – even after the
EMA’s
BE guideline^{9} of 2010 – never received a *single*
deficiency letter in this respect.

Given the theoretical considerations,^{32} as well as my observations
and simulations, at least a modified
criterion \(\small{-20\%\leq\Delta\,\widehat{t}_\text{max}\leq25\%}\)
could be considered as a quick – though doubtful – fix.

»*Comparable median (≤ 20 % difference) and range for
T*_{max}.«

is statistically questionable and should be challenged. Provide comments using this template and send it to [email protected] until 31 July 2022.

“Enlightenment is man’s release from his self-incurred tutelage. Tutelage is man’s inability to make use of his understanding without direction from another. Self-incurred is this tutelage when its cause lies not in lack of reason but in lack of resolution and courage to use it without direction from another.Sapere aude!“Have courage to use your own reason!” – that is the motto of enlightenment.

In order to ‘catch’ \(\small{t_\text{max}}\) in all subjects under all treatments, likely a much tighter sampling schedule than usual is required.

In the simulations we observed in the CI inclusion approach no significantly inflated Type I Error. However,

- the tight sampling ended too early. Hence, in some cases
*t*_{max}was observed later than the true one. - The Wilcoxon
signed-rank test (for paired differences) and the Mann–Whitney
*U*test give unbiased estimates of the shift in location only if distributions are*identical*. Based on the skewness in the simulations the distributions were only*similar*.

Alternatives^{42}^{43}^{44}not requiring identical distributions are not trivial and computationally intensive. Furthermore, they give only a probability and no confidence interval. A less powerful method (requiring fewer assumptions) for*paired*samples is the sign test, provided in the function`sign_test()`

of package`coin`

.^{2}

For any approach sample size estimation is difficult and computationally demanding. Since it is currently not supported in software, you have to roll out your own code – relying on numerous assumptions.

top of section ↩︎ previous section ↩︎

Acknowledgment

Susana Almeida (Medicines for Europe), Jean-Michel Cardot, Jiři Hofmann (Zentiva), and members of the BEBA-Forum ElMaestro and Ohlbe for fruitful discussions.

License

Helmut Schütz 2022

`R`

and all packages GPL 3.0,
`pandoc`

GPL 2.0.

1^{st} version April 19, 2021. Rendered July 29, 2022 15:35 CEST
by rmarkdown
via pandoc in 0.79 seconds.

Footnotes and References

Therneau TM, Lumley T, Atkinson E, Crowson C.

*Survival Analysis.*Package version 3.3.1. 2022-02-20. CRAN↩︎Hothorn T, Winell H, Hornik K, van de Wiel MA, Zeileis A.

*Conditional Inference Procedures in a Permutation Test Framework.*Package version 1.4.2. 2021-10-07. CRAN↩︎Mersman O, Trautmann H, Steuer D, Bornkamp B.

*truncnorm: Truncated Normal Distribution.*Package version 1.0.8. 2018-02-26. CRAN.↩︎Wickham H.

*Flexibly Reshape Data.*Package version 0.8.9. 2022-04-12. CRAN.↩︎Sarkar D, Andrews F, Wright K, Klepeis, N, Murrell P.

*Trellis Graphics for R.*Package version 0.20.45. 2021-09-18. CRAN.↩︎Komsta L, Novomestky F.

*moments: Moments, Cumulants, Skewness, Kurtosis and Related Tests.*Package version 0.14.1. 2015-01-05. CRAN.↩︎Commission of the EC.

*Note for Guidance. Investigation of Bioavailability and Bioequivalence. Appendix III: Technical Aspects of Bioequivalence Statistics.*3CC15a. Brussels. December 1991. Online.↩︎EMEA, CPMP.

*Note for Guidance on the Investigation of Bioavailability and Bioequivalence.*CPMP/EWP/QWP/1401/98. London. 26 July 2001. Online.↩︎EMA, CHMP.

*Guideline on the Investigation of Bioequivalence.*CPMP/EWP/QWP/1401/98 Rev. 1/ Corr **. London. 20 January 2010. Online.↩︎EMA, CHMP.

*Guideline on the pharmacokinetic and clinical evaluation of modified release dosage forms.*EMA/CHMP/EWP/280/96 Rev1. London. 20 November 2014. Online.↩︎EMA, CHMP.

*Ibuprofen oral use immediate release formulations 200 – 800 mg product-specific bioequivalence guidance.*EMA/CHMP/356876/2017 Rev.1*. Amsterdam. 25 March 2022. Online.↩︎EMA, CHMP.

*Paracetamol oral use immediate release formulations product-specific bioequivalence guidance.*EMA/CHMP/356877/2022 Rev.1*. Amsterdam. 25 March 2022. Online.↩︎EMA, CHMP.

*Tadalafil film-coated tablets 2.5 mg, 5 mg, 10 mg and 20 mg product-specific bioequivalence guidance.*EMA/CHMP/315234/2014 Rev.1*. Amsterdam. 25 March 2022. Online.↩︎Sauter R, Steinijans VW, Diletti E, Böhm E, Schulz H-U.

*Presentation of results from bioequivalence studies.*Int J Clin Pharm Ther Toxicol. 1992; 30(7): 233–56. PMID:1506127.↩︎ANMAT.

*Régimen de Buenas Prácticas para la Realización de Estudios de Biodisponibilidad/Bioequivalencia.*Buenos Aires. 2006. Online. [Spanish].↩︎NIHS, Division of Drugs.

*Guideline for Bioequivalence Studies of Generic Products. Q & A.*Tokio. February 29, 2012. Online.↩︎NIHS, Division of Drugs.

*Guideline for Bioequivalence Studies of Generic Products. Q & A.*Tokio. March 19, 2020. Online.↩︎MCC. Registration of Medicines.

*Biostudies.*Pretoria. June 2015. Online.↩︎EMEA, CPMP.

*Note for Guidance on Modified Release Oral and Transdermal Dosage Forms: Section II (Pharmacokinetic and Clinical Evaluation).*CPMP/EWP/280/96 Corr *. London. 28 July 1999. Online.↩︎Department of Health, TGA.

*European Union and ICH Guidelines adopted in Australia. Guideline on the Investigation of Bioequivalence with TGA Annotations.*Canberra. 16 June 2011. Online↩︎ASEAN States Pharmaceutical Product Working Group.

*ASEAN Guideline for the Conduct of Bioequivalence Studies.*Vientiane. March 2015. Online.↩︎ANAMED. Instituto de Salud Pública de Chile.

*Guia para La realización de estudios de biodisponibilidad comparativa en formas farmacéuticas sólidas de administración oral y acción sistémica.*Santiago. December 2018. [Spanish].↩︎ECC.

*Regulations Conducting Bioequivalence Studies within the Framework of the Eurasian Economic Union.*3 November 2016, amended 4 September 2020. Online.↩︎Ministry of Health and Population, The Specialized Scientific Committee for Evaluation of Bioavailability & Bioequivalence Studies.

*Egyptian Guideline For Conducting Bioequivalence Studies for Marketing Authorization of Generic Products.*Cairo. February 2017. Internet Archive.↩︎GHC.

*The GCC Guidelines for Bioequivalence.*May 2021. Online.↩︎New Zealand Medicines and Medical Devices Safety Authority.

*Guideline on the Regulation of Therapeutic Products in New Zealand. Part 6: Bioequivalence of medicines.*Wellington. February 2018. Online.↩︎WHO, Essential Medicines and Health Products:

*Multisource (generic) pharmaceutical products: guidelines on registration requirements to establish interchangeability.*WHO Technical Report Series, No. 1003, Annex 6. Geneva. 28 April 2017. Online↩︎The

__true__*t*_{max}follows a*continuous*distribution. However, due to the sampling schedule the distribution is*discretized*.↩︎Think about normal distributed data. The arithmetic mean is the statistic of

*location*(central tendency) and the standard deviation the statistic of*dispersion*. Both are estimated values. Talking about the ‘variability’ of the mean would be nonsense as well.↩︎Johnston RG, Schroder SA, Mallawaaratchy AR.

*Statistical Artifacts in the Ratio of Discrete Quantities.*Am Stat. 1995; 49(3): 285–91. JSTOR:2684202.↩︎Basson RP, Cerimele BJ, DeSante KA, Howey DJ.

*Tmax: An Unconfounded Metric for Rate of Absorption in Single Dose Bioequivalence Studies.*Pharm Res. 1996; 13(2): 324–8. doi:10.1023/A:1016019904520.↩︎Tóthfálusi L, Endrényi L.

*Estimation of C*_{max}*and T*_{max}*in Populations After Single and Multiple Drug Administration.*J Pharmacokin Pharmacodyn. 2003; 30(5): 363–85. doi:10.1023/b:jopa.0000008159.97748.09.↩︎Endrényi L, Tóthfalusi L.

*Metrics for the Evaluation of Bioequivalence of Modified-Release Formulations.*AAPS J. 2012; 14(4): 813–9. doi:10.1208/s12248-012-9396-8. Free Full Text.↩︎Geyer CJ.

*Nonparametric Tests and Confidence Intervals.*April 13, 2003. Online.↩︎Imagine:

\(\small{x=\left\{1,\cdots,1,2\right\}\rightarrow \tilde{x}={\color{Blue}1}\;\wedge_{}^{}\;R(x)={\color{Blue}1}}\)

\(\small{y=\left\{1,\cdots,1,3\right\}\rightarrow \tilde{x}={\color{Blue}1}\;\wedge_{}^{}\;R(y)={\color{Red}2}}\)

Is the range of \(\small{y}\) ‘apparently’ different from the one of \(\small{x}\)?

\(\small{z=\left\{1,\cdots,1,1\right\}\rightarrow \tilde{x}={\color{Blue}1}\;\wedge_{}^{}\;R(z)={\color{Red}0}}\)

Is the range of \(\small{z}\) ‘apparently’ different from the other ones?↩︎Note that the function

`wilcox.test()`

of Base R is not exact for tied (equally ranked) data. Hence, I used the function`wilcox_test()`

of package`coin`

with its default average ranking.↩︎Wagener HH, Vögtle-Junkert U.

*Intrasubject variability in bioequivalence studies illustrated by the example of ibuprofen.*Int J Clin Pharmacol Ther. 1996; 34(1): 21–31. PMID:8688993.↩︎In models with more than one compartment \(\small{t_\text{max}}\) and \(\small{C_\text{max}}\) cannot be analytically derived. In software numeric optimization is employed to locate the maximum of the function.↩︎

That explains why PD is less sensitive to detect differences in formulations and PK-studies are preferred.↩︎

Basson RP, Ghosh A, Cerimele BJ, DeSante KA, Howey DC.

*Why Rate of Absorption Inferences in Single Dose Bioequivalence Studies are Often Inappropriate.*Pharm Res. 1998; 15(2): 276–9. doi:10.1023/a:1011974803996.↩︎Hauschke D, Steinijans VW, Diletti E.

*A distribution-free procedure for the statistical analysis of bioequivalence studies.*Int J Clin Pharm Ther Toxicol. 1990; 28(2): 72–8. PMID:2307548.↩︎Brunner E, Munzel U.

*The Nonparametric Behrens‐Fisher Problem: Asymptotic Theory and a Small‐Sample Approximation.*Biom. J. 2000; 42(1): 17–25. doi:10.1002/(SICI)1521-4036(200001)42:1%3C17::AID-BIMJ17%3E3.0.CO;2-U.↩︎Neubert K, Brunner E.

*A studentized permutation test for the non-parametric Behrens–Fisher problem.*Comput Stat Data Anal. 2007; 51(10): 5192–204. doi:10.1016/j.csda.2006.05.024.↩︎Wilcox RA.

*Introduction to Robust Estimation and Hypothesis Testing.*London: Academic Press; 4^{th}ed. 2017. p. 192–8.↩︎Chow S-C, Liu J-p.

*Design and Analysis of Bioavailability and Bioequivalence Studies.*Boca Raton: Chapman & Hall/CRC; 3^{rd}ed. 2009. Table 5.3.4. p. 146.↩︎FDA, CDER.

*Draft Guidance for Industry. Bioequivalence Studies with Pharmacokinetic Endpoints for Drugs Submitted Under an ANDA.*Rockville. December 2013. Download↩︎Health Canada.

*Guidance Document. Comparative Bioavailability Standards: Formulations Used for Systemic Effects.*Ottawa. 2018/06/08. Online↩︎Will that inflate the Type I Error because it depends on the

*observed*data? Perhaps. However, since only the point estimate is assessed, likely the overall Type I Error is controlled by the 90% CI of*AUC*.↩︎FDA, CDER.

*Guidance for Industry. Bioavailability and Bioequivalence Studies for Orally Administered Drug Products — General Considerations.*Rockville. March 2003. Internet Archive.↩︎FDA, CDER.

*Guidance for Industry. Bioavailability Studies Submitted in NDAs or INDs — General Considerations.*Silver Spring. April 2022. Download.↩︎FDA, OGD.

*Draft Guidance on Methylphenidate Hydrochloride.*Recommended Sept 2012; Revised Nov 2014; Jul 2018. Download.↩︎Bonate PL.

*Pharmacokinetic-Pharmacodynamic Modeling and Simulation.*New York: Springer; 2^{nd}ed. 2011. p. 359.↩︎Gabrielsson J, Weiner D.

*Pharmacokinetic and Pharmacodynamic Data Analysis: Concepts and Applications.*Stockholm: Apotekarsocieteten; 5^{th}ed. 2016. p. 335–6, p. 704.↩︎Kant I.

*Answering the Question: What Is Enlightenment?*Berlinische Monatsschrift. December 1784. [Original in German].↩︎