Consider allowing JavaScript. Otherwise, you have to be proficient in reading LaTeX since formulas will not be rendered. Furthermore, the table of contents in the left column for navigation will not be available and code-folding not supported. Sorry for the inconvenience.

Examples in this article were generated with R 4.1.0 by the package PowerTOST.1

See also a collection of other articles.

  • Click to show / hide R code.

Introduction

Have you ever wondered where all these nice numbers come from?

\(\small{\alpha\;0.05}\), bioequivalence limits \(\small{80.00-125.00\%}\), switching to SABE at \(\small{CV_\textrm{wR}\;30\%}\), upper cap of scaling at \(\small{CV_\textrm{wR}\;50\%}\), \(\small{k\;0.760}\), \(\small{s_\textrm{wR}\;0.294}\)

The list is endless and as interesting as a telephone directory.

What the Heck?

α 0.05

Why this number? OK, at least this one is easy to explain: Mr Fisher considered it convenient.

The value for which p = 0.05, or 1 in 20, is 1.96 or nearly 2; it is convenient to take this point as a limit in judging whether a deviation is to be considered significant or not.
Ronald A. Fisher. 1925.2

For nearly a century this number is the holy grail of frequentists. Members of the other chuch, the Bayesians, hold other believes.

90% Confidence Interval

Did you ever wonder why (based on Mr Fisher’s \(\small{\alpha=0.05}\)) we use a \(\small{100(1-2\,\alpha)=90\%}\) confidence interval in bioequivalence and not the \(\small{95\%}\) CI like in clinical Phase III?

In Phase III we want to demonstrate superiority (for nitpickers: non-inferiority) of verum to placebo. Once a drug is approved, we can be 95% sure that it is safe and efficacious (though there is a 5% chance that even a mega-selling blockbuster drug is not better than snake-oil).

In the early days of BE a \(\small{95\%}\) CI was applied indeed. Below a scan of my conference proceedings.3


Fig. 1 Note my handwritten comment (“generell” is German for generally).
The guideline contained five text passages stating
»The 95% confidence interval…«

So why did we change to the \(\small{90\%}\) CI?

In a particular patient the bioavailability can be either too low \(\small{(p(\textrm{BA}<80\%)\leq5\%)}\) or too high \((\small{p(\textrm{BA}>125\%)\leq5\%})\) but evidently not at the same time.
The \(\small{90\%}\) CI controls the risk for the population of patients. Therefore, if a study passes, the risk for patients does still not exceed \(\small{5\%}\).4 5

sd   <- 0.15
n    <- 501
ylim <- c(0, dnorm(x = 0, sd = sd))
dev.new(width = 4.5, height = 2.25, record = TRUE)
op   <- par(no.readonly = TRUE)
par(mar = c(4, 1, 0, 0), cex.axis = 0.9)
# TOST (left)
plot(log(c(0.5, 2)), c(0, 0), type = "n", xlim = log(c(0.5, 2)), axes = FALSE,
     ylim = ylim, xlab = expression(mu[T]/mu[R]*" (%)"), ylab = "")
axis(1, at = c(log(0.5), log(1/1.5), log(0.8), 0, log(1.25), log(1.5), log(2)),
     labels = sprintf("%.0f", 100*c(0.5, 1/1.5, 0.8, 1, 1.25, 1.5, 2)))
mtext(side = 2, text = expression(italic(p)*" (%)"), line = -0.1)
x <- seq(log(0.8), log(2), length.out = n)
polygon(x = c(x, rev(x)), y = c(rep(0, n), rev(dnorm(x = x, sd = sd))),
        col = "#90EE90", border = "#90EE90")
text(0, ylim[2]/2, adj = 0.5, labels = 95, cex = 0.9)
x <- seq(log(0.5), log(0.8), length.out = n)
polygon(x = c(x, rev(x)), y = c(rep(0, n), rev(dnorm(x = x, sd = sd))),
        col = "#FA8072", border = "#FA8072")
text(log(0.8), dnorm(x = log(1.25), sd = sd)*0.2, pos = 2, offset = 0.5,
           labels = 5, cex = 0.9)
# TOST (right)
plot(log(c(0.5, 2)), c(0, 0), type = "n", xlim = log(c(0.5, 2)), axes = FALSE,
     ylim = ylim, xlab = expression(mu[T]/mu[R]*" (%)"), ylab = "")
axis(1, at = c(log(0.5), log(1/1.5), log(0.8), 0, log(1.25), log(1.5), log(2)),
     labels = sprintf("%.0f", 100*c(0.5, 1/1.5, 0.8, 1, 1.25, 1.5, 2)))
mtext(side = 2, text = expression(italic(p)*" (%)"), line = -0.1)
x <- seq(log(0.5), log(1.25), length.out = n)
polygon(x = c(x, rev(x)), y = c(rep(0, n), rev(dnorm(x = x, sd = sd))),
        col = "#90EE90", border = "#90EE90")
text(0, ylim[2]/2, adj = 0.5, labels = 95, cex = 0.9)
x <- seq(log(1.25), log(2), length.out = n)
polygon(x = c(x, rev(x)), y = c(rep(0, n), rev(dnorm(x = x, sd = sd))),
        col = "#FA8072", border = "#FA8072")
text(log(1.25), dnorm(x = log(1.25), sd = sd)*0.2, pos = 4, offset = 0.5,
           labels = 5, cex = 0.9)
# 90% CI inclusion
plot(log(c(0.5, 2)), c(0, 0), type = "n", xlim = log(c(0.5, 2)), axes = FALSE,
     ylim = ylim, xlab = expression(mu[T]/mu[R]*" (%)"), ylab = "")
axis(1, at = c(log(0.5), log(1/1.5), log(0.8), 0, log(1.25), log(1.5), log(2)),
     labels = sprintf("%.0f", 100*c(0.5, 1/1.5, 0.8, 1, 1.25, 1.5, 2)))
mtext(side = 2, text = expression(italic(p)*" (%)"), line = -0.1)
x <- seq(log(0.8), log(1.25), length.out = n)
polygon(x = c(x, rev(x)), y = c(rep(0, n), rev(dnorm(x = x, sd = sd))),
        col = "#90EE90", border = "#90EE90")
text(0, ylim[2]/2, adj = 0.5, labels = 90, cex = 0.9)
x <- seq(log(0.5), log(0.8), length.out = n)
polygon(x = c(x, rev(x)), y = c(rep(0, n), rev(dnorm(x = x, sd = sd))),
        col = "#FA8072", border = "#FA8072")
text(log(0.8), dnorm(x = log(1.25), sd = sd)*0.2, pos = 2, offset = 0.5,
           labels = 5, cex = 0.9)
x <- seq(log(1.25), log(2), length.out = n)
polygon(x = c(x, rev(x)), y = c(rep(0, n), rev(dnorm(x = x, sd = sd))),
        col = "#FA8072", border = "#FA8072")
text(log(1.25), dnorm(x = log(1.25), sd = sd)*0.2, pos = 4, offset = 0.5,
           labels = 5, cex = 0.9)
par(op)

Fig. 2 90% confidence interval unveiled.

If we would have kept the \(\small{95\%}\) CI, the patient’s risk would be only 2.5%. Fine in principle but we are only 95% confident that the reference product works at all. Hence, keeping the \(\small{95\%}\) CI would have been double standards.

80 – 125%

In the early days of bioequivalence, the analysis was performed on raw (untransformed) data, assuming an additive model. Based on a clinically not relevant difference \(\small{\Delta=20\%}\) one gets: \[\left\{\theta_1,\theta_2\right\}=\left\{100(1-\Delta),100(1+\Delta)\right\}=80-120\%\tag{1}\] One should not forget that bioequivalence was never a scientific theory in the Popperian sense but an ad hoc solution to a pressing problem in the 1970s.6 7 The commonly assumed clinically not relevant difference \(\small{\Delta=20\%}\) is arbitrary (as any other). However, we have decades of empiric evidence that the concept is sufficient in practice. Apart from occasional anecdotal reports (mainly dealing with narrow therapeutic index drugs), no problems are evident switching between the originator and generics (and vice versa) in terms of lacking efficacy and safety problems.

Some argued that concentrations and derived pharmacokinetic metrics like the Area Under the Curve (AUC) do not follow a normal distribution – which is a prerequiste of the ANOVA. Makes sense.
If one works with the arithmetic mean and its variance, it implies that there is a certain probability of negative values because the domain of \(\small{\mathcal{N}(\mu;\sigma^2)}\) is \(\small{[-\infty<x+\infty]}\) for \(\small{x\in \mathbb{R}}\).

A fascinating example of the FDA:8

Fig. 3 Arithmetic mean ± SD (n 238) in Excel.
Sampling times: pre-dose, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 16, 24, 30, 36, and 48 h post-dose

A line plot instead of a XY plot. Oh dear, someone clicked the first button in Excel!
One hour intervals in the beginning are as wide as the 12 hours at the end. Do these guys and dolls really believe that at seven hours there’s a ~16% probability that concentrations are ≤–232 ng/mL‽ Any statistic implies an underlying distribution.
Which cult of Pastafarianism do they belong to? The one holding that negative mass exist or the other believing in negative lengths?

Exploring large data sets we see that the distribution is skewed to the right. The lognormal distribution does not only pull the right tail in but also with its domain of \(\small{]0<x+\infty]}\) for \(\small{x\in \mathbb{R}^{+}}\) avoids the ‘problem’ of negative concentrations.

Fig. 4 Geometric mean ± SDgeom (n 238).

This plot reflects the extreme variability of the drug much better and shows that high concentrations are more likely than low ones.

After a dose we know only one thing for sure: The concentration is not zero.
Harold Boxenbaum9

His statement ended in shouting matches. Don’t know why.10

<nitpick>
The distribution of data per se is not important. Only the model’s residual errors – as estimates of \(\epsilon\) – have to be normally distributed. However, exploring large data sets again, we see that when performing the analysis on untransformed data that they aren’t.

</nitpick>

Another argument is based on the fundamental equation of pharmacokinetics, namely \[AUC=\frac{f\cdot D}{CL},\tag{2}\] where \(\small{f}\) is the fraction absorbed, \(\small{D}\) the dose, and \(\small{CL}\) the clearance.
When we want the compare the bioavailabilities of two drugs (\(\small{f_\textrm{T},f_\textrm{R}}\)) we arrive at \[\frac{f_\textrm{T}}{f_\textrm{R}}=\frac{AUC_\textrm{T}\cdot CL_\textrm{T}}{D_\textrm{T}}\Big{/}\frac{AUC_\textrm{R}\cdot CL_\textrm{R}}{D_\textrm{R}}\tag{3}\] By assuming11 identical doses and clearances (\(\small{D_\textrm{T}\equiv D_\textrm{R}, CL_\textrm{T}\equiv CL_\textrm{R}}\)) we can cancel them out \[\require{cancel}\frac{f_\textrm{T}}{f_\textrm{R}}=\frac{AUC_\textrm{T}\cdot \cancel{CL_\textrm{T}}}{\cancel{D_\textrm{T}}}\Big{/}\frac{AUC_\textrm{R}\cdot \cancel{CL_\textrm{R}}}{\cancel{D_\textrm{R}}}\tag{4}\] to get what we want: \[\frac{f_\textrm{T}}{f_\textrm{R}}=\frac{AUC_\textrm{T}}{AUC_\textrm{R}}.\tag{5}\] Hey, that’s a ratio and hence, we have to use a multiplicative model.

But we need differences in the ANOVA (which is an additive model)… Nothing easier than that. \(\small{(2)}\) can be rewritten to \[\log_{e}AUC=\log_{e}f\,+\log_{e}D\,-\log_{e}CL.\tag{6}\] After the same substitutions and cancelations as in \(\small{(3)}\) and \(\small{(4)}\), perform the analysis \[\log_{e}PE=\log_{e}f_T\,-\log_{e}f_T=\log_{e}AUC_\textrm{T}\,-\log_{e}AUC_\textrm{R},\tag{7}\] and use \(\small{\exp(\log_{e}PE)}\) to get with \(\small{PE}\) a number clinicians can comprehend.

At the first BioInternational conference12 there was a poll among the participants about the transformation. Outcome: ⅓ never, ⅓ always, ⅓ case by case (i.e., perform both analyses and report the one with narrower confidence interval ‘because it fits the data better’). Let’s be silent about the last camp.13

Wait a minute! The original acceptance range \(\small{(1)}\) was symmetrical around \(\small{100\%}\). In \(\small{\log_{e}}\)-scale it should be symmetrical around \(\small{0}\) (because \(\small{\log_{e}1=0}\)).
What happens to our \(\small{\Delta}\) which should still be \(\small{20\%}\)? Due to the positive skewness of the lognormal distribution a [heated debate] lively discussion started after early publications proposing \(\small{80-125\%}\).14 15 Keeping \(\small{80-120\%}\) would be flawed because the maximum power should be obtained at \(\small{\mu_\textrm{T}/\mu_\textrm{R}=1}\) for \[\exp\left((\log_{e}\theta_1+\log_{e}\theta_2)/2\right)\tag{8}\] which works only if \(\small{\theta_2=\theta_1^{-1}}\) or \(\small{\theta_1=\theta_2^{-1}}\). Keeping the original limits, maximum power would be obtained at \(\small{\mu_\textrm{T}/\mu_\textrm{R}=\exp((\log_{e}0.8+\log_{e}1.2)/2)\approx0.979796}\).

Fig. 5 Power curve for {θ1 0.80, θ2 1.20}, 2×2×2 design, n 28.
Note that the x-axis is in log-scale.

There were three parties (all agreed that the acceptance range should be symmetrical in \(\small{\log_{e}}\)-scale and consequently asymmetrical back-transformed).
These were their suggestions:

\[\left\{\theta_1,\theta_2\right\}=81.98-121.98\%\tag{9}\] \[\left\{\theta_1,\theta_2\right\}=\left\{100/(1+\Delta),100(1+\Delta)\right\}=8\dot{3}.33-120\%\tag{10}\] \[\left\{\theta_1,\theta_2\right\}=\left\{100(1-\Delta),100/(1-\Delta)\right\}=80-125\%\tag{11}\]

  • The argument of the first party was essentially:
    The width of the acceptance range was 40% and we have empiric evidence that the concept of bioequivalence ‘worked’ – let’s keep it.

  • The second party argued:
    Since that’s a new method we don’t want to face safety issues with a higher limit. Furthermore, a more restrictive lower limit prevents issues with insufficient efficacy.

  • The third party argued:
    80% as the lower limit served us well in the past. Hence, 125% is the way to go because it is simply the reciprocal of the lower limit and the coverage probability in the log-domain is the same like the one we had.
    Furthermore, these are nice numbers.

We all know which party prevailed. Can you imagine why?

previous section ↩︎

To round or not to round

How I love rounding of the confidence interval according to the guidelines!
All jurisdictions require the result in percent with two decimal figures, whereas Health Canada only one.

If we do that16 it will increase the patient’s risk. Why? Simple: If we would use the exact result, a study with a lower confidence limit of \(\small{79.995\%}\) would fail as one would with an upper confidence limit of \(\small{125.0049\%}\). Both would pass after rounding.

library(PowerTOST) # attach it to run the examples
# Cave: long runtime
set.seed(123456)
nsims  <- 1e5L # number of simulations
target <- 0.80 # target power
theta0 <- 0.95 # assumed T/R-ratio
CV     <- 0.25 # assumed CV
n      <- sampleN.TOST(CV = CV, theta0 = theta0,
                       targetpower = target,
                       print = FALSE)[["Sample size"]]
CV.sim <- mse2CV(CV2mse(CV) * rchisq(nsims, df = n - 2) / (n - 2))
pe.sim <- exp(rnorm(nsims, mean = log(theta0),
                           sd = sqrt(0.5 / n) * sqrt(CV.sim)))
df     <- data.frame(CV = CV.sim, PE = 100 * pe.sim,
                     lower = NA, upper = NA,
                     exact = FALSE, round2 = FALSE, round1 = FALSE)
for (i in 1:nsims) {
  df[i, 3:4] <- 100*CI.BE(CV = CV.sim[i], pe = pe.sim[i], n = n)
  if (df$lower[i] >= 80 & df$upper[i] <= 125) df$exact[i] <- TRUE
  if (round(df$lower[i], 2) >= 80 &
      round(df$upper[i], 2) <= 125) {
    df$round2[i] <- TRUE
  }
  if (round(df$lower[i], 1) >= 80 &
      round(df$upper[i], 1) <= 125) {
    df$round1[i] <- TRUE
  }
}
pass.exact  <- sum(df$exact)/nsims
pass.round2 <- sum(df$round2)/nsims
pass.round1 <- sum(df$round1)/nsims
cat("Percentage of", formatC(nsims, format = "d", big.mark = ","),
    "simulated studies",
    "\npassing the acceptance limits with the 90% CI",
    "\n  full precision:", sprintf("%.3f%%", 100*pass.exact),
    "\n  2 decimals    :", sprintf("%.3f%%", 100*pass.round2),
    "\n  1 decimal     :", sprintf("%.3f%%", 100*pass.round1), "\n")
# Percentage of 100,000 simulated studies 
# passing the acceptance limits with the 90% CI 
#   full precision: 80.341% 
#   2 decimals    : 80.375% 
#   1 decimal     : 80.605%

Splendid. Am I picky?
In all textbooks and publications the confidence interval inclusion approach is given as \[\begin{matrix}\tag{12} \theta_1=1-\Delta,\theta_2=\left(1-\Delta\right)^{-1}\\ H_0:\;\frac{\mu_\textrm{T}}{\mu_\textrm{R}}\ni\left\{\theta_1,\,\theta_2\right\}\;vs\;H_1:\;\theta_1<\frac{\mu_\textrm{T}}{\mu_\textrm{R}}<\theta_2, \end{matrix}\] where \(\small{H_0}\) is the null hypothesis of inequivalence and \(\small{H_1}\) the alternative hypothesis. \(\small{\theta_1}\) and \(\small{\theta_2}\) are the lower and upper limits of the acceptance range, and \(\small{\mu_\textrm{T}}\) and \(\small{\mu_\textrm{R}}\) the geometric least squares means of \(\small{\textrm{T}}\) and \(\small{\textrm{R}}\), respectively.
Alternatively by the TOST procedure:17 \[\begin{matrix}\tag{13} H_{01}:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}\leq \theta_1 & vs & H_{11}:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}>\theta_1\\ H_{02}:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}\geq \theta_2 & vs & H_{12}:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}<\theta_2 \end{matrix}\] Do you see anything about rounding here? I don’t. If you can point me to a publication which supports this regulatory ‘invention’, let me know.

In older releases of Phoenix/WinNonlin there was a verbatim statement based on the confidence interval in full precision like
Average bioequivalence shown for confidence=0.90 and percent=0.20
or
Failed to show average bioequivalence for confidence=0.90 and percent=0.20
After rounding according to the guidelines that may not be ‘correct’ any more. I mused about writing a one-sentence SOP:
»Delete the verbatim BE assessment from the output because it might contradict
rounded results and provoke questions.«

For good reasons Certara removed the text from the output in later releases.

90.00 – 111.11%

These limits are stated in most guideline when dealing with Narrow Therapeutic Index Drugs (NTIDs). Should we take the upper limit literally and not based on \(\small{\Delta=10\%}\)? \[\left\{\theta_1,\theta_2\right\}=\left\{100(1-\Delta),100/(1-\Delta)\right\}=90-\dot{1}11.11\%\tag{14}\] If yes, we would have to apply double rounding (of the CI and the upper limit).

90.0 – 112.0%

Health Canada18 requires for ‘critical dose drugs’ (i.e., NTIDs) that the confidence interval of AUC lies within \(\small{90.0-112.0\%}\). I always thought that \(\small{100/0.900=\dot{1}11.11}\).

Consequently, on the average products will be approved with \(\small{\sqrt{90.0\times112.0}\approx 100.4\%}\). When asked, the reply was:
»These numbers are more easy to remember.«

75.00 – 133.33%

A similar story, this time dealing with widening the limits for Highly Variable Drugs / Drug Products.19 20 Based on \(\small{\Delta=25\%}\): \[\left\{\theta_1,\theta_2\right\}=\left\{100(1-\Delta),100/(1-\Delta)\right\}=75-1\dot{3}3.33\%\tag{15}\] Double rounding again?

k 0.760

All jurisdictions accepting Average Bioequivalence with Expanding Limits (ABEL) for HVD(P)s give the regulatory constant \(\small{k}\) in the expansion formula \[\left\{\theta_1,\theta_2\right\}=\exp(\mp k\,\cdot s_\textrm{wR})\tag{16}\] with \(\small{0.760}\). Where does it come from?

Based on the switching \(\small{CV_0=30\%}\) we get \[k=\log_{e}1.25 \Big{/}\sqrt{\log_{e}(CV_0^{2}+1)}\approx 0.7601283\ldots\tag{17}\] Just another ‘nice’ number. If we would back-transform \(\small{k=0.760}\), the switching \(\small{CV_0}\) would be \(\small{30.00529\%}\) and not \(\small{30\%}\).21

Upper cap 57.4%

Health Canada gives the upper cap of scaling in ABEL with \(\small{CV_\textrm{wR}\;57.4\%}\) but state also that the expansion is limited to \(\small{66.7-150.0\%}\).22 ‘Nice’ numbers as usual but there’s a contradiction.

With \(\small{CV_\textrm{wR}\;57.4\%}\) we get \[\begin{matrix}\tag{18} s_\textrm{wR}=\sqrt{\log_{e}(0.574^2+1)}\approx 0.5336524\ldots\\ \left\{\theta_{\textrm{s}_1},\theta_{\textrm{s}_2}\right\}=\exp(\mp0.760\cdot s_\textrm{wR})\approx 66.65929-150.0166\% \end{matrix}\] In order to obtain \(\small{66.7-150.0\%}\), in the power and sample size functions of PowerTOST we decided to use a cap of \(\small{0.57382}\) instead.

hc <- reg_const(regulator = "HC")
print(hc); round(100*scABEL(CV = hc$CVcap, regulator = "HC"), 1)
# HC regulatory settings
# - CVswitch            = 0.3 
# - cap on scABEL if CVw(R) > 0.57382
# - regulatory constant = 0.76 
# - pe constraint applied
# lower upper 
#  66.7 150.0

It’s guesswork. We assumed that \(\small{66.7-150.0\%}\) is more ‘important’ than a cap at \(\small{57.4\%}\).

s0 0.294

A similar goody in the FDA’s Reference-Scaled Average Bioequivalence (RSABE). Again we have the switching \(\small{CV_0=30\%}\). But then \[s_0=\sqrt{\log_{e}(CV_0^{2}+1)}\approx 0.2935604\ldots\tag{19}\] I thought that there is a consensus about the classification of HVD(P)s, i.e., \(\small{CV_0\;30\%}\). Not for the FDA,23 giving the regulatory constant with \(\small{0.294}\), which translates to \(\small{30.04689\%}\).

σw0 0.25

There is a consensus about \(\small{CV_\textrm{wR}=30\%}\) classifying drugs as highly variable24 but ones with \(\small{CV_\textrm{wR}=25\%}\) were considered ‘problematic’.25

The FDA prefers the ‘nicer’ \(\small{\sigma_\textrm{w0}\;0.25}\), which translates to \(\small{25.39576\%}\).

Utopia

Instead of ‘nice’ numbers guidelines should state the formulas which are not more than elementary maths any eighth-grader could master. It’s not rocket science. We are not retarded.

All we need are numbers of \(\small{\Delta}\) (and \(\small{CV_0/\sigma_\textrm{w0}}\) in scaled average bioequivalence).
Everything else should be calculated and assessed with full precision.

License

CC BY 4.0 Helmut Schütz 2021
1st version March 30, 2021.
Rendered 2021-06-24 10:42:50 CEST by rmarkdown in 0.23 seconds.

Footnotes and References


  1. Labes D, Schütz H, Lang B. PowerTOST: Power and Sample Size for (Bio)Equivalence Studies. 2021-01-18. CRAN.↩︎

  2. Fisher RA. Statistical Methods for Research Workers. Edinburgh: Oliver & Boyd; 1925. Chapter III. Distributions.↩︎

  3. Canadian Health Protection Branch, the U.S. Food and Drug Administration, the United States Pharmacopeia. International Open Conference on Dissolution, Bioavailability, and Bioequivalence. Toronto. June 15–18, 1992.↩︎

  4. Steinijans VW, Hauschke D, Jonkman JHG. Controversies in Bioequivalence Studies. Clin Pharmacokinet. 1992; 22(4): 247–53. doi:10.2165/00003088-199222040-00001.↩︎

  5. Berger RL, Hsu JC. Bioequivalence Trials, Intersection-Union Tests and Equivalence Confidence Sets. Stat Sci. 1996; 11(4): 283–302. JSTOR:2246021.↩︎

  6. Levy G, Gibaldi M. Bioavailability of Drugs. Circulation. 1974; 49(3): 391–394. doi:10.1161/01.CIR.49.3.391.  Open Access.↩︎

  7. Skelly JP. A History of Biopharmaceutics in the Food and Drug Administration 1968–1993. AAPS J. 2010; 12(1): 44–50. doi:10.1208/s12248-009-9154-8. PMC Free Full Text Free Full Text.↩︎

  8. U.S. FDA, CDER. NDA 204-412. Clinical Pharmacology and Biopharmaceutics Review(s). Reference ID: 3244307.↩︎

  9. American Association of Pharmaceutical Scientists, U.S. Food and Drug Administration, Federation International Pharmaceutique, Health Protection Branch (Canada), Association of Official Analytical Chemists. Analytical Methods Validation: Bioavailability, Bioequivalence and Pharmacokinetic Studies. Arlington. December 3–5, 1990.↩︎

  10. Think about paracetamol/acetaminophen. With its molecular mass 151.165 g·mol–1 and \(\small{f}\) 0.75 after a 500 mg oral dose we start with ~4.94·1021 molecules in the circulation. Given its half life of 2½ hours after one week ≈35 molecules happily float around.↩︎

  11. The declared content \(\small{\neq}\) the measured one. Furthermore, we can never be sure that the measured content is the true one. Don’t forget analytical (in)accuracy and (im)precision. BTW, our method was validated for the test product and not for the reference. A dose-correction is only acceptable for Health Canada and under certain conditions for the EMA.
    The fact that clearances are not identical inflates the confidence interval, esp. for highly variable drugs.↩︎

  12. McGilveray IJ, Dighe SV, French IW, Midha KK (eds). Bio­International ’89. Issues in the Evaluation of Bio­availability Data. Toronto. October 1–4, 1989.↩︎

  13. Keene ON. The log transformation is special. Stat Med. 1995; 14(8): 811–9. doi:10.1002/sim.4780140810.  Open Access.↩︎

  14. Mantel N. Do We Want Confidence Intervals Symmetrical About the Null Value? Biometrics. 1977; 33(4): 759–60.↩︎

  15. Kirkwood TBL. Bioequivalence Testing – A Need to Rethink. Biometrics. 1981; 37(3): 589–91. doi:10.2307/2530573.↩︎

  16. I can imagine what people would say:
    »Of course, we do. It’s written in the holy books and the probability of passing increases. If regulators don’t care about an inflated Type I Error, why should we?«↩︎

  17. Schuirmann DJ. A comparison of the Two One-Sided Tests Procedure and the Power Approach for Assessing the Equivalence of Average Bioavailability. J Pharmacokin Biopharm. 1987; 15(6): 657–80. doi:10.1007/BF01068419.↩︎

  18. Health Canada. Guidance Document. Comparative Bioavailability Standards: Formulations Used for Systemic Effects. Pub.:170501. Ottawa. 2018/06/08.↩︎

  19. Executive Board of the Health Ministers’ Council for GCC States. The GCC Guidelines for Bioequivalence. March 2016. online.↩︎

  20. Medicines Control Council. Registration of Medicines. Biostudies. Pretoria. June 2015. online.↩︎

  21. Karalis V, Symillides M, Macheras P. On the leveling-off properties of the new bioequivalence limits for highly variable drugs of the EMA guideline. Eur J Pharm Sci. 2011; 44: 497–505. doi:10.1016/j.ejps.2011.09.008.↩︎

  22. Health Canada. Therapeutic Products Directorate. Notice: Policy on Bioequivalence Standards for Highly Variable Drug Products. File number 16-104293-140. Ottawa. April 18, 2016.↩︎

  23. U.S. FDA, OGD. Draft Guidance on Progesterone. Recommended Apr 2010, Revised Feb 2011. download.↩︎

  24. McGilveray IJ, Midha KK, Skelly J, Dighe S, Doluisio JT, French IW, Karim A, Burford R. Consensus Report from “Bio International ’89”: Issues in the Evaluation of Bioavailability Data. J Pharm Sci. 1990, 79(10): 945–6. doi:10.1002/jps.2600791022.↩︎

  25. Blume HH, Midha KK. Conference Report. Bio-International 92. Conference on Bioavailability, Bioequivalence and Pharmacokinetic Studies. Bad Homburg, Germany, May 20–22, 1992. In: Midha KK, Blume HH (eds). Bio-International. Bioavailability, Bioequivalence and Pharmacokinetics. Stuttgart. medpharm; 1993.↩︎