Consider allowing JavaScript. Otherwise, you have to be proficient in reading LaTeX since formulas will not be rendered. Furthermore, the table of contents in the left column for navigation will not be available and code-folding not supported. Sorry for the inconvenience.

Examples in this article were generated with R 4.0.5 by the package PowerTOST.1

See also a collection of other articles.

  • Click to show / hide R code.

Introduction

Have you ever wondered where all these nice numbers come from?

\(\small{\alpha\;0.05}\), bioequivalence limits 80.00 – 125.00%, switching to SABE at \(\small{CV_\textrm{wR}\;30\%}\), upper cap of scaling at \(\small{CV_\textrm{wR}\;50\%}\), \(\small{k\;0.760}\), \(\small{s_\textrm{wR}\;0.294}\)

The list is endless and as interesting as a telephone directory.

What the Heck?

α 0.05

Why this number? OK, at least this one is easy to explain: Mr Fisher considered it convenient.

The value for which p = 0.05, or 1 in 20, is 1.96 or nearly 2; it is convenient to take this point as a limit in judging whether a deviation is to be considered significant or not.
Ronald A. Fisher, Statistical Methods for Research Workers. 1925.

For nearly a century this number is the holy grail of frequentists. Members of the other chuch, the Bayesians, hold other believes.

90% Confidence Interval

Did you ever wonder why (based on Mr Fisher’s \(\small{\alpha=0.05}\)) we use a \(\small{100(1-2\alpha)=90\%}\) confidence interval in bioequivalence and not the \(\small{95\%}\) CI like in clinical Phase III?

In Phase III we want to demonstrate superiority (for nitpickers: non-inferiority) of verum to placebo. Once a drug is approved, we can be 95% sure that it is safe and efficacious (though there is a 5% chance that even a mega-selling blockbuster drug is not better than snake-oil).

In the early days of BE a \(\small{95\%}\) CI was applied indeed. Below a scan of conference proceedings.2


Fig. 1 Note my handwritten comment (“generell” is German for generally).
The guideline contained five text passages stating
»The 95% confidence interval…«

So why did we change to the \(\small{90\%}\) CI? It controls the risk for the population of patients. In a particular patient the bioavailability can be either too low or too high but evidently not at the same time. Therefore, the risk for a particular patient does not exceed 5%.3

If we would have kept the \(\small{95\%}\) CI, the patient’s risk would be only 2.5%. Fine in principle but we are only 95% confident that the reference works at all. Hence, keeping the \(\small{95\%}\) CI would have been double standards.

80 – 125%

In the early days of bioequivalence, the analysis was performed on raw (untransformed) data, assuming an additive model. Based on a clinically not relevant difference \(\small{\Delta=20\%}\) one gets: \[\left\{\theta_1,\theta_2\right\}=\left\{100(1-\Delta),100(1+\Delta)\right\}=80-120\%\tag{1}\] So far, so good.

Some argued that concentrations and derived pharmacokinetic metrics like the Area Under the Curve (AUC) do not follow a normal distribution – which is a prerequiste of the ANOVA. Makes sense.
If one works with the arithmetic mean and its variance, it implies that there is a certain probability of negative values. The domain of \(\small{\mathcal{N}(\mu;\sigma^2)}\) is \(\small{[-\infty<x+\infty]}\) for \(\small{x\in \mathbb{R}}\).

A fascinating example of the FDA:4

Fig. 2 Arithmetic mean ± SD (n 238) in Excel.
Sampling times: pre-dose, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14, 16, 24, 30, 36, and 48 h post-dose

Line plot instead of XY plot. Oh dear, clicked the first button!
One hour intervals in the beginning are as wide as the 12 hours at the end. Do these guys and dolls really believe that at seven hours there’s a ~16% probability that concentrations are ≤–232 ng/mL and a ~1% probability that concentrations are ≤–731 ng/mL‽ Any statistic implies an underlying distribution.
Which cult of Pastafarianism do they belong to? The one holding that negative mass exist or the one believing in negative lengths?

Exploring large data sets we see that the distribution is skewed to the right. The lognormal distribution does not only pull the right tail in but also with its domain of \(\small{]0<x+\infty]}\) for \(\small{x\in \mathbb{R}^{+}}\) avoids the ‘problem’ of negative concentrations.

Fig. 3 Geometric mean ± SDgeom (n 238).

This plot reflects the extreme variability of the drug much better and shows that high concentrations are more likely than low ones.

After a dose we know only one thing for sure: The concentration is not zero.
Harold Boxenbaum5

His statement ended in shouting matches. Don’t know why.6

<nitpick>
  • The distribution of data per se is not important. Only the model’s residual errors as estimates of ε have to be normally distributed. However, exploring large data sets again, we see that when performing the analysis on untransformed data that they aren’t.

</nitpick>

Another argument is based on the fundamental equation of pharmacokinetics. \[AUC=\frac{f\cdot D}{CL},\tag{2}\] where \(\small{f}\) is the fraction absorbed, \(\small{D}\) the dose, and \(\small{CL}\) the clearance.
When we want the compare the bioavailabilities of two drugs (\(\small{f_\textrm{T},f_\textrm{R}}\)) we arrive at \[\frac{f_\textrm{T}}{f_\textrm{R}}=\frac{AUC_\textrm{T}\cdot CL_\textrm{T}}{D_\textrm{T}}\Big{/}\frac{AUC_\textrm{R}\cdot CL_\textrm{R}}{D_\textrm{R}}\tag{3}\] By assuming7 identical doses and clearances (\(\small{D_\textrm{T}\equiv D_\textrm{R}, CL_\textrm{T}\equiv CL_\textrm{R}}\)) we can cancel them out \[\require{cancel}\frac{f_\textrm{T}}{f_\textrm{R}}=\frac{AUC_\textrm{T}\cdot \cancel{CL_\textrm{T}}}{\cancel{D_\textrm{T}}}\Big{/}\frac{AUC_\textrm{R}\cdot \cancel{CL_\textrm{R}}}{\cancel{D_\textrm{R}}}\tag{4}\] to get what we want: \[\frac{f_\textrm{T}}{f_\textrm{R}}=\frac{AUC_\textrm{T}}{AUC_\textrm{R}}.\tag{5}\] Hey, that’s a ratio and we have to use a multiplicative model.

But we need differences in the ANOVA (which is an additive model)… Nothing easier than that. \((2)\) can be rewritten to \[\log_{e}AUC=\log_{e}f\,+\log_{e}D\,-\log_{e}CL.\tag{6}\] After the same substitutions and cancelations as in \((3)\) and \((4)\), perform the analysis \[\log_{e}PE=\log_{e}f_T\,-\log_{e}f_T=\log_{e}AUC_\textrm{T}\,-\log_{e}AUC_\textrm{R},\tag{7}\] and use \(\small{\exp(\log_{e}PE)}\) to get with \(\small{PE}\) a number clinicians can comprehend.

At the first BioInternational conference (Toronto, 1989) there was a poll among the participants about the log-transformation. Outcome: ⅓ never, ⅓ always, ⅓ case by case (i.e., perform both analyses and report the one with narrower confidence interval ‘because it fits the data better’). Let’s be silent about the last camp.

Wait a minute! Our original acceptance range was symmetrical around 100%. OK, in \(\small{\log_{e}}\)-scale it should be symmetrical around 0 (because \(\small{\log_{e}1=0}\)).
What happens to our \(\small{\Delta}\) which should still be 20%? Due to the positive skewness of the lognormal distribution a [heated debate] lively discussion started after early publications proposing 80 – 125%.8 9 Keeping 80 – 120% would be flawed because the maximum power should be obtained at \(\small{\mu_\textrm{T}/\mu_\textrm{R}=1}\) for \[\exp\left((\log_{e}\theta_1+\log_{e}\theta_2)/2\right)\tag{8}\] which works only for \(\small{\theta_2=\theta_1^{-1}}\) or \(\small{\theta_1=\theta_2^{-1}}\). Keeping the original limits, maximum power would be obtained at \(\small{\mu_\textrm{T}/\mu_\textrm{T}=\exp((\log_{e}0.8+\log_{e}1.2)/2)\approx0.979796}\).

Fig. 2 Power curve for {θ1 0.80, θ2 1.20}, 2×2×2 design, n 28.
Note that the x-axis is in log-scale.

There were three parties (all agreed that the acceptance range should be symmetrical in \(\small{\log_{e}}\)-scale and consequently asymmetrical back-transformed).
These were their suggestions:

\[\left\{\theta_1,\theta_2\right\}=81.98-121.98\%\tag{9}\] \[\left\{\theta_1,\theta_2\right\}=\left\{100/(1+\Delta),100(1+\Delta)\right\}=8\dot{3}.33-120\%\tag{10}\] \[\left\{\theta_1,\theta_2\right\}=\left\{100(1-\Delta),100/(1-\Delta)\right\}=80-125\%\tag{11}\]

  • The argument of the first party was essentially:
    The width of the acceptance range was 40% and we have empiric evidence that the concept of bioequivalence ‘worked’ – let’s keep it.

  • The second party argued:
    Since that’s a new method we don’t want to face safety issues with a higher limit. Furthermore, a more restrictive lower limit prevents issues with insufficient efficacy.

  • The third party argued:
    80% as the lower limit served us well in the past. Hence, 125% is the way to go because it is simply the reciprocal of the lower limit and the coverage probability in the log-domain is the same like the one we had. Furthermore, these are nice numbers.

We all know which party prevailed. Can you imagine why?

previous section ↩︎

To round or not to round

How I love rounding of the confidence interval according to the guidelines! All jurisdictions require two decimal figures, whereas Health Canada only one.

If we do that10 it will increase the patient’s risk. Why? Simple: If we would use the exact result a study with a lower confidence limit of 79.995% would fail as one would with an upper confidence limit of 125.0049%. Both would pass after rounding.

library(PowerTOST) # attach it to run the examples
set.seed(123456)
nsims  <- 1e5L # number of simulations
target <- 0.80 # target power
theta0 <- 0.95 # assumed T/R-ratio
CV     <- 0.25 # assumed CV
n      <- sampleN.TOST(CV = CV, theta0 = theta0,
                       targetpower = target,
                       print = FALSE)[["Sample size"]]
CV.sim <- mse2CV(CV2mse(CV) * rchisq(nsims, df = n - 2) / (n - 2))
pe.sim <- exp(rnorm(nsims, mean = log(theta0),
                           sd = sqrt(0.5 / n) * sqrt(CV.sim)))
df     <- data.frame(CV = CV.sim, PE = 100 * pe.sim,
                     lower = NA, upper = NA,
                     exact = FALSE, round2 = FALSE,
                     round1 = FALSE)
for (i in 1:nsims) {
  df[i, 3:4] <- 100*CI.BE(CV = CV.sim[i], pe = pe.sim[i], n = n)
  if (df$lower[i] >= 80 & df$upper[i] <= 125) {
    df$exact[i] <- TRUE
  }
  if (round(df$lower[i], 2) >= 80 &
      round(df$upper[i], 2) <= 125) {
    df$round2[i] <- TRUE
  }
  if (round(df$lower[i], 1) >= 80 &
      round(df$upper[i], 1) <= 125) {
    df$round1[i] <- TRUE
  }
}
pass.exact  <- sum(df$exact)/nsims
pass.round2 <- sum(df$round2)/nsims
pass.round1 <- sum(df$round1)/nsims
cat("Percentage of", nsims, "simulated studies",
    "\npassing the acceptance limits with the 90% CI",
    "\n  full precision:", sprintf("%.3f%%", 100*pass.exact),
    "\n  2 decimals    :", sprintf("%.3f%%", 100*pass.round2),
    "\n  1 decimal     :", sprintf("%.3f%%", 100*pass.round1), "\n")
R> Percentage of 100000 simulated studies 
R> passing the acceptance limits with the 90% CI 
R>   full precision: 80.341% 
R>   2 decimals    : 80.375% 
R>   1 decimal     : 80.605%

Splendid. Am I picky? In all textbooks and publications the confidence interval inclusion approach of ABE is \[\begin{matrix}\tag{12} \theta_1=1-\Delta,\theta_2=\left(1-\Delta\right)^{-1}\\ H_0:\;\frac{\mu_\textrm{T}}{\mu_\textrm{R}}\ni\left\{\theta_1,\,\theta_2\right\}\;vs\;H_1:\;\theta_1<\frac{\mu_\textrm{T}}{\mu_\textrm{R}}<\theta_2, \end{matrix}\] where \(\small{H_0}\) is the null hypothesis of inequivalence and \(\small{H_1}\) the alternative hypothesis, \(\small{\theta_1}\) and \(\small{\theta_2}\) the lower and upper limits of the acceptance range, and \(\small{\mu_\textrm{T}}\) and \(\small{\mu_\textrm{R}}\) the geometric least squares means of \(\small{\textrm{T}}\) and \(\small{\textrm{R}}\), respectively. Do you see anything about rounding here? I don’t. If you can point me to a publication which supports this regulatory ‘invention’, let me know.

In older releases of Phoenix/WinNonlin there was a verbatim statement based on the CI in full precision like
Average bioequivalence shown for confidence=0.90 and percent=0.20 or
Failed to show average bioequivalence for confidence=0.90 and percent=0.20

After rounding according to the guidelines that may not be ‘correct’ any more. I considered to write a one-sentence SOP:
  • »Delete the verbatim BE assessment from the output because it might contradict
    rounded results and provoke questions.«

For good reasons Certara removed the text from the output in later releases.

90.00 – 111.11%

Well, these limits are stated in most guideline when dealing with NTIDs. Should we take the upper limit literally and not based on \(\small{\Delta=10\%}\)? \[\left\{\theta_1,\theta_2\right\}=\left\{100(1-\Delta),100/(1-\Delta)\right\}=90-1\dot{1}1.11\%\tag{13}\] If yes, we would have to apply double rounding (of the CI and the upper limit).

90.0 – 112.0%

Health Canada requires for critical dose drugs that the confidence interval of AUC lies within 90.0 – 112.0%. I always thought that \(\small{100/0.9=1\dot{1}1.11}\).

Consequently, on the average products will be approved with \(\small{\sqrt{90.0\times112.0}\approx 100.4\%}\). When asked, the reply was:
»These numbers are more easy to remember.«

75.00 – 133.33%

Similar story, this time dealing with widening the limits for HVD(P)s.11 12 Based on \(\small{\Delta=25\%}\): \[\left\{\theta_1,\theta_2\right\}=\left\{100(1-\Delta),100/(1-\Delta)\right\}=75-1\dot{3}3.33\%\tag{14}\] Double rounding again?

k = 0.760

All jurisdictions accepting ABEL give the regulatory constant \(\small{k}\) with 0.760. How come?

Based on the switching \(\small{CV_0=30\%}\) we get \[k=\log_{e}1.25 \Big{/}\sqrt{\log_{e}(CV_0^{2}+1)}\approx 0.7601283\ldots.\tag{15}\] Just another ‘nice’ number. If we would back-transform \(\small{k=0.760}\), the switching \(\small{CV_\textrm{wR}}\) would be 30.00529% and not 30%.13

Upper cap 57.4%

Health Canada gives the upper cap of scaling with \(\small{CV_\textrm{wR}\;57.4\%}\) but state also that the expansion is limited to \(\small{66.7-150.0\%}\).14 ‘Nice’ numbers as usual but there’s a contradiction.

With \(\small{CV_\textrm{wR}\;57.4\%}\) we get \[\begin{matrix}\tag{16} s_\textrm{wR}=\sqrt{\log_{e}(0.574^2+1)}\approx 0.5336524\ldots\\ \left\{\theta_{\textrm{s}_1},\theta_{\textrm{s}_2}\right\}=\exp(\mp0.760\cdot s_\textrm{wR})\approx 66.65929-150.0166\% \end{matrix}\] In the power and sample size functions of PowerTOST we decided to use a cap of 0.57382 instead in order to obtain 66.7–150.0%.

hc <- reg_const(regulator = "HC")
print(hc); round(100*scABEL(CV = hc$CVcap, regulator = "HC"), 1)
R> HC regulatory settings
R> - CVswitch            = 0.3 
R> - cap on scABEL if CVw(R) > 0.57382
R> - regulatory constant = 0.76 
R> - pe constraint applied
R> lower upper 
R>  66.7 150.0

It’s guesswork. We assumed that 66.7–150.0% is more ‘important’ than a cap at 57.4%.

s0 = 0.294

A similar goody from RSABE. Again we have the switching \(\small{CV_0=30\%}\). But then \[s_0=\sqrt{\log_{e}(CV_0^{2}+1)}\approx 0.2935604\ldots.\tag{17}\] I thought that there is a consensus about the classification of HVD(P)s, i.e., \(\small{CV_\textrm{wR}\;30\%}\). Not for the FDA:15 30.04689%.

σw0 = 0.25

At the first Bio-International conference (Toronto, 1989) there was a consensus about \(\small{CV_\textrm{wR}\;30\%}\) but \(\small{CV_\textrm{wR}\;25\%}\) was considered ‘problematic’.

The FDA prefers the ‘nicer’ \(\small{\sigma_\textrm{w0}\;0.25}\), which means \(\small{CV_\textrm{wR}\;25.39576\%}\).

Utopia

Instead of ‘nice’ numbers guidelines should state the formulas which are not more than elementary maths any eighth-grader could master. It’s not rocket science. We are not retarded.

All we need are numbers of \(\small{\Delta}\) (and \(\small{CV_0}\) in scaled average bioequivalence).
Everything else should be calculated and assessed with full precision.

License

CC BY 4.0 Helmut Schütz 2021
1st version March 30, 2021.
Rendered 2021-04-06 17:36:23 CEST by rmarkdown in 0.19 seconds.

Footnotes and References


  1. Labes D, Schütz H, Lang B. PowerTOST: Power and Sample Size for (Bio)Equivalence Studies. 2021-01-18. CRAN.↩︎

  2. Canadian Health Protection Branch, the U.S. Food and Drug Administration, the United States Pharmacopeia. International Open Conference on Dissolution, Bioavailability, and Bioequivalence. Toronto, June 15–18, 1992.↩︎

  3. Steinijans VW, Hauschke D, Jonkman JHG. Controversies in Bioequivalence Studies. Clin Pharmacokinet. 1992; 22(4): 247–53. doi:10.2165/00003088-199222040-00001.↩︎

  4. FDA. Center for drug Evaluation and Research. NDA 204-412. Clinical Pharmacology and Biopharmaceutics Review(s). Reference ID: 3244307.↩︎

  5. American Association of Pharmaceutical Scientists, U.S. Food and Drug Administration, Federation International Pharmaceutique, Health Protection Branch (Canada), Association of Official Analytical Chemists. Analytical Methods Validation: Bioavailability, Bioequivalence and Pharmacokinetic Studies. Arlington, December 3–5, 1990.↩︎

  6. Think about paracetamol/acetaminophen. With its molecular mass 151.165 g·mol–1 and ƒ 0.75 after a 500 mg oral dose we start with ~4.94·1021 molecules in the circulation. Given its half life of 2½ hours after one week ≈35 molecules happily float around.↩︎

  7. The declared content \(\small{\neq}\) the measured one. Furthermore, we can never be sure that the measured content is the true one. Don’t forget analytical (in)accuracy and (im)precision. BTW, our method was validated for the test product and not for the reference. A dose-correction is only acceptable for Health Canada and under certain conditions for the EMA.
    The fact that clearances are not identical inflates the confidence interval, esp. for highly variable drugs.↩︎

  8. Mantel N. Do We Want Confidence Intervals Symmetrical About the Null Value? Biometrics. 1977; 33(4): 759–60.↩︎

  9. Kirkwood TBL. Bioequivalence Testing – A Need to Rethink. Biometrics. 1981; 37(3): 589–91. doi:10.2307/2530573.↩︎

  10. I can imagine what people would say: »Of course, we do. It’s written in the holy books and the probability of passing increases. If regulators don’t care about an inflated Type I Error, why should we?«↩︎

  11. Executive Board of the Health Ministers’ Council for GCC States. The GCC Guidelines for Bioequivalence. March 2016. online.↩︎

  12. Medicines Control Council. Registration of Medicines. Biostudies. Pretoria, June 2015. online.↩︎

  13. Karalis V, Symillides M, Macheras P. On the leveling-off properties of the new bioequivalence limits for highly variable drugs of the EMA guideline. Eur J Pharm Sci. 2011; 44: 497–505. doi:10.1016/j.ejps.2011.09.008.↩︎

  14. Health Canada. Therapeutic Products Directorate. Notice: Policy on Bioequivalence Standards for Highly Variable Drug Products. Ottawa, April 18, 2016. File number 16-104293-140.↩︎

  15. FDA. Office of Generic Drugs. Draft Guidance on Progesterone. Recommended Apr 2010; Revised Feb 2011. download.↩︎