Consider allowing JavaScript. Otherwise, you have to be proficient in reading LaTeX since formulas will not be rendered. Furthermore, the table of contents in the left column for navigation will not be available. Sorry for the inconvenience.

The right-hand badges give the respective section’s ‘level’.

Basics requiring no or only limited statistical expertise.

These sections are the most important ones. They are – hopefully – easily comprehensible even for novices. A basic knowledge of R does not hurt.

A somewhat higher knowledge of statistics and/or R is required. May be skipped or reserved for a later reading.

Click to show / hide R code.
To copy R code to the clipboard click on the icon in the top left corner.

Introduction

If this article is perceived as overly focused on statistics, I apologize. This is due to my professional background, which has led me to be less skilled at crafting engaging narratives.

‘Bioavailability’ (a portmanteau of ‘biologic availability’) in its current meaning was coined in 1973¹ and ‘Bioequivalence’ saw the light of day in 1975.²

The MeSH term ‘Biological Availability’ was introduced in 1979.

The extent to which the active ingredient of a drug dosage form becomes available at the site of drug action or in a biological medium believed to reflect accessibility to a site of action.

The site of action (i.e., a receptor) is inaccessible. There should be no space for believes in science. The best definition of bioequivalence is given by the ICH.³

“Two drug products containing the same drug substance(s) are considered bioequivalent if their relative bioavailability (BA) (rate and extent of drug absorption) after administration in the same molar dose lies within acceptable predefined limits. These limits are set to ensure comparable in vivo performance, i.e., similarity in terms of safety and efficacy.
ICH (2020)³

We will use a simple example in the following. A two-treatment two-sequence two-period (2×2×2) crossover design, subjects 1–6 were in sequence \(\small{\text{TR}}\) and subjects 7–12 in sequence \(\small{\text{RT}}\). \[\small{\begin{array}{ccc} \textsf{Table I}\phantom{0}\\ \text{subject} & \text{T} & \text{R}\\\hline \phantom{1}1 & 71 & 81\\ \phantom{1}2 & 61 & 65\\ \phantom{1}3 & 80 & 94\\ \phantom{1}4 & 66 & 74\\ \phantom{1}5 & 94 & 54\\ \phantom{1}6 & 97 & 63\\ \phantom{1}7 & 70 & 85\\ \phantom{1}8 & 76 & 90\\ \phantom{1}9 & 54 & 53\\ 10 & 99 & 56\\ 11 & 83 & 90\\ 12 & 51 & 68\\\hline \end{array}}\]

top of section ↩︎

The 1970s

Problems were reported with formulations of Narrow Therapeutic Index Drugs (NTIDs) like phenytoin,⁴ ⁵ ⁶ ⁷ digoxin,¹ ⁸ ⁹ warfarin,¹⁰ theophylline,¹¹ primidone.¹² Some show nonlinear pharmacokinetics (phenytoin) or are auto-inducers (warfarin).

Excipient changed from CaS0₄ to lactose^{5 6}
The API was altered (e.g., particle size,^{7 9} amorphous to crystalline¹⁰)
Variable disintegration time
Dissolution testing not mandatory
No in vivo studies were performed comparing the new to the approved formulation
Breakthrough-seizures⁴ and intoxications^{5 6} (phenytoin) and variable or poor effect (digoxin, theophylline)

Generic drugs in the current sense did not yet exist at that time; only the content had to meet the USP requirements.

“Although in 1969 Professor John Wagner demonstrated to the Bureau of Medicine, methods for comparing areas under the serum versus time curve (AUC) to estimate bioequivalence, his approach was ignored inasmuch as the FDA hierarchy did not believe a problem existed, and therefore such studies would not be necessary. For their part the Offices of Pharmaceutical Research and Compliance in the Bureau of Medicine and the Commissioner’s Office believed that the “Bioavailability Problem” as some called it was a “Content Uniformity Problem”.¹³ In 1971 for example, when notified of a “Bioavailability Problem” with a generic digoxin product, FDA investigated and ascertained that one manufacturer first added all the excipients into a 55-gal drum, then added digoxin, closed the lid, and mixed it by rolling the drum across the floor a few times. The content uniformity of those tablets varied from 10% to 156%.
Jerome Philip Skelly (2010)¹⁴

Following a ‘Conference on Bioavailability of Drugs’ held at the National Academy of Sciences of the United States in 1971, a guideline was published the following year.¹⁵

“[…] the mean of AUC of the generic had to be within 20% of the mean AUC of the approved product. At first this was determined by using serum versus time plots on specially weighted paper, cutting the plot out and then weighing each separately.
Jerome Philip Skelly (2010)¹⁴

top of section ↩︎ previous section ↩︎

80/20 Rule

The FDA’s 80/20 Rule or ‘Power Approach’ (at least 80% power to detect a 20% difference) of 1972 consisted of testing the hypothesis of no difference at the \(\small{\alpha=0.05}\) level of significance.¹⁴ ¹⁶ \[H_0:\;\mu_\text{T}-\mu_\text{R}=0\;vs\;H_1:\;\mu_\text{T}-\mu_\text{R}\neq 0,\tag{1}\] where \(\small{H_0}\) is the null hypothesis of equivalence and \(\small{H_1}\) the alternative hypothesis of inequivalence. \(\small{\mu_\text{T}}\) and \(\small{\mu_\text{R}}\) are the (true) means of \(\small{\text{T}}\) and \(\small{\text{R}}\), respectively. In order to pass the test, the estimated (post hoc, a posteriori, retrospective) power had to be at least 80%. The power depends on the true value of \(\small{\sigma}\), which is unknown. There exists a value of \(\small{\sigma_{\,0.80}}\) such that if \(\small{\sigma\leq\sigma_{\,0.80}}\), the power of the test of no difference \(\small{H_0}\) is greater or equal to 0.80. Since \(\small{\sigma}\) is unknown, it has to be approximated by the sample standard deviation \(\small{s}\). The Power Approach in a simple 2×2×2 crossover design then consists of rejecting \(\small{H_0}\) and concluding that \({\small{\mu_\text{T}}}\) and \({\small{\mu_\text{R}}}\) are equivalent if \[-t_{1-\alpha/2,\nu}\leq\frac{\bar{x}_\text{T}-\bar{x}_\text{R}}{s\sqrt{\tfrac{1}{2}\left(\tfrac{1}{n_1}+\tfrac{1}{n_2}\right)}}\leq t_{1-\alpha/2,\nu}\:\text{and}\:s\leq\sigma_{0.80},\tag{2}\] where \(\small{n_1,\,n_2}\) are the number of subjects in sequences 1 and 2, the degrees of freedom \(\small{\nu=n_1+n_2-2}\), and \(\small{\bar{x}_\text{T}\,,\bar{x}_\text{R}}\) are the means of \(\small{\text{T}}\) and \(\small{\text{R}}\), respectively.
Note that this procedure is based on estimated power \(\small{\widehat{\pi}}\), since the true power is a function of the unknown \(\small{\sigma}\). It was the only approach based on post hoc power and was never implemented in any other jurisdiction.

For the example we estimate a power of only 47.2% to detect a 20% difference and the study would fail.

First proposals by the biostatistical community were published.¹⁷ ¹⁸ ¹⁹ ²⁰

top of section ↩︎ previous section ↩︎

95% CI

The analysis was performed on untransformed data (i.e., by an additive model assuming normal distributed data) and bioequivalence was concluded if the 95% confidence interval (CI) of the point estimate (PE) was entirely within 80 – 120%.

We get for our example in R:

example          <- data.frame(subject   = rep(1:12, each = 2),
                               sequence  = c(rep("TR", 12), rep("RT", 12)),
                               treatment = c(rep(c("T", "R"), 6),
                                             rep(c("R", "T"), 6)),
                               period    = rep(1:2, 12),
                               Y         = c(71, 81, 61, 65, 80, 94,
                                             66, 74, 94, 54, 97, 63,
                                             85, 70, 90, 76, 54, 53,
                                             56, 99, 90, 83, 68, 51))
factors          <- c("subject", "period", "treatment")
example[factors] <- lapply(example[factors], factor) # factorize the data
# additive model (untransformed data, differences); sequence not in the model!
muddle           <- lm(Y ~ subject + period + treatment, data = example)
CI               <- as.numeric(confint(muddle, level = 0.95)["treatmentT", ])
PE               <- coef(muddle)[["treatmentT"]]
# Percentages (flawed!)
mean.T           <- mean(example$Y[example$treatment == "T"])
mean.R           <- mean(example$Y[example$treatment == "R"])
PE.pct           <- 100 * mean.T / mean.R
CI.pct           <- 100 * (CI + mean.R) / mean.R
result           <- data.frame(method = c("differences", "percentages"),
                               PE = c(sprintf("%+.3f", PE),
                                      sprintf("%6.2f%%",  PE.pct)),
                               lower = c(sprintf("%+.3f", CI[1]),
                                         sprintf("%.2f%%",  CI.pct[1])),
                               upper = c(sprintf("%+.3f", CI[2]),
                                         sprintf("%6.2f%%",  CI.pct[2])),
                               BE = c("", "fail"))
if (CI.pct[1] >= 80 & CI.pct[2] <= 120) result$BE[2] <- "pass"
names(result)[3:4] <- c("lower CL", "upper CL")
print(result, row.names = FALSE)

#       method      PE lower CL upper CL   BE
#  differences  +2.250  -12.807  +17.307     
#  percentages 103.09%   82.42%  123.76% fail

If data are analyzed by an additive model the result are differences. It is a fundamental error to naïvely transform differences to percentages – it would require Fieller’s CI.²¹ ²² However, this was not done back in the day. We get a 95% CI of 82.42 – 123.76%, and the study would fail because the upper confidence limit (CL) is > 120%.

top of section ↩︎ previous section ↩︎

Westlake’s CI

Westlake¹⁸ mused that the shortest CI – which is symmetrical about the PE – would be too difficult to comprehend by non-statisticians. He suggested to split the t-values in such a way that the probability of the two tails sums to \(\small{\alpha}\) and the respective CI is symmetrical around 0 (or 100%). In the example we obtain ±21.48%, and the study would fail as well because the confidence limits are > ±20%. As above, calculating a percentage is flawed.

However, such a result is misleading. The information about the location of the difference is lost; one cannot know any more whether the BA of \(\small{\text{T}}\) is lower or higher than the one of \(\small{\text{R}}\). Therefore, the method was criticized¹⁹ and never implemented in practice. It took me years to convince Certara to remove Westlake’s CI from the results in Phoenix WinNonlin. In 2016, I was successful with version 6.4… Since then the differences are given in the additive model.

top of section ↩︎ previous section ↩︎

The Roaring 1980s

The generic boom started 1984 in the U.S. with the ‘Drug Price Competition and Patent Term Restoration Act’ (informally known as ‘Hatch-Waxman Act’).²³

The approval process was different for innovator (originator) and generic companies.

Innovators:

Preclinical data
Documentation of pharmaceutical quality
In clinical phase I documentation of pharmacokinetics (PK) in healthy subjects, dose finding, safety / tolerability, food effect
In phase II efficacy & safety in a small groups of patients
In phase III demonstration of efficacy & safety versus placebo in well-powered studies:
Efficacy: Non-Inferiority/Superiority
Safety: Non-Superiority

Generic companies:

Documentation of pharmaceutical quality
Not required:
- Any in vivo study
- Sometimes comparison of disintegration, rarely comparison of dissolution was performed

Regulatory concerns about generic substitution arose, leading to extensive discussions which method could be used to compare formulations.

Pharmaceutical equivalence
Bioequivalence (BE)
Therapeutic equivalence

There was an early agreement that pharmaceutical equivalence is too permissive and therapeutic equivalence would require extremely large studies in patients.²⁴ Hence, comparing the bioavailability (BA) in healthy volunteers seemed to be a reasonable compromise.¹⁷

“What is the justification for studying bioequivalence in healthy volunteers?
“Variability is the enemy of therapeutics” and is also the enemy of bioequivalence. We are trying to determine if two dosage forms of the same drug behave similarly. Therefore we want to keep any other variability not due to the dosage forms at a minimum. We choose the least variable “test tube”, that is, a healthy volunteer.
Disease states can definitely change bioavailability, but we are testing for bioequivalence, not bioavailability.
Leslie Z. Benet (2013)²⁵

Whereas in pharmacokinetics (PK) by bioavailability exclusively the Area under Curve extrapolated to infinite time (\(\small{AUC_{0-\infty}}\)) is meant, the FDA introduced two new terms, namely

Therefore, PK metrics, whereas PK parameters refer to modeling.

the ‘rate of bioavailability’ – measured by the maximum concentration (\(\small{C_\text{max}}\)) and
the ‘extent of bioavailability’ – measured by the \(\small{AUC}\).

The former is understood as a surrogate for the absorption rate \(\small{k\,_\text{a}}\) in a PK model. I prefer – like the ICH³ and the FDA since 2003²⁶ – rate and extent of absorption, in order not to contaminate the original meaning of BA in PK.

Let us consider the basic equation of pharmacokinetics \[\frac{f\cdot D}{CL}=\frac{f\cdot D}{V\cdot k_\text{ el}}=AUC_{0-\infty}=\int_{0}^{\infty}C(t)\,dt,\tag{3}\] where \(\small{f}\) is the fraction absorbed (we are interested in the comparison of formulations), \(\small{D}\) is the dose, \(\small{CL}\) is the clearance, \(\small{V}\) is the apparent volume of distribution, \(\small{k\,_\text{el}}\) is the elimination rate constant, and \(\small{C(t)}\) is the plasma concentration with time. We see immediately that for identical²⁷ doses and invariate²⁸ \(\small{CL}\), \(\small{V}\), \(\small{k\,_\text{el}}\) (which are drug-specific), comparing the \(\small{AUC}\text{s}\) allows to compare the fractions absorbed.

“Pharmacokinetics: one of the magic arts of divination whereby needles are stuck into dummies in an attempt to predict profits.
Stephen Senn (2004)

It must be mentioned that \(\small{C_\text{max}}\) is not sensitive to even substantial changes in the rate of absorption \(\small{k\,_\text{a}}\), since it is a composite metric.²⁹ In a one compartment model it depends on \(\small{k\,_\text{a}}\), \(\small{f}\) and both the elimination rate constant \(\small{k\,_\text{el}}\) and \(\small{V}\) (or \(\small{CL}\) if you belong to the other church).³⁰ Whereas \(\small{k\,_\text{a}}\) and \(\small{f}\) are properties of the formulation – we are interested in – the others are properties of the drug. \[\eqalign{ t_\textrm{max}&=\frac{\log_{e}(k\,_\text{a}/k\,_\text{el})}{k\,_\text{a}-k\,_\text{el}}\\ C_\textrm{max}&=\frac{f\cdot D\cdot k\,_\text{a}}{V\cdot (k\,_\text{a}-k\,_\text{el})}\large(\small\exp(-k\,_\text{el}\cdot t_\textrm{max})-\exp(-k\,_\text{a}\cdot t_\textrm{max})\large)\tag{4}}\] Therefore, when using it as a surrogate for the absorption rate one must keep in mind that formulations with different fractions absorbed and \(\small{t_\text{max}}\) might show the same \(\small{C_\text{max}}\).
It took ten years before the alternative metric \(\small{C_\text{max}/AUC}\) (based on theoretical considerations and simulations) was proposed.³¹ ³² ³³ Apart from being independent from \(\small{f}\), it is substantially less variable than \(\small{C_\text{max}}\). Regrettably, it was never implemented in any guideline.

In the early 1980s originators failed in trying to falsify the concept (i.e., comparing BE in healthy volunteers to large therapeutic tquivalence (TE) studies in patients): If BE passed, TE passed as well and vice versa. If they would have succeeded (BE passed while TE failed), generic companies would have to demonstrate TE in order to get products approved. Such studies would have to be much larger than the originators’ phase III studies, making them economically infeasible.²⁴ Essentially, that would have meant an early end of the young generic industry.

However, comparative BA is also used by originators in scale-up of formulations used in phase III to the to-be-marketed formulation, supporting post-approval changes, in line extensions of approved products, and for testing of drug-drug interactions or food effects. Hence, a substantial part of BE trials are performed by originators. If they had been successful to refute the concept, they would have shot into their own foot.

In the mid 1980s a consensus was reached, i.e., that generic approval should only be acceptable after suitable in vivo equivalence.

The main assumption in BE was (and still is) that ‘similar’ plasma concentrations in healthy volunteers will lead to similar concentrations at the target site (i.e., a receptor) and thus, to similar effects in patients. It was still an open issue whether BE should be interpreted as a surrogate of clinical efficacy/safety or a measure of pharmaceutical quality. Whereas in the 1980s the former was prevalent, since the 1990s the latter is mainstream.
A somewhat naïve interpretation of the PK metrics is that \(\small{AUC}\) directly translates to efficacy and \(\small{C_\text{max}}\) to safety. Especially the latter is not correct because any difference in \(\small{C_\text{max}}\) leads to a relatively smaller difference in the maximum effect \(\small{E_\text{max}}\).

There was no consensus about the definition of ‘similarity’ and the statistical methodology to compare plasma profiles. Two early methods are outlined in the following.

top of section ↩︎ previous section ↩︎

75/75 Rule

An approach employed by the FDA. Two drugs were considered bioequivalent if at least 75% of subjects show \(\small{\text{T}/\text{R}\textsf{-}}\)ratios within 75 – 125%.¹⁴ ³⁴ ³⁵ It is not a statistic and, thus, was immediately criticized because variable formulations or studies with some extreme values may pass the criterion by pure chance.³⁶

We get for our example in R:

example       <- data.frame(subject   = rep(1:12, each = 2),
                            sequence  = c(rep("TR", 12), rep("RT", 12)),
                            treatment = c(rep(c("T", "R"), 6),
                                          rep(c("R", "T"), 6)),
                            period    = rep(1:2, 12),
                            Y         = c(71, 81, 61, 65, 80, 94,
                                          66, 74, 94, 54, 97, 63,
                                          85, 70, 90, 76, 54, 53,
                                          56, 99, 90, 83, 68, 51))
rule.75.75    <- reshape(example, idvar = "subject", timevar = "treatment",
                         drop = c("sequence", "period"), direction = "wide")
names(rule.75.75)[2:3] <- c("T", "R")
rule.75.75$T.R <- 100 * (rule.75.75$T / rule.75.75$R)
for (i in 1:nrow(rule.75.75)) {
  if (rule.75.75$T.R[i] >= 75 & rule.75.75$T.R[i] <= 125) {
    rule.75.75$BE[i]     <- TRUE
    rule.75.75$within[i] <- "yes"
  } else {
    rule.75.75$BE[i]     <- FALSE
    rule.75.75$within[i] <- "no"
  }
}
names(rule.75.75)[c(4, 6)] <- c("T/R (%)", "±25%")
BE            <- "Failed BE by the"
if (sum(rule.75.75$BE) / nrow(rule.75.75) >= 0.75) BE <- "Passed BE by the"
print(rule.75.75[, c(1:4, 6)], row.names = FALSE); cat(BE, "75/75 Rule.\n")

#  subject  T  R   T/R (%) ±25%
#        1 71 81  87.65432  yes
#        2 61 65  93.84615  yes
#        3 80 94  85.10638  yes
#        4 66 74  89.18919  yes
#        5 94 54 174.07407   no
#        6 97 63 153.96825   no
#        7 70 85  82.35294  yes
#        8 76 90  84.44444  yes
#        9 53 54  98.14815  yes
#       10 99 56 176.78571   no
#       11 83 90  92.22222  yes
#       12 51 68  75.00000  yes
# Passed BE by the 75/75 Rule.

Nine of the twelve subjects (75%) have a T/R-ratio within 75 – 125% and the study would pass, despite the three subjects with extreme \(\small{\text{T}/\text{R}\textsf{-}}\)ratios.

top of section ↩︎ previous section ↩︎

t-test

Another suggestion was testing for a statistically significant difference at level \(\small{\alpha=0.05}\). The null hypothesis was that formulations are equal (\(\small{\mu_\text{T}-\mu_\text{R}=0}\)).

Let’s assess our example in R again:

example        <- data.frame(subject   = rep(1:12, each = 2),
                             sequence  = c(rep("TR", 12), rep("RT", 12)),
                             treatment = c(rep(c("T", "R"), 6),
                                           rep(c("R", "T"), 6)),
                             period    = rep(1:2, 12),
                             Y         = c(71, 81, 61, 65, 80, 94,
                                           66, 74, 94, 54, 97, 63,
                                           85, 70, 90, 76, 54, 53,
                                           56, 99, 90, 83, 68, 51))
tt             <- reshape(example, idvar = "subject", timevar = "treatment",
                          drop = c("sequence", "period"), direction = "wide")
tt$T.R         <- tt[, 2] - tt[, 3]
names(tt)[2:4] <- c("T", "R", "T–R")
p              <- t.test(x = tt$T, y = tt$R, paired = TRUE)$p.value
BE             <- "Failed BE"
if (p >= 0.05) BE <- "Passed BE"
print(tt, row.names = FALSE); cat(sprintf("%s by a paired t-test (p = %.4f).\n", BE, p))

#  subject  T  R T–R
#        1 71 81 -10
#        2 61 65  -4
#        3 80 94 -14
#        4 66 74  -8
#        5 94 54  40
#        6 97 63  34
#        7 70 85 -15
#        8 76 90 -14
#        9 53 54  -1
#       10 99 56  43
#       11 83 90  -7
#       12 51 68 -17
# Passed BE by a paired t-test (p = 0.7381).

We calculate a \(\small{p}\)-value of 0.7381, which is statistically not significant (\(\small{\geq\alpha}\)) and the study would pass again.

However, we face a similar problem like with the 75/75 Rule. If the differences show high variability, the study would pass. On the other hand, if there is low variability in the differences, the study would fail. This is counterintuitive and actually the opposite of what regulators want.

Interlude 1

One of my early sins³⁷ – it was not the last…
After phenytoin intoxications in Austria³⁸ we compared three generics (containing the free acid like the originator, Na-, or Ca-salt) to the reference in a crossover design. All formulations have been approved and were marketed in Austria. Although at that time I already calculated a 95% CI, the reviewers of our manuscript insisted in testing for a significant difference ‘because it is state of the art’.

Fig. 1 Phenytoin 3 × 100 mg equivalent, single dose fasting.

Two generics were statistically significant different from the reference (\(\small{\text{T}_1}\) containing the free acid like the originator and \(\small{\text{T}_3}\) containing the Ca-salt). \(\small{\text{T}_2}\) containing the Na-salt was statistically not significant different and, thus, considered equivalent – despite its high \(\small{\text{T}/\text{R}\textsf{-}}\)ratio (Table II). \[\small{ \begin{array}{ccccc} \textsf{Table II}\phantom{0000}\\ \text{formulation} & \text{T}/\text{R (%)} & p & & \text{BE}\\\hline \text{T}_1 & 146 & 0.0195\phantom{6} & \text{*} & \text{fail}\\ \text{T}_2 & 134 & 0.151\phantom{96} & \text{n.s.} & \text{pass}\\ \text{T}_3 & \phantom{1}28 & 0.00596 & \text{**} & \text{fail}\\\hline \end{array}}\] If we would evaluate the study according to current standards (i.e., by the 90% CI inclusion approach based on \(\small{\log_{e}\textsf{-}}\)transformed data and acceptance limits of 80.00–125.00%), all generics would fail. \(\small{\text{T}_3}\) would even be bioinequivalent because its upper CL is way below 80% (Table III).
If we would adjust for multiplicity (\(\small{\alpha_\text{adj}=0.05/3=0.1\dot{6}\mapsto 96.6\dot{6}\text{% CI}}\)) – although not required in an exploratory study – the outcome would be even worse (Table IV). \[\small{\begin{array}{ccccc} \textsf{Table III}\phantom{0000}\\ \text{formulation} & \text{PE (%)} & \text{CL}_\text{lower}\text{(%)} & \text{CL}_\text{upper}\text{ (%)} & \text{BE}\\\hline \text{T}_1 & 151.12 & 118.75 & 192.32 & \text{fail (inconclusive)}\\ \text{T}_2 & 139.39 & \phantom{1}95.91 & 202.60 & \text{fail (inconclusive)}\\ \text{T}_3 & \phantom{1}21.67 & \phantom{1}10.25 & \phantom{2}45.81 & \text{fail (inequivalent)}\\\hline \end{array}}\] \[\small{\begin{array}{ccccc} \textsf{Table IV}\phantom{0000}\\ \text{formulation} & \text{PE (%)} & \text{CL}_\text{lower}\text{(%)} & \text{CL}_\text{upper}\text{ (%)} & \text{BE}\\\hline \text{T}_1 & 151.12 & 106.67 & 214.09 & \text{fail (inconclusive)}\\ \text{T}_2 & 139.39 & \phantom{1}81.20 & 239.28 & \text{fail (inconclusive)}\\ \text{T}_3 & \phantom{1}21.67 & \phantom{10}7.34 & \phantom{2}63.93 & \text{fail (inequivalent)}\\\hline \end{array}}\] Given the nonlinear PK of phenytoin,³⁹ ⁴⁰ switching a patient from the originator to the generics with high \(\small{\text{T}/\text{R}\textsf{-}}\)ratios would be problematic – potentially leading to toxicity after multiple doses. Even worse would be switching from the generic \(\small{\text{T}_3}\) with its low \(\small{\text{T}/\text{R}\textsf{-}}\)ratio to any of the other formulations.

top of section ↩︎ previous section ↩︎

ANOVA and beyond

An Analysis of Variance (ANOVA) instead of a t-test allows to take period-effects into account.⁴¹ ⁴² ⁴³ This decade was also the heyday of Bayesian methods.⁴⁴ ⁴⁵ ⁴⁶ ⁴⁷ Nomograms for sample size estimation were also Bayesian⁴⁸ but happily misused by frequentists. New parametric⁴⁹ ⁵⁰ as well as nonparametric methods entered the stage.⁵⁰ ⁵¹ Metrics to compare controlled release formulations in steady state were proposed.⁵² ⁵³ ⁵⁴ The first software to evaluate 2×2×2 crossover studies was released in the public domain.⁵⁵

The acceptance range in bioequivalence is based on a ‘clinically relevant difference’ \(\small{\Delta}\), i.e., for data following a lognormal distribution \[\left\{\theta_1,\theta_2\right\}=\left\{100\,(1-\Delta),100\,(1-\Delta)^{-1}\right\}\tag{5}\] It must be mentioned that the commonly assumed \(\small{\Delta=20\%}\) leading to \(\small\left\{80.00\%,125.00\%\right\}\)⁵⁶ is arbitrary (as is any other).

An important leap forward was the Two One-Sided Tests Procedure (TOST)¹⁶ – although it was never implemented in its original form \(\small{(6)}\) in regulatory practice. Instead, the confidence interval inclusion approach \(\small{(7)}\) made it to the guidelines. Although these approaches are operationally identical (i.e., their outcomes [pass | fail] are the same), these are statistically different methods:

The TOST Procedure gives two \(\small{p}\)-values, namely \(\small{p(\theta_0\geq\theta_1)}\) and \(\small{p(\theta_0\leq\theta_2)}\). BE is concluded if both \(\small{p}\)-values are \(\small{\leq\alpha}\).

\[\begin{matrix}\tag{6} H_\textrm{0L}:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}\leq\theta_1\:vs\:H_\textrm{1L}:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}>\theta_1\\ H_\textrm{0U}:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}\geq\theta_2\:vs\:H_\textrm{1U}:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}<\theta_2 \end{matrix}\]

In the CI inclusion approach BE is concluded if the two-sided \(\small{1-2\,\alpha}\) CI lies entirely within the acceptance range \(\small{\left\{\theta_1,\theta_2\right\}}\). For an explanation why a 90% CI (and not a 95% CI like in phase III) is used, see another article. \[H_0:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}\ni\left\{\theta_1,\theta_2\right\}\:vs\:H_1:\theta_1<\frac{\mu_\textrm{T}}{\mu_\textrm{R}}<\theta_2\tag{7}\]

When we evaluate our example by \(\small{(6)}\), we get \(\small{p(\theta_0\geq\theta_1)=0.0155}\) and \(\small{p(\theta_0\leq\theta_2)=0.0515}\). Since one of the \(\small{p\textsf{-}}\)values is \(\small{>\alpha}\), the study would fail.

Interlude 2

It is a misconception that a certain CI of a sample (i.e., a particular study) contains the – true but unknown – population mean \(\small{\mu}\) with \(\small{1-\alpha}\) probabilty. Let’s simulate some studies and evaluate them by \(\small{(7)}\):

invisible(library(PowerTOST))
set.seed(123) # for reproducibility of simulations
mue      <- 1 # true population mean
CV       <- 0.25
studies  <- 100
x        <- sampleN.TOST(CV = CV, theta0 = mue, targetpower = 0.8, print = FALSE)
subjects <- x[["Sample size"]]
power    <- x[["Achieved power"]]
# simulate subjects within studies, lognormal distribution
samples  <- data.frame(study     = rep(1:studies, each = subjects * 2),
                       subject   = rep(rep(1:subjects, studies), each = 2),
                       period    = rep(rep(1:2, studies), 2),
                       sequence  = rep(c(rep(c("TR"), subjects),
                                         rep(c("RT"), subjects)), studies),
                       treatment = c(rep(c("T", "R"), subjects / 2),
                                     rep(c("R", "T"), subjects / 2)),
                       Y         = rlnorm(n = subjects * studies * 2,
                                          meanlog = log(mue) - 0.5 * log(CV^2 + 1),
                                          sdlog = sqrt(log(CV^2 + 1))))
facs     <- c("subject", "period", "treatment")
samples[facs] <- lapply(samples[facs], factor) # factorize the data
result   <- data.frame(study = 1:studies, PE = NA_real_,
                       lower = NA_real_, upper = NA_real_,
                       BE = FALSE, contain = TRUE)
grand.PE <- numeric(studies)
for (i in 1:studies) {
  temp           <- samples[samples$study == i, ]
  heretic        <- lm(log(Y) ~ period + subject + treatment, data = temp)
  result$PE[i]   <- 100 * exp(coef(heretic)[["treatmentT"]])
  result[i, 3:4] <- 100 * exp(confint(heretic, level = 0.90)["treatmentT", ])
  if (round(result[i, 3], 2) >= 80 & round(result[i, 4], 2) <= 125)
    result$BE[i] <- TRUE
  if (result$lower[i] > 100 * mue | result$upper[i] < 100 * mue) result$contain[i] <- FALSE
  grand.PE[i]    <- mean(result$PE[1:i]) # (cumulative) grand means
}
dev.new(width = 4.5, height = 4.5)
op       <- par(no.readonly = TRUE)
par(mar = c(3.05, 2.9, 1.4, 0.75), cex.axis = 0.9, mgp = c(2, 0.5, 0))
xlim     <- range(c(min(result$lower), 1e4 / min(result$lower),
                    max(result$upper), 1e4 / max(result$upper)))
plot(1:2, 100 * rep(mue, 2), type = "n", log = "x", xlab = "PE [90% CI]",
     ylab = "study  #", axes = FALSE,
     xlim = xlim, ylim = range(result$study))
abline(v = 100 * c(0.8, mue, 1.25), lty = c(2, 1, 2))
axis(1, at = c(125, pretty(xlim)),
     labels = sprintf("%.0f%%", c(125, pretty(xlim))))
axis(2, at = c(1, pretty(1:studies)[-1]), las = 1)
axis(3, at = 100 * mue, label = expression(mu))
box()
lines(grand.PE, 1:studies, lwd = 2)
for (i in 1:studies) {
  if (result$BE[i]) {       # pass
    clr <- "blue"
  } else {                  # fail
    if (result$contain[i]) {# mue within CI
      clr <- "magenta"
    } else {                # mue not in CI
      clr <- "red"
    }
  }
  lines(c(result$lower[i], result$upper[i]), rep(i, 2), col = clr)
  points(result$PE[i], i, pch = 16, cex = 0.6, col = clr)
}
par(op)

Fig. 2 2×2×2 crossover studies (\(\small{\mu}\) = 100%, \(\small{CV}\) = 25%: \(\small{n}\) = 24 for ≥80% power).

In 7% of studies the population mean \(\small{\mu}\) is not contained in the 90% CI (red lines). In other words, given the result of a single study we can never know where \(\small{\mu}\) lies. Only the grand mean (mean of sample means \(\small{\frac{1}{n}\sum_{i=1}^{i=n}\overline{x_i}}\)) approaches \(\small{\mu}\) for a large number of samples. After the 100^th study it is with 99.44% pretty close to \(\small{\mu}\) (for geeks: The convergence is poor; when simulating 25,000 studies, it is 100.23%). However, nobody would repeat a – passing – study (blue lines) for such a rather uninteresting information, right?
This explains also why a particular study might fail by pure chance even if a formulation is equivalent (here 15% of studies; red or magenta lines). Such cases are related to the producer’s risk (Type II Error = 1 – power), which is for the given conditions 16.3%. On the other hand, it is also possible that a formulation which is not equivalent might pass. These cases are related to the patient’s risk (Type I Error).
For details see the articles about hypotheses, treatment effects, post hoc power, and sample size estimation. Science is a cruel mistress.

At a hearing in 1986 the FDA confirmed that \(\small{(6)}\) or \(\small{(7)}\) of untransformed data should be used with \(\small{\Delta=20\%}\). If clinically relevant, tighter limits (\(\small{\Delta=10\%}\)) might be needed.⁵⁷

The first German guideline was drafted by the Working Group for Pharmaceutical Process Engineering (Arbeitsgemeinschaft für Pharmazeutische Verfahrenstechnik) in 1985.⁵⁸ It was presented and discussed in 1987.⁵⁹ ⁶⁰ ⁶¹

In 1988 wider acceptance limits of 70 – 130% were proposed for \(\small{C_\text{max}}\) due to its inherent high variability⁶² (as a one-point metric practically always larger than the one of the integrated metric \(\small{AUC}\)).

The Australian draft guideline was published in 1988.⁶³ It was the first covering not only the design and evaluation but also validation of bioanalytical methods. The model with effects period, subject, treatment^{20 43} was recommended and a test for sequence-effects was not considered necessary. The problematic conversion of differences to percentages was acknowledged and Fieller’s CI^21
22 discussed. Kudos to both!

In 1989 a series of loose-leaf binders was started.⁶⁴ It contained raw-data of generic drugs marketed in Germany, the evaluation provided by companies, as well as results recalculated by the ZL (Central Laboratory of German Pharmacists). Including the 6^th supplement of 1996 it contained more than 2,000 pages… It was an indispensible resource for planning new studies and also showed the ‘journey’ of dossiers (i.e., the same study being used by different companies).

The BioInternational conference series set milestones in the development of testing for bioequivalence. The first in Toronto 1989 dealt with the \(\small{\log_{e}\textsf{-}}\)transformation of data and the definition of highly variable drugs (HVDs).⁶⁵ There was a poll among the participants about the \(\small{\log_{e}\textsf{-}}\)transformation. Outcome: ⅓ never, ⅓ always, ⅓ case by case (i.e., perform both analyses and report the one with narrower CI ‘because it fits the data better’). Let’s be silent about the last team.⁶⁶ HVDs were defined as drugs with intra-subject variabilities of more than 30% but problems might be evident already at 25%.

top of section ↩︎ previous section ↩︎

The Boring (?) 1990s

The original acceptance range was symmetrical around 100%. In \(\small{\log_{e}\textsf{-}}\)scale it should be symmetrical around \(\small{0}\) (because \(\small{\log_{e}1=0}\)). What happens to our \(\small{\Delta}\), which should still be 20%? Due to the positive skewness of the lognormal distribution a lively discussion started after early publications proposing 80 – 125%.^{19 41} Keeping 80 – 120% would have been flawed because the maximum power should be obtained at \(\small{\mu_\text{T}/\mu_\text{R}=1}\) for \[\exp\left((\log_{e}\theta_1+\log_{e}\theta_2)/2\right),\tag{8}\] which works only if \(\small{\theta_2=\theta_1^{-1}}\) or \(\small{\theta_1=\theta_2^{-1}}\). Keeping the original limits, maximum power would be obtained at \(\small{\mu_\text{T}/\mu_\text{R}=\exp((\log_{e}0.8+\log_{e}1.2)/2)\approx0.979796}\).

Fig. 3 Power for a 2×2×2 design and limits 0.80 – 1.20.
Note that the \(\small{\theta}\)-axis is in log-scale.

There were three parties (all agreed that the acceptance range should be symmetrical in \(\small{\log_{e}\textsf{-}}\)scale and consequently asymmetrical when back-transformed). These were their arguments and suggestions:

The width of the acceptance range was 40% and we have empiric evidence that the concept of BE ‘worked’ – let’s keep it.

\[\left\{\theta_1,\theta_2\right\}=81.98-121.98\%\tag{9}\]

Since that’s a new method, we don’t want to face safety issues with a higher limit. Furthermore, a more restrictive lower limit prevents issues with insufficient efficacy.

\[\left\{\theta_1,\theta_2\right\}=\left\{100/(1+\Delta),100\,(1+\Delta)\right\}=8\dot{3}.33-120\%\tag{10}\]

80% as the lower limit served us well in the past. Hence, 125% is the way to go because it is simply the reciprocal of the lower limit and the coverage probability in the log-domain is the same like the one we had. Furthermore, these are nice numbers.

\[\left\{\theta_1,\theta_2\right\}=\left\{100\,(1-\Delta),100/(1-\Delta)\right\}=80-125\%\tag{11}\]

The 90% CI inclusion approach \(\small{(7)}\) based on \(\small{\log_{e}\textsf{-}}\)transformed data with acceptance limits of 80.00 – 125.00% \(\small{(5)}\) was the winner.

Fig. 4 Power for a 2×2×2 design and limits 0.80 – 1.25.
Note the symmetry: power for any \(\small{1/\theta=\theta}\).

First sample size tables for the multiplicative model with the acceptance range 80 – 125% were published⁶⁷ and extended for narrower (90 – 111%) and wider (70 – 143%) acceptance ranges.⁶⁸ The nonparametric method was improved taking period-effects into account.⁶⁹ ⁷⁰ Drug-drug and food-interaction studies should be assessed for equivalence.⁷¹ The general applicability of average BE was challenged and the concept of individual and population bioequivalence outlined.⁷² ⁷³ ⁷⁴ The first textbook dealing exclusively with BA/BE was published.⁷⁵

This was also the decade of updated and new guidelines. A European draft guidance was published in 1990;⁷⁶ the final guideline was published in December 1991 and came into force in June 1992.⁷⁷ The 90% CI inclusion approach of \(\small{\log_{e}\textsf{-}}\)transformed data with an acceptance range of 80 – 125% was recommended and for NTIDs the acceptance range may need to be tightened. Due to its inherent higher variability a wider acceptance range may be acceptable for \(\small{C_\text{max}}\). If inevitable and clinically acceptable, a wider acceptance range may also be used for \(\small{AUC}\). Only if clinically relevant, a nonparametric analysis of \(\small{t_\text{max}}\) was recommended.
An in vivo stuy was not required if the new formulation is

to be parenterally administered as a solution and contains the same API(s) and excipients in the same concentrations as the reference or
is a liquid oral form in solution (elixir, syrup, etc.) containing the API(s) in the same concentration and form as the reference, not containing excipients that may significantly affect gastric passage or absorption of the active substance.

Similar statements about solutions were given in all later guidelines. The second lead to application of the Biopharmaceutic Classification System (BCS).⁷⁸ More about that later.

In July 1992 the first guidance of the FDA was published.⁷⁹ An ANOVA of \(\small{\log_{e}\textsf{-}}\)transformed data was recommended and the nested subject(sequence) term in the statistical model entered the scene. It must be mentioned that in comparative BA studies subjects are usually uniquely coded. Hence, the term subject(sequence) is a bogus one⁸⁰ and could be replaced by the simple subject as well (see below for an example). Regrettably this model was implemented in all global guidelines ever since.

In the same year the Canadian guidance for Immediate Release (IR) formulations was published.⁸¹ To that time is was the most extensive one because it gave not only the method of evaluation, but information about the study design, sample size, ethics, bioanalytics, etc. It differed from the others in the relaxed requirement for \(\small{C_\text{max}}\), where only the \(\small{\text{T}/\text{R}\textsf{-}}\)ratio has to lie within 80 – 125% (instead of its CI).

In 1998 the World Health Organization published its first guideline,⁸² which was similar to the European one.

Table V shows the result of the example evaluated by the various methods. \[\small{\begin{array}{lcccc} \textsf{Table V}\phantom{0}\\ \phantom{0}\text{Method} & \text{Model} & \text{PE} & \text{power},p,\text{CI} & \text{BE?}\\\hline \text{80/20 Rule} & \text{additive} & - & 47.22\% & \text{fail}\\ \text{TOST} & \text{additive} & +2.250\;(103.09\%) & 0.0155,\,0.0515 & \text{fail}\\ \text{95% CI} & \text{additive} & +2.250\;(103.09\%) & -12.807\,,+17.307\;(82.61-123.76\%) & \text{fail}\\ \text{Westlake} & \text{additive} & \pm0.000\;(100.00\%) & \pm16.143\;(\pm21.48\%) & \text{fail}\\\hline \text{80/20 Rule} & \text{multiplicative} & - & 73.57\% & \text{fail}\\ \text{TOST} & \text{multiplicative} & 102.82\% & 0.0099,\,0.0283 & \text{pass}\\ \text{90% CI} & \text{multiplicative} & 102.82\% & \phantom{1}87.25-121.17\% & \text{pass}\\ \text{Westlake} & \text{multiplicative} & 100.00\% & \pm17.72\% & \text{pass}\\ \text{75/75 Rule} & \text{multiplicative} & - & - & \text{pass}\\\hline \end{array}}\] In the additive model the acceptance range was 80 – 120%, whereas in the multiplicative model it is 80 – 125%. Since in the former differences are assessed – wrong – percentages are given in brackets.

As of today only the 90% CI inclusion approach is globally accepted. Our example in R again:

example       <- data.frame(subject   = rep(1:12, each = 2),
                            sequence  = c(rep("TR", 12), rep("RT", 12)),
                            treatment = c(rep(c("T", "R"), 6),
                                          rep(c("R", "T"), 6)),
                            period    = rep(1:2, 12),
                            Y         = c(71, 81, 61, 65, 80, 94,
                                          66, 74, 94, 54, 97, 63,
                                          85, 70, 90, 76, 54, 53,
                                          56, 99, 90, 83, 68, 51))
facs          <- c("subject", "sequence", "treatment", "period")
example[facs] <- lapply(example[facs], factor) # factorize the data
txt           <- paste("nested model : period, subject(sequence), treatment",
                       "\nsimple model : period, subject, sequence, treatment",
                       "\nheretic model: period, subject, treatment\n\n")
result        <- data.frame(model = c("nested", "simple", "heretic"),
                            PE = NA, lower = NA, upper = NA, BE = "fail", na = 0)
for (i in 1:3) {
  if (result$model[i] == "nested") { # bogus nested model (guidelines)
    nested         <- lm(log(Y) ~ period +
                                  subject %in% sequence +
                                  treatment, data = example)
    result$PE[i]   <- 100 * exp(coef(nested)[["treatmentT"]])
    result[i, 3:4] <- 100 * exp(confint(nested, level = 0.90)["treatmentT", ])
    result[i, 6]   <- sum(is.na(coef(nested)))
  }
  if (result$model[i] == "simple") { # simple model (subjects are uniquely coded)
    simple         <- lm(log(Y) ~ period +
                                  subject +
                                  sequence +
                                  treatment, data = example)
    result$PE[i]   <- 100 * exp(coef(simple)[["treatmentT"]])
    result[i, 3:4] <- 100 * exp(confint(simple, level = 0.90)["treatmentT", ])
    result[i, 6]   <- sum(is.na(coef(simple)))
  }
  if (result$model[i] == "heretic") { # heretic model (without sequence)
    heretic        <- lm(log(Y) ~ period +
                                  subject +
                                  treatment, data = example)
    result$PE[i]   <- 100 * exp(coef(heretic)[["treatmentT"]])
    result[i, 3:4] <- 100 * exp(confint(heretic, level = 0.90)["treatmentT", ])
    result[i, 6]   <- sum(is.na(coef(heretic)))
  }
  # rounding acc. to guidelines
  if (round(result[i, 3], 2) >= 80 & round(result[i, 4], 2) <= 125)
    result$BE[i] <- "pass"
}
# cosmetics
result$PE     <- sprintf("%6.2f%%", result$PE)
result$lower  <- sprintf("%6.2f%%", result$lower)
result$upper  <- sprintf("%6.2f%%", result$upper)
names(result)[c(3:4, 6)] <- c("lower CL", "upper CL", "NE")
cat(txt); print(result, row.names = FALSE)

# nested model : period, subject(sequence), treatment 
# simple model : period, subject, sequence, treatment 
# heretic model: period, subject, treatment
# 
#    model      PE lower CL upper CL   BE NE
#   nested 102.82%   87.25%  121.17% pass 13
#   simple 102.82%   87.25%  121.17% pass  1
#  heretic 102.82%   87.25%  121.17% pass  0

As already outlined above, the nested model recommended in all [sic] guidelines is over-specified because subjects are uniquely coded. In the example we get 13 not estimable (aliased) effects (in the output of R lines with NA, in SAS ., and in WinNonlin not estimable). Correct, because we asking for something the data cannot provide.⁸⁰ In the simple model only one effect cannot be estimated. Even sequence can be removed from the model. I call it heretic because regulators will grill you if you are using it. It was the model proposed by Westlake^{20 43} and I used it in hundreds (‼) of my studies. Note that the results of all models are exactly the same; if you don’t believe me, try it with one of your studies.

A ‘Positive List’ was published by the German regulatory authority, i.e., for 90 drugs BE was not required.⁸³ In order to comply with the European Note for Guidance of 2001⁸⁴ it had to be removed by the BfArM.

Two (of five) sessions of the BioInternational ’92 conference in Bad Homburg dealt with BE of Highly Variable Drugs.⁸⁵ ⁸⁶ Various approaches have been discussed: Multiple dose instead of single dose studies, metabolite instead of the parent compound, stable isotope techniques,⁸⁷ add-on designs, and – for the first time – replicate designs.

Although the BioInternational 2 in Munich 1994 was with over 600 participants the largest in the series, no substantial progress for HVD(P)s was achieved.⁸⁸ Following a suggestion⁸⁹ at a joint AAPS/FDA workshop in 1995 widening the conventional acceptance limits of 80.00 – 125.00% was considered.⁹⁰

“For some highly variable drugs and drug products, the bioequivalence standard should be modified by changing the BE limits while maintaining the current confidence interval at 90%. […] the bioequivalence limits should be determined based in part upon the intrasubject variability for the reference product.
Shah et al. (1996)⁹⁰

A hot topic ever since… Why are we discussing it for 35 (‼) years (since the first BioInternational conference)? Is it really that complicated⁹¹ or are we too stupid?

Studies in steady-state were proposed as an option for HVD(P)s in a European draft guideline⁹² but was removed from the final version of 2001.⁸⁴

Validation of bioanalytical methods⁹³ ⁹⁴ ⁹⁵ ⁹⁶ was partly covered in Australia and Canada. However, no specific guideline existed. A series of conferences (informally known as ‘Crystal City’) was initiated in 1990.⁹⁷ Procedures stated in the conference report⁹⁸ were discussed at the BioInternational 2 in Munich 1994 and quickly adopted by bioanalytical sites. Updates were subsequently published.⁹⁹ ¹⁰⁰

TODO: SUPAC (FDA)

top of section ↩︎ previous section ↩︎

21^st century

After a wealth of – controversal – publications in the 1990s,^{72 73 74} ¹⁰¹ ¹⁰² ¹⁰³ ¹⁰⁴ ¹⁰⁵ ¹⁰⁶ ¹⁰⁷ ¹⁰⁸ ¹⁰⁹ the FDA introduced two new concepts as alternatives to average bioequivalence (ABE), namely population bioequivalence (PBE) and individual bioequivalence (IBE).¹¹⁰ ABE focuses only on the comparison of population averages of the PK metrics and not their variances of formulations. It does also not assess a subject-by-formulation interaction variance, that is, the variation in the average \(\small{\text{T}}\) and \(\small{\text{R}}\) difference among individuals. In contrast, PBE and IBE include comparisons of both averages and variances of PK metrics. The PBE approach assesses total variability of the PK metrics in the population. The IBE approach assesses within-subject variability for the \(\small{\text{T}}\) and \(\small{\text{R}}\) formulations, as well as the subject-by-formulation interaction.
Demonstrated PBE would support ‘Prescribability’ (i.e., a drug naïve patient could start treatment with a generic), whereas IBE support ‘Switchability’ (i.e., a patient could switch formulations during treatment).¹⁰⁹ Contrary to ABE, both PBE and IBE require studies in a full replicate design, which means that both \(\small{\text{T}}\) and \(\small{\text{R}}\) are administered twice. The acceptance limits for ABE were kept at 80.00–125.00% but for the others scaling to the variability of the reference was possible. That would mean an incentive for test formulations with lower variability than the reference but a penalty for ones with a higher variability.

However, the underlying statistical concepts were not trivial and the result practically incomprehensible for non-statisticians. Furthermore, both approaches had a discontinuity (when moving from constant- to reference-scaling), which lead to an inflated type I error (patient’s risk) of approximately 6.5% if CV_wR 18.1–20.2%.¹¹⁰ ¹¹¹ ¹¹²
The PBE/IBE faced criticism, e.g., »responses [to the guidance] were still doubt-filled as to whether the new bioequivalence criteria really provided added value compared to average bioequivalence«¹¹³ and was regarded a »‘theoretical’ solution to a ‘thoretical’ problem«¹¹⁴ leading to its omission from a subsequent guidance,¹¹⁵ and a return to conventional ABE.¹¹⁶

“[ABE should suffice based upon grounds of] ‘practicality, plausibility, historical adequacy, and purpose’ and ‘because we have better things to do.’ […] ‘Statisticians have a bad track record in bioequivalence, […] the literature is full of ludicrous recommendations from statisticians, […] regulatory recommendations (of dubious validity) have been hastily implemented, and practical realities have been ignored’.
Stephen Senn (2000)¹¹⁷

I remember a Dutch regulator standing up in the BioInternational conference (London 2003) saying: »I’m glad that PBE and IBE are dead. I never understood them.«

Poland happily adopted Germany’s ‘Positive List’⁸³ only when it wanted to join the European Union to learn that in the meantime Germany abandoned it. Until 2015 a similar (but shorter) list existed in The Netherlands for national market authorisations only. Must have been a schizophrenic situation for assessors of the MEB: In the morning a dossier for national MA without any in vivo comparison → ☑. In the afternoon another dossier of the same product in the course of a European submission. BE performed, but lower 90% CI 79.99% → ☒. Bizarre.
Until 2012 Denmark required for NTIDs that the 90% CI had to include 100% (i.e., that there is no significant treatment effect). Bizarre as well. For details see Example 3 in this article.

The first bioanalytical method validation guidance was published by the FDA in 2001 and revised in 2018.¹¹⁸ ¹¹⁹ Before the European draft guideline was published in 2009,¹²⁰ some inspectors raised an eyebrow if sites worked according to the FDA’s guidance.

“The validation of bioanalytical methods and the analysis of study samples should be performed in accordance with the principles of Good Laboratory Practice (GLP). However, as human bioanalytical studies fall outside of the scope of GLP, as defined in Directive 2004/10/EC, the sites conducting the human studies are not required to be monitored as part of a national GLP compliance programme.
EMEA (2009)¹²⁰

Well roared, lions! My CRO was GLP-certified since 1991, although we performed only phase I studies. In other countries (e.g., Spain), this was not possible. In Germany GLP is subject to state law. Hence, it was possible to get certified in one federal state but not in another… However, this ‘issue’ was resolved with the final guideline published in 2011¹²¹ and the ICH M10 guideline of 2022,¹²² superseding all local guidelines.

TODO: BCS-based biowaivers, reference-scaling, two-stage designs, NTIDs, current guidelines in various jurisdictions…

Still unresolved or not harmonized issues:

Scaled ABE for HVD(P)s (RSABE¹²³ or ABEL¹²⁴ ¹²⁵ ¹²⁶);
control of the type I error,¹²⁷ agreement on which of the metrics can be scaled, outliers^{124 125}
Method for NTIDs (fixed narrower acceptance limits¹²⁴ or reference-scaling¹²³ ¹²⁸)
Comparison of ‘early exposure’¹²⁹ if clinically relevant? (\(\small{t_\text{max}}\) by a nonparametric method or first partial \(\small{AUC}\)); see also this article
Cut-off times of partial \(\small{AUC}\textsf{s}\) (based on PD – like the FDA or PK – like the EMA?)
Alternative surrogate for the rate of absorption (\(\small{C_\text{max}/AUC}\)^{31 32 33})?
Reduce variability of \(\small{AUC}\)¹³⁰ of HVDs by using \(\small{AUC/\hat{\lambda}_z}\)?
Studies in fed state mandatory?
Multiple dose studies of modified release products really¹³¹ necessary?
Adaptive sequential two-stage designs (only exact or simulation-based as well?)
Potency-correction if measured contents differ by more than 5% (arbitrary)

See also some of my presentations, a – somewhat outdated – collection of guidelines, and further readings on the topic.^{112 113} ¹³² ¹³³ ¹³⁴ ¹³⁵ ¹³⁶ ¹³⁷ ¹³⁸ ¹³⁹ ¹⁴⁰ ¹⁴¹

A word of warning: The textbooks dealing with statistics (marked with ★ in the references) are rather tough cookies and not recommended for beginners.

top of section ↩︎ previous section ↩︎

Acknowledgments

Henning Blume and José Augusto Guimarães Morais for discussions about the BioInternational conferences and early days of bioequivalence.

Licenses

Helmut Schütz 2024
R GPL 3.0, klippy MIT, pandoc GPL 2.0.
1^st version April 9, 2024. Rendered May 1, 2024 18:03 CEST by rmarkdown via pandoc in 0.09 seconds.

Footnotes and References

Lindenbaum J, Preibisz JJ, Butler VP Jr., Saha JR. Variation in digoxin bioavailabity: a continuing problem. J Chron Dis. 1973; 16: 749–54. Open Access.↩︎
DeSante KA, DiSanto AR, Chodos DJ, Stoll RG. Antibiotic Batch Certification and Bioequivalence. JAMA. 1975; 232(13): 1349–51. doi:10.1001/jama.1975.03250130033016.↩︎
International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use. Bioequivalence for Immediate-Release Solid Oral Dosage Forms. M13A. Draft version 20 December 2022. Online.↩︎
Hall DG, In: Hearing Before the Subcommittee on Monopolies Select Committee on Small Business. U.S. Senate, Government Printing Office, Washington D.C. 1967: 258–81.↩︎
Tyrer JH, Eadie MJ, Sutherland JM, Hooper WD. Outbreak of anticonvulsant intoxication in an Australian city. Br Med J. 1970; 4: 271–3. doi:10.1136/bmj.4.5730.271. Open Access.↩︎
Bochner F, Hooper WD, Tyrer JH, Eadie MJ. Factors involved in an outbreak of phenytoin intoxications. J Neurol Sci. 1972; 16(4): 481–7. doi:10.1016/0022-510x(72)90053-6.↩︎
Lund L. Clinical significance of generic inequivalence of three different pharmaceutical preparations of phenytoin. Eur J Clin Pharmacol. 1974; 7: 119–24. doi:10.1007/bf00561325.↩︎
Lindenbaum J, Mellow MH, Blackstone MO, Butler VP. Variations in biological activity of digoxin from four preparations. N Engl J Med. 1971; 285(24): 1344–7. doi:10.1056/nejm197112092852403.↩︎
Jounela AJ, Pentikäinen PJ, Sothmann. Effect of particle size on the bioavalability of digoxin. Eur J Clin Pharmacol. 1975; 8(5): 365–70. doi:10.1007/BF00562664.↩︎
Richton-Hewett S, Foster E, Apstein CS. Medical and Economic Consequences of a Blinded Oral Anticoagulant Brand Change at a Municipal Hospital. Arch Intern Med. 1988; 148(4): 806–8. doi:10.1001/archinte.1988.00380040046010.↩︎
Weinberger M, Hendeles L, Bighley L, Speer J. The Relation of Product Formulation to Absorption of Oral Theophylline. N Engl J Med. 1978; 299(16): 852–7. doi:10.1056/nejm197810192991603.↩︎
Bielmann B, Levac TH, Langlois Y, L Tetreault L. Bioavailability of primidone in epileptic patients. Int J Clin Pharmacol. 1974; 9(2): 132–7. PMID 4208031↩︎
Skelly JP, Knapp G. Biologic availability of digoxin tablets. JAMA. 1973; 224(2): 243. doi:10.1001/jama.1973.03220150051015.↩︎
Skelly JP. A History of Biopharmaceutics in the Food and Drug Administration 1968–1993. AAPS J. 2010; 12(1): 44–50. doi:10.1208/s12248-009-9154-8. Free Full Text.↩︎
APhA Academy of Pharmaceutical Sciences. Guidelines for Biopharmaceutic Studies in Man. Washington D.C. February 1972.↩︎
Schuirmann DJ. A comparison of the Two One-Sided Tests Procedure and the Power Approach for Assessing the Equivalence of Average Bioavailability. J Pharmacokin Biopharm. 1987; 15(6): 657–80. doi:10.1007/BF01068419.↩︎
Metzler CM. Bioavailability – A Problem in Equivalence. Biometrics. 1974; 30(2): 309–17. PMID 4833140.↩︎
Westlake WJ. Symmetrical Confidence Intervals for Bioequivalence Trials. Biometrics. 1976; 32(4): 741–4. PMID 1009222.↩︎
Mantel N. Do We Want Confidence Intervals Symmetrical About the Null Value? Biometrics. 1977; 33: 759–60. [Letter to the Editor]↩︎
Westlake WJ. Design and Evaluation of Bioequivalence Studies in Man. In: Blanchard J, Sawchuk RJ, Brodie BB, editors. Principles and perspectives in Drug Bioavailability. Basel: Karger; 1979. ISBN 3-8055-2440-4. p. 192–210.↩︎
Fieller EC. Some Problems In Interval Estimation. J Royal Stat Soc B. 1954; 16(2): 175–85. JSTOR:2984043.↩︎
Locke CS. An Exact Confidence Interval from Untransformed Data for the Ratio of Two Formulation Means. J. Pharmacokin. Biopharm. 1984; 12(6): 649–55. doi:10.1007/bf01059558.↩︎
Public Law 98-417. Sept. 24, 1984. Online.↩︎
In phase III we try to demonstrate that verum performs ‘better’ than placebo, i.e., one-sided tests for non-inferiority (effect) and non-superiority (adverse reactions). Such studies are already large: Approving statins and COVID-19 vaccines required ten thousands volunteers. Can you imagine how many it would need to detect a 20% difference between two treatments?↩︎
Benet LZ. Why Do Bioequivalence Studies in Healthy Volunteers? 1^st MENA Regulatory Conference on Bioequivalence, Biowaivers, Bioanalysis and Dissolution. Amman. 23 September 2013. Internet Archive.↩︎
Office of the Federal Register. Code of Federal Regulations, Title 21, Part 320, Subpart A, § 320.23(a)(1) Online.↩︎
This is an assumption, i.e., based on the labelled content instead of the measured potency.↩︎
Yet another assumption. Incorrect for highly variable drugs and, thus, inflates the confidence interval.↩︎
Tóthfálusi L, Endrényi L. Estimation of C_max and T_max in Populations After Single and Multiple Drug Administration. J Pharmacokin Pharmacodyn. 2003; 30(5): 363–85. doi:10.1023/b:jopa.0000008159.97748.09.↩︎
In models with more than one compartment \(\small{t_\text{max}}\) and \(\small{C_\text{max}}\) cannot be analytically derived. In software numeric optimization is employed to locate the maximum of the function.↩︎
Endrényi L, Fritsch S, Yan W. C_max/AUC is a clearer measure than C_max for absorption rates in investigations of bioequivalence. Int J Clin Pharmacol Ther Toxicol. 1991; 29(10): 394–9. PMID 1748540.↩︎
Schall R, Luus HG. Comparison of absorption rates on bioequivalence studies of immediate release drug dormulations. Int J Clin Pharmacol Ther Toxicol. 1992; 30(5): 153–9. PMID 1592542.↩︎
Endrényi L, Yan W. Variation of C_max and C_max/AUC in investigations of bioequivalence. Int J Clin Pharm Ther Toxicol. 1993; 31(4): 184–9. PMID 8500920.↩︎
Haynes JD. Statistical simulation study of new proposed uniformity requirement for bioequivalency studies. J Pharm Sci. 1981; 70(6): 673–5. doi:10.1002/jps.2600700625.↩︎
Cabana BE. Assessment of 75/75 Rule: FDA Viewpoint. Pharm Sci. 1983; 72(1): 98–99. doi:10.1002/jps.2600720127.↩︎
Haynes JD. FDA 75/75 Rule: A Response. Pharm Sci. 1983; 72: 99–100.↩︎
Nitsche V, Mascher H, Schütz H. Comparative bioavailability of several phenytoin preparations marketed in Austria. Int J Clin Pharmacol Ther Toxicol. 1984; 22(2): 104–7. PMID 6698663.↩︎
Klingler D, Nitsche V, Schmidbauer H. Hydantoin-Intoxikation nach Austausch scheinbar gleichwertiger Diphenylhydantoin-Präparate. Wr Med Wschr. 1981; 131: 295–300. [German]↩︎
Glazko AJ, Chang T, Bouhema J, Dill WA, Goulet JR, Buchanan RA. Metabolic disposition of diphenylhydantoin in normal human subjects following intravenous administration. Clin Pharmacol Ther. 1969; 10(4): 498–504. doi:10.1002/cpt1969104498.↩︎
Bochner F, Hooper WD, Tyrer JH, Eadi MJ. Effect of dosage increments on blood phenytoin concentrations. J Neurol Neurosurg Psychiatr. 1972; 35(6): 873–6. doi:10.1136/jnnp.35.6.873.↩︎
Kirkwood TBL. Bioequivalence Testing – A Need to Rethink [reader reaction]. Biometrics. 1981, 37: 589—91. doi:10.2307/2530573.↩︎
Westlake WJ. Response to Bioequivalence Testing – A Need to Rethink [reader reaction response]. Biometrics. 1981, 37: 591—93.↩︎
Westlake WJ. Bioavailability and Bioequivalence of Pharmaceutical Formulations. In: Pearce KE, editor. Biopharmaceutical Statistics for Drug Development. New York: Marcel Dekker; 1988. p. 329–53. ISBN 0-8247-7798-0.↩︎
Rodda BE, Davis RL. Determining the probability of an important difference in bioavailability. Clin Pharmacol Ther. 1980; 28: 247–52. doi:10.1038/clpt.1980.157.↩︎
Mandallaz D, Mau J. Comparison of Different Methods for Decision-Making in Bioequivalence Assessment. Biometrics. 1981; 37: 213–22. PMID 6895040.↩︎
Fluehler H, Hirtz J, Moser HA. An Aid to Decision-Making in Bioequivalence Assessment. J Pharmacokin Biopharm. 1981; 9: 235–43. doi:10.1007/BF01068085.↩︎
Selwyn MR, Hall NR. On Bayesian Methods for Bioequivalence. Biometrics. 1984; 40: 1103–8. PMID 6398710.↩︎
Fluehler H, Grieve AP, Mandallaz D, Mau J, Moser HA. Bayesian Approach to Bioequivalence Assessment: An Example. J Pharm Sci. 1983; 72(10): 1178–81. doi:10.1002/jps.2600721018.↩︎
Anderson S, Hauck WW. A New Procedure for Testing Bioequivalence in Comparative Bioavailability and Other Clinical Trials. Commun Stat Ther Meth. 1983; 12(23): 2663–92. doi:10.1080/03610928308828634.↩︎
Steinijans VW, Diletti E. Statistical Analysis of Bioavailability Studies: Parametric and Nonparametric Confidence Intervals. Eur J Clin Pharmacol. 1983; 24: 127–36. doi:10.1007/BF00613939.↩︎
Steinijans VW, Diletti E. Generalization of Distribution-Free Confidence Intervals for Bioavailability Ratios. Eur J Clin Pharmacol. 1985; 28: 85–8. doi:10.1007/BF00635713.↩︎
Steinijans VW, Schulz H-U, Beier W, Radtke HW. Once daily theophylline: multiple-dose comparison of an encapsulated micro-osmotic system (Euphylong) with a tablet (Uniphyllin). Int J Clin Pharm Ther Toxicol. 1986; 24(8): 438–47. PMID 3759279.↩︎
Steinijans VW. Pharmacokinetic Characteristics of Controlled Release Products and Their Biostatistical Analysis. In: Gundert-Remy U, Möller H, editors. Oral Controlled Release Products – Therapeutic and Biopharmaceutic Assessment. Stuttgart: Wissenschaftliche Verlagsanstalt; 1988, p. 99–115.↩︎
Blume H, Siewert M, Steinijans V. Bioäquivalenz von per os applizierten Retard-Arzneimitteln; Konzeption der Studien und Entscheidung über Austauschbarkeit. Pharm Ind. 1989; 51: 1025–33. [German]↩︎
Wijnand HP, Timmer CJ. Mini-computer programs for bioequivalence testing of pharmaceutical drug formulations in two-way cross-over studies. Comput Programs Biomed. 1983; 17(1–2): 73–88. doi:10.1016/0010-468x(83)90027-2.↩︎
Where did it come from? Two stories:
Les Benet told that there was a poll at the FDA and – essentially based on gut feeling – the 20% saw the light of day.
I’ve heard another one, which I like more. Wilfred J. Westlake, one of the pioneers of BE was a statistician at SKF. During a coffee and cig break (everybody was smoking in the 1970s) he asked his fellows of the clinical pharmacology department »Which difference in blood concentrations do you consider relevant?« Yep, the 20% were born.↩︎
Rheinstein P. Report by the Bioequivalence Task Force on Recommendations from the Bioequivalence Hearing conducted by the Food and Drug Administration. September 29 – October 1986. January 1988.↩︎
APV. Richtlinie und Kommentar. Pharmazeutische Industrie. 1985; 47(6): 627–32. [German]↩︎
Arbeitsgemeinschaft Pharmazeutische Verfahrenstechnik (APV). International Symposium. Bioavailability/Bioequivalence, Pharmaceutical Equivalence and Therapeutic Equivalence. Würzburg. 9–11 February, 1987.↩︎
Junginger H. APV-Richtlinie – »Untersuchungen zur Bioverfügbarkeit, Bioäquivalenz« Pharm Ztg. 1987; 132: 1952–55. [German]↩︎
Junginger H. Studies on Bioavailability and Bioequivalence – APV Guideline. Drugs Made in Germany. 1987; 30: 161–6.↩︎
Blume H, Kübel-Thiel K, Reutter B, Siewert M, Stenzhorn G. Nifedipin: Monographie zur Prüfung der Bioverfügbarkeit / Bioäquivalenz von schnell-freisetzenden Zubereitungen (1). Pharm Ztg. 1988; 133(6): 398–93. [German]↩︎
TGA. Guidelines for Bioavailability and Bioequivalency Studies. Draft C06:6723c (29/11/88).↩︎
Blume H, Mutschler E. Bioäquivalenz – Qualitätsbewertung wirkstoffgleicher Fertigarzneimittel: Anleitung-Methoden-Materialien. Frankfurt/Main: Govi-Verlag; 1989. [German]↩︎
McGilveray IJ, Midha KK, Skelly JP, Dighe S, Doluiso JT, French IW, Karim A, Burford R. Consensus Report from “Bio International ’89”: Issues in the Evaluation of Bioavailability Data. J Pharm Sci. 1990; 79(10): 945–6. doi:10.1002/jps.2600791022.↩︎
Keene ON. The log transformation is special. Stat Med. 1995; 14(8): 811–9. doi:10.1002/sim.4780140810. Open Access.↩︎
Diletti E, Hauschke D, Steinijans VW. Sample size determination for bioequivalence assessment by means of confidence intervals. Int J Clin Pharm Ther Toxicol. 1991; 29(1): 1–8. PMID 2004861.↩︎
Diletti E, Hauschke D, Steinijans VW. Sample size determination: Extended tables for the multiplicative model and bioequivalence ranges of 0.9 to 1.11 and 0.7 to 1.43. Int J Clin Pharm Ther Toxicol. 1992; 30(Suppl.1): S59–62. PMID 1601533.↩︎
Hauschke D, Steinijans VW, Diletti E. A distribution-free procedure for the statistical analysis of bioequivalence studies. Int J Clin Pharm Ther Toxicol. 1990; 28(2): 72–8.↩︎
Steinijans VW, Hauschke D. Update on the statistical analysis of bioequivalence studies. Int J Clin Pharm Ther Toxicol. 1990; 28(3): 105–10. PMID 2318545.↩︎
Steinijans VW, Hartmann M, Huber R, Radtke HW. Lack of pharmacokinetic interaction as an equivalence problem. Int J Clin Pharm Ther Toxicol. 1991; 29(8): 323–8. PMID 1835963.↩︎
Anderson S, Hauck WW. Consideration of individual bioequivalence. J Pharmacokinet Biopharm 1990; 18(3): 259–73. doi:10.1007/bf01062202.↩︎
Schall R, Luus HG. On population and individual bioequivalence. Stat Med 1993; 12(12): 1109–24. doi:10.1002/sim.4780121202.↩︎
Schall R. A unified view of individual, population, and average bioequivalence. In: Blume HH, Midha KK, editors. Bio-International 2. Bioavailability, Bioequivalence and Pharmacokinetic Studies. Stuttgart: medpharm; 1995: 91–106.↩︎
Chow S-C, Liu J-p. Design and Analysis of Bioavailability and Bioequivalence Studies. New York: Marcel Dekker; 1992. ISBN 0-8247-8682-3. ★↩︎
CPMP Working Party. Investigation of Bioavailabilty and Bioequivalence: Note for Guidance. III/54/89-EN, 8^th Draft. June 1990.↩︎
Commission of the European Community. Investigation of Bioavailabilty and Bioequivalence. Brussels. December 1991. Online.↩︎
Amidon GL, Lennernäs H, Shah VV, Crison JR. A Theoretical Basis for a Biopharmaceutic Drug Classification: The Correlation of in Vitro Drug Product Dissolution and in Vivo Bioavailability. Pharm Res. 1995; 12(3): 413–20. doi:10.1023/a:1016212804288. Open Access.↩︎
FDA, CDER. Guidance for Industry. Statistical Procedures for Bioequivalence Studies using a Standard Two-Treatment Crossover Design. Rockville. Jul 1992. Internet Archive.↩︎
If Subject 1 is randomized to sequence \(\small{\text{TR}}\), there is not ‘another’ Subject 1 randomized to sequence \(\small{\text{RT}}\). Randomization is not like Schrödinger’s cat. Hence, the nested term in the guidelines is an insult to the mind.↩︎
Health Canada, HPFB. Guidance for Industry. Conduct and Analysis of Bioavailability and Bioequivalence Studies – Part A: Oral Dosage FormulationsUsed for Systemic Effects. Ottawa. 1992. Online.↩︎
WHO Marketing Authorization of Pharmaceutical Products with Special Reference to Multisource (Generic) Products: A Manual for Drug Regulatory Authorities. Geneva. 1998. Internet Archive.↩︎
Gleiter CH, Klotz U, Kuhlmann J, Blume H, Stanislaus F, Harder S, Paulus H, Poethko-Müller C, Holz-Slomczyk M. (1998), When Are Bioavailability Studies Required? A German Proposal. J Clin Pharmacol. 1998 38: 904–11. doi:10.1002/j.1552-4604.1998.tb04385.x. Open Access.↩︎
EMEA, CPMP. Note for Guidance on the Investigation of Bioavailability and Bioequivalence. London. 26 July 2001. Online.↩︎
Midha KK, Blume HH, editors. Bio-International. Bioavailability, Bioequivalence and Pharmacokinetics. Stuttgart: medpharm; 1993. ISBN 3-88763-019-X.↩︎
Blume HH, Midha KK. Bio-International 92, Conference on Bioavailability, Bioequivalence, and Pharmacokinetic Studies. J Pharm Sci. 1993; 82(11): 1186–9. doi:10.1002/jps.2600821125.↩︎
Simultaneous administration of a stable isotope labelled IV dose would allow to calculate the true clearance in each period. Then it would not be necessary to assume identical clearances in \(\small{(3)}\) any more and the problem of highly variable drugs (inflating the CI) could be avoided. However, it would require that the IV formulation is manufactured according to the rules of cGMP and different from the internal standard in MS, which is generally not feasible. Such an approach is only mentioned in Japanese guidelines.↩︎
Blume HH, Midha KK, editors. Bio-International 2. Bioavailability, Bioequivalence and Pharmacokinetic Studies. Stuttgart: medpharm; 1995.↩︎
Boddy AW, Snikeris FC, Kringle RO, Wei GCG, Opperman JA, Midha KK. An approach for widening the bioequivalence acceptance limits in the case of highly variable drugs. Pharm Res. 1995; 12(12): 1865–8. doi:10.1023/a:1016219317744.↩︎
Shah VP, Yacobi A, Barr WH, Benet LZ, Breimer D, Dobrinska MR, Endrényi L, Fairweather W, Gillespie W, Gonzalez MA, Hooper J, Jackson A, Lesko LL, Midha KK, Noonan PK, Patnaik R, Williams RL. Workshop Report. Evaluation of Orally Administered Highly Variable Drugs and Drug Formulations. Pharm Res. 1996; 13(11): 1590–4. doi:10.1023/a:1016468018478.↩︎
Schütz H, Labes D, Wolfsegger MJ. Critical Remarks on Reference-Scaled Average Bioequivalence. J Pharm Pharmaceut Sci. 25: 285–96. doi:10.18433/jpps32892.↩︎
EMEA Human Medicines Evaluation Unit / CPMP. Note for Guidance on the Investigation of Bioavailability and Bioequivalence. Draft. London. 17 December 1998.↩︎
Brooks MA, Weifeld RE. A Validation Process for Data from the Analysis of Drugs in Biological Fluids. Drug Devel Ind Pharm. 1985; 11: 1703–28.↩︎
Pachla LA, Wright DS, Reynolds DL. Bioanalytical Considerations for Pharmacokinetic and Biopharmaceutic Studies. J Clin Pharmacol. 1986; 26(5): 332–5. doi:10.1002/j.1552-4604.1986.tb03534.x.↩︎
Buick AR, Doig MV, Jeal SC, Land GS, McDowall RD, Method Validation in the Bioanalytical Laboratory. J Pharm Biomed Anal. 1990; 8(8–12): 629–37. doi:10.1016/0731-7085(90)80093-5. Open Access.↩︎
Karnes ST, Shiu G, Shah VP. Validation of Bioanalytical Methods. Pharm Res. 1991; 8(4): 421–6. doi:10.1023/a:1015882607690.↩︎
AAPS, FDA, FIP, HPB, AOAC. Analytical Methods Validation: Bioavailability, Bioequivalence and Pharmacokinetic Studies. Arlington, VA. December 3–5, 1990.↩︎
Shah VP, Midha KK, Dighe S, McGilveray IJ, Skelly JP, Yacobi A, Layloff T, Viswanathan CT, Cook CE, McDowall RD, Pittman, Spector S. Analytical methods validation: Bioavailability, bioequivalence and pharmacokinetic studies. Eur J Drug Metab Pharmacokinet. 1991 ;16(4):249–55. doi:10.1007/bf03189968.↩︎
Shah VP, Midha KK, Findlay JWA, Hill HM, Hulse JD, McGilveray IJ, McKay G, Miller KJ, Patnaik RN, Powell ML, Tonelli A, Viswanathan CT, Yacobi A. Bioanalytical Method Validation – A Revisit with a Decade of Progress. Pharm Res. 2000; 17: 1551–7. doi:10.1023/a:1007669411738↩︎
Viswanathan CT, Bansal S, Booth B, DeStefano AJ, Rose MJ, Sailstad J, Shah VP, Skelly JP, Swann PG, Weiner R. Workshop / Conference Report – Quantitative Bioanalytical Methods Validation and Implementation: Best Practices for Chromatographic and Ligand Binding Assays. AAPS J. 2007; 24(10): 1962–73. doi:10.1007/s11095-007-9291-7.↩︎
Anderson S. Individual Bioequivalence: A problem of Switchability. Biopharm Rep. 1993; 2(2): 1–11.↩︎
Endrényi L, Schulz M. Individual Variation and the Acceptance of Average Bioequivalence. Drug Inform J. 1993; 27(1): 195–201. doi:10.1177/009286159302700135.↩︎
Endrényi L. A method for the evaluation of individual bioequivalence. Int J Clin Pharmacol. 1994; 32(9): 497–508. PMID 7820334.↩︎
Esinhart JD, Chinchilli VM. Extension to use of tolerance intervals for the assessment of individual bioequivalence. J Biopharm Stat. 1994; 4: 39–52. doi:10.1080/10543409408835071.↩︎
Chow S-C, Liu J-p. Current issues in bioequivalence trials. Drug Inform J. 1995; 29: 795–804. doi:10.1177/009286159502900302.↩︎
Chen ML. Individual bioequivalence. A regulatory update. J Biopharm Stat. 1997. 7(1): 5–11. doi:10.1080/10543409708835162.↩︎
Hauck WW, Anderson S. Commentary on individual bioequivalence by ML Chen. J Biopharm Stat. 1997; 7(1): 13–6. doi:10.1080/10543409708835163.↩︎
Liu J-p, Chow S-C. Some thoughts on individual bioequivalence. J Biopharm Stat. 1997; 7(1): 41–8. doi:10.1080/10543409708835168.↩︎
Midha KK, Rawson MJ, Hubbard JW. Prescribability and switchability of highly variable drugs and drug products. J Contr Rel. 1999; 62(1-2): 33–40. doi:10.1016/s0168-3659(99)00050-4.↩︎
FDA, CDER. Guidance for Industry. Statistical Approaches to Establishing Bioequivalence. Rockville. Jan 2001. Download.↩︎
Chow S-C, Shao J, Wang H. Individual bioequivalence testing under 2 × 3 designs. Stat Med. 2002; 21(5): 629–48. doi:10.1002/sim.1056.↩︎
Chow S-C, Liu J-p. Design and Analysis of Bioavailability and Bioequivalence Studies. Boca Raton: Chapman & Hall/CRC Press; 3^rd edition 2009. ISBN 978-1-58488-668-6. ★ p. 596–8.↩︎
Hauschke D, Steinijans VW, Pigeot I. Bioequivalence Studies in Drug Development. Methods and Applications. Chichester: Wiley; 2007. ISBN 0-470-09475-3. ★ p. 209.↩︎
Patterson S. A Review of the Development of Biostatistical Design and Analysis Techniques for Assessing In Vivo Bioequivalence: Part Two. Ind J Pharm Sci. 2001; 63(3): 169–86. Open Access.↩︎
FDA, CDER. Guidance for Industry. Bioavailability and Bioequivalence Studies for Orally Administered Drug Products — General Considerations. Rockville. March 2003. Internet Archive.↩︎
Schall R, Endrényi L. Bioequivalence: tried and tested. Cardiovasc J Afr. 2010. 21(2): 69–70. PMCID 3721767. Free Full text.↩︎
Senn S. Conference Proceedings: Challenging Statistical Issues in Clinical Trials. Decisions and Bioequivalence. 2000.↩︎
FDA, CDER, CVM. Guidance for Industry. Bioanalytical Method Validation. Rockville. May 2001. Internet Archive.↩︎
FDA, CDER, , CVM. Guidance for Industry. Bioanalytical Method Validation. Silver Spring. May 2018. Download.↩︎
EMEA, CHMP. Guideline on Validation of Bioanalytical Methods. Draft. London. 19 November 2009. Online.↩︎
EMA, CHMP. Guideline on Validation of Bioanalytical Methods. London. 21 July 2011. Online.↩︎
ICH. Bioanalytical Method Validation And Study Sample Analysis. M10. 22 May 2022. Online.↩︎
FDA, CDER. Guidance for Industry. Bioequivalence Studies With Pharmacokinetic Endpoints for Drugs Submitted Under an ANDA. Draft. Silver Spring. August 2021. Download.↩︎
EMEA, CHMP. Guideline on the Investigation of Bioequivalence. London. 20 January 2010. Online.↩︎
Health Canada. Guidance Document. Comparative Bioavailability Standards: Formulations Used for Systemic Effects. Ottawa. 2018/06/08. Online.↩︎
WHO/PQT: medicines. Application of reference-scaled criteria for AUC in bioequivalence studies conducted for submission to PQT/MED. Geneva. 02 July 2021. Online.↩︎
Schütz H. Highly Variable Drugs and Type I Error. Presentation at: 6^th International Workshop – GBHI 2024. Rockville, MD. 16 April 2024. Online.↩︎
Paixão P, García Arieta A, Silva N, Petric Z, Bonelli M, Morais JAG, Blake K, Gouveia LF. A Two-Way Proposal for the Determination of Bioequivalence for Narrow Therapeutic Index Drugs in the European Union. Pharmaceut. 2024; 16: 598. doi:10.3390/pharmaceutics16050598. Open Access.↩︎
Hofmann J. Bioequivalence of early exposure: t_max & pAUC. Presentation at: BioBridges. Prague. 21 September 2023. Online.↩︎
Abdallah HY. An area correction method to reduce intrasubject variability in bioequivalence studies. J Pharm Pharmaceut Sci. 1998; 1(2): 60–5. Open Access.↩︎
Paixão P, Gouveia LF, Morais JAG. An alternative single dose parameter to avoid the need for steady-state studies on oral extended-release drug products. Eur J Pharmaceut Biopharmaceut. 2012; 80(2): 410–7. doi:10.1016/j.ejpb.2011.11.001.↩︎
Senn S. Cross-over Trials in Clinical Research. Chichester: Wiley; 2^nd edition 2002. ISBN 0-471-49653-7. ★↩︎
Wellek S. Testing Statistical Hypotheses of Equivalence. Boca Raton: Chapman & Hall/CRC; 2003. ISBN 978-1-5848-8160-5. ★↩︎
Amidon G, Lesko L, Midha K, Shah V, Hilfinger J. International Bioequivalence Standards: A New Era. Ann Arbor: TSRL; 2006. ISBN 10-0-9790119-0-6.↩︎
Kanfer I, Shargel L, editors. Generic Product Development. International Regulatory Requirements for Bioequivalence. New York: informa healthcare; 2010. ISBN 978-0-8493-7785-3.↩︎
Bolton S, Bon C. Pharmaceutical Statistics. Practical and Clinical Applications. New York: informa healthcare; 5^th edition 2010. ISBN 978-1-4200-7422-2. ★↩︎
Davit B, Braddy AC, Conner DP, Yu LX. International Guidelines for Bioequivalence of Systemically Available Orally Administered Generic Drug Products: A Survey of Similarities and Differences. AAPS J. 2013; 15(4): 974–90. doi:10.1208/s12248-013-9499-x. Free Full Text.↩︎
Yu LX, Li BV, editors. FDA Bioequivalence Standards. New York: Springer; 2014. ISBN 978-1-4939-1251-0.↩︎
Jones B, Kenward MG. Design and Analysis of Cross-Over Trials. Boca Raton: CRC Press. 3^rd edition 2015. ISBN 978-1-4398-6142-4. ★↩︎
Kanfer I, editor. Bioequivalence Requirements in Various Global Jurisdictions. New York: Springer; 2017. ISBN 978-3-319-88542-1.↩︎
Patterson S, Jones B. Bioequivalence and Statistics in Clinical Pharmacology. Boca Raton: CRC Press; 2^nd edition 2019. ISBN 978-0-3677-8244-3. ★↩︎

A Short History of Bioequivalence

What You Always Wanted to Know
(But Were Afraid to Ask)

Helmut Schütz

May 1, 2024

Introduction

The 1970s

80/20 Rule

95% CI

Westlake’s CI

The Roaring 1980s

75/75 Rule

t-test

ANOVA and beyond

The Boring (?) 1990s

21^st century

Acknowledgments

Licenses

Footnotes and References

A Short History of Bioequivalence

What You Always Wanted to Know(But Were Afraid to Ask)

Helmut Schütz

May 1, 2024

Introduction

The 1970s

80/20 Rule

95% CI

Westlake’s CI

The Roaring 1980s

75/75 Rule

t-test

ANOVA and beyond

The Boring (?) 1990s

21st century

Acknowledgments

Licenses

Footnotes and References

What You Always Wanted to Know
(But Were Afraid to Ask)

21^st century