Consider allowing JavaScript. Otherwise, you have to be proficient in reading LaTeX since formulas will not be rendered. Furthermore, the table of contents in the left column for navigation will not be available. Sorry for the inconvenience.


  • The right-hand badges give the respective section’s ‘level’.
    
  1. Basics requiring no or only limited statistical expertise.
    
  1. These sections are the most important ones. They are – hopefully – easily comprehensible even for novices. A basic knowledge of R does not hurt.
    
  1. A somewhat higher knowledge of statistics and/or R is required. May be skipped or reserved for a later reading.
  • Click to show / hide R code.
  • To copy R code to the clipboard click on the icon copy icon in the top left corner.

Introduction

If this article is perceived as overly focused on statistics, I apologize. This is due to my professional background, which has led me to be less skilled at crafting engaging narratives.

‘Bioavailability’ (a portmanteau of ‘biologic availability’) in its current meaning was coined in 19731 and ‘Bio­equi­va­lence’ saw the light of day in 1975.2

The MeSH term ‘Biological Availability’ was introduced in 1979.
The extent to which the active ingredient of a drug dosage form becomes available at the site of drug action or in a biological medium believed to reflect accessibility to a site of action.

The site of action (i.e., a receptor) is inaccessible. There should be no space for believes in science. The best definition of bioequivalence is given by the ICH.3

Two drug products containing the same drug substance(s) are considered bioequivalent if their relative bio­availability (BA) (rate and extent of drug absorption) after administration in the same molar dose lies with­in acceptable predefined limits. These limits are set to ensure com­par­able in vivo performance, i.e., si­mi­la­ri­ty in terms of safety and efficacy.
ICH (2020)3

We will use a simple example in the following. A two-treatment two-sequence two-period (2×2×2) crossover design, subjects 1–6 were in sequence \(\small{\text{TR}}\) and subjects 7–12 in sequence \(\small{\text{RT}}\). \[\small{\begin{array}{ccc} \textsf{Table I}\phantom{0}\\ \text{subject} & \text{T} & \text{R}\\\hline \phantom{1}1 & 71 & 81\\ \phantom{1}2 & 61 & 65\\ \phantom{1}3 & 80 & 94\\ \phantom{1}4 & 66 & 74\\ \phantom{1}5 & 94 & 54\\ \phantom{1}6 & 97 & 63\\ \phantom{1}7 & 70 & 85\\ \phantom{1}8 & 76 & 90\\ \phantom{1}9 & 54 & 53\\ 10 & 99 & 56\\ 11 & 83 & 90\\ 12 & 51 & 68\\\hline \end{array}}\]

top of section ↩︎

    

The 1970s

Problems were reported with formulations of Narrow Therapeutic Index Drugs (NTIDs) like phenytoin,4 5 6 7 digoxin,1 8 9 warfarin,10 theophylline,11 primidone.12 Some show nonlinear pharmacokinetics (phenytoin) or are auto-inducers (warfarin).

  • Excipient changed from CaS04 to lactose5 6
  • The API was altered (e.g., particle size,7 9 amorphous to crystalline10)
  • Variable disintegration time
  • Dissolution testing not mandatory
  • No in vivo studies were performed comparing the new to the approved formulation
  • Breakthrough-seizures4 and intoxications5 6 (phenytoin) and variable or poor effect (digoxin, theophylline)

Generic drugs in the current sense did not yet exist at that time; only the content had to meet the USP requirements.

Although in 1969 Professor John Wagner demonstrated to the Bureau of Medicine, methods for comparing areas under the serum versus time curve (AUC) to estimate bioequivalence, his approach was ignored inasmuch as the FDA hierarchy did not believe a problem existed, and there­fore such studies would not be ne­cessary. For their part the Offices of Pharmaceutical Re­search and Compliance in the Bureau of Medicine and the Com­mis­sio­ner’s Office believed that the “Bioavailability Problem” as some called it was a “Content Uni­formity Problem”.13 In 1971 for example, when notified of a “Bioavailability Problem” with a generic di­goxin product, FDA in­vestigated and ascertained that one manufacturer first added all the excipients into a 55-gal drum, then added di­gox­in, closed the lid, and mixed it by rolling the drum across the floor a few times. The content uniformity of those tablets varied from 10% to 156%.
Jerome Philip Skelly (2010)14

Following a ‘Conference on Bioavailability of Drugs’ held at the National Academy of Sciences of the United States in 1971, a guideline was published the following year.15

Oh dear! © 2008 hobvias sudoneighm @ flickr

[…] the mean of AUC of the generic had to be within 20% of the mean AUC of the approved product. At first this was determined by using serum versus time plots on specially weighted paper, cutting the plot out and then weighing each separately.
Jerome Philip Skelly (2010)14

top of section ↩︎ previous section ↩︎

    

80/20 Rule

The FDA’s 80/20 Rule or ‘Power Approach’ (at least 80% power to detect a 20% difference) of 1972 consisted of testing the hypothesis of no difference at the \(\small{\alpha=0.05}\) level of significance.14 16 \[H_0:\;\mu_\text{T}-\mu_\text{R}=0\;vs\;H_1:\;\mu_\text{T}-\mu_\text{R}\neq 0,\tag{1}\] where \(\small{H_0}\) is the null hypothesis of equivalence and \(\small{H_1}\) the alternative hypothesis of inequivalence. \(\small{\mu_\text{T}}\) and \(\small{\mu_\text{R}}\) are the (true) means of \(\small{\text{T}}\) and \(\small{\text{R}}\), respectively. In order to pass the test, the estimated (post hoc, a posteriori, retro­spec­tive) power had to be at least 80%. The power depends on the true value of \(\small{\sigma}\), which is unknown. There exists a value of \(\small{\sigma_{\,0.80}}\) such that if \(\small{\sigma\leq\sigma_{\,0.80}}\), the power of the test of no difference \(\small{H_0}\) is greater or equal to 0.80. Since \(\small{\sigma}\) is unknown, it has to be approximated by the sample standard deviation \(\small{s}\). The Power Approach in a simple 2×2×2 cross­over design then consists of rejecting \(\small{H_0}\) and concluding that \({\small{\mu_\text{T}}}\) and \({\small{\mu_\text{R}}}\) are equivalent if \[-t_{1-\alpha/2,\nu}\leq\frac{\bar{x}_\text{T}-\bar{x}_\text{R}}{s\sqrt{\tfrac{1}{2}\left(\tfrac{1}{n_1}+\tfrac{1}{n_2}\right)}}\leq t_{1-\alpha/2,\nu}\:\text{and}\:s\leq\sigma_{0.80},\tag{2}\] where \(\small{n_1,\,n_2}\) are the number of subjects in sequences 1 and 2, the degrees of freedom \(\small{\nu=n_1+n_2-2}\), and \(\small{\bar{x}_\text{T}\,,\bar{x}_\text{R}}\) are the means of \(\small{\text{T}}\) and \(\small{\text{R}}\), respectively.
Note that this procedure is based on estimated power \(\small{\widehat{\pi}}\), since the true power is a function of the unknown \(\small{\sigma}\). It was the only approach based on post hoc power and was never implemented in any other jurisdiction.

For the example we estimate a power of only 47.2% to detect a 20% difference and the study would fail.

First proposals by the biostatistical community were published.17 18 19 20

top of section ↩︎ previous section ↩︎

    

95% CI

The analysis was performed on untransformed data (i.e., by an additive model assuming normal distributed data) and bio­equi­va­lence was concluded if the 95% con­fi­dence interval (CI) of the point estimate (PE) was entirely within 80 – 120%.

We get for our example in R:

example          <- data.frame(subject   = rep(1:12, each = 2),
                               sequence  = c(rep("TR", 12), rep("RT", 12)),
                               treatment = c(rep(c("T", "R"), 6),
                                             rep(c("R", "T"), 6)),
                               period    = rep(1:2, 12),
                               Y         = c(71, 81, 61, 65, 80, 94,
                                             66, 74, 94, 54, 97, 63,
                                             85, 70, 90, 76, 54, 53,
                                             56, 99, 90, 83, 68, 51))
factors          <- c("subject", "period", "treatment")
example[factors] <- lapply(example[factors], factor) # factorize the data
# additive model (untransformed data, differences); sequence not in the model!
muddle           <- lm(Y ~ subject + period + treatment, data = example)
CI               <- as.numeric(confint(muddle, level = 0.95)["treatmentT", ])
PE               <- coef(muddle)[["treatmentT"]]
# Percentages (flawed!)
mean.T           <- mean(example$Y[example$treatment == "T"])
mean.R           <- mean(example$Y[example$treatment == "R"])
PE.pct           <- 100 * mean.T / mean.R
CI.pct           <- 100 * (CI + mean.R) / mean.R
result           <- data.frame(method = c("differences", "percentages"),
                               PE = c(sprintf("%+.3f", PE),
                                      sprintf("%6.2f%%",  PE.pct)),
                               lower = c(sprintf("%+.3f", CI[1]),
                                         sprintf("%.2f%%",  CI.pct[1])),
                               upper = c(sprintf("%+.3f", CI[2]),
                                         sprintf("%6.2f%%",  CI.pct[2])),
                               BE = c("", "fail"))
if (CI.pct[1] >= 80 & CI.pct[2] <= 120) result$BE[2] <- "pass"
names(result)[3:4] <- c("lower CL", "upper CL")
print(result, row.names = FALSE)
#       method      PE lower CL upper CL   BE
#  differences  +2.250  -12.807  +17.307     
#  percentages 103.09%   82.42%  123.76% fail

If data are analyzed by an additive model the result are dif­ferences. It is a fundamental error to naïvely transform differences to percentages – it would require Fieller’s CI.21 22 However, this was not done back in the day. We get a 95% CI of 82.42 – 123.76%, and the study would fail because the upper con­fi­dence limit (CL) is > 120%.

top of section ↩︎ previous section ↩︎

    

Westlake’s CI

Westlake18 mused that the shortest CI – which is symmetrical about the PE – would be too difficult to comprehend by non-sta­tis­ticians. He suggested to split the t-values in such a way that the probability of the two tails sums to \(\small{\alpha}\) and the respective CI is symmetrical around 0 (or 100%). In the example we obtain ±21.48%, and the study would fail as well because the confidence limits are > ±20%. As above, calculating a percentage is flawed.

However, such a result is misleading. The information about the location of the difference is lost; one cannot know any more whether the BA of \(\small{\text{T}}\) is lower or higher than the one of \(\small{\text{R}}\). Therefore, the method was criticized19 and never implemented in prac­tice. It took me years to convince Certara to remove Westlake’s CI from the results in Phoe­nix Win­Non­lin. In 2016, I was successful with version 6.4… Since then the differences are given in the additive model.

top of section ↩︎ previous section ↩︎

    

The Roaring 1980s

The generic boom started 1984 in the U.S. with the ‘Drug Price Competition and Patent Term Restoration Act’ (informally known as ‘Hatch-Waxman Act’).23

The approval process was different for innovator (originator) and generic companies.

Innovators:

  • Preclinical data
  • Documentation of pharmaceutical quality
  • In clinical phase I documentation of pharmacokinetics (PK) in healthy subjects, dose finding, safety / tolerability, food effect
  • In phase II efficacy & safety in a small groups of patients
  • In phase III demonstration of efficacy & safety versus placebo in well-powered studies:
    Efficacy: Non-Inferiority/Superiority
    Safety: Non-Superiority

Generic companies:

  • Documentation of pharmaceutical quality
  • Not required:
    • Any in vivo study
    • Sometimes comparison of disintegration, rarely comparison of dissolution was performed

Regulatory concerns about generic substitution arose, leading to extensive discussions which method could be used to compare formulations.

  • Pharmaceutical equivalence
  • Bioequivalence (BE)
  • Therapeutic equivalence

There was an early agreement that pharmaceutical equivalence is too permissive and therapeutic equivalence would require extremely large studies in patients.24 Hence, comparing the bioavailability (BA) in healthy volunteers seemed to be a reasonable compromise.17

What is the justification for studying bioequivalence in healthy volunteers?
“Variability is the enemy of therapeutics” and is also the enemy of bioequivalence. We are trying to determine if two dosage forms of the same drug behave similarly. Therefore we want to keep any other variability not due to the dosage forms at a minimum. We choose the least vari­able “test tube”, that is, a healthy vo­lun­teer.
Disease states can definitely change bioavailability, but we are testing for bioequivalence, not bio­avail­ability.

Whereas in pharmacokinetics (PK) by bioavailability exclusively the Area under Curve extrapolated to infinite time (\(\small{AUC_{0-\infty}}\)) is meant, the FDA introduced two new terms, namely

Therefore, PK metrics, whereas PK para­me­ters refer to modeling.
  1. the ‘rate of bioavailability’ – measured by the maximum concentration (\(\small{C_\text{max}}\)) and
  2. the ‘extent of bioavailability’ – measured by the \(\small{AUC}\).

The former is understood as a surrogate for the absorption rate \(\small{k\,_\text{a}}\) in a PK model. I pre­fer – like the ICH3 and the FDA since 200326 – rate and extent of absorption, in order not to contaminate the original meaning of BA in PK.

    

Let us consider the basic equation of pharmacokinetics \[\frac{f\cdot D}{CL}=\frac{f\cdot D}{V\cdot k_\text{ el}}=AUC_{0-\infty}=\int_{0}^{\infty}C(t)\,dt,\tag{3}\] where \(\small{f}\) is the fraction absorbed (we are interested in the comparison of formulations), \(\small{D}\) is the dose, \(\small{CL}\) is the clear­ance, \(\small{V}\) is the apparent volume of distribution, \(\small{k\,_\text{el}}\) is the elimination rate constant, and \(\small{C(t)}\) is the plasma concentration with time. We see immediately that for identical27 doses and invariate28 \(\small{CL}\), \(\small{V}\), \(\small{k\,_\text{el}}\) (which are drug-spe­ci­fic), com­paring the \(\small{AUC}\text{s}\) allows to compare the frac­tions absorbed.

Pharmacokinetics: one of the magic arts of divination whereby needles are stuck into dummies in an attempt to predict profits.
Stephen Senn (2004)

It must be mentioned that \(\small{C_\text{max}}\) is not sensitive to even substantial changes in the rate of absorption \(\small{k\,_\text{a}}\), since it is a composite metric.29 In a one compartment model it depends on \(\small{k\,_\text{a}}\), \(\small{f}\) and both the elimination rate con­stant \(\small{k\,_\text{el}}\) and \(\small{V}\) (or \(\small{CL}\) if you belong to the other church).30 Whereas \(\small{k\,_\text{a}}\) and \(\small{f}\) are properties of the formulation – we are interested in – the others are properties of the drug. \[\eqalign{ t_\textrm{max}&=\frac{\log_{e}(k\,_\text{a}/k\,_\text{el})}{k\,_\text{a}-k\,_\text{el}}\\ C_\textrm{max}&=\frac{f\cdot D\cdot k\,_\text{a}}{V\cdot (k\,_\text{a}-k\,_\text{el})}\large(\small\exp(-k\,_\text{el}\cdot t_\textrm{max})-\exp(-k\,_\text{a}\cdot t_\textrm{max})\large)\tag{4}}\] Therefore, when using it as a surrogate for the absorption rate one must keep in mind that formulations with different fractions absorbed and \(\small{t_\text{max}}\) might show the same \(\small{C_\text{max}}\).
It took ten years before the alternative metric \(\small{C_\text{max}/AUC}\) (based on theo­re­tical considerations and simulations) was proposed.31 32 33 Apart from being independent from \(\small{f}\), it is substantially less variable than \(\small{C_\text{max}}\). Regrett­ably, it was never implemented in any guideline.

    

In the early 1980s originators failed in trying to falsify the concept (i.e., comparing BE in healthy volunteers to large the­ra­peu­tic tqui­va­lence (TE) studies in patients): If BE passed, TE passed as well and vice versa. If they would have succeeded (BE passed while TE failed), generic companies would have to demonstrate TE in order to get pro­ducts approved. Such studies would have to be much larger than the originators’ phase III studies, making them economically infeasible.24 Essentially, that would have meant an early end of the young generic industry.

However, comparative BA is also used by originators in scale-up of formulations used in phase III to the to-be-mar­keted formulation, supporting post-approval changes, in line extensions of approved products, and for testing of drug-drug interactions or food effects. Hence, a substantial part of BE trials are performed by originators. If they had been successful to refute the concept, they would have shot into their own foot.

In the mid 1980s a consensus was reached, i.e., that generic approval should only be acceptable after suitable in vivo equivalence.

The main assumption in BE was (and still is) that ‘similar’ plasma concentrations in healthy volunteers will lead to similar concentrations at the target site (i.e., a receptor) and thus, to similar effects in patients. It was still an open issue whether BE should be interpreted as a surrogate of clinical efficacy/safety or a measure of pharmaceutical quality. Where­as in the 1980s the former was prevalent, since the 1990s the latter is mainstream.
A somewhat naïve interpretation of the PK metrics is that \(\small{AUC}\) directly translates to efficacy and \(\small{C_\text{max}}\) to safety. Especially the latter is not correct because any difference in \(\small{C_\text{max}}\) leads to a relatively smaller difference in the ma­xi­mum effect \(\small{E_\text{max}}\).

There was no consensus about the definition of ‘similarity’ and the statistical methodology to compare plasma profiles. Two early methods are outlined in the following.

top of section ↩︎ previous section ↩︎

    

75/75 Rule

An approach employed by the FDA. Two drugs were considered bioequivalent if at least 75% of subjects show \(\small{\text{T}/\text{R}\textsf{-}}\)ratios within 75 – 125%.14 34 35 It is not a statistic and, thus, was immediately criticized because variable formulations or studies with some ex­treme values may pass the criterion by pure chance.36

    

We get for our example in R:

example       <- data.frame(subject   = rep(1:12, each = 2),
                            sequence  = c(rep("TR", 12), rep("RT", 12)),
                            treatment = c(rep(c("T", "R"), 6),
                                          rep(c("R", "T"), 6)),
                            period    = rep(1:2, 12),
                            Y         = c(71, 81, 61, 65, 80, 94,
                                          66, 74, 94, 54, 97, 63,
                                          85, 70, 90, 76, 54, 53,
                                          56, 99, 90, 83, 68, 51))
rule.75.75    <- reshape(example, idvar = "subject", timevar = "treatment",
                         drop = c("sequence", "period"), direction = "wide")
names(rule.75.75)[2:3] <- c("T", "R")
rule.75.75$T.R <- 100 * (rule.75.75$T / rule.75.75$R)
for (i in 1:nrow(rule.75.75)) {
  if (rule.75.75$T.R[i] >= 75 & rule.75.75$T.R[i] <= 125) {
    rule.75.75$BE[i]     <- TRUE
    rule.75.75$within[i] <- "yes"
  } else {
    rule.75.75$BE[i]     <- FALSE
    rule.75.75$within[i] <- "no"
  }
}
names(rule.75.75)[c(4, 6)] <- c("T/R (%)", "±25%")
BE            <- "Failed BE by the"
if (sum(rule.75.75$BE) / nrow(rule.75.75) >= 0.75) BE <- "Passed BE by the"
print(rule.75.75[, c(1:4, 6)], row.names = FALSE); cat(BE, "75/75 Rule.\n")
#  subject  T  R   T/R (%) ±25%
#        1 71 81  87.65432  yes
#        2 61 65  93.84615  yes
#        3 80 94  85.10638  yes
#        4 66 74  89.18919  yes
#        5 94 54 174.07407   no
#        6 97 63 153.96825   no
#        7 70 85  82.35294  yes
#        8 76 90  84.44444  yes
#        9 53 54  98.14815  yes
#       10 99 56 176.78571   no
#       11 83 90  92.22222  yes
#       12 51 68  75.00000  yes
# Passed BE by the 75/75 Rule.

Nine of the twelve subjects (75%) have a T/R-ratio within 75 – 125% and the study would pass, despite the three subjects with ex­treme \(\small{\text{T}/\text{R}\textsf{-}}\)ratios.

top of section ↩︎ previous section ↩︎

    

t-test

Another suggestion was testing for a statistically significant difference at level \(\small{\alpha=0.05}\). The null hypothesis was that formulations are equal (\(\small{\mu_\text{T}-\mu_\text{R}=0}\)).

Let’s assess our example in R again:

example        <- data.frame(subject   = rep(1:12, each = 2),
                             sequence  = c(rep("TR", 12), rep("RT", 12)),
                             treatment = c(rep(c("T", "R"), 6),
                                           rep(c("R", "T"), 6)),
                             period    = rep(1:2, 12),
                             Y         = c(71, 81, 61, 65, 80, 94,
                                           66, 74, 94, 54, 97, 63,
                                           85, 70, 90, 76, 54, 53,
                                           56, 99, 90, 83, 68, 51))
tt             <- reshape(example, idvar = "subject", timevar = "treatment",
                          drop = c("sequence", "period"), direction = "wide")
tt$T.R         <- tt[, 2] - tt[, 3]
names(tt)[2:4] <- c("T", "R", "T–R")
p              <- t.test(x = tt$T, y = tt$R, paired = TRUE)$p.value
BE             <- "Failed BE"
if (p >= 0.05) BE <- "Passed BE"
print(tt, row.names = FALSE); cat(sprintf("%s by a paired t-test (p = %.4f).\n", BE, p))
#  subject  T  R T–R
#        1 71 81 -10
#        2 61 65  -4
#        3 80 94 -14
#        4 66 74  -8
#        5 94 54  40
#        6 97 63  34
#        7 70 85 -15
#        8 76 90 -14
#        9 53 54  -1
#       10 99 56  43
#       11 83 90  -7
#       12 51 68 -17
# Passed BE by a paired t-test (p = 0.7381).

We calculate a \(\small{p}\)-value of 0.7381, which is statistically not significant (\(\small{\geq\alpha}\)) and the study would pass again.

However, we face a similar problem like with the 75/75 Rule. If the differences show high variability, the study would pass. On the other hand, if there is low variability in the differences, the study would fail. This is counterintuitive and actually the opposite of what regulators want.

    
Interlude 1

One of my early sins37 – it was not the last…
After phenytoin intoxications in Austria38 we compared three ge­ne­rics (containing the free acid like the ori­gi­nator, Na-, or Ca-salt) to the reference in a cross­over design. All formulations have been approved and were marketed in Austria. Although at that time I already calculated a 95% CI, the reviewers of our manuscript insisted in testing for a significant difference ‘because it is state of the art’.


Fig. 1 Phenytoin 3 × 100 mg equivalent, single dose fasting.

Two generics were statistically significant different from the reference (\(\small{\text{T}_1}\) containing the free acid like the originator and \(\small{\text{T}_3}\) containing the Ca-salt). \(\small{\text{T}_2}\) containing the Na-salt was statistically not significant different and, thus, considered equi­va­lent – despite its high \(\small{\text{T}/\text{R}\textsf{-}}\)ratio (Table II). \[\small{ \begin{array}{ccccc} \textsf{Table II}\phantom{0000}\\ \text{formulation} & \text{T}/\text{R (%)} & p & & \text{BE}\\\hline \text{T}_1 & 146 & 0.0195\phantom{6} & \text{*} & \text{fail}\\ \text{T}_2 & 134 & 0.151\phantom{96} & \text{n.s.} & \text{pass}\\ \text{T}_3 & \phantom{1}28 & 0.00596 & \text{**} & \text{fail}\\\hline \end{array}}\] If we would evaluate the study according to current standards (i.e., by the 90% CI inclusion approach based on \(\small{\log_{e}\textsf{-}}\)trans­formed data and acceptance limits of 80.00–125.00%), all generics would fail. \(\small{\text{T}_3}\) would even be bio­in­equi­valent because its upper CL is way below 80% (Table III).
If we would adjust for multiplicity (\(\small{\alpha_\text{adj}=0.05/3=0.1\dot{6}\mapsto 96.6\dot{6}\text{% CI}}\)) – although not required in an explo­ra­tory study – the outcome would be even worse (Table IV). \[\small{\begin{array}{ccccc} \textsf{Table III}\phantom{0000}\\ \text{formulation} & \text{PE (%)} & \text{CL}_\text{lower}\text{(%)} & \text{CL}_\text{upper}\text{ (%)} & \text{BE}\\\hline \text{T}_1 & 151.12 & 118.75 & 192.32 & \text{fail (inconclusive)}\\ \text{T}_2 & 139.39 & \phantom{1}95.91 & 202.60 & \text{fail (inconclusive)}\\ \text{T}_3 & \phantom{1}21.67 & \phantom{1}10.25 & \phantom{2}45.81 & \text{fail (inequivalent)}\\\hline \end{array}}\] \[\small{\begin{array}{ccccc} \textsf{Table IV}\phantom{0000}\\ \text{formulation} & \text{PE (%)} & \text{CL}_\text{lower}\text{(%)} & \text{CL}_\text{upper}\text{ (%)} & \text{BE}\\\hline \text{T}_1 & 151.12 & 106.67 & 214.09 & \text{fail (inconclusive)}\\ \text{T}_2 & 139.39 & \phantom{1}81.20 & 239.28 & \text{fail (inconclusive)}\\ \text{T}_3 & \phantom{1}21.67 & \phantom{10}7.34 & \phantom{2}63.93 & \text{fail (inequivalent)}\\\hline \end{array}}\] Given the nonlinear PK of phenytoin,39 40 switching a patient from the originator to the generics with high \(\small{\text{T}/\text{R}\textsf{-}}\)ratios would be problematic – potentially leading to toxicity after multiple doses. Even worse would be switching from the ge­ne­ric \(\small{\text{T}_3}\) with its low \(\small{\text{T}/\text{R}\textsf{-}}\)ratio to any of the other formulations.

top of section ↩︎ previous section ↩︎

    

ANOVA and beyond

An Analysis of Variance (ANOVA) instead of a t-test allows to take period-effects into account.41 42 43 This decade was also the heyday of Bayesian methods.44 45 46 47 Nomograms for sample size estimation were also Bayesian48 but happily mis­used by frequentists. New parametric49 50 as well as nonparametric methods entered the stage.50 51 Metrics to compare controlled release formulations in steady state were proposed.52 53 54 The first software to evaluate 2×2×2 crossover studies was released in the public domain.55

    

The acceptance range in bioequivalence is based on a ‘clinically relevant difference’ \(\small{\Delta}\), i.e., for data following a lognormal dis­tri­bu­tion \[\left\{\theta_1,\theta_2\right\}=\left\{100\,(1-\Delta),100\,(1-\Delta)^{-1}\right\}\tag{5}\] It must be mentioned that the commonly assumed \(\small{\Delta=20\%}\) leading to \(\small\left\{80.00\%,125.00\%\right\}\)56 is arbitrary (as is any other).

    

An important leap forward was the Two One-Sided Tests Procedure (TOST)16 – al­though it was never implemented in its original form \(\small{(6)}\) in regulatory practice. In­stead, the confidence interval inclusion approach \(\small{(7)}\) made it to the guidelines. Al­though these approaches are operationally identical (i.e., their outcomes [pass | fail] are the same), these are statistically different methods:

  1. The TOST Procedure gives two \(\small{p}\)-values, namely \(\small{p(\theta_0\geq\theta_1)}\) and \(\small{p(\theta_0\leq\theta_2)}\). BE is concluded if both \(\small{p}\)-values are \(\small{\leq\alpha}\).

\[\begin{matrix}\tag{6} H_\textrm{0L}:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}\leq\theta_1\:vs\:H_\textrm{1L}:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}>\theta_1\\ H_\textrm{0U}:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}\geq\theta_2\:vs\:H_\textrm{1U}:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}<\theta_2 \end{matrix}\]

  1. In the CI inclusion approach BE is concluded if the two-sided \(\small{1-2\,\alpha}\) CI lies entirely within the acceptance range \(\small{\left\{\theta_1,\theta_2\right\}}\). For an explanation why a 90% CI (and not a 95% CI like in phase III) is used, see another article. \[H_0:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}\ni\left\{\theta_1,\theta_2\right\}\:vs\:H_1:\theta_1<\frac{\mu_\textrm{T}}{\mu_\textrm{R}}<\theta_2\tag{7}\]

When we evaluate our example by \(\small{(6)}\), we get \(\small{p(\theta_0\geq\theta_1)=0.0155}\) and \(\small{p(\theta_0\leq\theta_2)=0.0515}\). Since one of the \(\small{p\textsf{-}}\)values is \(\small{>\alpha}\), the study would fail.

    
Interlude 2

It is a misconception that a certain CI of a sample (i.e., a particular study) contains the – true but unknown – population mean \(\small{\mu}\) with \(\small{1-\alpha}\) probabilty. Let’s simulate some studies and evaluate them by \(\small{(7)}\):

invisible(library(PowerTOST))
set.seed(123) # for reproducibility of simulations
mue      <- 1 # true population mean
CV       <- 0.25
studies  <- 100
x        <- sampleN.TOST(CV = CV, theta0 = mue, targetpower = 0.8, print = FALSE)
subjects <- x[["Sample size"]]
power    <- x[["Achieved power"]]
# simulate subjects within studies, lognormal distribution
samples  <- data.frame(study     = rep(1:studies, each = subjects * 2),
                       subject   = rep(rep(1:subjects, studies), each = 2),
                       period    = rep(rep(1:2, studies), 2),
                       sequence  = rep(c(rep(c("TR"), subjects),
                                         rep(c("RT"), subjects)), studies),
                       treatment = c(rep(c("T", "R"), subjects / 2),
                                     rep(c("R", "T"), subjects / 2)),
                       Y         = rlnorm(n = subjects * studies * 2,
                                          meanlog = log(mue) - 0.5 * log(CV^2 + 1),
                                          sdlog = sqrt(log(CV^2 + 1))))
facs     <- c("subject", "period", "treatment")
samples[facs] <- lapply(samples[facs], factor) # factorize the data
result   <- data.frame(study = 1:studies, PE = NA_real_,
                       lower = NA_real_, upper = NA_real_,
                       BE = FALSE, contain = TRUE)
grand.PE <- numeric(studies)
for (i in 1:studies) {
  temp           <- samples[samples$study == i, ]
  heretic        <- lm(log(Y) ~ period + subject + treatment, data = temp)
  result$PE[i]   <- 100 * exp(coef(heretic)[["treatmentT"]])
  result[i, 3:4] <- 100 * exp(confint(heretic, level = 0.90)["treatmentT", ])
  if (round(result[i, 3], 2) >= 80 & round(result[i, 4], 2) <= 125)
    result$BE[i] <- TRUE
  if (result$lower[i] > 100 * mue | result$upper[i] < 100 * mue) result$contain[i] <- FALSE
  grand.PE[i]    <- mean(result$PE[1:i]) # (cumulative) grand means
}
dev.new(width = 4.5, height = 4.5)
op       <- par(no.readonly = TRUE)
par(mar = c(3.05, 2.9, 1.4, 0.75), cex.axis = 0.9, mgp = c(2, 0.5, 0))
xlim     <- range(c(min(result$lower), 1e4 / min(result$lower),
                    max(result$upper), 1e4 / max(result$upper)))
plot(1:2, 100 * rep(mue, 2), type = "n", log = "x", xlab = "PE [90% CI]",
     ylab = "study  #", axes = FALSE,
     xlim = xlim, ylim = range(result$study))
abline(v = 100 * c(0.8, mue, 1.25), lty = c(2, 1, 2))
axis(1, at = c(125, pretty(xlim)),
     labels = sprintf("%.0f%%", c(125, pretty(xlim))))
axis(2, at = c(1, pretty(1:studies)[-1]), las = 1)
axis(3, at = 100 * mue, label = expression(mu))
box()
lines(grand.PE, 1:studies, lwd = 2)
for (i in 1:studies) {
  if (result$BE[i]) {       # pass
    clr <- "blue"
  } else {                  # fail
    if (result$contain[i]) {# mue within CI
      clr <- "magenta"
    } else {                # mue not in CI
      clr <- "red"
    }
  }
  lines(c(result$lower[i], result$upper[i]), rep(i, 2), col = clr)
  points(result$PE[i], i, pch = 16, cex = 0.6, col = clr)
}
par(op)


Fig. 2 2×2×2 crossover studies (\(\small{\mu}\) = 100%, \(\small{CV}\) = 25%: \(\small{n}\) = 24 for ≥80% power).

In 7% of studies the population mean \(\small{\mu}\) is not contained in the 90% CI (red lines). In other words, given the result of a single study we can never know where \(\small{\mu}\) lies. Only the grand mean (mean of sample means \(\small{\frac{1}{n}\sum_{i=1}^{i=n}\overline{x_i}}\)) approaches \(\small{\mu}\) for a large num­ber of samples. After the 100th study it is with 99.44% pretty close to \(\small{\mu}\) (for geeks: The convergence is poor; when simulating 25,000 studies, it is 100.23%). How­ever, nobody would repeat a – passing – study (blue lines) for such a rather un­inter­esting information, right?
This explains also why a particular study might fail by pure chance even if a formulation is equivalent (here 15% of studies; red or magenta lines). Such cases are related to the producer’s risk (Type II Error = 1 – power), which is for the given conditions 16.3%. On the other hand, it is also possible that a formulation which is not equivalent might pass. These cases are related to the patient’s risk (Type I Error).
For details see the articles about hypotheses, treatment effects, post hoc power, and sample size estimation. Science is a cruel mistress.

    

At a hearing in 1986 the FDA confirmed that \(\small{(6)}\) or \(\small{(7)}\) of untransformed data should be used with \(\small{\Delta=20\%}\). If clinically relevant, tighter limits (\(\small{\Delta=10\%}\)) might be needed.57

The first German guideline was drafted by the Working Group for Pharmaceutical Pro­cess Engineering (Ar­beits­ge­mein­schaft für Phar­ma­zeu­tische Ver­fah­rens­tech­nik) in 1985.58 It was presented and discussed in 1987.59 60 61

In 1988 wider acceptance limits of 70 – 130% were proposed for \(\small{C_\text{max}}\) due to its inherent high variability62 (as a one-point metric practically always larger than the one of the integrated metric \(\small{AUC}\)).

The Australian draft guideline was published in 1988.63 It was the first covering not only the design and evaluation but also validation of bioanalytical methods. The model with effects period, subject, treatment20 43 was rec­om­mend­ed and a test for se­quence-ef­fects was not considered necessary. The problematic conversion of differences to percentages was acknowledged and Fieller’s CI21 22 discussed. Kudos to both!

In 1989 a series of loose-leaf binders was started.64 It contained raw-data of generic drugs marketed in Germany, the evaluation provided by companies, as well as results recalculated by the ZL (Central Laboratory of German Phar­ma­cists). Including the 6th supplement of 1996 it contained more than 2,000 pages… It was an indispensible resource for planning new studies and also showed the ‘journey’ of dossiers (i.e., the same study being used by different companies).

The BioInternational conference series set milestones in the development of testing for bioequivalence. The first in Toronto 1989 dealt with the \(\small{\log_{e}\textsf{-}}\)transformation of data and the definition of highly variable drugs (HVDs).65 There was a poll among the participants about the \(\small{\log_{e}\textsf{-}}\)transformation. Out­come: ⅓ never, ⅓ always, ⅓ case by case (i.e., perform both analyses and report the one with narrower CI ‘because it fits the data better’). Let’s be silent about the last team.66 HVDs were defined as drugs with intra-subject variabilities of more than 30% but problems might be evident already at 25%.

top of section ↩︎ previous section ↩︎

The Boring (?) 1990s

    

The original acceptance range was symmetrical around 100%. In \(\small{\log_{e}\textsf{-}}\)scale it should be symmetrical around \(\small{0}\) (because \(\small{\log_{e}1=0}\)). What happens to our \(\small{\Delta}\), which should still be 20%? Due to the positive skewness of the lognormal distribution a lively discussion started after early publications proposing 80 – 125%.19 41 Keeping 80 – 120% would have been flawed because the maximum power should be obtained at \(\small{\mu_\text{T}/\mu_\text{R}=1}\) for \[\exp\left((\log_{e}\theta_1+\log_{e}\theta_2)/2\right),\tag{8}\] which works only if \(\small{\theta_2=\theta_1^{-1}}\) or \(\small{\theta_1=\theta_2^{-1}}\). Keeping the original limits, maximum power would be obtained at \(\small{\mu_\text{T}/\mu_\text{R}=\exp((\log_{e}0.8+\log_{e}1.2)/2)\approx0.979796}\).


Fig. 3 Power for a 2×2×2 design and limits 0.80 – 1.20.
Note that the \(\small{\theta}\)-axis is in log-scale.

There were three parties (all agreed that the acceptance range should be symmetrical in \(\small{\log_{e}\textsf{-}}\)scale and consequently asymmetrical when back-transformed). These were their arguments and suggestions:

The width of the acceptance range was 40% and we have empiric evidence that the concept of BE ‘worked’ – let’s keep it.
\[\left\{\theta_1,\theta_2\right\}=81.98-121.98\%\tag{9}\]
Since that’s a new method, we don’t want to face safety issues with a higher limit. Furthermore, a more restrictive lower limit prevents issues with insufficient efficacy.
\[\left\{\theta_1,\theta_2\right\}=\left\{100/(1+\Delta),100\,(1+\Delta)\right\}=8\dot{3}.33-120\%\tag{10}\]
80% as the lower limit served us well in the past. Hence, 125% is the way to go because it is simply the reciprocal of the lower limit and the coverage probability in the log-domain is the same like the one we had. Furthermore, these are nice numbers.

\[\left\{\theta_1,\theta_2\right\}=\left\{100\,(1-\Delta),100/(1-\Delta)\right\}=80-125\%\tag{11}\]

    

The 90% CI inclusion approach \(\small{(7)}\) based on \(\small{\log_{e}\textsf{-}}\)transformed data with acceptance limits of 80.00 – 125.00% \(\small{(5)}\) was the winner.


Fig. 4 Power for a 2×2×2 design and limits 0.80 – 1.25.
Note the symmetry: power for any \(\small{1/\theta=\theta}\).

First sample size tables for the multiplicative model with the acceptance range 80 – 125% were published67 and ex­tended for narrower (90 – 111%) and wider (70 – 143%) acceptance ranges.68 The nonparametric method was improved taking period-effects into account.69 70 Drug-drug and food-in­ter­action studies should be assessed for equi­va­lence.71 The general applicability of average BE was challenged and the concept of individual and population bioequivalence outlined.72 73 74 The first textbook dealing exclusively with BA/BE was published.75

This was also the decade of updated and new guidelines. A European draft guidance was published in 1990;76 the final guideline was published in December 1991 and came into force in June 1992.77 The 90% CI inclusion approach of \(\small{\log_{e}\textsf{-}}\)transformed data with an acceptance range of 80 – 125% was recommended and for NTIDs the acceptance range may need to be tightened. Due to its inherent higher variability a wider acceptance range may be acceptable for \(\small{C_\text{max}}\). If inevitable and clinically acceptable, a wider acceptance range may also be used for \(\small{AUC}\). Only if clinically relevant, a nonparametric analysis of \(\small{t_\text{max}}\) was re­comm­end­ed.
An in vivo stuy was not required if the new formulation is

  1. to be parenterally administered as a solution and contains the same API(s) and excipients in the same concentrations as the reference or
  2. is a liquid oral form in solution (elixir, syrup, etc.) containing the API(s) in the same concentration and form as the reference, not containing excipients that may significantly affect gastric passage or absorption of the active substance.

Similar statements about solutions were given in all later guidelines. The second lead to application of the Bio­phar­ma­ceu­tic Classi­fi­cation System (BCS).78 More about that later.

In July 1992 the first guidance of the FDA was published.79 An ANOVA of \(\small{\log_{e}\textsf{-}}\)transformed data was re­com­mend­ed and the nested subject(sequence) term in the statistical model entered the scene. It must be mentioned that in com­pa­rative BA studies subjects are usually uniquely coded. Hence, the term subject(sequence) is a bogus one80 and could be replaced by the simple subject as well (see below for an example). Regrettably this model was implemented in all global guidelines ever since.

In the same year the Canadian guidance for Immediate Release (IR) formulations was published.81 To that time is was the most extensive one because it gave not only the method of evaluation, but information about the study design, sample size, ethics, bioanalytics, etc. It differed from the others in the relaxed requirement for \(\small{C_\text{max}}\), where only the \(\small{\text{T}/\text{R}\textsf{-}}\)ratio has to lie within 80 – 125% (instead of its CI).

In 1998 the World Health Organization published its first guideline,82 which was similar to the European one.

Table V shows the result of the example evaluated by the various methods. \[\small{\begin{array}{lcccc} \textsf{Table V}\phantom{0}\\ \phantom{0}\text{Method} & \text{Model} & \text{PE} & \text{power},p,\text{CI} & \text{BE?}\\\hline \text{80/20 Rule} & \text{additive} & - & 47.22\% & \text{fail}\\ \text{TOST} & \text{additive} & +2.250\;(103.09\%) & 0.0155,\,0.0515 & \text{fail}\\ \text{95% CI} & \text{additive} & +2.250\;(103.09\%) & -12.807\,,+17.307\;(82.61-123.76\%) & \text{fail}\\ \text{Westlake} & \text{additive} & \pm0.000\;(100.00\%) & \pm16.143\;(\pm21.48\%) & \text{fail}\\\hline \text{80/20 Rule} & \text{multiplicative} & - & 73.57\% & \text{fail}\\ \text{TOST} & \text{multiplicative} & 102.82\% & 0.0099,\,0.0283 & \text{pass}\\ \text{90% CI} & \text{multiplicative} & 102.82\% & \phantom{1}87.25-121.17\% & \text{pass}\\ \text{Westlake} & \text{multiplicative} & 100.00\% & \pm17.72\% & \text{pass}\\ \text{75/75 Rule} & \text{multiplicative} & - & - & \text{pass}\\\hline \end{array}}\] In the additive model the acceptance range was 80 – 120%, whereas in the multiplicative model it is 80 – 125%. Since in the former differences are assessed – wrong – percentages are given in brackets.

    

As of today only the 90% CI inclusion approach is globally accepted. Our example in R again:

example       <- data.frame(subject   = rep(1:12, each = 2),
                            sequence  = c(rep("TR", 12), rep("RT", 12)),
                            treatment = c(rep(c("T", "R"), 6),
                                          rep(c("R", "T"), 6)),
                            period    = rep(1:2, 12),
                            Y         = c(71, 81, 61, 65, 80, 94,
                                          66, 74, 94, 54, 97, 63,
                                          85, 70, 90, 76, 54, 53,
                                          56, 99, 90, 83, 68, 51))
facs          <- c("subject", "sequence", "treatment", "period")
example[facs] <- lapply(example[facs], factor) # factorize the data
txt           <- paste("nested model : period, subject(sequence), treatment",
                       "\nsimple model : period, subject, sequence, treatment",
                       "\nheretic model: period, subject, treatment\n\n")
result        <- data.frame(model = c("nested", "simple", "heretic"),
                            PE = NA, lower = NA, upper = NA, BE = "fail", na = 0)
for (i in 1:3) {
  if (result$model[i] == "nested") { # bogus nested model (guidelines)
    nested         <- lm(log(Y) ~ period +
                                  subject %in% sequence +
                                  treatment, data = example)
    result$PE[i]   <- 100 * exp(coef(nested)[["treatmentT"]])
    result[i, 3:4] <- 100 * exp(confint(nested, level = 0.90)["treatmentT", ])
    result[i, 6]   <- sum(is.na(coef(nested)))
  }
  if (result$model[i] == "simple") { # simple model (subjects are uniquely coded)
    simple         <- lm(log(Y) ~ period +
                                  subject +
                                  sequence +
                                  treatment, data = example)
    result$PE[i]   <- 100 * exp(coef(simple)[["treatmentT"]])
    result[i, 3:4] <- 100 * exp(confint(simple, level = 0.90)["treatmentT", ])
    result[i, 6]   <- sum(is.na(coef(simple)))
  }
  if (result$model[i] == "heretic") { # heretic model (without sequence)
    heretic        <- lm(log(Y) ~ period +
                                  subject +
                                  treatment, data = example)
    result$PE[i]   <- 100 * exp(coef(heretic)[["treatmentT"]])
    result[i, 3:4] <- 100 * exp(confint(heretic, level = 0.90)["treatmentT", ])
    result[i, 6]   <- sum(is.na(coef(heretic)))
  }
  # rounding acc. to guidelines
  if (round(result[i, 3], 2) >= 80 & round(result[i, 4], 2) <= 125)
    result$BE[i] <- "pass"
}
# cosmetics
result$PE     <- sprintf("%6.2f%%", result$PE)
result$lower  <- sprintf("%6.2f%%", result$lower)
result$upper  <- sprintf("%6.2f%%", result$upper)
names(result)[c(3:4, 6)] <- c("lower CL", "upper CL", "NE")
cat(txt); print(result, row.names = FALSE)
# nested model : period, subject(sequence), treatment 
# simple model : period, subject, sequence, treatment 
# heretic model: period, subject, treatment
# 
#    model      PE lower CL upper CL   BE NE
#   nested 102.82%   87.25%  121.17% pass 13
#   simple 102.82%   87.25%  121.17% pass  1
#  heretic 102.82%   87.25%  121.17% pass  0

As already outlined above, the nested model recommended in all [sic] guidelines is over-specified because subjects are uniquely coded. In the example we get 13 not estimable (aliased) effects (in the output of R lines with NA, in SAS ., and in Win­Non­lin not estimable). Correct, because we asking for something the data cannot provide.80 In the simple mod­el only one effect cannot be estimated. Even sequence can be removed from the model. I call it he­re­tic because regulators will grill you if you are using it. It was the model proposed by Westlake20 43 and I used it in hundreds (‼) of my stud­ies. Note that the results of all models are exactly the same; if you don’t believe me, try it with one of your stud­ies.

    

A ‘Positive List’ was published by the German regulatory authority, i.e., for 90 drugs BE was not required.83 In order to comply with the European Note for Guidance of 200184 it had to be removed by the BfArM.

Two (of five) sessions of the BioInternational ’92 conference in Bad Homburg dealt with BE of Highly Variable Drugs.85 86 Vari­ous approaches have been discussed: Multiple dose instead of single dose studies, metabolite instead of the parent compound, stable isotope tech­niques,87 add-on designs, and – for the first time – replicate designs.

Although the BioInternational 2 in Munich 1994 was with over 600 participants the largest in the series, no sub­stan­ti­al progress for HVD(P)s was achieved.88 Following a suggestion89 at a joint AAPS/FDA workshop in 1995 widening the conventional acceptance limits of 80.00 – 125.00% was considered.90

For some highly variable drugs and drug products, the bioequivalence standard should be modified by changing the BE limits while maintaining the current confidence interval at 90%. […] the bioequivalence limits should be determined based in part upon the intrasubject varia­bility for the reference product.
Shah et al. (1996)90

A hot topic ever since… Why are we discussing it for 35 (‼) years (since the first Bio­Inter­national conference)? Is it really that com­pli­cated91 or are we too stupid?

Studies in steady-state were proposed as an option for HVD(P)s in a European draft guideline92 but was removed from the final version of 2001.84

Validation of bioanalytical methods93 94 95 96 was partly covered in Australia and Canada. However, no specific guideline existed. A series of conferences (informally known as ‘Crys­tal City’) was initiated in 1990.97 Procedures stated in the conference report98 were discussed at the Bio­In­ter­na­tio­nal 2 in Munich 1994 and quickly adopted by bioanalytical sites. Updates were subsequently published.99 100

TODO: SUPAC (FDA)

top of section ↩︎ previous section ↩︎

21st century

    

After a wealth of – controversal – publications in the 1990s,72 73 74 101 102 103 104 105 106 107 108 109 the FDA introduced two new concepts as alternatives to average bio­equi­va­lence (ABE), namely population bioequivalence (PBE) and individual bio­equi­va­lence (IBE).110 ABE focuses only on the comparison of po­pu­lation averages of the PK metrics and not their variances of formulations. It does also not assess a sub­ject-by-for­mu­lat­ion interaction variance, that is, the variation in the average \(\small{\text{T}}\) and \(\small{\text{R}}\) difference among individuals. In contrast, PBE and IBE include com­pa­ri­sons of both averages and variances of PK metrics. The PBE approach assesses total variability of the PK metrics in the population. The IBE approach assesses within-subject variability for the \(\small{\text{T}}\) and \(\small{\text{R}}\) formulations, as well as the sub­ject-by-formulation interaction.
Demonstrated PBE would support ‘Prescribability’ (i.e., a drug naïve patient could start treatment with a generic), whereas IBE support ‘Switchability’ (i.e., a patient could switch formulations during treatment).109 Contrary to ABE, both PBE and IBE require studies in a full replicate design, which means that both \(\small{\text{T}}\) and \(\small{\text{R}}\) are administered twice. The acceptance limits for ABE were kept at 80.00–125.00% but for the others scaling to the variability of the reference was possible. That would mean an incentive for test formulations with lower variability than the reference but a penalty for ones with a higher variability.

However, the underlying statistical concepts were not trivial and the result practically incomprehensible for non-statisticians. Furthermore, both approaches had a discontinuity (when moving from constant- to reference-scaling), which lead to an inflated type I error (patient’s risk) of approximately 6.5% if CVwR 18.1–20.2%.110 111 112
The PBE/IBE faced criticism, e.g., »responses [to the guidance] were still doubt-filled as to whether the new bioequivalence criteria really provided added value compared to average bioequivalence«113 and was regarded a »‘theoretical’ solution to a ‘thoretical’ problem«114 leading to its omission from a subsequent guid­ance,115 and a return to conventional ABE.116

[ABE should suffice based upon grounds of] ‘practicality, plausibility, historical adequacy, and purpose’ and ‘because we have better things to do.’ […] ‘Statisticians have a bad track record in bioequivalence, […] the literature is full of ludicrous recommendations from statisticians, […] regulatory recommendations (of dubious validity) have been hastily implemented, and practical realities have been ignored’.
Stephen Senn (2000)117

I remember a Dutch regulator standing up in the BioInternational conference (London 2003) saying: »I’m glad that PBE and IBE are dead. I never understood them.«

    

Poland happily adopted Germany’s ‘Positive List’83 only when it wanted to join the European Union to learn that in the mean­time Germany abandoned it. Until 2015 a similar (but shorter) list existed in The Netherlands for national market authori­sa­tions only. Must have been a schizophrenic situation for assessors of the MEB: In the morning a dossier for national MA with­out any in vivo comparison → . In the afternoon another dossier of the same product in the course of a European submission. BE performed, but lower 90% CI 79.99% → . Bizarre.
Until 2012 Denmark required for NTIDs that the 90% CI had to include 100% (i.e., that there is no significant treatment effect). Bizarre as well. For details see Example 3 in this article.

The first bioanalytical method validation guidance was published by the FDA in 2001 and revised in 2018.118 119 Before the European draft guideline was published in 2009,120 some inspectors raised an eyebrow if sites worked according to the FDA’s guidance.

The validation of bioanalytical methods and the analysis of study samples should be per­form­ed in accordance with the principles of Good Laboratory Practice (GLP). However, as human bio­ana­ly­ti­cal studies fall outside of the scope of GLP, as defined in Directive 2004/10/EC, the sites con­duct­ing the human studies are not required to be monitored as part of a national GLP compliance programme.
EMEA (2009)120

Well roared, lions! My CRO was GLP-certified since 1991, although we performed only phase I studies. In other countries (e.g., Spain), this was not possible. In Germany GLP is subject to state law. Hence, it was possible to get certified in one federal state but not in another… However, this ‘issue’ was resolved with the final guideline published in 2011121 and the ICH M10 guideline of 2022,122 superseding all local guidelines.

TODO: BCS-based biowaivers, reference-scaling, two-stage designs, NTIDs, current guidelines in various jurisdictions…

Still unresolved or not harmonized issues:

  1. Scaled ABE for HVD(P)s (RSABE123 or ABEL124 125 126);
    control of the type I error,127 agreement on which of the metrics can be scaled, outliers124 125
  2. Method for NTIDs (fixed narrower acceptance limits124 or reference-scaling123 128)
  3. Comparison of ‘early exposure’129 if clinically relevant? (\(\small{t_\text{max}}\) by a nonparametric method or first partial \(\small{AUC}\)); see also this article
  4. Cut-off times of partial \(\small{AUC}\textsf{s}\) (based on PD – like the FDA or PK – like the EMA?)
  5. Alternative surrogate for the rate of absorption (\(\small{C_\text{max}/AUC}\)31 32 33)?
  6. Reduce variability of \(\small{AUC}\)130 of HVDs by using \(\small{AUC/\hat{\lambda}_z}\)?
  7. Studies in fed state mandatory?
  8. Multiple dose studies of modified release products really131 necessary?
  9. Adaptive sequential two-stage designs (only exact or simulation-based as well?)
  10. Potency-correction if measured contents differ by more than 5% (arbitrary)
See also some of my presentations, a – somewhat outdated – collection of guidelines, and further readings on the topic.112 113 132 133 134 135 136 137 138 139 140 141
    

A word of warning: The textbooks dealing with statistics (marked with ★ in the references) are rather tough cookies and not recommended for beginners.

top of section ↩︎ previous section ↩︎

Acknowledgments

Henning Blume and José Augusto Guimarães Morais for discussions about the Bio­Inter­national conferences and early days of bioequivalence.

Licenses

CC BY 4.0 Helmut Schütz 2024
R GPL 3.0, klippy MIT, pandoc GPL 2.0.
1st version April 9, 2024. Rendered May 1, 2024 18:03 CEST by rmarkdown via pandoc in 0.09 seconds.

Footnotes and References


  1. Lindenbaum J, Preibisz JJ, Butler VP Jr., Saha JR. Variation in digoxin bioavailabity: a continuing problem. J Chron Dis. 1973; 16: 749–54. Open Access Open Access.↩︎

  2. DeSante KA, DiSanto AR, Chodos DJ, Stoll RG. Antibiotic Batch Certification and Bioequivalence. JAMA. 1975; 232(13): 1349–51. doi:10.1001/jama.1975.03250130033016.↩︎

  3. International Council for Har­mo­ni­sa­tion of Techni­cal Require­ments for Pharmaceuticals for Human Use. Bioequivalence for Immediate-Release Solid Oral Dosage Forms. M13A. Draft version 20 December 2022. Online.↩︎

  4. Hall DG, In: Hearing Before the Subcommittee on Monopolies Select Committee on Small Business. U.S. Senate, Government Printing Office, Washington D.C. 1967: 258–81.↩︎

  5. Tyrer JH, Eadie MJ, Sutherland JM, Hooper WD. Outbreak of anticonvulsant intoxication in an Australian city. Br Med J. 1970; 4: 271–3. doi:10.1136/bmj.4.5730.271. Open Access Open Access.↩︎

  6. Bochner F, Hooper WD, Tyrer JH, Eadie MJ. Factors involved in an outbreak of phenytoin intoxications. J Neurol Sci. 1972; 16(4): 481–7. doi:10.1016/0022-510x(72)90053-6.↩︎

  7. Lund L. Clinical significance of generic inequivalence of three different pharmaceutical preparations of phenytoin. Eur J Clin Phar­ma­col. 1974; 7: 119–24. doi:10.1007/bf00561325.↩︎

  8. Lindenbaum J, Mellow MH, Blackstone MO, Butler VP. Variations in biological activity of digoxin from four preparations. N Engl J Med. 1971; 285(24): 1344–7. doi:10.1056/nejm197112092852403.↩︎

  9. Jounela AJ, Pentikäinen PJ, Sothmann. Effect of particle size on the bioavalability of digoxin. Eur J Clin Phar­ma­col. 1975; 8(5): 365–70. doi:10.1007/BF00562664.↩︎

  10. Richton-Hewett S, Foster E, Apstein CS. Medical and Economic Consequences of a Blinded Oral Anticoagulant Brand Change at a Municipal Hospital. Arch Intern Med. 1988; 148(4): 806–8. doi:10.1001/archinte.1988.00380040046010.↩︎

  11. Weinberger M, Hendeles L, Bighley L, Speer J. The Relation of Product Formulation to Absorption of Oral Theo­phyl­line. N Engl J Med. 1978; 299(16): 852–7. doi:10.1056/nejm197810192991603.↩︎

  12. Bielmann B, Levac TH, Langlois Y, L Tetreault L. Bioavailability of primidone in epileptic patients. Int J Clin Phar­ma­col. 1974; 9(2): 132–7. PMID 4208031↩︎

  13. Skelly JP, Knapp G. Biologic availability of digoxin tablets. JAMA. 1973; 224(2): 243. doi:10.1001/jama.1973.03220150051015.↩︎

  14. Skelly JP. A History of Biopharmaceutics in the Food and Drug Administration 1968–1993. AAPS J. 2010; 12(1): 44–50. doi:10.1208/s12248-009-9154-8. PMC Free Full Text Free Full Text.↩︎

  15. APhA Academy of Pharmaceutical Sciences. Guidelines for Biopharmaceutic Studies in Man. Washington D.C. February 1972.↩︎

  16. Schuirmann DJ. A comparison of the Two One-Sided Tests Procedure and the Power Approach for Assessing the Equivalence of Average Bioavailability. J Pharmacokin Bio­pharm. 1987; 15(6): 657–80. doi:10.1007/BF01068419.↩︎

  17. Metzler CM. Bioavailability – A Problem in Equivalence. Biometrics. 1974; 30(2): 309–17. PMID 4833140.↩︎

  18. Westlake WJ. Symmetrical Confidence Intervals for Bioequivalence Trials. Bio­metrics. 1976; 32(4): 741–4. PMID 1009222.↩︎

  19. Mantel N. Do We Want Confidence Intervals Symmetrical About the Null Value? Bio­metrics. 1977; 33: 759–60. [Letter to the Editor]↩︎

  20. Westlake WJ. Design and Evaluation of Bioequivalence Studies in Man. In: Blanchard J, Sawchuk RJ, Brodie BB, editors. Prin­cip­les and perspectives in Drug Bio­avail­abi­li­ty. Basel: Karger; 1979. ISBN 3-8055-2440-4. p. 192–210.↩︎

  21. Fieller EC. Some Problems In Interval Estimation. J Royal Stat Soc B. 1954; 16(2): 175–85. JSTOR:2984043.↩︎

  22. Locke CS. An Exact Confidence Interval from Untransformed Data for the Ratio of Two Formulation Means. J. Phar­ma­co­kin. Biopharm. 1984; 12(6): 649–55. doi:10.1007/bf01059558.↩︎

  23. Public Law 98-417. Sept. 24, 1984. Online.↩︎

  24. In phase III we try to demonstrate that verum performs ‘better’ than placebo, i.e., one-sided tests for non-inferiority (effect) and non-superiority (adverse reactions). Such studies are already large: Approving sta­tins and CO­VID-19 vaccines required ten thousands volunteers. Can you imagine how many it would need to detect a 20% difference between two treatments?↩︎

  25. Benet LZ. Why Do Bioequivalence Studies in Healthy Volunteers? 1st MENA Regulatory Conference on Bio­equi­va­lence, Bio­wai­vers, Bioanalysis and Dissolution. Amman. 23 September 2013.  Internet Archive.↩︎

  26. Office of the Federal Register. Code of Federal Regulations, Title 21, Part 320, Subpart A, § 320.23(a)(1) Online.↩︎

  27. This is an assumption, i.e., based on the labelled content instead of the measured potency.↩︎

  28. Yet another assumption. Incorrect for highly variable drugs and, thus, inflates the confidence interval.↩︎

  29. Tóthfálusi L, Endrényi L. Estimation of Cmax and Tmax in Populations After Single and Multiple Drug Ad­mi­ni­stra­tion. J Pharma­co­kin Pharma­codyn. 2003; 30(5): 363–85. doi:10.1023/b:jopa.0000008159.97748.09.↩︎

  30. In models with more than one compartment \(\small{t_\text{max}}\) and \(\small{C_\text{max}}\) cannot be analytically derived. In software numeric optimization is employed to locate the maximum of the function.↩︎

  31. Endrényi L, Fritsch S, Yan W. Cmax/AUC is a clearer measure than Cmax for absorption rates in investigations of bio­equi­va­lence. Int J Clin Pharmacol Ther Toxicol. 1991; 29(10): 394–9. PMID 1748540.↩︎

  32. Schall R, Luus HG. Comparison of absorption rates on bioequivalence studies of immediate release drug dormulations. Int J Clin Phar­ma­col Ther To­xi­col. 1992; 30(5): 153–9. PMID 1592542.↩︎

  33. Endrényi L, Yan W. Variation of Cmax and Cmax/AUC in investigations of bio­equi­va­lence. Int J Clin Pharm Ther To­xi­col. 1993; 31(4): 184–9. PMID 8500920.↩︎

  34. Haynes JD. Statistical simulation study of new proposed uniformity requirement for bioequivalency studies. J Pharm Sci. 1981; 70(6): 673–5. doi:10.1002/jps.2600700625.↩︎

  35. Cabana BE. Assessment of 75/75 Rule: FDA Viewpoint. Pharm Sci. 1983; 72(1): 98–99. doi:10.1002/jps.2600720127.↩︎

  36. Haynes JD. FDA 75/75 Rule: A Response. Pharm Sci. 1983; 72: 99–100.↩︎

  37. Nitsche V, Mascher H, Schütz H. Comparative bioavailability of several phenytoin preparations marketed in Austria. Int J Clin Pharmacol Ther Toxicol. 1984; 22(2): 104–7. PMID 6698663.↩︎

  38. Klingler D, Nitsche V, Schmidbauer H. Hydantoin-Intoxikation nach Austausch schein­bar gleich­wertiger Di­phenyl­hy­dan­toin-Präparate. Wr Med Wschr. 1981; 131: 295–300. [German]↩︎

  39. Glazko AJ, Chang T, Bouhema J, Dill WA, Goulet JR, Buchanan RA. Metabolic disposition of diphenylhydantoin in normal human subjects following intravenous administration. Clin Pharmacol Ther. 1969; 10(4): 498–504. doi:10.1002/cpt1969104498.↩︎

  40. Bochner F, Hooper WD, Tyrer JH, Eadi MJ. Effect of dosage increments on blood pheny­toin concentrations. J Neu­rol Neuro­surg Psychiatr. 1972; 35(6): 873–6. doi:10.1136/jnnp.35.6.873.↩︎

  41. Kirkwood TBL. Bioequivalence Testing – A Need to Rethink [reader reaction]. Biometrics. 1981, 37: 589—91. doi:10.2307/2530573.↩︎

  42. Westlake WJ. Response to Bioequivalence Testing – A Need to Rethink [reader reaction response]. Biometrics. 1981, 37: 591—93.↩︎

  43. Westlake WJ. Bioavailability and Bioequivalence of Pharmaceutical Formulations. In: Pearce KE, editor. Bio­phar­ma­ceu­tical Statistics for Drug Development. New York: Marcel Dekker; 1988. p. 329–53. ISBN 0-8247-7798-0.↩︎

  44. Rodda BE, Davis RL. Determining the probability of an important difference in bio­availability. Clin Pharmacol Ther. 1980; 28: 247–52. doi:10.1038/clpt.1980.157.↩︎

  45. Mandallaz D, Mau J. Comparison of Different Methods for Decision-Making in Bio­equi­valence Assessment. Bio­me­trics. 1981; 37: 213–22. PMID 6895040.↩︎

  46. Fluehler H, Hirtz J, Moser HA. An Aid to Decision-Making in Bioequivalence Assessment. J Pharmacokin Bio­pharm. 1981; 9: 235–43. doi:10.1007/BF01068085.↩︎

  47. Selwyn MR, Hall NR. On Bayesian Methods for Bioequivalence. Biometrics. 1984; 40: 1103–8. PMID 6398710.↩︎

  48. Fluehler H, Grieve AP, Mandallaz D, Mau J, Moser HA. Bayesian Approach to Bio­equivalence Assessment: An Example. J Pharm Sci. 1983; 72(10): 1178–81. doi:10.1002/jps.2600721018.↩︎

  49. Anderson S, Hauck WW. A New Procedure for Testing Bioequivalence in Comparative Bioavailability and Other Clinical Trials. Commun Stat Ther Meth. 1983; 12(23): 2663–92. doi:10.1080/03610928308828634.↩︎

  50. Steinijans VW, Diletti E. Statistical Analysis of Bioavailability Studies: Parametric and Nonparametric Confidence Intervals. Eur J Clin Pharmacol. 1983; 24: 127–36. doi:10.1007/BF00613939.↩︎

  51. Steinijans VW, Diletti E. Generalization of Distribution-Free Confidence Intervals for Bioavailability Ratios. Eur J Clin Phar­ma­col. 1985; 28: 85–8. doi:10.1007/BF00635713.↩︎

  52. Steinijans VW, Schulz H-U, Beier W, Radtke HW. Once daily theophylline: multiple-dose comparison of an encapsulated micro-osmotic system (Euphylong) with a tablet (Uniphyllin). Int J Clin Pharm Ther Toxi­col. 1986; 24(8): 438–47. PMID 3759279.↩︎

  53. Steinijans VW. Pharmacokinetic Characteristics of Controlled Release Products and Their Biostatistical Analysis. In: Gundert-Remy U, Möller H, editors. Oral Controlled Release Products – Therapeutic and Biopharmaceutic Assess­ment. Stutt­gart: Wis­sen­schaftliche Verlagsanstalt; 1988, p. 99–115.↩︎

  54. Blume H, Siewert M, Steinijans V. Bioäquivalenz von per os applizierten Retard-Arzneimitteln; Konzeption der Stu­dien und Ent­scheidung über Austauschbarkeit. Pharm Ind. 1989; 51: 1025–33. [German]↩︎

  55. Wijnand HP, Timmer CJ. Mini-computer programs for bioequivalence testing of pharmaceutical drug formulations in two-way cross-over studies. Comput Programs Bio­med. 1983; 17(1–2): 73–88. doi:10.1016/0010-468x(83)90027-2.↩︎

  56. Where did it come from? Two stories:
    Les Benet told that there was a poll at the FDA and – essentially based on gut feeling – the 20% saw the light of day.
    I’ve heard another one, which I like more. Wilfred J. Westlake, one of the pioneers of BE was a statistician at SKF. During a coffee and cig break (everybody was smoking in the 1970s) he asked his fellows of the clinical pharmacology department »Which difference in blood concentrations do you consider relevant?« Yep, the 20% were born.↩︎

  57. Rheinstein P. Report by the Bioequivalence Task Force on Recommendations from the Bioequivalence Hearing conducted by the Food and Drug Administration. September 29 – October 1986. January 1988.↩︎

  58. APV. Richtlinie und Kommentar. Pharmazeutische Industrie. 1985; 47(6): 627–32. [German]↩︎

  59. Arbeitsgemeinschaft Pharmazeutische Verfahrenstechnik (APV). International Symposium. Bioavail­abi­lity/Bio­equi­va­lence, Pharmaceutical Equivalence and The­ra­peu­tic Equivalence. Würzburg. 9–11 February, 1987.↩︎

  60. Junginger H. APV-Richtlinie – »Untersuchungen zur Bioverfügbarkeit, Bioäquivalenz« Pharm Ztg. 1987; 132: 1952–55. [German]↩︎

  61. Junginger H. Studies on Bioavailability and Bioequivalence – APV Guideline. Drugs Made in Germany. 1987; 30: 161–6.↩︎

  62. Blume H, Kübel-Thiel K, Reutter B, Siewert M, Stenzhorn G. Nifedipin: Monographie zur Prüfung der Bio­ver­füg­bar­keit / Bio­äqui­va­lenz von schnell-freisetzenden Zubereitungen (1). Pharm Ztg. 1988; 133(6): 398–93. [German]↩︎

  63. TGA. Guidelines for Bioavailability and Bioequivalency Studies. Draft C06:6723c (29/11/88).↩︎

  64. Blume H, Mutschler E. Bioäquivalenz – Qualitätsbewertung wirkstoffgleicher Fertigarzneimittel: An­lei­tung-Me­tho­den-Ma­te­ri­a­lien. Frank­furt/Main: Govi-Ver­lag; 1989. [German]↩︎

  65. McGilveray IJ, Midha KK, Skelly JP, Dighe S, Doluiso JT, French IW, Karim A, Burford R. Consensus Report from “Bio In­ter­na­tional ’89”: Issues in the Evaluation of Bioavailability Data. J Pharm Sci. 1990; 79(10): 945–6. doi:10.1002/jps.2600791022.↩︎

  66. Keene ON. The log transformation is special. Stat Med. 1995; 14(8): 811–9. doi:10.1002/sim.4780140810. Open Access Open Access.↩︎

  67. Diletti E, Hauschke D, Steinijans VW. Sample size determination for bioequivalence assessment by means of confidence intervals. Int J Clin Pharm Ther Toxicol. 1991; 29(1): 1–8. PMID 2004861.↩︎

  68. Diletti E, Hauschke D, Steinijans VW. Sample size determination: Extended tables for the multiplicative model and bioequivalence ranges of 0.9 to 1.11 and 0.7 to 1.43. Int J Clin Pharm Ther Toxicol. 1992; 30(Suppl.1): S59–62. PMID 1601533.↩︎

  69. Hauschke D, Steinijans VW, Diletti E. A distribution-free procedure for the statistical analysis of bioequivalence studies. Int J Clin Pharm Ther Toxicol. 1990; 28(2): 72–8.↩︎

  70. Steinijans VW, Hauschke D. Update on the statistical analysis of bioequivalence studies. Int J Clin Pharm Ther To­xi­col. 1990; 28(3): 105–10. PMID 2318545.↩︎

  71. Steinijans VW, Hartmann M, Huber R, Radtke HW. Lack of pharmacokinetic interaction as an equivalence problem. Int J Clin Pharm Ther To­xi­col. 1991; 29(8): 323–8. PMID 1835963.↩︎

  72. Anderson S, Hauck WW. Consideration of individual bioequivalence. J Phar­ma­co­kinet Biopharm 1990; 18(3): 259–73. doi:10.1007/bf01062202.↩︎

  73. Schall R, Luus HG. On population and individual bioequivalence. Stat Med 1993; 12(12): 1109–24. doi:10.1002/sim.4780121202.↩︎

  74. Schall R. A unified view of individual, population, and average bioequivalence. In: Blume HH, Midha KK, editors. Bio-Inter­na­tio­nal 2. Bioavailability, Bioequivalence and Pharmacokinetic Studies. Stuttgart: med­pharm; 1995: 91–106.↩︎

  75. Chow S-C, Liu J-p. Design and Analysis of Bioavailability and Bioequivalence Studies. New York: Marcel Dekker; 1992. ISBN 0-8247-8682-3. ★↩︎

  76. CPMP Working Party. Investigation of Bioavailabilty and Bioequivalence: Note for Guidance. III/54/89-EN, 8th Draft. June 1990.↩︎

  77. Commission of the European Community. Investigation of Bioavailabilty and Bio­equivalence. Brussels. December 1991. Online.↩︎

  78. Amidon GL, Lennernäs H, Shah VV, Crison JR. A Theoretical Basis for a Biopharmaceutic Drug Classification: The Correlation of in Vitro Drug Product Dissolution and in Vivo Bioavailability. Pharm Res. 1995; 12(3): 413–20. doi:10.1023/a:1016212804288. Open Access Open Access.↩︎

  79. FDA, CDER. Guidance for Industry. Statistical Procedures for Bioequivalence Studies using a Standard Two-Treat­ment Crossover Design. Rockville. Jul 1992.  Internet Archive.↩︎

  80. If Subject 1 is randomized to sequence \(\small{\text{TR}}\), there is not ‘another’ Subject 1 randomized to sequence \(\small{\text{RT}}\). Ran­dom­iza­tion is not like Schrödinger’s cat. Hence, the nested term in the guidelines is an insult to the mind.↩︎

  81. Health Canada, HPFB. Guidance for Industry. Conduct and Analysis of Bioavailability and Bioequivalence Studies – Part A: Oral Dosage FormulationsUsed for Systemic Effects. Ottawa. 1992. Online.↩︎

  82. WHO Marketing Authorization of Pharmaceutical Products with Special Reference to Multisource (Generic) Pro­ducts: A Manual for Drug Regulatory Authorities. Geneva. 1998. Internet Archive.↩︎

  83. Gleiter CH, Klotz U, Kuhlmann J, Blume H, Stanislaus F, Harder S, Paulus H, Poethko-Müller C, Holz-Slomczyk M. (1998), When Are Bioavailability Studies Required? A German Proposal. J Clin Pharmacol. 1998 38: 904–11. doi:10.1002/j.1552-4604.1998.tb04385.x. Open Access Open Access.↩︎

  84. EMEA, CPMP. Note for Guidance on the Investigation of Bioavailability and Bio­equi­va­lence. London. 26 July 2001. Online.↩︎

  85. Midha KK, Blume HH, editors. Bio-International. Bioavailability, Bio­equi­va­lence and Pharmacokinetics. Stutt­gart: med­pharm; 1993. ISBN 3-88763-019-X.↩︎

  86. Blume HH, Midha KK. Bio-International 92, Conference on Bioavailability, Bioequivalence, and Pharmacokinetic Studies. J Pharm Sci. 1993; 82(11): 1186–9. doi:10.1002/jps.2600821125.↩︎

  87. Simultaneous administration of a stable isotope labelled IV dose would allow to calculate the true clearance in each period. Then it would not be necessary to assume identical clearances in \(\small{(3)}\) any more and the problem of highly vari­able drugs (inflating the CI) could be avoided. However, it would require that the IV formulation is manufactured according to the rules of cGMP and different from the internal standard in MS, which is generally not feasible. Such an approach is only mentioned in Japanese guidelines.↩︎

  88. Blume HH, Midha KK, editors. Bio-International 2. Bioavailability, Bioequivalence and Pharmacokinetic Studies. Stutt­gart: med­pharm; 1995.↩︎

  89. Boddy AW, Snikeris FC, Kringle RO, Wei GCG, Opperman JA, Midha KK. An approach for widening the bio­equi­va­lence acceptance limits in the case of highly variable drugs. Pharm Res. 1995; 12(12): 1865–8. doi:10.1023/a:1016219317744.↩︎

  90. Shah VP, Yacobi A, Barr WH, Benet LZ, Breimer D, Dobrinska MR, Endrényi L, Fairweather W, Gillespie W, Gonzalez MA, Hooper J, Jackson A, Lesko LL, Midha KK, Noonan PK, Patnaik R, Williams RL. Workshop Report. Evaluation of Orally Ad­mi­nis­tered Highly Variable Drugs and Drug Formulations. Pharm Res. 1996; 13(11): 1590–4. doi:10.1023/a:1016468018478.↩︎

  91. Schütz H, Labes D, Wolfsegger MJ. Critical Remarks on Reference-Scaled Average Bioequivalence. J Pharm Phar­ma­ceut Sci. 25: 285–96. doi:10.18433/jpps32892.↩︎

  92. EMEA Human Medicines Evaluation Unit / CPMP. Note for Guidance on the Investigation of Bioavailability and Bio­equi­va­lence. Draft. London. 17 December 1998.↩︎

  93. Brooks MA, Weifeld RE. A Validation Process for Data from the Analysis of Drugs in Biological Fluids. Drug Devel Ind Pharm. 1985; 11: 1703–28.↩︎

  94. Pachla LA, Wright DS, Reynolds DL. Bioanalytical Considerations for Pharmacokinetic and Biopharmaceutic Studies. J Clin Phar­ma­col. 1986; 26(5): 332–5. doi:10.1002/j.1552-4604.1986.tb03534.x.↩︎

  95. Buick AR, Doig MV, Jeal SC, Land GS, McDowall RD, Method Validation in the Bioanalytical Laboratory. J Pharm Biomed Anal. 1990; 8(8–12): 629–37. doi:10.1016/0731-7085(90)80093-5. Open Access Open Access.↩︎

  96. Karnes ST, Shiu G, Shah VP. Validation of Bioanalytical Methods. Pharm Res. 1991; 8(4): 421–6. doi:10.1023/a:1015882607690.↩︎

  97. AAPS, FDA, FIP, HPB, AOAC. Analytical Methods Validation: Bioavailability, Bioequivalence and Pharma­co­ki­netic Studies. Arlington, VA. December 3–5, 1990.↩︎

  98. Shah VP, Midha KK, Dighe S, McGilveray IJ, Skelly JP, Yacobi A, Layloff T, Viswanathan CT, Cook CE, McDowall RD, Pittman, Spector S. Analytical methods validation: Bioavailability, bioequivalence and pharmacokinetic studies. Eur J Drug Metab Phar­ma­co­kinet. 1991 ;16(4):249–55. doi:10.1007/bf03189968.↩︎

  99. Shah VP, Midha KK, Findlay JWA, Hill HM, Hulse JD, McGilveray IJ, McKay G, Miller KJ, Patnaik RN, Powell ML, Tonelli A, Viswanathan CT, Yacobi A. Bioanalytical Method Validation – A Revisit with a Decade of Progress. Pharm Res. 2000; 17: 1551–7. doi:10.1023/a:1007669411738↩︎

  100. Viswanathan CT, Bansal S, Booth B, DeStefano AJ, Rose MJ, Sailstad J, Shah VP, Skelly JP, Swann PG, Weiner R. Workshop / Conference Report – Quantitative Bioanalytical Methods Validation and Implementation: Best Prac­ti­ces for Chro­ma­to­graphic and Ligand Binding Assays. AAPS J. 2007; 24(10): 1962–73. doi:10.1007/s11095-007-9291-7.↩︎

  101. Anderson S. Individual Bioequivalence: A problem of Switchability. Biopharm Rep. 1993; 2(2): 1–11.↩︎

  102. Endrényi L, Schulz M. Individual Variation and the Acceptance of Average Bioequivalence. Drug Inform J. 1993; 27(1): 195–201. doi:10.1177/009286159302700135.↩︎

  103. Endrényi L. A method for the evaluation of individual bioequivalence. Int J Clin Pharmacol. 1994; 32(9): 497–508. PMID 7820334.↩︎

  104. Esinhart JD, Chinchilli VM. Extension to use of tolerance intervals for the assessment of individual bioequivalence. J Biopharm Stat. 1994; 4: 39–52. doi:10.1080/10543409408835071.↩︎

  105. Chow S-C, Liu J-p. Current issues in bioequivalence trials. Drug Inform J. 1995; 29: 795–804. doi:10.1177/009286159502900302.↩︎

  106. Chen ML. Individual bioequivalence. A regulatory update. J Biopharm Stat. 1997. 7(1): 5–11. doi:10.1080/10543409708835162.↩︎

  107. Hauck WW, Anderson S. Commentary on individual bioequivalence by ML Chen. J Biopharm Stat. 1997; 7(1): 13–6. doi:10.1080/10543409708835163.↩︎

  108. Liu J-p, Chow S-C. Some thoughts on individual bioequivalence. J Biopharm Stat. 1997; 7(1): 41–8. doi:10.1080/10543409708835168.↩︎

  109. Midha KK, Rawson MJ, Hubbard JW. Prescribability and switchability of highly variable drugs and drug products. J Contr Rel. 1999; 62(1-2): 33–40. doi:10.1016/s0168-3659(99)00050-4.↩︎

  110. FDA, CDER. Guidance for Industry. Statistical Approaches to Establishing Bio­equi­va­lence. Rockville. Jan 2001. Download.↩︎

  111. Chow S-C, Shao J, Wang H. Individual bioequivalence testing under 2 × 3 designs. Stat Med. 2002; 21(5): 629–48. doi:10.1002/sim.1056.↩︎

  112. Chow S-C, Liu J-p. Design and Analysis of Bioavailability and Bioequivalence Studies. Boca Raton: Chapman & Hall/CRC Press; 3rd edition 2009. ISBN 978-1-58488-668-6. ★ p. 596–8.↩︎

  113. Hauschke D, Steinijans VW, Pigeot I. Bioequivalence Studies in Drug Development. Methods and Applications. Chichester: Wiley; 2007. ISBN 0-470-09475-3. ★ p. 209.↩︎

  114. Patterson S. A Review of the Development of Biostatistical Design and Analysis Techniques for Assessing In Vivo Bioequivalence: Part Two. Ind J Pharm Sci. 2001; 63(3): 169–86. Open Access Open Access.↩︎

  115. FDA, CDER. Guidance for Industry. Bioavailability and Bioequivalence Studies for Orally Administered Drug Pro­ducts — General Considerations. Rockville. March 2003.  Internet Archive.↩︎

  116. Schall R, Endrényi L. Bioequivalence: tried and tested. Cardiovasc J Afr. 2010. 21(2): 69–70. PMCID 3721767. PMC Free Full text.↩︎

  117. Senn S. Conference Proceedings: Challenging Statistical Issues in Clinical Trials. Decisions and Bioequivalence. 2000.↩︎

  118. FDA, CDER, CVM. Guidance for Industry. Bioanalytical Method Validation. Rockville. May 2001.  Internet Archive.↩︎

  119. FDA, CDER, , CVM. Guidance for Industry. Bioanalytical Method Validation. Silver Spring. May 2018. Download.↩︎

  120. EMEA, CHMP. Guideline on Validation of Bioanalytical Methods. Draft. London. 19 November 2009. Online.↩︎

  121. EMA, CHMP. Guideline on Validation of Bioanalytical Methods. London. 21 July 2011. Online.↩︎

  122. ICH. Bioanalytical Method Validation And Study Sample Analysis. M10. 22 May 2022. Online.↩︎

  123. FDA, CDER. Guidance for Industry. Bioequivalence Studies With Pharmacokinetic Endpoints for Drugs Submitted Under an ANDA. Draft. Silver Spring. August 2021. Download.↩︎

  124. EMEA, CHMP. Guideline on the Investigation of Bioequivalence. London. 20 January 2010. Online.↩︎

  125. Health Canada. Guidance Document. Comparative Bioavailability Standards: Formulations Used for Sys­temic Effects. Ottawa. 2018/06/08. Online.↩︎

  126. WHO/PQT: medicines. Application of reference-scaled criteria for AUC in bioequivalence studies conducted for sub­mis­sion to PQT/MED. Geneva. 02 July 2021. Online.↩︎

  127. Schütz H. Highly Variable Drugs and Type I Error. Presentation at: 6th International Workshop – GBHI 2024. Rockville, MD. 16 April 2024. Online.↩︎

  128. Paixão P, García Arieta A, Silva N, Petric Z, Bonelli M, Morais JAG, Blake K, Gouveia LF. A Two-Way Proposal for the De­ter­mi­nation of Bioequivalence for Narrow Therapeutic Index Drugs in the European Union. Pharmaceut. 2024; 16: 598. doi:10.3390/pharmaceutics16050598. Open Access Open Access.↩︎

  129. Hofmann J. Bioequivalence of early exposure: tmax & pAUC. Presentation at: BioBridges. Prague. 21 September 2023. Online.↩︎

  130. Abdallah HY. An area correction method to reduce intrasubject variability in bioequivalence studies. J Pharm Phar­ma­ceut Sci. 1998; 1(2): 60–5. Open Access Open Access.↩︎

  131. Paixão P, Gouveia LF, Morais JAG. An alternative single dose parameter to avoid the need for steady-state studies on oral ex­tend­ed-release drug products. Eur J Phar­ma­ceut Bio­phar­ma­ceut. 2012; 80(2): 410–7. doi:10.1016/j.ejpb.2011.11.001.↩︎

  132. Senn S. Cross-over Trials in Clinical Research. Chichester: Wiley; 2nd edition 2002. ISBN 0-471-49653-7. ★↩︎

  133. Wellek S. Testing Statistical Hypotheses of Equivalence. Boca Raton: Chapman & Hall/CRC; 2003. ISBN 978-1-5848-8160-5. ★↩︎

  134. Amidon G, Lesko L, Midha K, Shah V, Hilfinger J. International Bioequivalence Standards: A New Era. Ann Arbor: TSRL; 2006. ISBN 10-0-9790119-0-6.↩︎

  135. Kanfer I, Shargel L, editors. Generic Product Development. International Regulatory Requirements for Bio­equi­va­lence. New York: informa healthcare; 2010. ISBN 978-0-8493-7785-3.↩︎

  136. Bolton S, Bon C. Pharmaceutical Statistics. Practical and Clinical Applications. New York: informa healthcare; 5th edition 2010. ISBN 978-1-4200-7422-2. ★↩︎

  137. Davit B, Braddy AC, Conner DP, Yu LX. International Guidelines for Bio­equi­va­lence of Systemically Available Oral­ly Ad­mi­ni­stered Generic Drug Products: A Survey of Similarities and Differences. AAPS J. 2013; 15(4): 974–90. doi:10.1208/s12248-013-9499-x. PMC Free Full Text Free Full Text.↩︎

  138. Yu LX, Li BV, editors. FDA Bioequivalence Standards. New York: Springer; 2014. ISBN 978-1-4939-1251-0.↩︎

  139. Jones B, Kenward MG. Design and Analysis of Cross-Over Trials. Boca Raton: CRC Press. 3rd edition 2015. ISBN 978-1-4398-6142-4. ★↩︎

  140. Kanfer I, editor. Bioequivalence Requirements in Various Global Jurisdictions. New York: Springer; 2017. ISBN 978-3-319-88542-1.↩︎

  141. Patterson S, Jones B. Bioequivalence and Statistics in Clinical Pharmacology. Boca Raton: CRC Press; 2nd edition 2019. ISBN 978-0-3677-8244-3. ★↩︎