Consider allowing JavaScript. Otherwise, you have to be proficient in reading since formulas will not be rendered. Furthermore, the table of contents in the left column for navigation will not be available. Sorry for the inconvenience.
If this article is perceived as overly focused on statistics, I apologize. This is due to my professional background, which has led me to be less skilled at crafting engaging narratives.
‘Bioavailability’ (a portmanteau of ‘biologic availability’) in its current meaning was coined in 19731 and ‘Bioequivalence’ saw the light of day in 1975.2
The MeSH term ‘Biological Availability’ was introduced in 1979.The site of action (i.e., a receptor) is inaccessible. There should be no space for believes in science. The best definition of bioequivalence is given by the ICH.3
“Two drug products containing the same drug substance(s) are considered bioequivalent if their relative bioavailability (BA) (rate and extent of drug absorption) after administration in the same molar dose lies within acceptable predefined limits. These limits are set to ensure comparable in vivo performance, i.e., similarity in terms of safety and efficacy.
We will use a simple example in the following. A two-treatment two-sequence two-period (2×2×2) crossover design, subjects 1–6 were in sequence \(\small{\text{TR}}\) and subjects 7–12 in sequence \(\small{\text{RT}}\). \[\small{\begin{array}{ccc} \textsf{Table I}\phantom{0}\\ \text{subject} & \text{T} & \text{R}\\\hline \phantom{1}1 & 71 & 81\\ \phantom{1}2 & 61 & 65\\ \phantom{1}3 & 80 & 94\\ \phantom{1}4 & 66 & 74\\ \phantom{1}5 & 94 & 54\\ \phantom{1}6 & 97 & 63\\ \phantom{1}7 & 70 & 85\\ \phantom{1}8 & 76 & 90\\ \phantom{1}9 & 54 & 53\\ 10 & 99 & 56\\ 11 & 83 & 90\\ 12 & 51 & 68\\\hline \end{array}}\]
Problems were reported with formulations of Narrow Therapeutic Index Drugs (NTIDs) like phenytoin,4 5 6 7 digoxin,1 8 9 warfarin,10 theophylline,11 primidone.12 Some show nonlinear pharmacokinetics (phenytoin) or are auto-inducers (warfarin).
Generic drugs in the current sense did not yet exist at that time; only the content had to meet the USP requirements.
“Although in 1969 Professor John Wagner demonstrated to the Bureau of Medicine, methods for comparing areas under the serum versus time curve (AUC) to estimate bioequivalence, his approach was ignored inasmuch as the FDA hierarchy did not believe a problem existed, and therefore such studies would not be necessary. For their part the Offices of Pharmaceutical Research and Compliance in the Bureau of Medicine and the Commissioner’s Office believed that the “Bioavailability Problem” as some called it was a “Content Uniformity Problem”.13 In 1971 for example, when notified of a “Bioavailability Problem” with a generic digoxin product, FDA investigated and ascertained that one manufacturer first added all the excipients into a 55-gal drum, then added digoxin, closed the lid, and mixed it by rolling the drum across the floor a few times. The content uniformity of those tablets varied from 10% to 156%.
Following a ‘Conference on Bioavailability of Drugs’ held at the National Academy of Sciences of the United States in 1971, a guideline was published the following year.15
© 2008 hobvias sudoneighm @ flickr
“[…] the mean of AUC of the generic had to be within 20% of the mean AUC of the approved product. At first this was determined by using serum versus time plots on specially weighted paper, cutting the plot out and then weighing each separately.
top of section ↩︎ previous section ↩︎
The FDA’s
80/20 Rule or ‘Power Approach’ (at least 80% power to detect a 20%
difference) of 1972 consisted of testing the hypothesis of no difference
at the \(\small{\alpha=0.05}\) level of
significance.14 16 \[H_0:\;\mu_\text{T}-\mu_\text{R}=0\;vs\;H_1:\;\mu_\text{T}-\mu_\text{R}\neq
0,\tag{1}\] where \(\small{H_0}\) is the null
hypothesis of equivalence and \(\small{H_1}\) the alternative
hypothesis of inequivalence. \(\small{\mu_\text{T}}\) and \(\small{\mu_\text{R}}\) are the (true) means
of \(\small{\text{T}}\) and \(\small{\text{R}}\), respectively. In order
to pass the test, the estimated (post hoc, a
posteriori, retrospective) power had to be at least 80%. The power
depends on the true value of \(\small{\sigma}\), which is unknown. There
exists a value of \(\small{\sigma_{\,0.80}}\) such that if
\(\small{\sigma\leq\sigma_{\,0.80}}\),
the power of the test of no difference \(\small{H_0}\) is greater or equal to 0.80.
Since \(\small{\sigma}\) is unknown, it
has to be approximated by the sample standard deviation \(\small{s}\). The Power Approach in a simple
2×2×2 crossover design then consists of rejecting \(\small{H_0}\) and concluding that \({\small{\mu_\text{T}}}\) and \({\small{\mu_\text{R}}}\) are equivalent if
\[-t_{1-\alpha/2,\nu}\leq\frac{\bar{x}_\text{T}-\bar{x}_\text{R}}{s\sqrt{\tfrac{1}{2}\left(\tfrac{1}{n_1}+\tfrac{1}{n_2}\right)}}\leq
t_{1-\alpha/2,\nu}\:\text{and}\:s\leq\sigma_{0.80},\tag{2}\]
where \(\small{n_1,\,n_2}\) are the
number of subjects in sequences 1 and 2, the degrees of freedom \(\small{\nu=n_1+n_2-2}\), and \(\small{\bar{x}_\text{T}\,,\bar{x}_\text{R}}\)
are the means of \(\small{\text{T}}\)
and \(\small{\text{R}}\),
respectively.
Note that this procedure is based on estimated power \(\small{\widehat{\pi}}\), since the
true power is a function of the unknown \(\small{\sigma}\). It was the only approach
based on post hoc power and
was never implemented in any other jurisdiction.
For the example we estimate a power of only 47.2% to detect a 20% difference and the study would fail.
First proposals by the biostatistical community were published.17 18 19 20
top of section ↩︎ previous section ↩︎
The analysis was performed on untransformed data (i.e., by an additive model assuming normal distributed data) and bioequivalence was concluded if the 95% confidence interval (CI) of the point estimate (PE) was entirely within 80 – 120%.
We get for our example in R:
<- data.frame(subject = rep(1:12, each = 2),
example sequence = c(rep("TR", 12), rep("RT", 12)),
treatment = c(rep(c("T", "R"), 6),
rep(c("R", "T"), 6)),
period = rep(1:2, 12),
Y = c(71, 81, 61, 65, 80, 94,
66, 74, 94, 54, 97, 63,
85, 70, 90, 76, 54, 53,
56, 99, 90, 83, 68, 51))
<- c("subject", "period", "treatment")
factors <- lapply(example[factors], factor) # factorize the data
example[factors] # additive model (untransformed data, differences); sequence not in the model!
<- lm(Y ~ subject + period + treatment, data = example)
muddle <- as.numeric(confint(muddle, level = 0.95)["treatmentT", ])
CI <- coef(muddle)[["treatmentT"]]
PE # Percentages (flawed!)
<- mean(example$Y[example$treatment == "T"])
mean.T <- mean(example$Y[example$treatment == "R"])
mean.R <- 100 * mean.T / mean.R
PE.pct <- 100 * (CI + mean.R) / mean.R
CI.pct <- data.frame(method = c("differences", "percentages"),
result PE = c(sprintf("%+.3f", PE),
sprintf("%6.2f%%", PE.pct)),
lower = c(sprintf("%+.3f", CI[1]),
sprintf("%.2f%%", CI.pct[1])),
upper = c(sprintf("%+.3f", CI[2]),
sprintf("%6.2f%%", CI.pct[2])),
BE = c("", "fail"))
if (CI.pct[1] >= 80 & CI.pct[2] <= 120) result$BE[2] <- "pass"
names(result)[3:4] <- c("lower CL", "upper CL")
print(result, row.names = FALSE)
# method PE lower CL upper CL BE
# differences +2.250 -12.807 +17.307
# percentages 103.09% 82.42% 123.76% fail
If data are analyzed by an additive model the result are differences. It is a fundamental error to naïvely transform differences to percentages – it would require Fieller’s CI.21 22 However, this was not done back in the day. We get a 95% CI of 82.42 – 123.76%, and the study would fail because the upper confidence limit (CL) is > 120%.
top of section ↩︎ previous section ↩︎
Westlake18 mused that the shortest CI – which is symmetrical about the PE – would be too difficult to comprehend by non-statisticians. He suggested to split the t-values in such a way that the probability of the two tails sums to \(\small{\alpha}\) and the respective CI is symmetrical around 0 (or 100%). In the example we obtain ±21.48%, and the study would fail as well because the confidence limits are > ±20%. As above, calculating a percentage is flawed.
However, such a result is misleading. The information about the location of the difference is lost; one cannot know any more whether the BA of \(\small{\text{T}}\) is lower or higher than the one of \(\small{\text{R}}\). Therefore, the method was criticized19 and never implemented in practice. It took me years to convince Certara to remove Westlake’s CI from the results in Phoenix WinNonlin. In 2016, I was successful with version 6.4… Since then the differences are given in the additive model.
top of section ↩︎ previous section ↩︎
The generic boom started 1984 in the U.S. with the ‘Drug Price Competition and Patent Term Restoration Act’ (informally known as ‘Hatch-Waxman Act’).23
The approval process was different for innovator (originator) and generic companies.
Innovators:
Generic companies:
Regulatory concerns about generic substitution arose, leading to extensive discussions which method could be used to compare formulations.
There was an early agreement that pharmaceutical equivalence is too permissive and therapeutic equivalence would require extremely large studies in patients.24 Hence, comparing the bioavailability (BA) in healthy volunteers seemed to be a reasonable compromise.17
“What is the justification for studying bioequivalence in healthy volunteers?
“Variability is the enemy of therapeutics” and is also the enemy of bioequivalence. We are trying to determine if two dosage forms of the same drug behave similarly. Therefore we want to keep any other variability not due to the dosage forms at a minimum. We choose the least variable “test tube”, that is, a healthy volunteer.
Disease states can definitely change bioavailability, but we are testing for bioequivalence, not bioavailability.
Whereas in pharmacokinetics (PK) by bioavailability exclusively the Area under Curve extrapolated to infinite time (\(\small{AUC_{0-\infty}}\)) is meant, the FDA introduced two new terms, namely
The former is understood as a surrogate for the absorption rate \(\small{k\,_\text{a}}\) in a PK model. I prefer – like the ICH3 and the FDA since 200326 – rate and extent of absorption, in order not to contaminate the original meaning of BA in PK.
Let us consider the basic equation of pharmacokinetics \[\frac{f\cdot D}{CL}=\frac{f\cdot D}{V\cdot k_\text{ el}}=AUC_{0-\infty}=\int_{0}^{\infty}C(t)\,dt,\tag{3}\] where \(\small{f}\) is the fraction absorbed (we are interested in the comparison of formulations), \(\small{D}\) is the dose, \(\small{CL}\) is the clearance, \(\small{V}\) is the apparent volume of distribution, \(\small{k\,_\text{el}}\) is the elimination rate constant, and \(\small{C(t)}\) is the plasma concentration with time. We see immediately that for identical27 doses and invariate28 \(\small{CL}\), \(\small{V}\), \(\small{k\,_\text{el}}\) (which are drug-specific), comparing the \(\small{AUC}\text{s}\) allows to compare the fractions absorbed.
“Pharmacokinetics: one of the magic arts of divination whereby needles are stuck into dummies in an attempt to predict profits.
It must be mentioned that \(\small{C_\text{max}}\) is not sensitive to
even substantial changes in the rate of absorption \(\small{k\,_\text{a}}\), since it is a
composite metric.29 In a one compartment model it depends on
\(\small{k\,_\text{a}}\), \(\small{f}\) and both the elimination
rate constant \(\small{k\,_\text{el}}\)
and \(\small{V}\) (or \(\small{CL}\) if you belong to the other
church).30 Whereas \(\small{k\,_\text{a}}\) and \(\small{f}\) are properties of the
formulation – we are interested in – the others are properties
of the drug. \[\eqalign{
t_\textrm{max}&=\frac{\log_{e}(k\,_\text{a}/k\,_\text{el})}{k\,_\text{a}-k\,_\text{el}}\\
C_\textrm{max}&=\frac{f\cdot D\cdot k\,_\text{a}}{V\cdot
(k\,_\text{a}-k\,_\text{el})}\large(\small\exp(-k\,_\text{el}\cdot
t_\textrm{max})-\exp(-k\,_\text{a}\cdot
t_\textrm{max})\large)\tag{4}}\] Therefore, when using it
as a surrogate for the absorption rate one must keep in mind that
formulations with different fractions absorbed and \(\small{t_\text{max}}\) might show the same
\(\small{C_\text{max}}\).
It took ten years before the alternative metric \(\small{C_\text{max}/AUC}\) (based on
theoretical considerations and simulations) was proposed.31 32 33 Apart from being
independent from \(\small{f}\), it is
substantially less variable than \(\small{C_\text{max}}\). Regrettably, it was
never implemented in any guideline.
In the early 1980s originators failed in trying to falsify the concept (i.e., comparing BE in healthy volunteers to large therapeutic tquivalence (TE) studies in patients): If BE passed, TE passed as well and vice versa. If they would have succeeded (BE passed while TE failed), generic companies would have to demonstrate TE in order to get products approved. Such studies would have to be much larger than the originators’ phase III studies, making them economically infeasible.24 Essentially, that would have meant an early end of the young generic industry.
However, comparative BA is also used by originators in scale-up of formulations used in phase III to the to-be-marketed formulation, supporting post-approval changes, in line extensions of approved products, and for testing of drug-drug interactions or food effects. Hence, a substantial part of BE trials are performed by originators. If they had been successful to refute the concept, they would have shot into their own foot.
In the mid 1980s a consensus was reached, i.e., that generic approval should only be acceptable after suitable in vivo equivalence.
The main assumption in BE was
(and still is) that ‘similar’ plasma concentrations in healthy
volunteers will lead to similar concentrations at the target site
(i.e., a receptor) and thus, to similar effects in patients. It
was still an open issue whether BE
should be interpreted as a surrogate of clinical efficacy/safety or a
measure of pharmaceutical quality. Whereas in the 1980s the former was
prevalent, since the 1990s the latter is mainstream.
A somewhat naïve interpretation of the
PK metrics is that \(\small{AUC}\) directly translates to
efficacy and \(\small{C_\text{max}}\)
to safety. Especially the latter is not correct because any difference
in \(\small{C_\text{max}}\) leads to a
relatively smaller difference in the maximum effect \(\small{E_\text{max}}\).
There was no consensus about the definition of ‘similarity’ and the statistical methodology to compare plasma profiles. Two early methods are outlined in the following.
top of section ↩︎ previous section ↩︎
An approach employed by the FDA. Two drugs were considered bioequivalent if at least 75% of subjects show \(\small{\text{T}/\text{R}\textsf{-}}\)ratios within 75 – 125%.14 34 35 It is not a statistic and, thus, was immediately criticized because variable formulations or studies with some extreme values may pass the criterion by pure chance.36
We get for our example in R:
<- data.frame(subject = rep(1:12, each = 2),
example sequence = c(rep("TR", 12), rep("RT", 12)),
treatment = c(rep(c("T", "R"), 6),
rep(c("R", "T"), 6)),
period = rep(1:2, 12),
Y = c(71, 81, 61, 65, 80, 94,
66, 74, 94, 54, 97, 63,
85, 70, 90, 76, 54, 53,
56, 99, 90, 83, 68, 51))
75.75 <- reshape(example, idvar = "subject", timevar = "treatment",
rule.drop = c("sequence", "period"), direction = "wide")
names(rule.75.75)[2:3] <- c("T", "R")
75.75$T.R <- 100 * (rule.75.75$T / rule.75.75$R)
rule.for (i in 1:nrow(rule.75.75)) {
if (rule.75.75$T.R[i] >= 75 & rule.75.75$T.R[i] <= 125) {
75.75$BE[i] <- TRUE
rule.75.75$within[i] <- "yes"
rule.else {
} 75.75$BE[i] <- FALSE
rule.75.75$within[i] <- "no"
rule.
}
}names(rule.75.75)[c(4, 6)] <- c("T/R (%)", "±25%")
<- "Failed BE by the"
BE if (sum(rule.75.75$BE) / nrow(rule.75.75) >= 0.75) BE <- "Passed BE by the"
print(rule.75.75[, c(1:4, 6)], row.names = FALSE); cat(BE, "75/75 Rule.\n")
# subject T R T/R (%) ±25%
# 1 71 81 87.65432 yes
# 2 61 65 93.84615 yes
# 3 80 94 85.10638 yes
# 4 66 74 89.18919 yes
# 5 94 54 174.07407 no
# 6 97 63 153.96825 no
# 7 70 85 82.35294 yes
# 8 76 90 84.44444 yes
# 9 53 54 98.14815 yes
# 10 99 56 176.78571 no
# 11 83 90 92.22222 yes
# 12 51 68 75.00000 yes
# Passed BE by the 75/75 Rule.
Nine of the twelve subjects (75%) have a T/R-ratio within 75 – 125% and the study would pass, despite the three subjects with extreme \(\small{\text{T}/\text{R}\textsf{-}}\)ratios.
top of section ↩︎ previous section ↩︎
Another suggestion was testing for a statistically significant difference at level \(\small{\alpha=0.05}\). The null hypothesis was that formulations are equal (\(\small{\mu_\text{T}-\mu_\text{R}=0}\)).
Let’s assess our example in R again:
<- data.frame(subject = rep(1:12, each = 2),
example sequence = c(rep("TR", 12), rep("RT", 12)),
treatment = c(rep(c("T", "R"), 6),
rep(c("R", "T"), 6)),
period = rep(1:2, 12),
Y = c(71, 81, 61, 65, 80, 94,
66, 74, 94, 54, 97, 63,
85, 70, 90, 76, 54, 53,
56, 99, 90, 83, 68, 51))
<- reshape(example, idvar = "subject", timevar = "treatment",
tt drop = c("sequence", "period"), direction = "wide")
$T.R <- tt[, 2] - tt[, 3]
ttnames(tt)[2:4] <- c("T", "R", "T–R")
<- t.test(x = tt$T, y = tt$R, paired = TRUE)$p.value
p <- "Failed BE"
BE if (p >= 0.05) BE <- "Passed BE"
print(tt, row.names = FALSE); cat(sprintf("%s by a paired t-test (p = %.4f).\n", BE, p))
# subject T R T–R
# 1 71 81 -10
# 2 61 65 -4
# 3 80 94 -14
# 4 66 74 -8
# 5 94 54 40
# 6 97 63 34
# 7 70 85 -15
# 8 76 90 -14
# 9 53 54 -1
# 10 99 56 43
# 11 83 90 -7
# 12 51 68 -17
# Passed BE by a paired t-test (p = 0.7381).
We calculate a \(\small{p}\)-value of 0.7381, which is statistically not significant (\(\small{\geq\alpha}\)) and the study would pass again.
However, we face a similar problem like with the 75/75 Rule. If the differences show high variability, the study would pass. On the other hand, if there is low variability in the differences, the study would fail. This is counterintuitive and actually the opposite of what regulators want.
One of my early sins37 – it was not the last…
After phenytoin intoxications in Austria38 we compared three
generics (containing the free acid like the originator, Na-, or Ca-salt)
to the reference in a crossover design. All formulations have been
approved and were marketed in Austria. Although at that time I already
calculated a 95% CI, the
reviewers of our manuscript insisted in testing for a significant
difference ‘because it is state of the art’.
Two generics were statistically significant different from the
reference (\(\small{\text{T}_1}\)
containing the free acid like the originator and \(\small{\text{T}_3}\) containing the
Ca-salt). \(\small{\text{T}_2}\)
containing the Na-salt was statistically not significant different and,
thus, considered equivalent – despite its high \(\small{\text{T}/\text{R}\textsf{-}}\)ratio
(Table II). \[\small{
\begin{array}{ccccc}
\textsf{Table II}\phantom{0000}\\
\text{formulation} & \text{T}/\text{R (%)} & p & &
\text{BE}\\\hline
\text{T}_1 & 146 & 0.0195\phantom{6} & \text{*} &
\text{fail}\\
\text{T}_2 & 134 & 0.151\phantom{96} & \text{n.s.} &
\text{pass}\\
\text{T}_3 & \phantom{1}28 & 0.00596 & \text{**} &
\text{fail}\\\hline
\end{array}}\] If we would evaluate the study according to
current standards (i.e., by the 90%
CI inclusion approach based on
\(\small{\log_{e}\textsf{-}}\)transformed
data and acceptance limits of 80.00–125.00%), all generics would fail.
\(\small{\text{T}_3}\) would even be
bioinequivalent because its upper
CL is way below 80% (Table III).
If we would adjust for multiplicity (\(\small{\alpha_\text{adj}=0.05/3=0.1\dot{6}\mapsto
96.6\dot{6}\text{% CI}}\)) – although not required in an
exploratory study – the outcome would be even worse (Table IV). \[\small{\begin{array}{ccccc}
\textsf{Table III}\phantom{0000}\\
\text{formulation} & \text{PE (%)} &
\text{CL}_\text{lower}\text{(%)} & \text{CL}_\text{upper}\text{
(%)} & \text{BE}\\\hline
\text{T}_1 & 151.12 & 118.75 & 192.32 & \text{fail
(inconclusive)}\\
\text{T}_2 & 139.39 & \phantom{1}95.91 & 202.60 &
\text{fail (inconclusive)}\\
\text{T}_3 & \phantom{1}21.67 & \phantom{1}10.25 &
\phantom{2}45.81 & \text{fail (inequivalent)}\\\hline
\end{array}}\] \[\small{\begin{array}{ccccc}
\textsf{Table IV}\phantom{0000}\\
\text{formulation} & \text{PE (%)} &
\text{CL}_\text{lower}\text{(%)} & \text{CL}_\text{upper}\text{
(%)} & \text{BE}\\\hline
\text{T}_1 & 151.12 & 106.67 & 214.09 & \text{fail
(inconclusive)}\\
\text{T}_2 & 139.39 & \phantom{1}81.20 & 239.28 &
\text{fail (inconclusive)}\\
\text{T}_3 & \phantom{1}21.67 & \phantom{10}7.34 &
\phantom{2}63.93 & \text{fail (inequivalent)}\\\hline
\end{array}}\] Given the nonlinear
PK of phenytoin,39 40 switching a patient
from the originator to the generics with high \(\small{\text{T}/\text{R}\textsf{-}}\)ratios
would be problematic – potentially leading to toxicity after multiple
doses. Even worse would be switching from the generic \(\small{\text{T}_3}\) with its low \(\small{\text{T}/\text{R}\textsf{-}}\)ratio
to any of the other formulations.
top of section ↩︎ previous section ↩︎
An Analysis of Variance (ANOVA) instead of a t-test allows to take period-effects into account.41 42 43 This decade was also the heyday of Bayesian methods.44 45 46 47 Nomograms for sample size estimation were also Bayesian48 but happily misused by frequentists. New parametric49 50 as well as nonparametric methods entered the stage.50 51 Metrics to compare controlled release formulations in steady state were proposed.52 53 54 The first software to evaluate 2×2×2 crossover studies was released in the public domain.55
The acceptance range in bioequivalence is based on a ‘clinically relevant difference’ \(\small{\Delta}\), i.e., for data following a lognormal distribution \[\left\{\theta_1,\theta_2\right\}=\left\{100\,(1-\Delta),100\,(1-\Delta)^{-1}\right\}\tag{5}\] It must be mentioned that the commonly assumed \(\small{\Delta=20\%}\) leading to \(\small\left\{80.00\%,125.00\%\right\}\)56 is arbitrary (as is any other).
An important leap forward was the Two One-Sided Tests Procedure (TOST)16 – although it was never implemented in its original form \(\small{(6)}\) in regulatory practice. Instead, the confidence interval inclusion approach \(\small{(7)}\) made it to the guidelines. Although these approaches are operationally identical (i.e., their outcomes [pass | fail] are the same), these are statistically different methods:
\[\begin{matrix}\tag{6} H_\textrm{0L}:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}\leq\theta_1\:vs\:H_\textrm{1L}:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}>\theta_1\\ H_\textrm{0U}:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}\geq\theta_2\:vs\:H_\textrm{1U}:\frac{\mu_\textrm{T}}{\mu_\textrm{R}}<\theta_2 \end{matrix}\]
When we evaluate our example by \(\small{(6)}\), we get \(\small{p(\theta_0\geq\theta_1)=0.0155}\) and \(\small{p(\theta_0\leq\theta_2)=0.0515}\). Since one of the \(\small{p\textsf{-}}\)values is \(\small{>\alpha}\), the study would fail.
It is a misconception that a certain CI of a sample (i.e., a particular study) contains the – true but unknown – population mean \(\small{\mu}\) with \(\small{1-\alpha}\) probabilty. Let’s simulate some studies and evaluate them by \(\small{(7)}\):
invisible(library(PowerTOST))
set.seed(123) # for reproducibility of simulations
<- 1 # true population mean
mue <- 0.25
CV <- 100
studies <- sampleN.TOST(CV = CV, theta0 = mue, targetpower = 0.8, print = FALSE)
x <- x[["Sample size"]]
subjects <- x[["Achieved power"]]
power # simulate subjects within studies, lognormal distribution
<- data.frame(study = rep(1:studies, each = subjects * 2),
samples subject = rep(rep(1:subjects, studies), each = 2),
period = rep(rep(1:2, studies), 2),
sequence = rep(c(rep(c("TR"), subjects),
rep(c("RT"), subjects)), studies),
treatment = c(rep(c("T", "R"), subjects / 2),
rep(c("R", "T"), subjects / 2)),
Y = rlnorm(n = subjects * studies * 2,
meanlog = log(mue) - 0.5 * log(CV^2 + 1),
sdlog = sqrt(log(CV^2 + 1))))
<- c("subject", "period", "treatment")
facs <- lapply(samples[facs], factor) # factorize the data
samples[facs] <- data.frame(study = 1:studies, PE = NA_real_,
result lower = NA_real_, upper = NA_real_,
BE = FALSE, contain = TRUE)
<- numeric(studies)
grand.PE for (i in 1:studies) {
<- samples[samples$study == i, ]
temp <- lm(log(Y) ~ period + subject + treatment, data = temp)
heretic $PE[i] <- 100 * exp(coef(heretic)[["treatmentT"]])
result3:4] <- 100 * exp(confint(heretic, level = 0.90)["treatmentT", ])
result[i, if (round(result[i, 3], 2) >= 80 & round(result[i, 4], 2) <= 125)
$BE[i] <- TRUE
resultif (result$lower[i] > 100 * mue | result$upper[i] < 100 * mue) result$contain[i] <- FALSE
<- mean(result$PE[1:i]) # (cumulative) grand means
grand.PE[i]
}dev.new(width = 4.5, height = 4.5)
<- par(no.readonly = TRUE)
op par(mar = c(3.05, 2.9, 1.4, 0.75), cex.axis = 0.9, mgp = c(2, 0.5, 0))
<- range(c(min(result$lower), 1e4 / min(result$lower),
xlim max(result$upper), 1e4 / max(result$upper)))
plot(1:2, 100 * rep(mue, 2), type = "n", log = "x", xlab = "PE [90% CI]",
ylab = "study #", axes = FALSE,
xlim = xlim, ylim = range(result$study))
abline(v = 100 * c(0.8, mue, 1.25), lty = c(2, 1, 2))
axis(1, at = c(125, pretty(xlim)),
labels = sprintf("%.0f%%", c(125, pretty(xlim))))
axis(2, at = c(1, pretty(1:studies)[-1]), las = 1)
axis(3, at = 100 * mue, label = expression(mu))
box()
lines(grand.PE, 1:studies, lwd = 2)
for (i in 1:studies) {
if (result$BE[i]) { # pass
<- "blue"
clr else { # fail
} if (result$contain[i]) {# mue within CI
<- "magenta"
clr else { # mue not in CI
} <- "red"
clr
}
}lines(c(result$lower[i], result$upper[i]), rep(i, 2), col = clr)
points(result$PE[i], i, pch = 16, cex = 0.6, col = clr)
}par(op)
In 7% of studies the population mean \(\small{\mu}\) is not contained in
the 90% CI (red lines). In
other words, given the result of a single study we can never
know where \(\small{\mu}\) lies. Only
the grand mean (mean of sample means \(\small{\frac{1}{n}\sum_{i=1}^{i=n}\overline{x_i}}\))
approaches \(\small{\mu}\) for a large number
of samples. After the 100th study it is with 99.44%
pretty close to \(\small{\mu}\) (for
geeks: The convergence is poor; when simulating 25,000 studies, it is
100.23%). However, nobody would repeat a – passing – study (blue lines)
for such a rather uninteresting information, right?
This explains also why a particular study might fail by pure
chance even if a formulation is equivalent (here 15% of
studies; red or magenta lines). Such cases are related to the producer’s
risk (Type II
Error = 1 – power), which is for the given conditions 16.3%. On the
other hand, it is also possible that a formulation which is not
equivalent might pass. These cases are related to the patient’s
risk (Type I
Error).
For details see the articles about hypotheses, treatment effects, post hoc power, and sample size
estimation. Science is a cruel mistress.
At a hearing in 1986 the FDA confirmed that \(\small{(6)}\) or \(\small{(7)}\) of untransformed data should be used with \(\small{\Delta=20\%}\). If clinically relevant, tighter limits (\(\small{\Delta=10\%}\)) might be needed.57
The first German guideline was drafted by the Working Group for Pharmaceutical Process Engineering (Arbeitsgemeinschaft für Pharmazeutische Verfahrenstechnik) in 1985.58 It was presented and discussed in 1987.59 60 61
In 1988 wider acceptance limits of 70 – 130% were proposed for \(\small{C_\text{max}}\) due to its inherent high variability62 (as a one-point metric practically always larger than the one of the integrated metric \(\small{AUC}\)).
The Australian draft guideline was published in 1988.63 It was the first covering not only the design and evaluation but also validation of bioanalytical methods. The model with effects period, subject, treatment20 43 was recommended and a test for sequence-effects was not considered necessary. The problematic conversion of differences to percentages was acknowledged and Fieller’s CI21 22 discussed. Kudos to both!
In 1989 a series of loose-leaf binders was started.64 It contained raw-data of generic drugs marketed in Germany, the evaluation provided by companies, as well as results recalculated by the ZL (Central Laboratory of German Pharmacists). Including the 6th supplement of 1996 it contained more than 2,000 pages… It was an indispensible resource for planning new studies and also showed the ‘journey’ of dossiers (i.e., the same study being used by different companies).
The BioInternational conference series set milestones in the development of testing for bioequivalence. The first in Toronto 1989 dealt with the \(\small{\log_{e}\textsf{-}}\)transformation of data and the definition of highly variable drugs (HVDs).65 There was a poll among the participants about the \(\small{\log_{e}\textsf{-}}\)transformation. Outcome: ⅓ never, ⅓ always, ⅓ case by case (i.e., perform both analyses and report the one with narrower CI ‘because it fits the data better’). Let’s be silent about the last team.66 HVDs were defined as drugs with intra-subject variabilities of more than 30% but problems might be evident already at 25%.
The original acceptance range was symmetrical around 100%. In \(\small{\log_{e}\textsf{-}}\)scale it should be symmetrical around \(\small{0}\) (because \(\small{\log_{e}1=0}\)). What happens to our \(\small{\Delta}\), which should still be 20%? Due to the positive skewness of the lognormal distribution a lively discussion started after early publications proposing 80 – 125%.19 41 Keeping 80 – 120% would have been flawed because the maximum power should be obtained at \(\small{\mu_\text{T}/\mu_\text{R}=1}\) for \[\exp\left((\log_{e}\theta_1+\log_{e}\theta_2)/2\right),\tag{8}\] which works only if \(\small{\theta_2=\theta_1^{-1}}\) or \(\small{\theta_1=\theta_2^{-1}}\). Keeping the original limits, maximum power would be obtained at \(\small{\mu_\text{T}/\mu_\text{R}=\exp((\log_{e}0.8+\log_{e}1.2)/2)\approx0.979796}\).
There were three parties (all agreed that the acceptance range should be symmetrical in \(\small{\log_{e}\textsf{-}}\)scale and consequently asymmetrical when back-transformed). These were their arguments and suggestions:
\[\left\{\theta_1,\theta_2\right\}=\left\{100\,(1-\Delta),100/(1-\Delta)\right\}=80-125\%\tag{11}\]
The 90% CI inclusion approach \(\small{(7)}\) based on \(\small{\log_{e}\textsf{-}}\)transformed data with acceptance limits of 80.00 – 125.00% \(\small{(5)}\) was the winner.
First sample size tables for the multiplicative model with the acceptance range 80 – 125% were published67 and extended for narrower (90 – 111%) and wider (70 – 143%) acceptance ranges.68 The nonparametric method was improved taking period-effects into account.69 70 Drug-drug and food-interaction studies should be assessed for equivalence.71 The general applicability of average BE was challenged and the concept of individual and population bioequivalence outlined.72 73 74 The first textbook dealing exclusively with BA/BE was published.75
This was also the decade of updated and new guidelines. A European
draft guidance was published in 1990;76 the final guideline
was published in December 1991 and came into force in June 1992.77 The 90%
CI inclusion approach of \(\small{\log_{e}\textsf{-}}\)transformed
data with an acceptance range of 80 – 125% was recommended and for
NTIDs the
acceptance range may need to be tightened. Due to its inherent higher
variability a wider acceptance range may be acceptable for \(\small{C_\text{max}}\). If inevitable and
clinically acceptable, a wider acceptance range may also be used for
\(\small{AUC}\). Only if clinically
relevant, a nonparametric analysis of \(\small{t_\text{max}}\) was
recommended.
An in vivo stuy was not required if the new formulation is
Similar statements about solutions were given in all later guidelines. The second lead to application of the Biopharmaceutic Classification System (BCS).78 More about that later.
In July 1992 the first guidance of the FDA was published.79 An ANOVA of \(\small{\log_{e}\textsf{-}}\)transformed data was recommended and the nested subject(sequence) term in the statistical model entered the scene. It must be mentioned that in comparative BA studies subjects are usually uniquely coded. Hence, the term subject(sequence) is a bogus one80 and could be replaced by the simple subject as well (see below for an example). Regrettably this model was implemented in all global guidelines ever since.
In the same year the Canadian guidance for Immediate Release (IR) formulations was published.81 To that time is was the most extensive one because it gave not only the method of evaluation, but information about the study design, sample size, ethics, bioanalytics, etc. It differed from the others in the relaxed requirement for \(\small{C_\text{max}}\), where only the \(\small{\text{T}/\text{R}\textsf{-}}\)ratio has to lie within 80 – 125% (instead of its CI).
In 1998 the World Health Organization published its first guideline,82 which was similar to the European one.
Table V shows the result of the example evaluated by the various methods. \[\small{\begin{array}{lcccc} \textsf{Table V}\phantom{0}\\ \phantom{0}\text{Method} & \text{Model} & \text{PE} & \text{power},p,\text{CI} & \text{BE?}\\\hline \text{80/20 Rule} & \text{additive} & - & 47.22\% & \text{fail}\\ \text{TOST} & \text{additive} & +2.250\;(103.09\%) & 0.0155,\,0.0515 & \text{fail}\\ \text{95% CI} & \text{additive} & +2.250\;(103.09\%) & -12.807\,,+17.307\;(82.61-123.76\%) & \text{fail}\\ \text{Westlake} & \text{additive} & \pm0.000\;(100.00\%) & \pm16.143\;(\pm21.48\%) & \text{fail}\\\hline \text{80/20 Rule} & \text{multiplicative} & - & 73.57\% & \text{fail}\\ \text{TOST} & \text{multiplicative} & 102.82\% & 0.0099,\,0.0283 & \text{pass}\\ \text{90% CI} & \text{multiplicative} & 102.82\% & \phantom{1}87.25-121.17\% & \text{pass}\\ \text{Westlake} & \text{multiplicative} & 100.00\% & \pm17.72\% & \text{pass}\\ \text{75/75 Rule} & \text{multiplicative} & - & - & \text{pass}\\\hline \end{array}}\] In the additive model the acceptance range was 80 – 120%, whereas in the multiplicative model it is 80 – 125%. Since in the former differences are assessed – wrong – percentages are given in brackets.
As of today only the 90% CI inclusion approach is globally accepted. Our example in R again:
<- data.frame(subject = rep(1:12, each = 2),
example sequence = c(rep("TR", 12), rep("RT", 12)),
treatment = c(rep(c("T", "R"), 6),
rep(c("R", "T"), 6)),
period = rep(1:2, 12),
Y = c(71, 81, 61, 65, 80, 94,
66, 74, 94, 54, 97, 63,
85, 70, 90, 76, 54, 53,
56, 99, 90, 83, 68, 51))
<- c("subject", "sequence", "treatment", "period")
facs <- lapply(example[facs], factor) # factorize the data
example[facs] <- paste("nested model : period, subject(sequence), treatment",
txt "\nsimple model : period, subject, sequence, treatment",
"\nheretic model: period, subject, treatment\n\n")
<- data.frame(model = c("nested", "simple", "heretic"),
result PE = NA, lower = NA, upper = NA, BE = "fail", na = 0)
for (i in 1:3) {
if (result$model[i] == "nested") { # bogus nested model (guidelines)
<- lm(log(Y) ~ period +
nested %in% sequence +
subject data = example)
treatment, $PE[i] <- 100 * exp(coef(nested)[["treatmentT"]])
result3:4] <- 100 * exp(confint(nested, level = 0.90)["treatmentT", ])
result[i, 6] <- sum(is.na(coef(nested)))
result[i,
}if (result$model[i] == "simple") { # simple model (subjects are uniquely coded)
<- lm(log(Y) ~ period +
simple +
subject +
sequence data = example)
treatment, $PE[i] <- 100 * exp(coef(simple)[["treatmentT"]])
result3:4] <- 100 * exp(confint(simple, level = 0.90)["treatmentT", ])
result[i, 6] <- sum(is.na(coef(simple)))
result[i,
}if (result$model[i] == "heretic") { # heretic model (without sequence)
<- lm(log(Y) ~ period +
heretic +
subject data = example)
treatment, $PE[i] <- 100 * exp(coef(heretic)[["treatmentT"]])
result3:4] <- 100 * exp(confint(heretic, level = 0.90)["treatmentT", ])
result[i, 6] <- sum(is.na(coef(heretic)))
result[i,
}# rounding acc. to guidelines
if (round(result[i, 3], 2) >= 80 & round(result[i, 4], 2) <= 125)
$BE[i] <- "pass"
result
}# cosmetics
$PE <- sprintf("%6.2f%%", result$PE)
result$lower <- sprintf("%6.2f%%", result$lower)
result$upper <- sprintf("%6.2f%%", result$upper)
resultnames(result)[c(3:4, 6)] <- c("lower CL", "upper CL", "NE")
cat(txt); print(result, row.names = FALSE)
# nested model : period, subject(sequence), treatment
# simple model : period, subject, sequence, treatment
# heretic model: period, subject, treatment
#
# model PE lower CL upper CL BE NE
# nested 102.82% 87.25% 121.17% pass 13
# simple 102.82% 87.25% 121.17% pass 1
# heretic 102.82% 87.25% 121.17% pass 0
As already outlined above, the nested model
recommended in all [sic] guidelines is over-specified because
subjects are uniquely coded. In the example we get 13 not estimable
(aliased) effects (in the output of R lines
with NA
, in SAS
.
, and in
WinNonlin not estimable
). Correct, because we asking for
something the data cannot provide.80 In
the simple model only one effect cannot be estimated. Even
sequence can be removed from the model. I call it
heretic because regulators will grill you if you are using it.
It was the model proposed by Westlake20 43 and I used it in hundreds (‼) of my
studies. Note that the results of all models are exactly the
same; if you don’t believe me, try it with one of your studies.
A ‘Positive List’ was published by the German regulatory authority, i.e., for 90 drugs BE was not required.83 In order to comply with the European Note for Guidance of 200184 it had to be removed by the BfArM.
Two (of five) sessions of the BioInternational ’92 conference in Bad
Homburg dealt with BE of Highly
Variable Drugs.85 86 Various approaches have been discussed:
Multiple dose instead of single dose studies, metabolite instead of the
parent compound, stable isotope techniques,87 add-on designs,
and – for the first time – replicate designs.
Although the BioInternational 2 in Munich 1994 was with over 600 participants the largest in the series, no substantial progress for HVD(P)s was achieved.88 Following a suggestion89 at a joint AAPS/FDA workshop in 1995 widening the conventional acceptance limits of 80.00 – 125.00% was considered.90
“For some highly variable drugs and drug products, the bioequivalence standard should be modified by changing the BE limits while maintaining the current confidence interval at 90%. […] the bioequivalence limits should be determined based in part upon the intrasubject variability for the reference product.
A hot topic ever since… Why are we discussing it for 35 (‼) years (since the first BioInternational conference)? Is it really that complicated91 or are we too stupid?
Studies in steady-state were proposed as an option for HVD(P)s in a European draft guideline92 but was removed from the final version of 2001.84
Validation of bioanalytical methods93 94 95 96 was partly covered in Australia and Canada. However, no specific guideline existed. A series of conferences (informally known as ‘Crystal City’) was initiated in 1990.97 Procedures stated in the conference report98 were discussed at the BioInternational 2 in Munich 1994 and quickly adopted by bioanalytical sites. Updates were subsequently published.99 100
TODO: SUPAC (FDA)
After a wealth of – controversal – publications in the 1990s,72 73 74
101
102
103
104
105
106
107
108
109
the FDA
introduced two new concepts as alternatives to average bioequivalence
(ABE), namely population bioequivalence (PBE) and individual
bioequivalence (IBE).110
ABE focuses only on the
comparison of population averages of the
PK metrics and not their variances
of formulations. It does also not assess a subject-by-formulation
interaction variance, that is, the variation in the average \(\small{\text{T}}\) and \(\small{\text{R}}\) difference among
individuals. In contrast,
PBE and
IBE include comparisons
of both averages and variances of
PK metrics. The
PBE approach assesses
total variability of the PK metrics
in the population. The
IBE approach assesses
within-subject variability for the \(\small{\text{T}}\) and \(\small{\text{R}}\) formulations, as well as
the subject-by-formulation interaction.
Demonstrated PBE would
support ‘Prescribability’ (i.e., a drug naïve patient
could start treatment with a generic), whereas
IBE support
‘Switchability’ (i.e., a patient could switch
formulations during treatment).109
Contrary to ABE, both
PBE and
IBE require studies in a
full replicate design, which means that both \(\small{\text{T}}\) and \(\small{\text{R}}\) are administered twice.
The acceptance limits for
ABE were kept at
80.00–125.00% but for the others scaling to the variability of the
reference was possible. That would mean an incentive for test
formulations with lower variability than the reference but a penalty for
ones with a higher variability.
However, the underlying statistical concepts were not trivial and the
result practically incomprehensible for non-statisticians. Furthermore,
both approaches had a discontinuity (when moving from constant- to
reference-scaling), which lead to an inflated type I error (patient’s
risk) of approximately 6.5% if CVwR
18.1–20.2%.110 111 112
The
PBE/IBE
faced criticism, e.g., »responses [to the
guidance] were still doubt-filled as to whether the new
bioequivalence criteria really provided added value compared to average
bioequivalence«113 and was regarded a »‘theoretical’
solution to a ‘thoretical’ problem«114 leading to its
omission from a subsequent guidance,115 and a return to
conventional ABE.116
“[ABE should suffice based upon grounds of] ‘practicality, plausibility, historical adequacy, and purpose’ and ‘because we have better things to do.’ […] ‘Statisticians have a bad track record in bioequivalence, […] the literature is full of ludicrous recommendations from statisticians, […] regulatory recommendations (of dubious validity) have been hastily implemented, and practical realities have been ignored’.
I remember a Dutch regulator standing up in the BioInternational conference (London 2003) saying: »I’m glad that PBE and IBE are dead. I never understood them.«
Poland happily adopted Germany’s ‘Positive List’83 only when it wanted to join the European Union
to learn that in the meantime Germany abandoned it. Until 2015 a similar
(but shorter) list existed in The Netherlands for national market
authorisations only. Must have been a schizophrenic situation for
assessors of the MEB: In
the morning a dossier for national
MA without any in
vivo comparison → ☑. In the
afternoon another dossier of the same product in the course of a
European submission. BE performed,
but lower 90% CI 79.99% → ☒.
Bizarre.
Until 2012 Denmark required for
NTIDs that the 90%
CI had to include 100%
(i.e., that there is no significant treatment effect). Bizarre
as well. For details see Example 3 in this
article.
The first bioanalytical method validation guidance was published by the FDA in 2001 and revised in 2018.118 119 Before the European draft guideline was published in 2009,120 some inspectors raised an eyebrow if sites worked according to the FDA’s guidance.
“The validation of bioanalytical methods and the analysis of study samples should be performed in accordance with the principles of Good Laboratory Practice (GLP). However, as human bioanalytical studies fall outside of the scope of GLP, as defined in Directive 2004/10/EC, the sites conducting the human studies are not required to be monitored as part of a national GLP compliance programme.
Well roared, lions! My CRO was GLP-certified since 1991, although we performed only phase I studies. In other countries (e.g., Spain), this was not possible. In Germany GLP is subject to state law. Hence, it was possible to get certified in one federal state but not in another… However, this ‘issue’ was resolved with the final guideline published in 2011121 and the ICH M10 guideline of 2022,122 superseding all local guidelines.
TODO: BCS-based biowaivers, reference-scaling, two-stage designs, NTIDs, current guidelines in various jurisdictions…
Still unresolved or not harmonized issues:
A word of warning: The textbooks dealing with statistics (marked with ★ in the references) are rather tough cookies and not recommended for beginners.
top of section ↩︎ previous section ↩︎
Henning Blume and José Augusto Guimarães Morais for discussions about the BioInternational conferences and early days of bioequivalence.
Helmut Schütz 2024
R
GPL 3.0,
klippy
MIT,
pandoc
GPL 2.0.
1st version April 9, 2024. Rendered May 1, 2024 18:03 CEST by
rmarkdown via
pandoc in 0.09 seconds.
Lindenbaum J, Preibisz JJ, Butler VP Jr., Saha JR. Variation in digoxin bioavailabity: a continuing problem. J Chron Dis. 1973; 16: 749–54. Open Access.↩︎
DeSante KA, DiSanto AR, Chodos DJ, Stoll RG. Antibiotic Batch Certification and Bioequivalence. JAMA. 1975; 232(13): 1349–51. doi:10.1001/jama.1975.03250130033016.↩︎
International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use. Bioequivalence for Immediate-Release Solid Oral Dosage Forms. M13A. Draft version 20 December 2022. Online.↩︎
Hall DG, In: Hearing Before the Subcommittee on Monopolies Select Committee on Small Business. U.S. Senate, Government Printing Office, Washington D.C. 1967: 258–81.↩︎
Tyrer JH, Eadie MJ, Sutherland JM, Hooper WD. Outbreak of anticonvulsant intoxication in an Australian city. Br Med J. 1970; 4: 271–3. doi:10.1136/bmj.4.5730.271. Open Access.↩︎
Bochner F, Hooper WD, Tyrer JH, Eadie MJ. Factors involved in an outbreak of phenytoin intoxications. J Neurol Sci. 1972; 16(4): 481–7. doi:10.1016/0022-510x(72)90053-6.↩︎
Lund L. Clinical significance of generic inequivalence of three different pharmaceutical preparations of phenytoin. Eur J Clin Pharmacol. 1974; 7: 119–24. doi:10.1007/bf00561325.↩︎
Lindenbaum J, Mellow MH, Blackstone MO, Butler VP. Variations in biological activity of digoxin from four preparations. N Engl J Med. 1971; 285(24): 1344–7. doi:10.1056/nejm197112092852403.↩︎
Jounela AJ, Pentikäinen PJ, Sothmann. Effect of particle size on the bioavalability of digoxin. Eur J Clin Pharmacol. 1975; 8(5): 365–70. doi:10.1007/BF00562664.↩︎
Richton-Hewett S, Foster E, Apstein CS. Medical and Economic Consequences of a Blinded Oral Anticoagulant Brand Change at a Municipal Hospital. Arch Intern Med. 1988; 148(4): 806–8. doi:10.1001/archinte.1988.00380040046010.↩︎
Weinberger M, Hendeles L, Bighley L, Speer J. The Relation of Product Formulation to Absorption of Oral Theophylline. N Engl J Med. 1978; 299(16): 852–7. doi:10.1056/nejm197810192991603.↩︎
Bielmann B, Levac TH, Langlois Y, L Tetreault L. Bioavailability of primidone in epileptic patients. Int J Clin Pharmacol. 1974; 9(2): 132–7. PMID 4208031↩︎
Skelly JP, Knapp G. Biologic availability of digoxin tablets. JAMA. 1973; 224(2): 243. doi:10.1001/jama.1973.03220150051015.↩︎
Skelly JP. A History of Biopharmaceutics in the Food and Drug Administration 1968–1993. AAPS J. 2010; 12(1): 44–50. doi:10.1208/s12248-009-9154-8. Free Full Text.↩︎
APhA Academy of Pharmaceutical Sciences. Guidelines for Biopharmaceutic Studies in Man. Washington D.C. February 1972.↩︎
Schuirmann DJ. A comparison of the Two One-Sided Tests Procedure and the Power Approach for Assessing the Equivalence of Average Bioavailability. J Pharmacokin Biopharm. 1987; 15(6): 657–80. doi:10.1007/BF01068419.↩︎
Metzler CM. Bioavailability – A Problem in Equivalence. Biometrics. 1974; 30(2): 309–17. PMID 4833140.↩︎
Westlake WJ. Symmetrical Confidence Intervals for Bioequivalence Trials. Biometrics. 1976; 32(4): 741–4. PMID 1009222.↩︎
Mantel N. Do We Want Confidence Intervals Symmetrical About the Null Value? Biometrics. 1977; 33: 759–60. [Letter to the Editor]↩︎
Westlake WJ. Design and Evaluation of Bioequivalence Studies in Man. In: Blanchard J, Sawchuk RJ, Brodie BB, editors. Principles and perspectives in Drug Bioavailability. Basel: Karger; 1979. ISBN 3-8055-2440-4. p. 192–210.↩︎
Fieller EC. Some Problems In Interval Estimation. J Royal Stat Soc B. 1954; 16(2): 175–85. JSTOR:2984043.↩︎
Locke CS. An Exact Confidence Interval from Untransformed Data for the Ratio of Two Formulation Means. J. Pharmacokin. Biopharm. 1984; 12(6): 649–55. doi:10.1007/bf01059558.↩︎
In phase III we try to demonstrate that verum performs ‘better’ than placebo, i.e., one-sided tests for non-inferiority (effect) and non-superiority (adverse reactions). Such studies are already large: Approving statins and COVID-19 vaccines required ten thousands volunteers. Can you imagine how many it would need to detect a 20% difference between two treatments?↩︎
Benet LZ. Why Do Bioequivalence Studies in Healthy Volunteers? 1st MENA Regulatory Conference on Bioequivalence, Biowaivers, Bioanalysis and Dissolution. Amman. 23 September 2013. Internet Archive.↩︎
Office of the Federal Register. Code of Federal Regulations, Title 21, Part 320, Subpart A, § 320.23(a)(1) Online.↩︎
This is an assumption, i.e., based on the labelled content instead of the measured potency.↩︎
Yet another assumption. Incorrect for highly variable drugs and, thus, inflates the confidence interval.↩︎
Tóthfálusi L, Endrényi L. Estimation of Cmax and Tmax in Populations After Single and Multiple Drug Administration. J Pharmacokin Pharmacodyn. 2003; 30(5): 363–85. doi:10.1023/b:jopa.0000008159.97748.09.↩︎
In models with more than one compartment \(\small{t_\text{max}}\) and \(\small{C_\text{max}}\) cannot be analytically derived. In software numeric optimization is employed to locate the maximum of the function.↩︎
Endrényi L, Fritsch S, Yan W. Cmax/AUC is a clearer measure than Cmax for absorption rates in investigations of bioequivalence. Int J Clin Pharmacol Ther Toxicol. 1991; 29(10): 394–9. PMID 1748540.↩︎
Schall R, Luus HG. Comparison of absorption rates on bioequivalence studies of immediate release drug dormulations. Int J Clin Pharmacol Ther Toxicol. 1992; 30(5): 153–9. PMID 1592542.↩︎
Endrényi L, Yan W. Variation of Cmax and Cmax/AUC in investigations of bioequivalence. Int J Clin Pharm Ther Toxicol. 1993; 31(4): 184–9. PMID 8500920.↩︎
Haynes JD. Statistical simulation study of new proposed uniformity requirement for bioequivalency studies. J Pharm Sci. 1981; 70(6): 673–5. doi:10.1002/jps.2600700625.↩︎
Cabana BE. Assessment of 75/75 Rule: FDA Viewpoint. Pharm Sci. 1983; 72(1): 98–99. doi:10.1002/jps.2600720127.↩︎
Haynes JD. FDA 75/75 Rule: A Response. Pharm Sci. 1983; 72: 99–100.↩︎
Nitsche V, Mascher H, Schütz H. Comparative bioavailability of several phenytoin preparations marketed in Austria. Int J Clin Pharmacol Ther Toxicol. 1984; 22(2): 104–7. PMID 6698663.↩︎
Klingler D, Nitsche V, Schmidbauer H. Hydantoin-Intoxikation nach Austausch scheinbar gleichwertiger Diphenylhydantoin-Präparate. Wr Med Wschr. 1981; 131: 295–300. [German]↩︎
Glazko AJ, Chang T, Bouhema J, Dill WA, Goulet JR, Buchanan RA. Metabolic disposition of diphenylhydantoin in normal human subjects following intravenous administration. Clin Pharmacol Ther. 1969; 10(4): 498–504. doi:10.1002/cpt1969104498.↩︎
Bochner F, Hooper WD, Tyrer JH, Eadi MJ. Effect of dosage increments on blood phenytoin concentrations. J Neurol Neurosurg Psychiatr. 1972; 35(6): 873–6. doi:10.1136/jnnp.35.6.873.↩︎
Kirkwood TBL. Bioequivalence Testing – A Need to Rethink [reader reaction]. Biometrics. 1981, 37: 589—91. doi:10.2307/2530573.↩︎
Westlake WJ. Response to Bioequivalence Testing – A Need to Rethink [reader reaction response]. Biometrics. 1981, 37: 591—93.↩︎
Westlake WJ. Bioavailability and Bioequivalence of Pharmaceutical Formulations. In: Pearce KE, editor. Biopharmaceutical Statistics for Drug Development. New York: Marcel Dekker; 1988. p. 329–53. ISBN 0-8247-7798-0.↩︎
Rodda BE, Davis RL. Determining the probability of an important difference in bioavailability. Clin Pharmacol Ther. 1980; 28: 247–52. doi:10.1038/clpt.1980.157.↩︎
Mandallaz D, Mau J. Comparison of Different Methods for Decision-Making in Bioequivalence Assessment. Biometrics. 1981; 37: 213–22. PMID 6895040.↩︎
Fluehler H, Hirtz J, Moser HA. An Aid to Decision-Making in Bioequivalence Assessment. J Pharmacokin Biopharm. 1981; 9: 235–43. doi:10.1007/BF01068085.↩︎
Selwyn MR, Hall NR. On Bayesian Methods for Bioequivalence. Biometrics. 1984; 40: 1103–8. PMID 6398710.↩︎
Fluehler H, Grieve AP, Mandallaz D, Mau J, Moser HA. Bayesian Approach to Bioequivalence Assessment: An Example. J Pharm Sci. 1983; 72(10): 1178–81. doi:10.1002/jps.2600721018.↩︎
Anderson S, Hauck WW. A New Procedure for Testing Bioequivalence in Comparative Bioavailability and Other Clinical Trials. Commun Stat Ther Meth. 1983; 12(23): 2663–92. doi:10.1080/03610928308828634.↩︎
Steinijans VW, Diletti E. Statistical Analysis of Bioavailability Studies: Parametric and Nonparametric Confidence Intervals. Eur J Clin Pharmacol. 1983; 24: 127–36. doi:10.1007/BF00613939.↩︎
Steinijans VW, Diletti E. Generalization of Distribution-Free Confidence Intervals for Bioavailability Ratios. Eur J Clin Pharmacol. 1985; 28: 85–8. doi:10.1007/BF00635713.↩︎
Steinijans VW, Schulz H-U, Beier W, Radtke HW. Once daily theophylline: multiple-dose comparison of an encapsulated micro-osmotic system (Euphylong) with a tablet (Uniphyllin). Int J Clin Pharm Ther Toxicol. 1986; 24(8): 438–47. PMID 3759279.↩︎
Steinijans VW. Pharmacokinetic Characteristics of Controlled Release Products and Their Biostatistical Analysis. In: Gundert-Remy U, Möller H, editors. Oral Controlled Release Products – Therapeutic and Biopharmaceutic Assessment. Stuttgart: Wissenschaftliche Verlagsanstalt; 1988, p. 99–115.↩︎
Blume H, Siewert M, Steinijans V. Bioäquivalenz von per os applizierten Retard-Arzneimitteln; Konzeption der Studien und Entscheidung über Austauschbarkeit. Pharm Ind. 1989; 51: 1025–33. [German]↩︎
Wijnand HP, Timmer CJ. Mini-computer programs for bioequivalence testing of pharmaceutical drug formulations in two-way cross-over studies. Comput Programs Biomed. 1983; 17(1–2): 73–88. doi:10.1016/0010-468x(83)90027-2.↩︎
Where did it come from? Two stories:
Les Benet told
that there was a poll at the FDA and – essentially based on gut
feeling – the 20% saw the light of day.
I’ve heard another one,
which I like more. Wilfred J. Westlake, one of the pioneers of
BE was a statistician at
SKF. During a
coffee and cig break (everybody was smoking in the 1970s) he asked his
fellows of the clinical pharmacology department »Which difference in
blood concentrations do you consider relevant?« Yep, the 20% were
born.↩︎
Rheinstein P. Report by the Bioequivalence Task Force on Recommendations from the Bioequivalence Hearing conducted by the Food and Drug Administration. September 29 – October 1986. January 1988.↩︎
APV. Richtlinie und Kommentar. Pharmazeutische Industrie. 1985; 47(6): 627–32. [German]↩︎
Arbeitsgemeinschaft Pharmazeutische Verfahrenstechnik (APV). International Symposium. Bioavailability/Bioequivalence, Pharmaceutical Equivalence and Therapeutic Equivalence. Würzburg. 9–11 February, 1987.↩︎
Junginger H. APV-Richtlinie – »Untersuchungen zur Bioverfügbarkeit, Bioäquivalenz« Pharm Ztg. 1987; 132: 1952–55. [German]↩︎
Junginger H. Studies on Bioavailability and Bioequivalence – APV Guideline. Drugs Made in Germany. 1987; 30: 161–6.↩︎
Blume H, Kübel-Thiel K, Reutter B, Siewert M, Stenzhorn G. Nifedipin: Monographie zur Prüfung der Bioverfügbarkeit / Bioäquivalenz von schnell-freisetzenden Zubereitungen (1). Pharm Ztg. 1988; 133(6): 398–93. [German]↩︎
TGA. Guidelines for Bioavailability and Bioequivalency Studies. Draft C06:6723c (29/11/88).↩︎
Blume H, Mutschler E. Bioäquivalenz – Qualitätsbewertung wirkstoffgleicher Fertigarzneimittel: Anleitung-Methoden-Materialien. Frankfurt/Main: Govi-Verlag; 1989. [German]↩︎
McGilveray IJ, Midha KK, Skelly JP, Dighe S, Doluiso JT, French IW, Karim A, Burford R. Consensus Report from “Bio International ’89”: Issues in the Evaluation of Bioavailability Data. J Pharm Sci. 1990; 79(10): 945–6. doi:10.1002/jps.2600791022.↩︎
Keene ON. The log transformation is special. Stat Med. 1995; 14(8): 811–9. doi:10.1002/sim.4780140810. Open Access.↩︎
Diletti E, Hauschke D, Steinijans VW. Sample size determination for bioequivalence assessment by means of confidence intervals. Int J Clin Pharm Ther Toxicol. 1991; 29(1): 1–8. PMID 2004861.↩︎
Diletti E, Hauschke D, Steinijans VW. Sample size determination: Extended tables for the multiplicative model and bioequivalence ranges of 0.9 to 1.11 and 0.7 to 1.43. Int J Clin Pharm Ther Toxicol. 1992; 30(Suppl.1): S59–62. PMID 1601533.↩︎
Hauschke D, Steinijans VW, Diletti E. A distribution-free procedure for the statistical analysis of bioequivalence studies. Int J Clin Pharm Ther Toxicol. 1990; 28(2): 72–8.↩︎
Steinijans VW, Hauschke D. Update on the statistical analysis of bioequivalence studies. Int J Clin Pharm Ther Toxicol. 1990; 28(3): 105–10. PMID 2318545.↩︎
Steinijans VW, Hartmann M, Huber R, Radtke HW. Lack of pharmacokinetic interaction as an equivalence problem. Int J Clin Pharm Ther Toxicol. 1991; 29(8): 323–8. PMID 1835963.↩︎
Anderson S, Hauck WW. Consideration of individual bioequivalence. J Pharmacokinet Biopharm 1990; 18(3): 259–73. doi:10.1007/bf01062202.↩︎
Schall R, Luus HG. On population and individual bioequivalence. Stat Med 1993; 12(12): 1109–24. doi:10.1002/sim.4780121202.↩︎
Schall R. A unified view of individual, population, and average bioequivalence. In: Blume HH, Midha KK, editors. Bio-International 2. Bioavailability, Bioequivalence and Pharmacokinetic Studies. Stuttgart: medpharm; 1995: 91–106.↩︎
Chow S-C, Liu J-p. Design and Analysis of Bioavailability and Bioequivalence Studies. New York: Marcel Dekker; 1992. ISBN 0-8247-8682-3. ★↩︎
CPMP Working Party. Investigation of Bioavailabilty and Bioequivalence: Note for Guidance. III/54/89-EN, 8th Draft. June 1990.↩︎
Commission of the European Community. Investigation of Bioavailabilty and Bioequivalence. Brussels. December 1991. Online.↩︎
Amidon GL, Lennernäs H, Shah VV, Crison JR. A Theoretical Basis for a Biopharmaceutic Drug Classification: The Correlation of in Vitro Drug Product Dissolution and in Vivo Bioavailability. Pharm Res. 1995; 12(3): 413–20. doi:10.1023/a:1016212804288. Open Access.↩︎
FDA, CDER. Guidance for Industry. Statistical Procedures for Bioequivalence Studies using a Standard Two-Treatment Crossover Design. Rockville. Jul 1992. Internet Archive.↩︎
If Subject 1 is randomized to sequence \(\small{\text{TR}}\), there is not ‘another’ Subject 1 randomized to sequence \(\small{\text{RT}}\). Randomization is not like Schrödinger’s cat. Hence, the nested term in the guidelines is an insult to the mind.↩︎
Health Canada, HPFB. Guidance for Industry. Conduct and Analysis of Bioavailability and Bioequivalence Studies – Part A: Oral Dosage FormulationsUsed for Systemic Effects. Ottawa. 1992. Online.↩︎
WHO Marketing Authorization of Pharmaceutical Products with Special Reference to Multisource (Generic) Products: A Manual for Drug Regulatory Authorities. Geneva. 1998. Internet Archive.↩︎
Gleiter CH, Klotz U, Kuhlmann J, Blume H, Stanislaus F, Harder S, Paulus H, Poethko-Müller C, Holz-Slomczyk M. (1998), When Are Bioavailability Studies Required? A German Proposal. J Clin Pharmacol. 1998 38: 904–11. doi:10.1002/j.1552-4604.1998.tb04385.x. Open Access.↩︎
EMEA, CPMP. Note for Guidance on the Investigation of Bioavailability and Bioequivalence. London. 26 July 2001. Online.↩︎
Midha KK, Blume HH, editors. Bio-International. Bioavailability, Bioequivalence and Pharmacokinetics. Stuttgart: medpharm; 1993. ISBN 3-88763-019-X.↩︎
Blume HH, Midha KK. Bio-International 92, Conference on Bioavailability, Bioequivalence, and Pharmacokinetic Studies. J Pharm Sci. 1993; 82(11): 1186–9. doi:10.1002/jps.2600821125.↩︎
Simultaneous administration of a stable isotope labelled IV dose would allow to calculate the true clearance in each period. Then it would not be necessary to assume identical clearances in \(\small{(3)}\) any more and the problem of highly variable drugs (inflating the CI) could be avoided. However, it would require that the IV formulation is manufactured according to the rules of cGMP and different from the internal standard in MS, which is generally not feasible. Such an approach is only mentioned in Japanese guidelines.↩︎
Blume HH, Midha KK, editors. Bio-International 2. Bioavailability, Bioequivalence and Pharmacokinetic Studies. Stuttgart: medpharm; 1995.↩︎
Boddy AW, Snikeris FC, Kringle RO, Wei GCG, Opperman JA, Midha KK. An approach for widening the bioequivalence acceptance limits in the case of highly variable drugs. Pharm Res. 1995; 12(12): 1865–8. doi:10.1023/a:1016219317744.↩︎
Shah VP, Yacobi A, Barr WH, Benet LZ, Breimer D, Dobrinska MR, Endrényi L, Fairweather W, Gillespie W, Gonzalez MA, Hooper J, Jackson A, Lesko LL, Midha KK, Noonan PK, Patnaik R, Williams RL. Workshop Report. Evaluation of Orally Administered Highly Variable Drugs and Drug Formulations. Pharm Res. 1996; 13(11): 1590–4. doi:10.1023/a:1016468018478.↩︎
Schütz H, Labes D, Wolfsegger MJ. Critical Remarks on Reference-Scaled Average Bioequivalence. J Pharm Pharmaceut Sci. 25: 285–96. doi:10.18433/jpps32892.↩︎
EMEA Human Medicines Evaluation Unit / CPMP. Note for Guidance on the Investigation of Bioavailability and Bioequivalence. Draft. London. 17 December 1998.↩︎
Brooks MA, Weifeld RE. A Validation Process for Data from the Analysis of Drugs in Biological Fluids. Drug Devel Ind Pharm. 1985; 11: 1703–28.↩︎
Pachla LA, Wright DS, Reynolds DL. Bioanalytical Considerations for Pharmacokinetic and Biopharmaceutic Studies. J Clin Pharmacol. 1986; 26(5): 332–5. doi:10.1002/j.1552-4604.1986.tb03534.x.↩︎
Buick AR, Doig MV, Jeal SC, Land GS, McDowall RD, Method Validation in the Bioanalytical Laboratory. J Pharm Biomed Anal. 1990; 8(8–12): 629–37. doi:10.1016/0731-7085(90)80093-5. Open Access.↩︎
Karnes ST, Shiu G, Shah VP. Validation of Bioanalytical Methods. Pharm Res. 1991; 8(4): 421–6. doi:10.1023/a:1015882607690.↩︎
AAPS, FDA, FIP, HPB, AOAC. Analytical Methods Validation: Bioavailability, Bioequivalence and Pharmacokinetic Studies. Arlington, VA. December 3–5, 1990.↩︎
Shah VP, Midha KK, Dighe S, McGilveray IJ, Skelly JP, Yacobi A, Layloff T, Viswanathan CT, Cook CE, McDowall RD, Pittman, Spector S. Analytical methods validation: Bioavailability, bioequivalence and pharmacokinetic studies. Eur J Drug Metab Pharmacokinet. 1991 ;16(4):249–55. doi:10.1007/bf03189968.↩︎
Shah VP, Midha KK, Findlay JWA, Hill HM, Hulse JD, McGilveray IJ, McKay G, Miller KJ, Patnaik RN, Powell ML, Tonelli A, Viswanathan CT, Yacobi A. Bioanalytical Method Validation – A Revisit with a Decade of Progress. Pharm Res. 2000; 17: 1551–7. doi:10.1023/a:1007669411738↩︎
Viswanathan CT, Bansal S, Booth B, DeStefano AJ, Rose MJ, Sailstad J, Shah VP, Skelly JP, Swann PG, Weiner R. Workshop / Conference Report – Quantitative Bioanalytical Methods Validation and Implementation: Best Practices for Chromatographic and Ligand Binding Assays. AAPS J. 2007; 24(10): 1962–73. doi:10.1007/s11095-007-9291-7.↩︎
Anderson S. Individual Bioequivalence: A problem of Switchability. Biopharm Rep. 1993; 2(2): 1–11.↩︎
Endrényi L, Schulz M. Individual Variation and the Acceptance of Average Bioequivalence. Drug Inform J. 1993; 27(1): 195–201. doi:10.1177/009286159302700135.↩︎
Endrényi L. A method for the evaluation of individual bioequivalence. Int J Clin Pharmacol. 1994; 32(9): 497–508. PMID 7820334.↩︎
Esinhart JD, Chinchilli VM. Extension to use of tolerance intervals for the assessment of individual bioequivalence. J Biopharm Stat. 1994; 4: 39–52. doi:10.1080/10543409408835071.↩︎
Chow S-C, Liu J-p. Current issues in bioequivalence trials. Drug Inform J. 1995; 29: 795–804. doi:10.1177/009286159502900302.↩︎
Chen ML. Individual bioequivalence. A regulatory update. J Biopharm Stat. 1997. 7(1): 5–11. doi:10.1080/10543409708835162.↩︎
Hauck WW, Anderson S. Commentary on individual bioequivalence by ML Chen. J Biopharm Stat. 1997; 7(1): 13–6. doi:10.1080/10543409708835163.↩︎
Liu J-p, Chow S-C. Some thoughts on individual bioequivalence. J Biopharm Stat. 1997; 7(1): 41–8. doi:10.1080/10543409708835168.↩︎
Midha KK, Rawson MJ, Hubbard JW. Prescribability and switchability of highly variable drugs and drug products. J Contr Rel. 1999; 62(1-2): 33–40. doi:10.1016/s0168-3659(99)00050-4.↩︎
FDA, CDER. Guidance for Industry. Statistical Approaches to Establishing Bioequivalence. Rockville. Jan 2001. Download.↩︎
Chow S-C, Shao J, Wang H. Individual bioequivalence testing under 2 × 3 designs. Stat Med. 2002; 21(5): 629–48. doi:10.1002/sim.1056.↩︎
Chow S-C, Liu J-p. Design and Analysis of Bioavailability and Bioequivalence Studies. Boca Raton: Chapman & Hall/CRC Press; 3rd edition 2009. ISBN 978-1-58488-668-6. ★ p. 596–8.↩︎
Hauschke D, Steinijans VW, Pigeot I. Bioequivalence Studies in Drug Development. Methods and Applications. Chichester: Wiley; 2007. ISBN 0-470-09475-3. ★ p. 209.↩︎
Patterson S. A Review of the Development of Biostatistical Design and Analysis Techniques for Assessing In Vivo Bioequivalence: Part Two. Ind J Pharm Sci. 2001; 63(3): 169–86. Open Access.↩︎
FDA, CDER. Guidance for Industry. Bioavailability and Bioequivalence Studies for Orally Administered Drug Products — General Considerations. Rockville. March 2003. Internet Archive.↩︎
Schall R, Endrényi L. Bioequivalence: tried and tested. Cardiovasc J Afr. 2010. 21(2): 69–70. PMCID 3721767. Free Full text.↩︎
Senn S. Conference Proceedings: Challenging Statistical Issues in Clinical Trials. Decisions and Bioequivalence. 2000.↩︎
FDA, CDER, CVM. Guidance for Industry. Bioanalytical Method Validation. Rockville. May 2001. Internet Archive.↩︎
FDA, CDER, , CVM. Guidance for Industry. Bioanalytical Method Validation. Silver Spring. May 2018. Download.↩︎
EMEA, CHMP. Guideline on Validation of Bioanalytical Methods. Draft. London. 19 November 2009. Online.↩︎
EMA, CHMP. Guideline on Validation of Bioanalytical Methods. London. 21 July 2011. Online.↩︎
ICH. Bioanalytical Method Validation And Study Sample Analysis. M10. 22 May 2022. Online.↩︎
FDA, CDER. Guidance for Industry. Bioequivalence Studies With Pharmacokinetic Endpoints for Drugs Submitted Under an ANDA. Draft. Silver Spring. August 2021. Download.↩︎
EMEA, CHMP. Guideline on the Investigation of Bioequivalence. London. 20 January 2010. Online.↩︎
Health Canada. Guidance Document. Comparative Bioavailability Standards: Formulations Used for Systemic Effects. Ottawa. 2018/06/08. Online.↩︎
WHO/PQT: medicines. Application of reference-scaled criteria for AUC in bioequivalence studies conducted for submission to PQT/MED. Geneva. 02 July 2021. Online.↩︎
Schütz H. Highly Variable Drugs and Type I Error. Presentation at: 6th International Workshop – GBHI 2024. Rockville, MD. 16 April 2024. Online.↩︎
Paixão P, García Arieta A, Silva N, Petric Z, Bonelli M, Morais JAG, Blake K, Gouveia LF. A Two-Way Proposal for the Determination of Bioequivalence for Narrow Therapeutic Index Drugs in the European Union. Pharmaceut. 2024; 16: 598. doi:10.3390/pharmaceutics16050598. Open Access.↩︎
Hofmann J. Bioequivalence of early exposure: tmax & pAUC. Presentation at: BioBridges. Prague. 21 September 2023. Online.↩︎
Abdallah HY. An area correction method to reduce intrasubject variability in bioequivalence studies. J Pharm Pharmaceut Sci. 1998; 1(2): 60–5. Open Access.↩︎
Paixão P, Gouveia LF, Morais JAG. An alternative single dose parameter to avoid the need for steady-state studies on oral extended-release drug products. Eur J Pharmaceut Biopharmaceut. 2012; 80(2): 410–7. doi:10.1016/j.ejpb.2011.11.001.↩︎
Senn S. Cross-over Trials in Clinical Research. Chichester: Wiley; 2nd edition 2002. ISBN 0-471-49653-7. ★↩︎
Wellek S. Testing Statistical Hypotheses of Equivalence. Boca Raton: Chapman & Hall/CRC; 2003. ISBN 978-1-5848-8160-5. ★↩︎
Amidon G, Lesko L, Midha K, Shah V, Hilfinger J. International Bioequivalence Standards: A New Era. Ann Arbor: TSRL; 2006. ISBN 10-0-9790119-0-6.↩︎
Kanfer I, Shargel L, editors. Generic Product Development. International Regulatory Requirements for Bioequivalence. New York: informa healthcare; 2010. ISBN 978-0-8493-7785-3.↩︎
Bolton S, Bon C. Pharmaceutical Statistics. Practical and Clinical Applications. New York: informa healthcare; 5th edition 2010. ISBN 978-1-4200-7422-2. ★↩︎
Davit B, Braddy AC, Conner DP, Yu LX. International Guidelines for Bioequivalence of Systemically Available Orally Administered Generic Drug Products: A Survey of Similarities and Differences. AAPS J. 2013; 15(4): 974–90. doi:10.1208/s12248-013-9499-x. Free Full Text.↩︎
Yu LX, Li BV, editors. FDA Bioequivalence Standards. New York: Springer; 2014. ISBN 978-1-4939-1251-0.↩︎
Jones B, Kenward MG. Design and Analysis of Cross-Over Trials. Boca Raton: CRC Press. 3rd edition 2015. ISBN 978-1-4398-6142-4. ★↩︎
Kanfer I, editor. Bioequivalence Requirements in Various Global Jurisdictions. New York: Springer; 2017. ISBN 978-3-319-88542-1.↩︎
Patterson S, Jones B. Bioequivalence and Statistics in Clinical Pharmacology. Boca Raton: CRC Press; 2nd edition 2019. ISBN 978-0-3677-8244-3. ★↩︎