Consider allowing JavaScript. Otherwise, you have to be proficient in reading since formulas will not be rendered. Furthermore, the table of contents in the left column for navigation will not be available and codefolding not supported. Sorry for the inconvenience.
Examples in this article were generated with 4.0.5 by the package PowerTOST
.^{1}
More examples are given in the respective vignette.^{2} See also the README on GitHub for an overview and the online manual^{3} for details and a collection of other articles.
Abbreviation  Meaning 

(A)BE  (Average) Bioequivalence 
ABEL  Average Bioequivalence with Expanding Limits 
CV_{b}  Betweensubject Coefficient of Variation 
CV_{w}  Withinsubject Coefficient of Variation 
CV_{wT}, CV_{wR}  Withinsubject Coefficient of Variation of the Test and Reference treatment 
H_{0}  Null hypothesis 
H_{1}  Alternative hypothesis (also H_{a}) 
HVD(P)  Highly Variable Drug (Product) 
SABE  Scaled Average Bioequivalence 
What is Average Bioequivalence with Expanding Limits?
For background about inferential statistics see the article about average bioequivalence in a replicate design.
Definitions:
The concept of Scaled Average Bioequivalence (SABE) for HVD(P)s is based on the following considerations:
The conventional confidence interval inclusion approach of ABE \[\begin{matrix}\tag{1}
\theta_1=1\Delta,\theta_2=\left(1\Delta\right)^{1}\\
H_0:\;\frac{\mu_\textrm{T}}{\mu_\textrm{R}}\ni\left\{\theta_1,\,\theta_2\right\}\;vs\;H_1:\;\theta_1<\frac{\mu_\textrm{T}}{\mu_\textrm{R}}<\theta_2,
\end{matrix}\] where \(\small{H_0}\) is the null hypothesis of inequivalence and \(\small{H_1}\) the alternative hypothesis, \(\small{\theta_1}\) and \(\small{\theta_2}\) are the fixed lower and upper limits of the acceptance range, and \(\small{\mu_\textrm{T}}\) are the geometric least squares means of \(\small{\textrm{T}}\) and \(\small{\textrm{R}}\), respectively
is in Scaled Average Bioequivalence (SABE)^{5} modified to \[H_0:\;\frac{\mu_\textrm{T}}{\mu_\textrm{R}}\Big{/}\sigma_\textrm{wR}\ni\left\{\theta_{\textrm{s}_1},\,\theta_{\textrm{s}_2}\right\}\;vs\;H_1:\;\theta_{\textrm{s}_1}<\frac{\mu_\textrm{T}}{\mu_\textrm{R}}\Big{/}\sigma_\textrm{wR}<\theta_{\textrm{s}_2},\tag{2}\] where \(\small{\sigma_\textrm{wR}}\) is the standard deviation of the reference and the scaled limits \(\small{\left\{\theta_{\textrm{s}_1},\,\theta_{\textrm{s}_2}\right\}}\) of the acceptance range depend on conditions given by the agency.
Average Bioequivalence with Expanding Limits (ABEL)^{6} for HVD(P)s is acceptable in numerous jurisdictions (see Fig. 2), wheras directly widening the limits is recommended in the member states of the Gulf Cooperation Council (see Fig. 3).
Alas, we are far away from global harmonization. Expanding / widening of the limits is acceptable for different pharmacokinetic metrics.
PK metric  Jurisdiction 

C_{max}  EMA,^{7} the WHO,^{8} ^{9} Australia,^{10} the East African Community,^{11} ASEAN states,^{12} the Eurasian Economic Union, ^{13} Egypt,^{14} New Zealand,^{15} Chile,^{16} Brazil,^{17} Canada,^{18} GCC.^{19} 
AUC 
Canada WHO (if in a 4period full replicate design). 
C_{min}, C_{τ}  EMA (controlled release products in steady state). 
_{partial} AUC  EMA (controlled release products). 
Based on the switching \(\small{CV_0=30\%}\) we get the switching standard deviation \(\small{s_0=\sqrt{\log_{e}(CV_{0}^{2}+1)}\approx0.2935604\ldots}\), the (rounded) regulatory constant \(\small{k=\frac{\log_{e}1.25}{s_0}\sim0.760}\), and finally the expanded limits \(\small{\left\{\theta_{\textrm{s}_1},\theta_{\textrm{s}_2}\right\}=100\left(\exp(\mp0.760\cdot s_{\textrm{wR}})\right)}\).
In order to apply the methods following conditions have to be fulfilled:
The wording in the guidelines gives the false impression that the methods are straightforward. In fact they are decision schemes which hinge on the estimated variance of the reference treatment \(\small{s_{\textrm{wR}}^{2}}\).
If \(\small{CV_\textrm{wR}\leq30\%}\) the study has to be assessed for ABE (left branch) or for ABEL (right branch) otherwise.
In the ABELbranch there is an ‘upper cap’ of scaling (\(\small{uc=50\%}\) except for Health Canada, where \(\small{uc\approx 57.382\%}\) ). Furthermore, the point estimate (\(\small{PE}\)) has to lie within 80.00 – 125.00%.
The Gulf Cooperation Council recommend a simplified variant with fixed widened limits of 75.00 – 133.33% if \(\small{CV_\textrm{wR}>30\%}\). There is no upper cap of scaling. The \(\small{PE}\) has also to lie within 80.00 – 125.00%.
Since the applicability of these approaches depends on the realized values (\(\small{CV_\textrm{wR}}\), \(\small{PE}\)) in the particular study – which are naturally unknown beforehand – analytical solutions for power (and hence, the sample size) do not exist.
Therefore, extensive simulations of potential combinations have to be employed.
Cave: Under certain conditions the methods may lead to an inflated Type I Error (increased patient’s risk).^{20} It will be elaborated in another article.
For the evaluation I recommend the package replicateBE
^{21} (note that evaluation for the GCC is not implemented yet).
Where do these numbers come from?
With the regulatory constant \(\small{k=0.76}\) we get in the \(\small{\log_{e}}\)scale a linear expansion from \(\small{CV_\textrm{wR}=30\%}\) to \(\small{uc}\) based on \(\small{s_\textrm{wR}=\sqrt{\log_{e}(CV_\textrm{wR}^{2})+1}}\) and \(\small{100\left(\exp(\mp k\cdot s_\textrm{wR})\right)}\).
Hence, with \(\small{CV_\textrm{wR}=30\%}\) we get
\(\small{\left\{\theta_1,\theta_2\right\}=100\left(\exp(\mp0.76\cdot 0.2935604)\right)=\left\{80.00,\,125.00\right\}}\).
With \(\small{uc=50\%}\) we get
\(\small{\left\{\theta_{\textrm{s}_1},\,\theta_{\textrm{s}_2}\right\}=100\left(\exp(\mp0.76\cdot 0.4723807)\right)=\left\{69.83678,\,143.19102\right\}}\).
For Health Canada with \(\small{uc=57.382\%}\) we get \(\small{\left\{\theta_{\textrm{s}_1},\,\theta_{\textrm{s}_2}\right\}=100\left(\exp(\mp0.76\cdot 0.5335068)\right)=\left\{66.66667,\,150.00000\right\}}\).
A clinically not relevant \(\small{\Delta}\) of 30% (leading a fixed range of 70.00 – 142.86%) was acceptable in Europe (in exceptional cases even for AUC) if prespecified in the protocol. A replicate design was not required.^{22} ^{23} ^{24}
A clinically not relevant \(\small{\Delta}\) of 25% for C_{max} (75.00 – 133.33%) was acceptable for the EMA if the study was performed in a replicate design and \(\small{CV_\textrm{wR}>30\%}\).^{25} A clinically not relevant \(\small{\Delta}\) of 25% for C_{max} (75.00 – 133.33%) is currently acceptable in South Africa.^{26}
Hence, this excursion into history may explain the upper cap of scaling in ABEL and the widened limits for the GCC.
I assume that Health Canada’s 66.7 – 150.0% are no more than ‘nice numbers’ – as usual.^{27}
A basic knowledge of R is required. To run the scripts at least version 1.4.8 (20190829) of PowerTOST
is required and 1.5.3 (20210118) suggested.
Any version of R would likely do, though the current release of PowerTOST
was only tested with version 3.6.3 (20200229) and later.
Note that in all functions of PowerTOST
the arguments (say, the assumed T/Rratio theta0
, the assumed coefficient of variation CV
, etc.) have to be given as ratios and not in percent.
sampleN.scABEL()
gives balanced sequences (i.e., an equal number of subjects is allocated to all sequences). Furthermore, the estimated sample size is the total number of subjects.
All examples deal with studies where the response variables likely follow a lognormal distribution, i.e., we assume a multiplicative model (ratios instead of differences). We work with \(\small{\log_{e}}\)transformed data in order to allow analysis by the ttest (requiring differences).
It may sound picky but ‘sample size calculation’ (as used in most guidelines and alas, in some publications and textbooks) is sloppy terminology. In order to get prospective power (and hence, a sample size), we need five values:
where
In other words, obtaining a sample size is not an exact calculation like \(\small{2\times2=4}\) but always just an estimation.
“Power Calculation – A guess masquerading as mathematics.
Of note, it is extremely unlikely that all assumptions will be exactly realized in a particular study. Hence, calculating retrospective (a.k.a. post hoc, a posteriori) power is not only futile but plain nonsense.^{29}
Since generally the withinsubject variability \(\small{CV_\textrm{w}}\) is smaller than the betweensubject variability \(\small{CV_\textrm{b}}\), crossover studies are so popular. Of note, there is no relationship between \(\small{CV_\textrm{w}}\) and \(\small{CV_\textrm{b}}\). An example are drugs which are subjected to polymorphic metabolism. For these drugs \(\small{CV_\textrm{w}\ll CV_\textrm{b}}\).
Except for Health Canada (where a mixedeffects model is required) the recommended evaluation by an ANOVA assumes homoscedasticity (\(\small{CV_\textrm{wR}=CV_\textrm{wT}}\)), which is – more often than not – wrong.
It is a prerequisite that no carryover from one period to the next exists. Only then the comparison of treatments will be unbiased.^{30} Carryover is elaborated in another article.
Studies in a replicate design can be not only performed in healthy volunteers but also in patients with a stable disease (e.g., asthma).
The sample size cannot be directly estimated,
in SABE only power simulated for an already given sample size.
“Power. That which statisticians are always calculating but never have.
Let’s start with PowerTOST
.
library(PowerTOST) # attach it to run the examples
The sample size functions’ defaults are:
Argument  Default  Meaning 

alpha

0.05

Nominal level of the test. 
targetpower

0.80

Target (desired) power. 
theta0

0.90

Assumed T/Rratio. 
theta1

0.80

Lower BElimit in ABE and lower PEconstraint in ABEL. 
theta2

1.25

Upper BElimit in ABE and upper PEconstraint in ABEL. 
design

"2x3x3"

Treatments × Sequences × Periods. 
regulator

"EMA"

Guess… 
nsims

1e05

Number of simulations. 
print

TRUE

Output to the console. 
details

TRUE

Show regulatory settings and sample size search. 
setseed

TRUE

Set a fixed seed (recommended for reproducibility). 
For a quick overview of the regulatory limits use the function scABEL()
– for once in percent according to the guidelines.
data.frame(regulator = "EMA", CV = c(30, 50),
df1 <L = NA, U = NA,
cap = c("lower", "upper"))
data.frame(regulator = "HC",
df2 <CV = c(30, 57.382),
L = NA, U = NA,
cap = c("lower", "upper"))
for (i in 1:2) {
3:4] < sprintf("%.2f%%", 100*scABEL(df1$CV[i]/100))
df1[i, 3:4] < sprintf("%.1f%%", 100*scABEL(df2$CV[i]/100,
df2[i, regulator = "HC"))
}$CV < sprintf("%.3f%%", df1$CV)
df1$CV < sprintf("%.3f%%", df2$CV)
df2if (packageVersion("PowerTOST") >= "1.5.3") {
data.frame(regulator = "GCC",
df3 <CV = c(30, 50), L = NA, U = NA,
cap = c("lower", "  "))
for (i in 1:2) {
3:4] < sprintf("%.2f%%", 100*scABEL(df3$CV[i]/100,
df3[i, regulator="GCC"))
}$CV < sprintf("%.3f%%", df3$CV)
df3
}if (packageVersion("PowerTOST") >= "1.5.3") {
print(df1, row.names = F); print(df2, row.names = F); print(df3, row.names = F)
else {
} print(df1, row.names = F); print(df2, row.names = F)
}
R> regulator CV L U cap
R> EMA 30.000% 80.00% 125.00% lower
R> EMA 50.000% 69.84% 143.19% upper
R> regulator CV L U cap
R> HC 30.000% 80.0% 125.0% lower
R> HC 57.382% 66.7% 150.0% upper
R> regulator CV L U cap
R> GCC 30.000% 80.00% 125.00% lower
R> GCC 50.000% 75.00% 133.33% 
The sample size functions of PowerTOST
use a modification of Zhang’s method^{31} for the first guess.
# Note that theta0 = 0.90 is the default
sampleN.scABEL(CV = 0.45, targetpower = 0.80,
design = "2x2x4", details = TRUE)
R>
R> +++++++++++ scaled (widened) ABEL +++++++++++
R> Sample size estimation
R> (simulation based on ANOVA evaluation)
R> 
R> Study design: 2x2x4 (4 period full replicate)
R> logtransformed data (multiplicative model)
R> 1e+05 studies for each step simulated.
R>
R> alpha = 0.05, target power = 0.8
R> CVw(T) = 0.45; CVw(R) = 0.45
R> True ratio = 0.9
R> ABE limits / PE constraint = 0.8 ... 1.25
R> EMA regulatory settings
R>  CVswitch = 0.3
R>  cap on scABEL if CVw(R) > 0.5
R>  regulatory constant = 0.76
R>  pe constraint applied
R>
R>
R> Sample size search
R> n power
R> 24 0.7539
R> 26 0.7846
R> 28 0.8112
An alternative to simulating the ‘key statistics’ is by subject simulations via the function sampleN.scABEL.sdsims()
. However, it comes with a price, speed.
R> method n power rel.speed
R> ‘key statistics’ 28 0.81116 1
R> subject simulations 28 0.81196 18
Throughout the examples I’m referring to studies in a single center – not multiple groups within them or multicenter studies. That’s another story.
We assume a CV of 0.45, a T/Rratio of 0.90, a target a power of 0.80, and want to perform the study in a 2sequence 4period full replicate study (TRTRRTRT or TRRTRTTR or TTRRRRTT) for the EMA’s ABEL.
Since theta0 = 0.90
,^{32} targetpower = 0.80
, and regulator = "EMA"
are defaults of the function, we don’t have to give them explicitely. As usual in bioequivalence, alpha = 0.05
is employed (we will assess the study by a \(\small{100(12\,\alpha)=90\%}\) confidence interval). Hence, you need to specify only the CV
(assuming \(\small{CV_\textrm{wT}=CV_\textrm{wR}}\)) and design = "2x2x4"
.
To shorten the output, use the argument details = FALSE
.
sampleN.scABEL(CV = 0.45, design = "2x2x4", details = FALSE)
R>
R> +++++++++++ scaled (widened) ABEL +++++++++++
R> Sample size estimation
R> (simulation based on ANOVA evaluation)
R> 
R> Study design: 2x2x4 (4 period full replicate)
R> logtransformed data (multiplicative model)
R> 1e+05 studies for each step simulated.
R>
R> alpha = 0.05, target power = 0.8
R> CVw(T) = 0.45; CVw(R) = 0.45
R> True ratio = 0.9
R> ABE limits / PE constraint = 0.8 ... 1.25
R> Regulatory settings: EMA
R>
R> Sample size
R> n power
R> 28 0.8112
Sometimes we are not interested in the entire output and want to use only a part of the results in subsequent calculations. We can suppress the output by stating the additional argument print = FALSE
and assign the result to a data.frame (here df
).
sampleN.scABEL(CV = 0.45, design = "2x2x4",
df <details = FALSE, print = FALSE)
Although you could access the elements by the number of the column(s), I don’t recommend that, since in various functions these numbers are different and hence, difficult to remember.
Let’s retrieve the column names of df
:
names(df)
> [1] "Design" "alpha" "CVwT"
R> [4] "CVwR" "theta0" "theta1"
R> [7] "theta2" "Sample size" "Achieved power"
R> [10] "Target power" "nlast" R
Now we can access the elements of df
by their names. Note that double square brackets [[…]]
have to be used (single ones are used to access elements by their numbers).
"Sample size"]]
df[[> [1] 28
R"Achieved power"]]
df[[> [1] 0.81116 R
If you insist in accessing elements by columnnumbers, use single square brackets […]
.
8:9]
df[> Sample size Achieved power
R> 1 28 0.81116 R
With 28 subjects (14 per sequence) we achieve the power we desire.
What happens if we have one dropout?
power.scABEL(CV = 0.45, design = "2x2x4",
n = df[["Sample size"]]  1)
R> Unbalanced design. n(i)=14/13 assumed.
R> [1] 0.79848
Below the 0.80 we desire.
Since dropouts are common, it makes sense to include / dose more subjects in order to end up with a number of eligible subjects which is not lower than our initial estimate.
Let us explore that in the next section.
We define two supportive functions:
n
will be rounded up to achieve balance. function(n, n.seq) {
balance <return(as.integer(n.seq * (n %/% n.seq + as.logical(n %% n.seq))))
}
n
and the anticipated droputrate do.rate
. function(n, do.rate, n.seq) {
nadj <return(as.integer(balance(n / (1  do.rate), n.seq)))
}
In order to come up with a suggestion we have to anticipate a (realistic!) dropout rate. Note that this not the job of the statistician; ask the Principal Investigator.
“It is a capital mistake to theorise before one has data.
The dropoutrate is calculated from the eligible and dosed subjects
or simply \[\begin{equation}\tag{3}
do.rate=1n_\textrm{eligible}/n_\textrm{dosed}
\end{equation}\] Of course, we know it only after the study was performed.
By substituting \(n_\textrm{eligible}\) with the estimated sample size \(n\), providing an anticipated dropoutrate and rearrangement to find the adjusted number of dosed subjects \(n_\textrm{adj}\) we should use \[\begin{equation}\tag{4} n_\textrm{adj}=\;\upharpoonleft n\,/\,(1do.rate) \end{equation}\] where \(\upharpoonleft\) denotes rounding up to the next even number as implemented in the functions above.
An all too common mistake is to increase the estimated sample size \(n\) by the dropoutrate according to \[\begin{equation}\tag{5} n_\textrm{adj}=\;\upharpoonleft n\times(1+do.rate) \end{equation}\] If you used \((5)\) in the past – you are not alone. In a small survey a whooping 29% of respondents reported to use it.^{33} Consider changing your routine.
“There are no routine statistical questions, only questionable statistical routines.
In the following I specified more arguments to make the function more flexible.
Note that I wrapped the function power.scABEL()
in suppressMessages()
. Otherwise, the function will throw for any odd sample size a message telling us that the design is unbalanced. Well, we know that.
0.45 # withinsubject CV
CV < 0.80 # target (desired) power
target < 0.90 # assumed T/Rratio
theta0 < "2x2x4"
design < 0.15 # anticipated dropoutrate 15%
do.rate <# might be realively high due
# to the 4 periods
as.integer(substr(design, 3, 3))
n.seq < scABEL(CV) # expanded limits
lims < sampleN.scABEL(CV = CV, theta0 = theta0,
df <targetpower = target,
design = design,
details = FALSE,
print = FALSE)
# calculate the adjusted sample size
nadj(df[["Sample size"]], do.rate, n.seq)
n.adj <# (decreasing) vector of eligible subjects
n.adj:df[["Sample size"]]
n.elig < paste0("Assumed CV : ",
info <
CV,"\nAssumed T/R ratio : ",
theta0,"\nExpanded limits : ",
sprintf("%.4f\u2026%.4f",
1], lims[2]),
lims["\nPE constraints : ",
sprintf("%.4f\u2026%.4f",
0.80, 1.25), # fixed in ABEL
"\nTarget (desired) power : ",
target,"\nAnticipated dropoutrate: ",
do.rate,"\nEstimated sample size : ",
"Sample size"]], " (",
df[["Sample size"]]/n.seq, "/sequence)",
df[["\nAchieved power : ",
signif(df[["Achieved power"]], 4),
"\nAdjusted sample size : ",
" (", n.adj/n.seq, "/sequence)",
n.adj, "\n\n")
# explore the potential outcome for
# an increasing number of dropouts
signif((n.adj  n.elig) / n.adj, 4)
do.act < data.frame(dosed = n.adj,
df <eligible = n.elig,
dropouts = n.adj  n.elig,
do.act = do.act,
power = NA)
for (i in 1:nrow(df)) {
$power[i] < suppressMessages(
dfpower.scABEL(CV = CV,
theta0 = theta0,
design = design,
n = df$eligible[i]))
}cat(info); print(round(df, 4), row.names = FALSE)
R> Assumed CV : 0.45
R> Assumed T/R ratio : 0.9
R> Expanded limits : 0.7215…1.3859
R> PE constraints : 0.8000…1.2500
R> Target (desired) power : 0.8
R> Anticipated dropoutrate: 0.15
R> Estimated sample size : 28 (14/sequence)
R> Achieved power : 0.8112
R> Adjusted sample size : 34 (17/sequence)
R> dosed eligible dropouts do.act power
R> 34 34 0 0.0000 0.8720
R> 34 33 1 0.0294 0.8630
R> 34 32 2 0.0588 0.8553
R> 34 31 3 0.0882 0.8456
R> 34 30 4 0.1176 0.8340
R> 34 29 5 0.1471 0.8237
R> 34 28 6 0.1765 0.8112
In the worst case (6 dropouts) we end up with the originally estimated sample size of 28. Power preserved, mission accomplished. If we have less dropouts, splendid – we gain power.
If we would have adjusted the sample size acc. to (5) we would have dosed also 34 subjects.
Cave: This might not always be the case… If the anticipated dropout rate of 15% is realized in the study, we would have also 28 eligible subjects (power 0.8112). In this example we achieve still more than our target power but the loss might be relevant in other cases.
As said in the preliminaries, calculating post hoc power is futile.
“There is simple intuition behind results like these: If my car made it to the top of the hill, then it is powerful enough to climb that hill; if it didn’t, then it obviously isn’t powerful enough. Retrospective power is an obvious answer to a rather uninteresting question. A more meaningful question is to ask whether the car is powerful enough to climb a particular hill never climbed before; or whether a different car can climb that new hill. Such questions are prospective, not retrospective.
However, sometimes we are interested in it for planning the next study.
If you give and odd total sample size n
, power.scABEL()
will try to keep sequences as balanced as possible and show in a message how that was done.
27
n.act <signif(power.scABEL(CV = 0.45, n = n.act,
design = "2x2x4"), 6)
R> Unbalanced design. n(i)=14/13 assumed.
R> [1] 0.79848
Say, our study was more unbalanced. Let us assume that we dosed 34 subjects, the total number of subjects was also 27 but all dropouts occured in one sequence (unlikely but possible).
Instead of the total sample size n
we can give the number of subjects of each sequence as a vector (the order is generally^{34} not relevant, i.e., it does not matter which element refers to which sequence).
By setting details = TRUE
we can retrieve the components of the simulations (probability to pass each test).
"2x2x4"
design < 0.45
CV < 34
n.adj < 27
n.act < n.adj / 2
n.s1 < n.act  n.s1
n.s2 < 0.90
theta0 < suppressMessages(
post.hoc <power.scABEL(CV = CV,
n = c(n.s1, n.s2),
theta0 = theta0,
design = design,
details = TRUE))
power.TOST(CV = CV,
ABE.xact <n = c(n.s1, n.s2),
theta0 = theta0,
design = design)
nchar(as.character(n.adj))
sig.dig < paste0("%", sig.dig, ".0f (%",
fmt <".0f dropouts)")
sig.dig, cat(paste0("Dosed subjects: ", sprintf("%2.0f", n.adj),
"\nEligible : ",
sprintf(fmt, n.act, n.adj  n.act),
"\n Sequence 1 : ",
sprintf(fmt, n.s1, n.adj / 2  n.s1),
"\n Sequence 1 : ",
sprintf(fmt, n.s2, n.adj / 2  n.s2),
"\nPower overall : ",
sprintf("%.5f", post.hoc[1]),
"\n p(ABEL) : ",
sprintf("%.5f", post.hoc[2]),
"\n p(PE) : ",
sprintf("%.5f", post.hoc[3]),
"\n p(ABE) : ",
sprintf("%.5f", post.hoc[4]),
"\n p(ABE) exact: ",
sprintf("%.5f", ABE.xact), "\n"))
R> Dosed subjects: 34
R> Eligible : 27 ( 7 dropouts)
R> Sequence 1 : 17 ( 0 dropouts)
R> Sequence 1 : 10 ( 7 dropouts)
R> Power overall : 0.77670
R> p(ABEL) : 0.77671
R> p(PE) : 0.91595
R> p(ABE) : 0.37628
R> p(ABE) exact: 0.37418
The components of overall power are:
p(ABEL)
is the probability that the confidence interval is within the expanded / widened limits.p(PE)
is the probability that the point estimate is within 80.00–125.00%.p(ABE)
is the probability of passing conventional Average Bioequivalence.power.TOST()
– confirming the simulation’s result.Of course, in a particular study you will provide the numbers in the n
vector directly.
The CV and the T/Rratio are only assumptions. Whatever their origin might be (literature, previous studies) they carry some degree of uncertainty. Hence, believing^{35} that they are the true ones may be risky.
Some statisticians call that the ‘CarvedinStone’ approach.
Say, we performed a pilot study in 16 subjects and estimated the CV as 0.45.
The \(\alpha\) confidence interval of the CV is obtained via the \(\chi^2\)distribution of its error variance \(\sigma^2\) with \(\small{n2}\) degrees of freedom. \[\begin{matrix}\tag{6} s^2=\log_{e}(CV^2+1)\\ L=\frac{(n1)\,s^2}{\chi_{\alpha/2,\,n2}^{2}}\leq\sigma^2\leq\frac{(n1)\,s^2}{\chi_{1\alpha/2,\,n2}^{2}}=U\\ \left\{lower\;CL,\;upper\;CL\right\}=\left\{\sqrt{\exp(L)1},\sqrt{\exp(U)1}\right\} \end{matrix}\]Let’s calculate the 95% confidence interval of the CV to get an idea.
16 # pilot study
m < CVCL(CV = 0.45, df = m  2,
ci <side = "2sided", alpha = 0.05)
signif(ci, 4)
R> lower CL upper CL
R> 0.3223 0.7629
Surprised? Although 0.45 is the best estimate for planning the next study, there is no guarantee that we will get exactly the same outcome. Since the \(\chi^2\)distribution is skewed to the right, it is more likely to get a higher CV than a lower one in the planned study.
If we plan the study based on 0.45, we would opt for 28 subjects like in the examples before (not adjusted for the dropoutrate).
If the CV will be lower, we loose power (less expansion). But what if it will be higher? Depends. Since we may expand more, we gain power. However, if we cross the upper cap of scaling (50% for the EMA), we will loose power. But how much?
Let’s explore what might happen at the confidence limits of the CV.
16
m < CVCL(CV = 0.45, df = m  2,
ci <side = "2sided", alpha = 0.05)
28
n < data.frame(CV = c(ci[["lower CL"]], 0.45,
comp <"upper CL"]]),
ci[[power = NA)
for (i in 1:nrow(comp)) {
$power[i] < power.scABEL(CV = comp$CV[i],
compdesign = "2x2x4",
n = n)
}1] < signif(comp[, 1], 4)
comp[, 2] < signif(comp[, 2], 6)
comp[, print(comp, row.names = FALSE)
R> CV power
R> 0.3223 0.73551
R> 0.4500 0.81116
R> 0.7629 0.60158
Might hurt.
What can we do? The larger the previous study was, the larger the degrees of freedom and hence, the narrower the confidence interval of the CV. In simple terms: The estimate is more certain. On the other hand, it also means that very small pilot studies are practically useless. What happens when we plan the study based on the confidence interval of the CV?
seq(12, 30, 6)
m < data.frame(n.pilot = m, CV = 0.45,
df <l = NA, u = NA,
n.low = NA, n.CV = NA, n.hi = NA)
for (i in 1:nrow(df)) {
3:4] < CVCL(CV = 0.45, df = m[i]  2,
df[i, side = "2sided",
alpha = 0.05)
5] < sampleN.scABEL(CV = df$l[i], design = "2x2x4",
df[i, details = FALSE,
print = FALSE)[["Sample size"]]
6] < sampleN.scABEL(CV = 0.45, design = "2x2x4",
df[i, details = FALSE,
print = FALSE)[["Sample size"]]
7] < sampleN.scABEL(CV = df$u[i], design = "2x2x4",
df[i, details = FALSE,
print = FALSE)[["Sample size"]]
}
3:4] < signif(df[, 3:4], 4)
df[, names(df)[3:4] < c("lower CL", "upper CL")
print(df, row.names = FALSE)
R> n.pilot CV lower CL upper CL n.low n.CV n.hi
R> 12 0.45 0.3069 0.8744 36 28 56
R> 18 0.45 0.3282 0.7300 34 28 42
R> 24 0.45 0.3415 0.6685 34 28 38
R> 30 0.45 0.3509 0.6334 34 28 34
Small pilot studies are practically useless. One leading generic company has an internal rule to perform pilot studies of HVD(P)s in a full replicate design and at least 24 subjects. Makes sense.
Furthermore, we don’t know where the true T/Rratio lies but we can calculate the lower 95% confidence limit of the pilot study’s point estimate to get an idea about a worst case. Say, it was 0.90.
16
m < 0.45
CV < 0.90
pe < round(CI.BE(CV = CV, pe = 0.90, n = m,
ci <design = "2x2x4"), 4)
if (pe <= 1) {
ci[["lower"]]
cl <else {
} ci[["upper"]]
cl <
}print(cl)
R> [1] 0.7515
Exlore the impact of a relatively 5% lower CV (less expansion) and a relatively 5% lower T/Rratio on power for the given sample size.
28
n < 0.45
CV < 0.90
theta0 < data.frame(CV = c(CV, CV*0.95),
comp1 <power = NA)
data.frame(theta0 = c(theta0, theta0*0.95),
comp2 <power = NA)
for (i in 1:2) {
$power[i] < power.scABEL(CV = comp1$CV[i],
comp1theta0 = theta0,
design = "2x2x4",
n = n)
}$power < signif(comp1$power, 5)
comp1for (i in 1:2) {
$power[i] < power.scABEL(CV = CV,
comp2theta0 = comp2$theta0[i],
design = "2x2x4",
n = n)
}$power < signif(comp2$power, 5)
comp2print(comp1, row.names = F); print(comp2, row.names = F)
R> CV power
R> 0.4500 0.81116
R> 0.4275 0.80095
R> theta0 power
R> 0.900 0.81116
R> 0.855 0.61952
Note the logscale of the xaxis. It demonstrates that power curves are symmetrical around 1 (\(\small{\log_{e}(1)=0}\), where \(\small{\log_{e}(\theta_2)=\left\log_{e}(\theta_1)\right}\)) and we will achieve the same power for \(\small{\theta_0}\) and \(\small{1/\theta_0}\) (e.g., for 0.90 and 1.1111). Contrary to ABE, power is maintained, unless we cross the upper scaling limit, where additionally the PEconstraint becomes increasingly important.
<nitpick> 0.45
CV < 0.10 # direction unknown
delta < "2x2x4"
design < c(1  delta, 1 / (1 + delta),
theta0s <1 + delta, 1 / (1  delta))
sampleN.scABEL(CV = CV, theta0 = 1  delta,
n <design = design,
details = FALSE,
print = FALSE)[["Sample size"]]
data.frame(CV = CV, theta0 = theta0s,
comp1 <base = c(TRUE, rep(FALSE, 3)),
n = n, power = NA)
for (i in 1:nrow(comp1)) {
$power[i] < power.scABEL(CV = CV,
comp1theta0 = comp1$theta0[i],
design = design, n = n)
} sampleN.scABEL(CV = CV, theta0 = 1 + delta,
n <design = design,
details = FALSE,
print = FALSE)[["Sample size"]]
data.frame(CV = CV, theta0 = theta0s,
comp2 <base = c(FALSE, FALSE, TRUE, FALSE),
n = n, power = NA)
for (i in 1:nrow(comp2)) {
$power[i] < power.scABEL(CV = CV,
comp2theta0 = comp2$theta0[i],
design = design, n = n)
}c(2, 5)] < signif(comp1[, c(2, 5)] , 4)
comp1[, c(2, 5)] < signif(comp2[, c(2, 5)] , 4)
comp2[, print(comp1, row.names = F); print(comp2, row.names = F)
R> CV theta0 base n power
R> 0.45 0.9000 TRUE 28 0.8112
R> 0.45 0.9091 FALSE 28 0.8388
R> 0.45 1.1000 FALSE 28 0.8397
R> 0.45 1.1110 FALSE 28 0.8101
R> CV theta0 base n power
R> 0.45 0.9000 FALSE 26 0.7846
R> 0.45 0.9091 FALSE 26 0.8149
R> 0.45 1.1000 TRUE 26 0.8140
R> 0.45 1.1110 FALSE 26 0.7837
</nitpick>
Essentially this leads to the murky waters of prospective sensitivity analyses, which will is covered in another article.
An appetizer to show the maximum deviations (CV, T/Rratio and decreased sample size due to dropouts) which give still a minimum acceptable power of ≥ 0.70:
0.45
CV < 0.90
theta0 < 0.80
target < 0.70
minpower < pa.scABE(CV = CV, theta0 = theta0,
pa <targetpower = target,
minpower = minpower,
design = "2x2x4")
100*(tail(pa$paCV[["CV"]], 1) 
change.CV < pa$plan[["CVwR"]]) /
pa$plan[["CVwR"]]
100*(head(pa$paGMR$theta0, 1) 
change.theta0 < pa$plan$theta0) /
pa$plan[["theta0"]]
100*(tail(pa$paN[["N"]], 1) 
change.n < pa$plan[["Sample size"]]) /
pa$plan[["Sample size"]]
data.frame(parameter = c("CV", "theta0", "n"),
comp <change = c(change.CV,
change.theta0,
change.n))$change < sprintf("%+.2f%%", comp$change)
compnames(comp)[2] < "relative change"
print(pa, plotit = FALSE); print(comp, row.names = FALSE)
R> Sample size plan scABE (EMA/ABEL)
R> Design alpha CVwT CVwR theta0 theta1 theta2 Sample size
R> 2x2x4 0.05 0.45 0.45 0.9 0.8 1.25 28
R> Achieved power Target power
R> 0.81116 0.8
R>
R> Power analysis
R> CV, theta0 and number of subjects leading to min. acceptable power of =0.7:
R> CV= 0.6629, theta0= 0.8719
R> n = 22 (power= 0.7185)
R> parameter relative change
R> CV +47.32%
R> theta0 3.12%
R> n 21.43%
Confirms what we have seen above. As expect the method is robust to changes of the CV. The sample size is also not very sensitive; many overrate the impact of dropouts on power.
As we have seen already above for an ANOVA we have to assume homoscedasticity.
I recommend to perform pilot studies in one of the fully replicated designs. When you are concerned about dropouts or the bioanalytical method requires large sample volumes, opt for one the 2sequence 3period designs (TRTRTR or TRRRTT).
Contrary to the partial replicate design (TRRRTRRRT) you get estimates of both \(\small{CV_\textrm{wT}}\) and \(\small{CV_\textrm{wR}}\). Since pharmaceutical technology improves, it is not uncommon that \(\small{CV_\textrm{wT}<CV_\textrm{wR}}\). If this is the case, you get an incentive in the sample size of the pivotal study (expanding the limits is based on \(\small{CV_\textrm{wR}}\) but the 90% CI on the – pooled – \(\small{s_\textrm{w}^{2}}\)).
\[\begin{matrix}\tag{7} s_\textrm{wT}^{2}=\log_{e}(CV_\textrm{wT}^{2}+1)\\ s_\textrm{wR}^{2}=\log_{e}(CV_\textrm{wR}^{2}+1)\\ s_\textrm{w}^{2}=\left(s_\textrm{wT}^{2}+s_\textrm{wR}^{2}\right)/2\\ CV_\textrm{w}=\sqrt{\exp(s_\textrm{w}^{2})1}\end{matrix}\]
Say, we performed two pilot studies. In the partial replicate we estimated the \(\small{CV_\textrm{w}}\) with 0.45. In the full replicate we estimated \(\small{CV_\textrm{wT}}\) with 0.414 and \(\small{CV_\textrm{wR}}\) with 0.484. Note that the \(\small{CV_\textrm{w}}\) is 0.45 as well. How will that impact the sample size of the pivotal 4period full replicate design?
data.frame(pilot = c("TRRRTRRRT", "TRTRTR"),
comp <CVwT = c(0.45, 0.414),
CVwR = c(0.45, 0.484),
CVw = NA,
n = NA, power = NA)
for (i in 1:nrow(comp)) {
4] < signif(
comp[i, mse2CV((CV2mse(comp$CVwT[i]) +
CV2mse(comp$CVwR[i])) / 2), 3)
5:6] < sampleN.scABEL(CV = c(comp$CVwT[i], comp$CVwR[i]),
comp[i, design = "2x2x4", details = FALSE,
print = FALSE)[8:9]
}print(comp, row.names = FALSE)
R> pilot CVwT CVwR CVw n power
R> TRRRTRRRT 0.450 0.450 0.45 28 0.81116
R> TRTRTR 0.414 0.484 0.45 24 0.80193
Since bioanalytics drives study costs to a great extent, we may safe ~14%.
Note that when you give CV
as twoelement vector, the first element has to be \(\small{CV_\textrm{wT}}\) and the second \(\small{CV_\textrm{wR}}\).
Although acc. to the guidelines it is not required to estimate \(\small{CV_\textrm{wT}}\), its value is ‘nice to know’. Sometimes studies fail only due to the large \(\small{CV_\textrm{wR}}\) thus inflating the confidence interval. In such a case you have at least ammunation to start an argument.
Even if you plan the pivotal study in a partial replicate design (why on earth?) knowing both \(\small{CV_\textrm{wT}}\) and \(\small{CV_\textrm{wR}}\) is useful.
data.frame(pilot = c("TRRRTRRRT", "TRTRTR"),
comp <CVwT = c(0.45, 0.414),
CVwR = c(0.45, 0.484),
CVw = NA,
n = NA, power = NA)
for (i in 1:nrow(comp)) {
4] < signif(
comp[i, mse2CV((CV2mse(comp$CVwT[i]) +
CV2mse(comp$CVwR[i])) / 2), 3)
5:6] < sampleN.scABEL(CV = c(comp$CVwT[i], comp$CVwR[i]),
comp[i, design = "2x3x3", details = FALSE,
print = FALSE)[8:9]
}print(comp, row.names = FALSE)
R> pilot CVwT CVwR CVw n power
R> TRRRTRRRT 0.450 0.450 0.45 39 0.80588
R> TRTRTR 0.414 0.484 0.45 36 0.80973
Again, a smaller sample size is possible.
Note that sampleN.scABEL()
is inaccurate for the partial replicate design if \(\small{CV_\textrm{wT}>CV_\textrm{wR}}\). Let’s reverse the values and compare the results.
data.frame(method = c("key", "subj"), n = NA,
comp <power = NA, rel.speed = NA)
proc.time()[[3]]
st <1, 2:3] < sampleN.scABEL(CV = c(0.484, 0.414),
comp[design = "2x3x3",
details = FALSE,
print = FALSE)[8:9]
proc.time()[[3]]
et <$rel.speed[1] < et  st
comp proc.time()[[3]]
st <2, 2:3] < sampleN.scABEL.sdsims(CV = c(0.484, 0.414),
comp[design = "2x3x3",
details = FALSE,
print = FALSE)[8:9]
proc.time()[[3]]
et <$rel.speed[2] < et  st
comp$rel.speed < signif(comp$rel.speed /
comp comp$rel.speed[1], 3)
1] < c("\u2018key statistics\u2019",
comp["subject simulations")
print(comp, row.names = FALSE)
R> method n power rel.speed
R> ‘key statistics’ 45 0.80212 1.0
R> subject simulations 48 0.80938 80.5
Hence, in such a case use always the function sampleN.scABEL.sdsims()
.
In bioequivalence the pharmacokinetic metrics C_{max} and AUC_{0–t} are mandatory (in some jurisdictions like the FDA additionally AUC_{0–∞}).
We don’t have to worry about multiplicity issues (inflated Type I Error) since if all tests must pass at level \(\alpha\), we are protected by the intersectionunion principle.^{36} ^{37}
We design the study always for the worst case combination, i.e., based on the PK metric requiring the largest sample size. In most jurisdictions wider BE limits are acceptable only for C_{max}. Let’s explore that with different CVs and T/Rratios.
c("Cmax", "AUC")
metrics < c("ABEL", "ABE")
methods < c(0.45, 0.30)
CV < c(0.90, 0.925)
theta0 < "2x2x4"
design < data.frame(metric = metrics, method = methods,
df <CV = CV, theta0 = theta0, n = NA,
power = NA)
1, 5:6] < sampleN.scABEL(CV = CV[1], theta0 = theta0[1],
df[design = design, details = FALSE,
print = FALSE)[8:9]
2, 5:6] < sampleN.TOST(CV = CV[1], theta0 = theta0[2],
df[design = design, print = FALSE)[7:8]
$power < signif(df$power, 5)
df paste0("Sample size based on ",
txt <$metric[df$n == max(df$n)], ".\n")
dfprint(df, row.names = FALSE); cat(txt)
R> metric method CV theta0 n power
R> Cmax ABEL 0.45 0.900 28 0.81116
R> AUC ABE 0.30 0.925 56 0.80896
R> Sample size based on AUC.
Commonly the PK metric evaluated by ABE drives the sample size. That means, the study is ‘overpowered’ for C_{max}.
Let us assume the same T/Rratios for both metrics. Which are the extreme T/Rratios (largest deviations of T from R) for C_{max} giving still the target power?
function(x) {
opt <power.scABEL(theta0 = x, CV = df$CV[1],
design = design,
n = df$n[2])  target
} c("Cmax", "AUC")
metrics < c("ABEL", "ABE")
methods < c(0.45, 0.30)
CV < c(0.90, 0.90)
theta0 < 0.80
target < "2x2x4"
design < data.frame(metric = metrics, method = methods,
df <CV = CV, theta0 = theta0, n = NA,
power = NA)
1, 5:6] < sampleN.scABEL(CV = CV[1], theta0 = theta0[1],
df[design = design, details = FALSE,
print = FALSE)[8:9]
2, 5:6] < sampleN.TOST(CV = CV[1], theta0 = theta0[2],
df[design = design, print = FALSE)[7:8]
$power < signif(df$power, 5)
dfif (theta0[1] < 1) {
uniroot(opt, tol = 1e8,
res <interval = c(0.80 + 1e4, theta0[1]))
else {
} uniroot(opt, tol = 1e8,
res <interval = c(theta0[1], 1.25  1e4))
} unlist(res)
res < c(res[["root"]], 1/res[["root"]])
theta0s < paste0("Target power for ", metrics[1],
txt <" and sample size ",
$n[2], "\nachieved for theta0 ",
dfsprintf("%.4f", theta0s[1]), " or ",
sprintf("%.4f", theta0s[2]), ".\n")
print(df, row.names = FALSE); cat(txt)
R> metric method CV theta0 n power
R> Cmax ABEL 0.45 0.9 28 0.81116
R> AUC ABE 0.30 0.9 84 0.80569
R> Target power for Cmax and sample size 84
R> achieved for theta0 0.8340 or 1.1990.
That means, although we assumed for C_{max} the same T/Rratio as for AUC– with the sample size of 84 required AUC, for C_{max} it can be as low as 0.834 or as high as 1.199, which is a soothing sideeffect.
Health Canada allows ABEL only for AUC whereas for C_{max} conventional ABE has to be employed. Hence, it is the other way ’round.
c("Cmax", "AUC")
metrics < c("ABE", "ABEL")
methods < c(0.45, 0.30)
CV < c(0.90, 0.90)
theta0 < "2x2x4"
design < data.frame(metric = metrics, method = methods,
df <CV = CV, theta0 = theta0, n = NA,
power = NA)
1, 5:6] < sampleN.TOST(CV = CV[1], theta0 = theta0[2],
df[design = design, print = FALSE)[7:8]
2, 5:6] < sampleN.scABEL(CV = CV[1], theta0 = theta0[1],
df[design = design,
regulator = "HC",
details = FALSE,
print = FALSE)[8:9]
$power < signif(df$power, 5)
df paste0("Sample size based on ",
txt <$metric[df$n == max(df$n)], ".\n")
dfprint(df, row.names = FALSE); cat(txt)
R> metric method CV theta0 n power
R> Cmax ABE 0.45 0.9 84 0.80569
R> AUC ABEL 0.30 0.9 30 0.81892
R> Sample size based on Cmax.
Q: Can we use R in a regulated environment and is PowerTOST
validated?
A: About the acceptability of Base R see ‘A Guidance Document for the Use of R in Regulated Clinical Trial Environments’.
The authors of PowerTOST
tried to do their best to provide reliable and valid results. The ‘NEWS’file on CRAN documents the development of the package, bugfixes, and introduction of new methods.
Validation of any software (yes, of SAS as well…) lies in the hands of the user. Execute the script test_ABEL.R
which can be found in the /tests
subdirectory of the package to reproduce tables given in the literature.^{38} You will notice some discrepancies: The authors employed only 10,000 simulations – which is not sufficient for a stable result (see below). Furthermore, they reported the minimum sample size which gives at least the target power, wheras sampleN.scABEL
always rounds up to give balanced sequences.
Q: Shall we throw away our sample size tables?
A: Not at all. File them in your archives to collect dust. Maybe in the future you will be asked by an agency how you arrived at a sample size. But: Don’t use them any more. What you should not do (and hopefully haven’t done before): Interpolate. Power and therefore, the sample size depends in a highly nonlinear fashion on the five conditions listed above, which makes interpolation of values given in table a nontrivial job.
Q: Which of the methods should we use in our daily practice?
A: sampleN.scABEL()
/power.scABEL()
for speed reasons. Only for the partial replicate designs and the – rare – case of CV_{wT} > CV_{wR}, use sampleN.scABEL.sdsims()
/power.scABEL.sdsims()
instead.
Q: I fail to understand your example about dropouts. We finish the study with 28 eligible subjects as desired. Why is the dropoutrate ~18% and not the anticipated 15%?
A: That’s due to rounding up the calculated adjusted sample size (32.94…) to the next even number (34).
If you manage it to dose fractional subjects (I can’t) your dropout rate would indeed equal the anticipated one: 100(1 – 28/32.94…) = 15%. ⬜
Q: Do we have to worry about unbalanced sequences?
A: sampleN.scABEL()
/sampleN.scABEL.sdsims()
will always give the total number of subjects for balanced sequences.
If you are interested in post hoc power, give the sample size as a vector, i.e., power.scABEL(..., n = c(foo, bar, baz)
, where foo
, bar
, and baz
are the number of subjects per sequence.
Q: The default number of simulations in the sample size estimation is 100,000. Why?
A: We found that with this number the simulations are stable. For the background see another article. Of course you can give a larger number in the argument nsims
. However, you shouldn’t decrease the number.
Q: How reliable are the results?
A: As stated above an exact method doesn’t exist. We can only compare the empiric power of the ABEcomponent to the exact one obtained by power.TOST()
. For an example see ‘Post hoc Power’ in the section about Dropouts.
Q: I still have questions. How to proceed?
A: The preferred method is to register at the BEBA Forum and post your question there (please read its Policy first).
You can contact me at [email protected]. Be warned – I will charge you for anything beyond most basic questions.
top of section ↩︎ previous section ↩︎
License
Helmut Schütz 2021
1^{st} version March 23, 2021.
Rendered 20210504 12:23:41 CEST by rmarkdown in 1.78 seconds.
Footnotes and References
Labes D, Schütz H, Lang B. PowerTOST: Power and Sample Size for (Bio)Equivalence Studies. 20210118. CRAN.↩︎
Schütz H. ReferenceScaled Average Bioequivalence. 20201223. CRAN.↩︎
Labes D, Schütz H, Lang B. Package ‘PowerTOST’. January 18, 2021. CRAN.↩︎
Some gastric resistant formulations of diclofenac are HVDPs, practically all topical formulations are HVDPs, whereas diclofenac itself is not a HVD (CV_{w} of a solution ~8%).↩︎
Tóthfalusi L, Endrényi L, GarcíaArieta A. Evaluation of bioequivalence for highly variable drugs with scaled average bioequivalence. Clin Pharmacokinet. 2009; 48(11): 725–43. doi:10.2165/1131804000000000000000.↩︎
ABEL is one variant of SABE. ReferenceScaled Average Bioequivalence (RSABE) is preferred by the FDA and in China. It will be covered in another article.↩︎
European Medicines Agency, Committee for Medicinal Products for Human Use. Guideline on the Investigation of Bioequivalence. London, 20 January 2010. online.↩︎
World Health Organization, Essential Medicines and Health Products: Multisource (generic) pharmaceutical products: guidelines on registration requirements to establish interchangeability. WHO Technical Report Series, No. 1003, Annex 6. Geneva, 28 April 2017. online↩︎
World Health Organization, Prequalification Team: medicines. Guidance Document: Application of referencescaled criteria for AUC in bioequivalence studies conducted for submission to PQTm. Geneva, 22 November 2018. online.↩︎
Australian Government, Department of Health, Therapeutic Goods Administration. European Union and ICH Guidelines adopted in Australia. Guideline on the Investigation of Bioequivalence with TGA Annotations. online.↩︎
East African Community, Medicines and Food Safety Unit. Compendium of Medicines Evaluation and Registration for Medicine Regulation Harmonization in the East African Community, Part III: EAC Guidelines on Therapeutic Equivalence Requirements. online.↩︎
ASEAN States Pharmaceutical Product Working Group. ASEAN Guideline for the Conduct of Bioequivalence Studies. Vientiane, March 2015. online.↩︎
Eurasian Economic Commission. Regulations Conducting Bioequivalence Studies within the Framework of the Eurasian Economic Union. 3 November 2016. online. Russian.↩︎
Ministry of Health and Population, The Specialized Scientific Committee for Evaluation of Bioavailability & Bioequivalence Studies. Egyptian Guideline For Conducting Bioequivalence Studies for Marketing Authorization of Generic Products. Cairo, February 2017. online.↩︎
New Zealand Medicines and Medical Devices Safety Authority. Guideline on the Regulation of Therapeutic Products in New Zealand. Part 6: Bioequivalence of medicines. Wellington, February 2018. online.↩︎
Departamento Agencia Nacional de Medicamentos. Instituto de Salud Pública de Chile. Guia para La realización de estudios de biodisponibilidad comparativa en formas farmacéuticas sólidas de administración oral y acción sistémica. Santiago, December 2018. Spanish.↩︎
ANVISA. Critérios para a condução de estudos de biodisponibilidade relativa/bioequivalência. Consulta Pública Nº 760/2019. Brasilia, December 27, 2019. Portuguese.↩︎
Health Canada. Guidance Document. Comparative Bioavailability Standards: Formulations Used for Systemic Effects. Ottawa, 08 June 2018. online.↩︎
Executive Board of the Health Ministers’ Council for GCC States. The GCC Guidelines for Bioequivalence. March 2016. online.↩︎
Labes D, Schütz H. Inflation of Type I Error in the Evaluation of Scaled Average Bioequivalence, and a Method for its Control. Pharm Res. 2016: 33(11); 2805–14. doi:10.1007/s1109501620061.↩︎
Schütz H, Tomashevskiy M, Detlew Labes D. replicateBE: Average Bioequivalence with Expanding Limits (ABEL). Version 1.0.15. 20200724. CRAN.↩︎
Commission of the European Communities, CPMP Working Party on Efficacy of Medicinal Products. Note for Guidance: Investigation of Bioavailability and Bioequivalence. Appendix III: Technical Aspects of Bioequivalence Statistics. Brussels, December 1991.↩︎
Commission of the European Communities, CPMP Working Party on Efficacy of Medicinal Products. Note for Guidance: Investigation of Bioavailability and Bioequivalence. June 1992.↩︎
Blume H, Mutschler E, editors. Bioäquivalenz. Qualitätsbewertung wirkstoffgleicher Fertigarzneimittel. Frankurt/Main: GoviVerlag; 6. Ergänzungslieferung 1996. German.↩︎
European Medicines Agency. CHMP Efficacy Working Party, Therapeutic Subgroup on Pharmacokinetics. Questions & Answers on the Bioavailability and Bioequivalence Guideline. London, 27 July 2006. download.↩︎
Medicines Control Council. Registration of Medicines. Biostudies. Pretoria, June 2015. online.↩︎
Of note, for NTID’s Health Canada’s limits are 90.0 – 112.0% and not 90.00 – 111.11% like in other jurisdictions. Guess the reason.↩︎
That’s contrary to other methods, where the CV is an assumption as well.↩︎
Hoenig JM, Heisey DM. The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis. Am Stat. 2001; 55(1): 19–24. doi:10.1198/000313001300339897.↩︎
There is no statistical method to ‘correct’ for unequal carryover. It can only be avoided by design, i.e., a sufficiently long washout between periods. According to the guidelines subjects with predose concentrations > 5% of their C_{max} can by excluded from the comparison if stated in the protocol.↩︎
Zhang P. A Simple Formula for Sample Size Calculation in Equivalence Studies. J Biopharm Stat. 2003; 13(3): 529–538. doi:10.1081/BIP120022772.↩︎
Don’t be tempted to give a ‘better’ T/Rratio – even if based on a pilot or a previous study. It is a natural property of HVD(P)s that the T/Rratio varies between studies. Don’t be overly optimistic!↩︎
Schütz H. Sample Size Estimation in Bioequivalence. Evaluation. 20201023. BEBA Forum.↩︎
The only exception is design = "2x2x3"
(the 3period full replicate with 3 periods and sequences TRTRTR). Then the first element is for sequence TRT and the second for RTR.↩︎
Quoting my late father: »If you believe, go to church.«↩︎
Berger RL, Hsu JC. Bioequivalence Trials, IntersectionUnion Tests and Equivalence Confidence Sets. Stat Sci. 1996; 11(4): 283–302. JSTOR:2246021.↩︎
Zeng A. The TOST confidence intervals and the coverage probabilities with R simulation. March 14, 2014.↩︎
Tóthfalusi L, Endrényi L. Sample Sizes for Designing Bioequivalence Studies for Highly Variable Drugs. J Pharm Pharmacol Sci. 2012; 15(1): 73–84. doi:10.18433/j3z88f. Open access.↩︎