Consider allowing JavaScript. Otherwise, you have to be proficient in reading since formulas will not be rendered. Furthermore, the table of contents in the left column for navigation will not be available and code-folding not supported. Sorry for the inconvenience.

Examples in this article were generated with 4.0.5 by the package PowerTOST.1

More examples are given in the respective vignette.2 See also the README on GitHub for an overview and the online manual3 for details and a collection of other articles.

• The right-hand badges give the respective section’s ‘level’.

1. Basics about sample size methodology – requiring no or only limited statistical expertise.

1. These sections are the most important ones. They are – hopefully – easily comprehensible even for novices.

1. A somewhat higher knowledge of statistics and/or R is required. May be skipped or reserved for a later reading.
• Click to show / hide R code.

# Introduction

What is the porpose of a power analysis?

Since it is unlikely that we will observed exactly our assumed values in the planned study, in a power analysis we explore the impact on power for potential deviations.

top of section ↩︎

## Preliminaries

A basic knowledge of R is required. To run the scripts at least version 1.4.3 (2016-11-01) of PowerTOST is suggested. Any version of R would likely do, though the current release of PowerTOST was only tested with version 3.6.3 (2020-02-29) and later.

Note that in all functions of PowerTOST the arguments (say, the assumed T/R-ratio theta0, the BE-limits (theta1, theta2), the assumed coefficient of variation CV, etc.) have to be given as ratios and not in percent.

Sample sizes are given for equally sized groups (parallel design) and for balanced sequences (crossovers, replicate designs). Furthermore, the estimated sample size is the total number of subjects, which is always an even number (parallel design) or a multiple of the number of sequence (crossovers, replicate designs) not subjects per group or sequence – like in some other software packages).

## Terminology

In order to get prospective power (and hence, a sample size), we need five values:

1. The level of the test (in BE commonly 0.05),
2. the BE-margins (commonly 0.80 – 1.25),
3. the desired (or target) power,
4. the variance (commonly expressed as a coefficient of variation), and
5. the deviation of the test from the reference treatment.

1 – 2 are fixed by the agency,
3 is set by the sponsor (commonly to 0.80 – 0.90), and
4 – 5 are just (uncertain!) assumptions.

Realization: Observations (in a sample) of a random variable (of the population).

Since it is extremely unlikely that all assumptions will be exactly realized in a particular study, it is worthwhile to prospectively explore the impact of potential deviations from assumptions on power.

We don’t have to be concerned about values which are ‘better’ (i.e., a lower CV, and/or a T/R-ratio closer to unity, and/or a lower than anticipated dropout-rate) because we will gain power. On the other hand, any change towards the ‘worse’ will negatively impact power.

# Sample size → Power

The sample size cannot be directly estimated, only power calculated for an already given sample size.

What we can do: Estimate the sample size based on the conditions given above and vary our assumptions (4 – 5) accordingly. Furthermore, we can decrease the sample size to explore the impact of dropouts on power.

# Examples

Let’s start with PowerTOST.

library(PowerTOST) # attach it to run the examples

Although we could do that ‘by hand’, there a three functions supporting power analyses: pa.ABE() for conventional Average Bioequivalence (ABE), pa.scABE() for Scaled Average Bioequivalence (ABEL, RSABE), and pa.NTIDFDA() for the FDA’s RSABE for NTIDs.

Note that the functions support only homoscedasticity (CVwT = CVwR).

The’ defaults common to all functions are:

Argument Default Meaning
alpha 0.05 Nominal level of the test.
targetpower 0.80 Target (desired) power.
minpower 0.70 Minimum acceptable power.
theta1 0.80 Lower BE-limit in ABE and lower PE-constraint in Scaled Average Bioequivalence.
theta2 1.25 Upper BE-limit in ABE and upper PE-constraint in Scaled Average Bioequivalence.

The default of the assumed T/R-ratio theta0 is 0.95 in pa.ABE(), 0.90 in pa.scABE(), and 0.975 in pa.NTIDFDA(). In pa.ABE() any design can be specified (defaults to "2x2").
In pa.scABE() the default design is "2x3x3" and regulator = "EMA".
In pa.NTIDFDA() only a 2-sequence 4-period full replicate design is implemented in accordance with the FDA’s guidance.4

Let’s explore in the following examples given in other articles.

## Parallel Design

We assume a CV of 0.40, a T/R-ratio of 0.95, target a power of 0.80 (see also the example in the corresponding article about the parallel design), and keep the default minimum power 0.70.

CV <- 0.40
x  <- pa.ABE(CV = CV, design = "parallel") # assign to an object
dev.new(width = 6.16, height = 6.7)
plot(x, pct = FALSE, ratiolabel = "theta0")
par(op) Fig.1 Parallel design (CV 0.40, T/R ratio 0.95).

Such patterns are typical for ABE. Note that the x-axes of the second and third panels are reversed (decreasing).

Most sensitive parameter is the T/R-ratio $$\small{\theta_0}$$, which can decrease from the assumed 0.95 to 0.9276 (maxium relative deviation –2.36%) until we reach our minimum acceptable power of 0.70. The CV is much less sensitive, which can increase from the assumed 0.40 to 0.4552 (relative +13.8%). The least sensitive is the sample size itslef, which can decrease from the planned 130 subjects to 103 (relative –20.8%). Hence, dropouts are in general of the least concern.

## Crossover Design

### Conventional ABE

We assume a CV of 0.25, a T/R-ratio of 0.95, target a power of 0.80 (see also the example in the corresponding article about the 2×2×2 design), and keep the default minimum power 0.70.

CV <- 0.25
x  <- pa.ABE(CV = CV)
dev.new(width = 6.16, height = 6.7)
plot(x, pct = FALSE, ratiolabel = "theta0")
par(op) Fig.2 2×2×2 design (CV 0.25, T/R ratio 0.95).

The order of parameters influencing power is as common in ABE: $$\small{\theta_0}$$ $$\small{\gg}$$ CV $$\small{>}$$ n.

### ABE for NTIDs

We assume a CV of 0.125, a T/R-ratio of 0.975, target a power of 0.80 (see also the example in the corresponding article about the 2×2×2 design), and keep the default minimum power 0.70.

CV     <- 0.125
theta0 <- 0.975
theta1 <- 0.90
x  <- pa.ABE(CV = CV, theta0 = theta0, theta1 = theta1)
dev.new(width = 6.16, height = 6.7)
plot(x, pct = FALSE, ratiolabel = "theta0")
par(op) Fig.3 2×2×2 design(CV 0.125, T/R ratio 0.975, NTID).

## HVDs / HVDPs

### ABEL

#### EMA and most others

We assume a CV of 0.45, a T/R-ratio of 0.90, target a power of 0.80 (see also the example in the corresponding article), keep the default minimum power 0.70, and want to perform the study in a 2-sequence 4-period full replicate study (TRTR|RTRT or TRRT|RTTR or TTRR|RRTT).

CV     <- 0.45
design <- "2x2x4"
x      <- pa.scABE(CV = CV, design = design)
dev.new(width = 6.16, height = 6.7)
plot(x, pct = FALSE, ratiolabel = "theta0")
par(op) Fig.3 2×2×4 replicate design (CV 0.45, T/R ratio 0.90, ABEL for the EMA).

Here the pattern is different to ABE. The idea of reference-scaling is to maintain power irrespective of the CV. Therefore, the CV is the least sensitive parameter – it can increase from the assumed 0.45 to 0.6629 (relative +47.3%). If the CV decreases, power decreases as well because we can expand limits less.

Let’s dive deeper. x is an S3 object.5 Contained in its list are the data.frames paCV, paGMR, and paN. With the function power.scABEL() we can assess the components of power at specific values of the CV.

n       <- x$paN$N
max.pwr <- which(x$paCV$pwr == max(x$paCV$pwr))
CVs     <- c(head(x$paCV$CV, 1),
x$paCV$CV[x$paCV$pwr == min(x$paCV$pwr[1:max.pwr])],
CV, x$paCV$CV[max.pwr], tail(x$paCV$CV, 1))
res     <- data.frame(CV = CVs, V1 = NA, V2 = NA, V3 = NA, V4 = NA)
for (i in 1:nrow(res)) {
res[i, 2:5] <- suppressMessages(
power.scABEL(CV = CVs[i], design = design,
n = n, details = TRUE))
}
names(res)[2:5] <- c("p(BE)", "p(BE-ABEL)",
"p(BE-PE)", "p(BE-ABE)")
print(signif(res, 4), row.names = FALSE)
R>      CV  p(BE) p(BE-ABEL) p(BE-PE) p(BE-ABE)
R>  0.3000 0.7383     0.7383   0.9832    0.6786
R>  0.3167 0.7351     0.7351   0.9784    0.6403
R>  0.4500 0.8112     0.8116   0.9266    0.4107
R>  0.4766 0.8164     0.8165   0.9161    0.3764
R>  0.6629 0.7000     0.7000   0.8467    0.1546

p(BE) is the overall power, p(BE-ABEL) is the power of ‘pure’ ABEL (without the PE-restriction), and p(BE-PE) is the power of the criterion ‘point estimate within acceptance range’ alone. p(BE-ABE) is the power of the conventional ABE test for comparative purposes.

We see that close to the upper cap of scaling (at CV 50%) power starts to decrease because we cannot expanding the limits any more (maximum expansion 69.84 – 143-19%). Furthermore, the PE-restriction becomes more important.

As above but according to the rules of Health Canada (upper cap of scaling ~57.4% instead of 50%).6

CV     <- 0.45
design <- "2x2x4"
x      <- pa.scABE(CV = CV, design = design,
regulator = "HC")
dev.new(width = 6.16, height = 6.7)
plot(x, pct = FALSE, ratiolabel = "theta0")
par(op) Fig.4 2×2×4 replicate design (CV 0.45, T/R ratio 0.90, ABEL for Health Canada).

Similar to the EMA but more relaxed due to the higher upper cap.

#### GCC

That’s a special case because for any CV ≥ 30% the limits are directly widened to 75.00 – 133.33% and there is no upper cap of scaling.7

CV     <- 0.45
design <- "2x2x4"
x      <- pa.scABE(CV = CV, design = design,
regulator = "GCC")
dev.new(width = 6.16, height = 6.7)
plot(x, pct = FALSE, ratiolabel = "theta0")
par(op) Fig.5 2×2×4 replicate design (CV 0.45, T/R ratio 0.90, ABEL for the GCC).

### RSABE

As above but for the U.S. FDA and China’s CDE (see also the example in the corresponding article).

CV     <- 0.45
design <- "2x2x4"
x      <- pa.scABE(CV = CV, design = design,
regulator = "FDA")
dev.new(width = 6.16, height = 6.7)
plot(x, pct = FALSE, ratiolabel = "theta0")
par(op) Fig.6 2×2×4 replicate design (CV 0.45, T/R ratio 0.90, RSABE).

Due to unlimited scaling the CV is even less important than in the other methods.

## NTIDs (FDA)

We assume a CV of 0.125, a T/R-ratio of 0.975 (the function’s default), target a power of 0.80, and want to perform the study in a 2-sequence 4-period full replicate study (TRTR|RTRT or TRRT|RTTR or TTRR|RRTT). Note that such a design is mandatory for the FDA.

CV     <- 0.125
design <- "2x2x4"
x      <- pa.NTIDFDA(CV = CV, design = design)
dev.new(width = 6.16, height = 6.9)
plot(x, pct = FALSE, ratiolabel = "theta0")
par(op) Fig.7 2×2×4 replicate design (CV 0.125, T/R ratio 0.975, RSABE for NTIDs).

Here the CV shows a different behavior to both RSABE for HVDs / HVDPs and ABE (see above). Maximum power is seen at ~20.4% and we observe two minima.

# Cave!

A power analysis is  not  a substitute for the ‘Sensitivity Analysis’ recommended by the ICH8 because in a real study a combination of all effects occurs simultaneously. It is up to you to decide on reasonable combinations and analyze their respective power.

How to explore deviations simultaneously will be elaborated in another article.

Footnotes and References

1. Labes D, Schütz H, Lang B. PowerTOST: Power and Sample Size for (Bio)Equivalence Studies. 2021-01-18. CRAN.↩︎

2. Schütz H. Power Analysis. 2021-01-18. CRAN.↩︎

3. Labes D, Schütz H, Lang B. Package ‘PowerTOST’. January 18, 2021. CRAN.↩︎

4. FDA. Office of Generic Drugs. Draft Guidance on Warfarin Sodium.. Recommended Dec 2012.↩︎

5. Wickham H. Advanced R. 2019-08-08. The S3 object system.↩︎

6. Health Canada. Guidance Document – Comparative Bioavailability Standards: Formulations Used for Systemic Effects. Ottawa, 2018/06/08.↩︎

7. Executive Board of the Health Ministers’ Council for GCC States. The GCC Guidelines for Bioequivalence. Version 2.4. 3/02/2011.↩︎

8. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. ICH Harmonised Tripartite Guideline. Statistical Principles for Clinical Trials. 5 February 1998. E9 Step 4.↩︎