Consider allowing JavaScript. Otherwise, you have to be proficient in reading since formulas will not be rendered. Furthermore, the table of contents in the left column for navigation will not be available and code-folding not supported. Sorry for the inconvenience.

• The right-hand badges give the respective section’s ‘level’.

1. Basics about assessment of (bio)equivalence trials – requiring no or only limited statistical expertise.

1. These sections are the most important ones. They are – hopefully – easily comprehensible even for novices.

1. A somewhat higher knowledge of statistics and/or R is required. May be skipped or reserved for a later reading.

1. An advanced knowledge of statistics and/or R is required. Not recommended for beginners in particular.
• Click to show / hide R code.

# Introduction

What is a significant sequence (carryover) effect and do we have to care about one?

Sometimes regulatory assessors ask for the ‘justification’ of a significant sequence effect.

I will try to clarify why such a justification is not possible and – a bit provocative – asking for one demonstrates a lack of understanding the underlying statistical concepts.

All examples deal with the 2×2×2 Crossover (RT|TR) but are applicable to any kind of Crossover (Higher-Order, Replicate Designs) as well. A basic knowledge of R does not hurt.

# Model

As the most simple case the 2×2×2 design ($$\small{\textrm{RT}|\textrm{TR}}$$) including a term for carryover is considered.1

Let sequences and periods be indexed by $$\small{i}$$ and $$\small{k}$$ ($$\small{i,k=1,2}$$) and $$\small{n_i}$$ subjects are randomized to sequence $$\small{i}$$. Let $$\small{Y_{ijk}}$$ be the $$\small{\log_{e}}$$-transformed PK-response of the $$\small{j}$$th subject in the $$\small{i}$$th sequence at the $$\small{k}$$th period. Then $Y_{ijk}=\mu_h+s_{ij}+\pi_k+\lambda_c+e_{ijk},\tag{1}$ where $$\small{\mu_h}$$ is effect of treatment $$\small{h}$$, where $$\small{h=\textrm{R}}$$ if $$\small{i=k}$$ and $$\small{h=\textrm{T}}$$ if $$\small{i\neq k}$$,
$$\small{s_{ij}}$$ is the fixed effect of the $$\small{j}$$th subject in the $$\small{i}$$th sequence,
$$\small{\pi_{j}}$$ is the fixed effect of the $$\small{k}$$th period,
$$\small{\lambda_{c}}$$ is the carryover effect of the corresponding formulation from period 1 to period 2, where
$$\small{c=\textrm{R}}$$ if $$\small{i=1,k=2}$$,
$$\small{c=\textrm{T}}$$ if $$\small{i=2,k=2}$$,
$$\small{\lambda_{c}=0}$$ if $$\small{i=1,2,k=1}$$,
$$\small{e_{ijk}}$$ is the random error in observing $$\small{Y_{ijk}}$$ (of the $$\small{j}$$th subject in the $$\small{k}$$th period and $$\small{i}$$th sequence).

The subject effects $$\small{s_{ij}}$$ are independently2 normally distributed with expected mean $$\small{0}$$ and between-subject variance $$\small{\sigma_{\textrm{b}}^{2}}$$. The random errors $$\small{e_{ijk}}$$ are independent and normally distributed with expected mean $$\small{0}$$ and variances $$\small{\sigma_{\textrm{wR}}^{2}}$$, $$\small{\sigma_{\textrm{wT}}^{2}}$$ for the reference and test treatment. The treatment variances are given by $$\small{\sigma_{\textrm{R}}^{2}=\sigma_{\textrm{b}}^{2}+\sigma_{\textrm{wR}}^{2}}$$ and $$\small{\sigma_{\textrm{T}}^{2}=\sigma_{\textrm{b}}^{2}+\sigma_{\textrm{wT}}^{2}}$$. Note that these components cannot be separately estimated in a nonreplicative design.

Therefore, the layout of the RT|TR design is:

Sequence Period 1 Period 2
1 ($$\small{\textrm{RT}}$$) $$\small{Y_{1j1}=\mu_\textrm{R}+s_{1j}+\pi_1+e_{1j1}}$$
$$\small{j=1,\ldots,n_1}$$
$$\small{Y_{1j2}=\mu_\textrm{T}+s_{1j}+\pi_2+\lambda_\textrm{R}+e_{1j2}}$$
$$\small{j=1,\ldots,n_1}$$
2 ($$\small{\textrm{TR}}$$) $$\small{Y_{2j1}=\mu_\textrm{T}+s_{2j}+\pi_1+e_{2j1}}$$
$$\small{j=1,\ldots,n_2}$$
$$\small{Y_{2j2}=\mu_\textrm{R}+s_{2j}+\pi_2+\lambda_\textrm{T}+e_{2j2}}$$
$$\small{j=1,\ldots,n_2}$$

The expected population means and variances are given by:

Sequence Period 1 Period 2
1 ($$\small{\textrm{RT}}$$) $$\small{E(Y_{1j1})=\mu_\textrm{R}+\pi_1}$$
$$\small{Var(Y_{1j1})=\sigma_{\textrm{R}}^{2}=\sigma_{\textrm{b}}^{2}+\sigma_{\textrm{wR}}^{2}}$$
$$\small{j=1,\ldots,n_1}$$
$$\small{E(Y_{1j2})=\mu_\textrm{T}+\pi_2+\lambda_\textrm{R}}$$
$$\small{Var(Y_{1j2})=\sigma_{\textrm{T}}^{2}=\sigma_{\textrm{b}}^{2}+\sigma_{\textrm{wT}}^{2}}$$
$$\small{j=1,\ldots,n_1}$$
2 ($$\small{\textrm{TR}}$$) $$\small{E(Y_{2j1})=\mu_\textrm{T}+\pi_1}$$
$$\small{Var(Y_{2j1})=\sigma_{\textrm{T}}^{2}=\sigma_{\textrm{b}}^{2}+\sigma_{\textrm{wT}}^{2}}$$
$$\small{j=1,\ldots,n_2}$$
$$\small{E(Y_{2j2})=\mu_\textrm{R}+\pi_2+\lambda_\textrm{T}}$$
$$\small{Var(Y_{2j2})=\sigma_{\textrm{R}}^{2}=\sigma_{\textrm{b}}^{2}+\sigma_{\textrm{wR}}^{2}}$$
$$\small{j=1,\ldots,n_2}$$

Assuming equal carryover ($$\small{\lambda_\textrm{R}=\lambda_\textrm{T}}$$), the term $$\small{\lambda_c}$$ can be dropped from the model.

# Implementation

Most agencies (like the EMA) require an ANOVA of $$\small{\log_{e}}$$ transformed responses, i.e., a linear model where all effects are fixed. In R:

model <- lm(log(PK) ~ sequence + subject%in%sequence +
period + treatment, data = data)

In SAS:

proc glm data = data;
class subject period sequence treatment;
model logPK = sequence subject(sequence)
period treatment;
run;

Note that in bioequivalence subjects generally are uniquely coded. If subject 1 is randomized to sequence 1, there is not ‘another’ subject 1 randomized to sequence 2. Hence, the term subject(sequence) stated in all guidelines is a bogus one. Replacing it with the simple term subject gives exactly [sic] the same point estimate and residual variance.
It avoids the many lines in the output denoted with . in SAS and not estimable in Phoenix/WinNonlin.

Heresy: You could remove sequence from the models entirely.

model <- lm(log(PK) ~ subject + period + treatment,
data = data)
Try it with one of your studies. I bet that you get the same PE and residual variance as in the other models. Misusing the first example further down:
Model PE MSE 90% CI
sequence + subject(sequence) 0.954135 0.0556768 0.856834–1.062490
sequence + subject(sequence) 0.954135 0.0556768 0.856834–1.062490
sequence+ subject(sequence) 0.954135 0.0556768 0.856834–1.062490

Quod erat demonstrandum. So much about over-parameterized models.

Other agencies (the FDA, Health Canada) require a mixed-effects model, where $$\small{s_{ij}}$$ is a random effect.
In SAS:

proc mixed data = data;
class subject period sequence treatment;
model logY = sequence period treatment;
random subject(sequence);
run;

Unfortunately due to different ‘design philosophies’ the SAS-code cannot be translated to R.

# Problems

## Wrong Test

Note that the MSE of sequence has to be tested against the MSE of subject(sequence) by $$(2)$$3 – or sequence in the simple model4 – and not against the residual MSE by $$(3)$$ – which is a within-subject term – in R’s default ANOVA (Type I), i.e., $\small{F=\frac{MSE_{\,\textrm{sequence}}}{MSE_{\,\textrm{subject(sequence))}}}}\tag{2}$ $\small{F=\frac{MSE_{\,\textrm{sequence}}}{MSE_{\,\textrm{residual}}}}\tag{3}$

Its $$\small{p}$$-value is calculated by $\small{p(F)=F_{\,\nu_1,\,\nu_2}}\tag{4}$ where $$\small{\nu_1}$$ are the sequences’ degrees of freedom (i.e., the number of sequences minus one) and $$\small{\nu_2}$$ are the subjects’ degrees of freedom (in a balanced 2×2×2 design $$\small{n-2}$$).

## Confounding

Confounding factors cannot be statistically separated.

The sequence effect is confounded with5

• the carryover effect, and
• the treatment-by-period interaction.

A statistically significant sequence effect could indicate that there is

• a true sequence effect, or
• a true treatment-by-period interaction, or
• a failure of randomization.

Only the last potential cause can be ruled out during monitoring or in an audit/inspection.

## Test

A statistical method to ‘correct’ for a true sequence effect does not exist – it can only be avoided by design.6

• the study is performed in healthy subjects or patients with a stable disease,
• the drug is not an endogenous entity, and
• an adequate washout is maintained (no pre-dose concentrations in higher periods).

A ‘Two-stage analysis’ was proposed.7

• An F-test for a significant sequence effect at α = 0.10.8
• If p(F) < 0.10, evaluation of the first period as a parallel design.
• Otherwise, evaluate the study as a Crossover.
if (anova(model)["sequence", "Pr(>F)"] < 0.1) {
mod.par <- lm(logPK ~ treatment,
data[data$period == 1, ]) } One of the pioneers of bioequivalence noted already in 1988: Note that the carryover effect is, essentially, the sequence effect, which can be tested against the sum of squares within sequence. If this carryover effect exists, then it confounds the test on formulations. […] My own experience with a large number of comparative bioavailability trials has led me to believe that significant carryover effects (at the 0.05 level) tend to occur in about 5% of the trials; in other words, I believe that carryover effects do not normally exist. — Wilfred J. Westlake9 In 1989 it was analytically demonstrated that the ‘Two-stage analysis’ is statistically flawed and should be avoided because it leads not only to biased estimates but – as any pre-test – inflates the Type I Error.10 The FDA rightly stated about testing at $$\small{\alpha=0.1}$$ in 1992:11 Even if there were no true sequence effect, no unequal residual effects, and no period-by-treatment statistical interaction, approximately ten out of one hundred standard two-treatment crossover studies would be likely to show an apparent sequence effect, if the testing is carried out at the ten percent level of significance. If the ANOVA test for the presence of a sequence effect result in statistical significance, the actual cause cannot be determine from the data alone. This theoretical consideration was confirmed in a large meta-study of well-controlled 2- and 3-treatment crossover trials.12 As expected, a significant sequence effect was observed at approximately the level of the test and hence, was considered a statistical artifact. Parts of Stephen Senn’s textbook13 can be understood as an essay against ‘adjusting’ for carryover. My personal interpretation is that – if conditions stated above hold – even testing for a sequence effect should be abandoned. In a later publication we find:14 Testing for carryover in bioequivalence studies […] is not recommended and, moreover, can be harmful. It seems that whenever carry-over is ‘detected’ under such conditions, it is a false positive and researchers will be led to use an inferior estimate, abandoning a superior one. In the same sense: Our advice therefore is to avoid having a test for a carry-over difference by doing all that is possible to remove the possibility that such a difference will exist. This requires using wash-out periods of adequate length between the treatment periods. — Byron Jones & Michael G. Kenward15 It’s laudable that the EMA16 stated in 2010: A test for carry-over is not considered relevant and no decisions regarding the analysis (e.g. analysis of the first period only) should be made on the basis of such a test. The potential for carry-over can be directly addressed by examination of the pre-treatment plasma concentrations in period 2 (and beyond if applicable). Nothing is mentioned by the FDA.17 Alas, Health Canada states in the current guidance:18 A summary of the testing of sequence, period and formulation effects should be presented. Explanations for significant effects should be given. Oh dear, period effects mean out. For the lacking relevance of significant formulation effects see another article. According to the Consolidated Standards of Reporting Trials (CONSORT) Statement:19 In particular, given that a carry over effect can neither be identified with sufficient power, nor can adjustment be made for such an effect in the 2×2 crossover design, the assumption needs to be made that any carry over effects are negligible and some justification presented for this. The description of the design should make clear how many interventions were tested, through how many periods, including information on the length of the treatment, run in, and washout periods (if any). — CONSORT 2010 statement: extension to randomised crossover trials20 From a consultant’s diary.21 I see a bit more than 10%… Perhaps I’m a victim of selection bias because quite often I get corpses on my desk to perform a – rarely useful – autopsy.22 Assessors regularly ask about a significant sequence effect… # Carryover Simulations Here we face a situation where we need simulations. Except in a meta-study, retrospectively assessing studies is of no value because we neither know whether a true carryover was present and, if yes, its extent. The simulations are based on $$\small{\log_{e}}$$ normal distributed data, true T/R-ratio 0.95, true CV 0.25, n 28 (balanced sequences), and a small period effect $$\small{\pi_2=+0.02}$$ (responses in the second period higher than in the first). Note how the sequence effect is tested. In the first example below we get an $$\small{F}$$-value of $$\small{0.0817/0.339\approx0.241}$$ by $$(2)$$ instead of $$\small{0.0817/0.0557\approx1.468}$$ by $$(3)$$. Only the former is correct $$\small{(p(F)\approx0.628}$$ instead of $$\small{p(F)\approx0.237)}$$. As expected, we see a significant subject effect because CVinter > CVintra and subjects are different indeed. If you see in a particular study a subject effect which is not highly significant, that’s a suspicious case. We need a supportive function for the simulations. Cave: 169 LOC. sim.effects <- function(alpha = 0.05, theta0 = 0.95, CV = 0.25, CVb, n = 28L, per.effect = 0, carryover = c(0, 0)) { # carryover: first element R -> T, second element T -> R set.seed(123456789) if (missing(CVb)) CVb <- CV * 1.5 # arbitrary sd <- sqrt(log(CV^2 + 1)) sd.b <- sqrt(log(CVb^2 + 1)) subj <- 1:n # within subjects T <- rnorm(n = n, mean = log(theta0), sd = sd) R <- rnorm(n = n, mean = 0, sd = sd) # between subjects TR <- rnorm(n = n, mean = 0, sd = sd.b) T <- T + TR R <- R + TR TR.sim <- exp(mean(T) - mean(R)) data <- data.frame(subject = rep(subj, each = 2), period = 1:2L, sequence = c(rep("RT", n), rep("TR", n)), treatment = c(rep(c("R", "T"), n/2), rep(c("T", "R"), n/2)), logPK = NA) subj.T <- subj.R <- 0L # subject counters for (i in 1:nrow(data)) { # clumsy but transparent if (data$treatment[i] == "T") {
subj.T  <- subj.T + 1L
if (data$period[i] == 1L) { data$logPK[i] <- T[subj.T]
} else {
data$logPK[i] <- T[subj.T] + per.effect + carryover[1] } } else { subj.R <- subj.R + 1L if (data$period[i] == 1L) {
data$logPK[i] <- R[subj.R] } else { data$logPK[i] <- R[subj.T] + per.effect + carryover[2]
}
}
}
per.mean <- exp(c(mean(data$logPK[data$period == 1]),
mean(data$logPK[data$period == 2])))
seq.mean <- exp(c(mean(data$logPK[data$sequence == "RT"]),
mean(data$logPK[data$sequence == "TR"])))
trt.mean <- exp(c(mean(data$logPK[data$treatment == "R"]),
mean(data$logPK[data$treatment == "T"])))
cs       <- c("subject", "period", "sequence", "treatment")
data[cs] <- lapply(data[cs], factor)
heading.typeI   <- paste("\nType I ANOVA Table: Crossover")
heading.typeIII <- c(paste("\nType III ANOVA Table: Crossover"),
"sequence vs")
model  <- lm(logPK ~ sequence + subject%in%sequence +
period + treatment,
data = data)
TR.est <- exp(coef(model)[["treatmentT"]])
CI     <- as.numeric(exp(confint(model, "treatmentT",
level = 1 - 2 * alpha)))
m.form <- toString(model$call) m.form <- substr(m.form, 5, nchar(m.form)-6) typeIII <- typeI <- anova(model) attr(typeI, "heading")[1] <- m.form attr(typeI, "heading")[2] <- heading.typeI if ("sequence:subject" %in% rownames(typeIII)) { # nested MSdenom <- typeIII["sequence:subject", "Mean Sq"] df2 <- typeIII["sequence:subject", "Df"] } else { # simple MSdenom <- typeIII["subject", "Mean Sq"] df2 <- typeIII["subject", "Df"] } fvalue <- typeIII["sequence", "Mean Sq"] / MSdenom df1 <- typeIII["sequence", "Df"] typeIII["sequence", 4] <- fvalue typeIII["sequence", 5] <- pf(fvalue, df1, df2, lower.tail = FALSE) attr(typeIII, "heading")[1] <- heading.typeIII[1] if ("sequence:subject" %in% rownames(typeIII)) { attr(typeIII, "heading")[2] <- paste(heading.typeIII[2], "sequence:subject") } else { attr(typeIII, "heading")[2] <- paste(heading.typeIII[2], "subject") } CV.est <- sqrt(exp(typeI["Residuals", "Mean Sq"])-1) print(typeI, digits = 4, signif.legend = FALSE) print(typeIII, digits = 4, signif.legend = FALSE) if (typeIII["sequence", "Pr(>F)"] < 0.1) { # ‘Two-stage analysis’ mod.par <- lm(logPK ~ treatment, data[data$period == 1, ])
TR.par.est <- exp(coef(mod.par)[["treatmentT"]])
CI.par     <- as.numeric(exp(confint(mod.par, "treatmentT",
level = 1 - 2 * alpha)))
aovPar     <- anova(mod.par)
m.form     <- toString(mod.par$call) m.form <- substr(m.form, 5, nchar(m.form)-26) attr(aovPar, "heading")[1] <- paste0("\n", m.form) attr(aovPar, "heading")[2] <- "ANOVA Table: Period 1 parallel" CV.par.est <- sqrt(exp(aovPar["Residuals", "Mean Sq"])-1) print(aovPar, digits = 4, signif.legend = FALSE) } txt <- paste("\ntheta0 ", sprintf("%.4f", theta0), "\nPeriod effect ", sprintf("%+g", per.effect), "\nCarryover (R->T, T->R)", paste(sprintf("%+g", carryover), collapse = ", "), "\nPeriod means (1, 2) ", paste(sprintf("%.4f", per.mean), collapse = ", "), "\nSequence means (RT, TR) ", paste(sprintf("%.4f", seq.mean), collapse = ", "), "\nTreatment means (R, T) ", paste(sprintf("%.4f", trt.mean), collapse = ", "), "\nSimulated T/R-ratio ", sprintf("%.4f", TR.sim)) if (typeIII["sequence", "Pr(>F)"] < 0.1) { txt <- paste(txt, "\n\n Analysis of both periods") } else { txt <- paste(txt, "\n") } txt <- paste(txt, "\nEstimated T/R-ratio ", sprintf("%.4f", TR.est), "\nBias ") if (sqrt(.Machine$double.eps) >= abs(TR.est - TR.sim)) {
txt <- paste(txt, "\u00B10.00%")
} else {
txt <- paste(txt, sprintf("%+.2f%%",
100*(TR.est-TR.sim)/TR.sim))
}
txt <- paste(txt, sprintf("%s%.f%% %s",
"\n", 100*(1-2*alpha),
"CI                  "),
paste(sprintf("%.4f", CI),
collapse = " \u2013 "))
if (round(CI[1], 4) >= 0.80 &
round(CI[2], 4) <= 1.25) {
txt <- paste(txt, "(pass)")
} else {
txt <- paste(txt, "(fail)")
}
txt <- paste(txt, "\nEstimated CV (within)   ",
sprintf("%.4f", CV.est))
if (typeIII["sequence", "Pr(>F)"] < 0.1) {
txt <- paste(txt, "\n\n    Analysis of first period",
"\nEstimated T/R-ratio     ",
sprintf("%.4f", TR.par.est),
"\nBias                   ",
sprintf("%+.2f%%",
100*(TR.par.est-TR.sim)/TR.sim),
paste0("\n",
sprintf("%.f%% CI                  ",
100*(1-2*alpha))),
paste(sprintf("%.4f", CI.par),
collapse = " \u2013 "))
if (round(CI.par[1], 4) >= 0.80 &
round(CI.par[2], 4) <= 1.25) {
txt <- paste(txt, "(pass)")
} else {
txt <- paste(txt, "(fail)")
}
txt <- paste(txt, "\nEstimated CV (total)    ",
sprintf("%.4f", CV.par.est))
}
cat(txt, "\n")
}
ANOVA tables’ significance codes
.   <0.1  & ≥0.05
*   <0.05 & ≥0.01
**  <0.01 & ≥0.001
*** <0.001

## Equal

Large but equal carryover ($$\small{\lambda_\textrm{R}=\lambda_\textrm{T}=+0.2}$$).

sim.effects(theta0 = 0.95, CV = 0.25, n = 28,
per.effect = +0.02,
carryover = c(+0.2, +0.2))
R> logPK ~ sequence + subject %in% sequence + period + treatment
R>
R> Type I ANOVA Table: Crossover
R>                  Df Sum Sq Mean Sq F value   Pr(>F)
R> sequence          1  0.082  0.0817   1.468 0.236597
R> period            1  0.875  0.8746  15.709 0.000514 ***
R> treatment         1  0.031  0.0309   0.554 0.463251
R> sequence:subject 26  8.814  0.3390   6.089 8.52e-06 ***
R> Residuals        26  1.448  0.0557
R>
R> Type III ANOVA Table: Crossover
R> sequence vs sequence:subject
R>                  Df Sum Sq Mean Sq F value   Pr(>F)
R> sequence          1  0.082  0.0817   0.241 0.627562
R> period            1  0.875  0.8746  15.709 0.000514 ***
R> treatment         1  0.031  0.0309   0.554 0.463251
R> sequence:subject 26  8.814  0.3390   6.089 8.52e-06 ***
R> Residuals        26  1.448  0.0557
R>
R> theta0                   0.9500
R> Period effect           +0.02
R> Carryover  (R->T, T->R) +0.2, +0.2
R> Period means     (1, 2)  1.0101, 1.2970
R> Sequence means (RT, TR)  1.1017, 1.1892
R> Treatment means  (R, T)  1.1718, 1.1180
R> Simulated T/R-ratio      0.9541
R>
R> Estimated T/R-ratio      0.9541
R> Bias                    ±0.00%
R> 90% CI                   0.8568 – 1.0625 (pass)
R> Estimated CV (within)    0.2393

Significant period effect, unbiased T/R-ratio. All is good since the carryover means out.

## Unequal Case 1

Positive unequal carryover ($$\small{\lambda_\textrm{R}=+0.05,\,\lambda_\textrm{T}=+0.2}$$).

sim.effects(theta0 = 0.95, CV = 0.25, n = 28,
per.effect = +0.02,
carryover = c(+0.05, +0.20))
R> logPK ~ sequence + subject %in% sequence + period + treatment
R>
R> Type I ANOVA Table: Crossover
R>                  Df Sum Sq Mean Sq F value   Pr(>F)
R> sequence          1  0.321  0.3209   5.764   0.0238 *
R> period            1  0.428  0.4285   7.696   0.0101 *
R> treatment         1  0.208  0.2082   3.740   0.0641 .
R> sequence:subject 26  8.814  0.3390   6.089 8.52e-06 ***
R> Residuals        26  1.448  0.0557
R>
R> Type III ANOVA Table: Crossover
R> sequence vs sequence:subject
R>                  Df Sum Sq Mean Sq F value   Pr(>F)
R> sequence          1  0.321  0.3209   0.947   0.3395
R> period            1  0.428  0.4285   7.696   0.0101 *
R> treatment         1  0.208  0.2082   3.740   0.0641 .
R> sequence:subject 26  8.814  0.3390   6.089 8.52e-06 ***
R> Residuals        26  1.448  0.0557
R>
R> theta0                   0.9500
R> Period effect           +0.02
R> Carryover  (R->T, T->R) +0.05, +0.2
R> Period means     (1, 2)  1.0101, 1.2032
R> Sequence means (RT, TR)  1.0221, 1.1892
R> Treatment means  (R, T)  1.1718, 1.0372
R> Simulated T/R-ratio      0.9541
R>
R> Estimated T/R-ratio      0.8852
R> Bias                    -7.23%
R> 90% CI                   0.7949 – 0.9857 (fail)
R> Estimated CV (within)    0.2393

The sequence effect is not significant at the 0.1 level but we get a negatively biased T/R-ratio. Great – the study fails although T and R are equivalent.
Note that the period effect is still significant but to great extent ‘masked’ (0.0101 instead of 0.000514).

## Unequal Case 2

Positive, extremely unequal carryover ($$\small{\lambda_\textrm{R}=+0.06,\,\lambda_\textrm{T}=+0.45}$$).

sim.effects(theta0 = 0.95, CV = 0.25, n = 28,
per.effect = +0.02,
carryover = c(+0.05, +0.45))
R> logPK ~ sequence + subject %in% sequence + period + treatment
R>
R> Type I ANOVA Table: Crossover
R>                  Df Sum Sq Mean Sq F value   Pr(>F)
R> sequence          1  1.070  1.0696  19.210 0.000171 ***
R> period            1  1.260  1.2596  22.623 6.39e-05 ***
R> treatment         1  0.854  0.8538  15.335 0.000582 ***
R> sequence:subject 26  8.814  0.3390   6.089 8.52e-06 ***
R> Residuals        26  1.448  0.0557
R>
R> Type III ANOVA Table: Crossover
R> sequence vs sequence:subject
R>                  Df Sum Sq Mean Sq F value   Pr(>F)
R> sequence          1  1.070  1.0696   3.155 0.087400 .
R> period            1  1.260  1.2596  22.623 6.39e-05 ***
R> treatment         1  0.854  0.8538  15.335 0.000582 ***
R> sequence:subject 26  8.814  0.3390   6.089 8.52e-06 ***
R> Residuals        26  1.448  0.0557
R>
R> logPK ~ treatment
R> ANOVA Table: Period 1 parallel
R>           Df Sum Sq Mean Sq F value Pr(>F)
R> treatment  1  0.006 0.00607   0.031  0.862
R> Residuals 26  5.100 0.19615
R>
R> theta0                   0.9500
R> Period effect           +0.02
R> Carryover  (R->T, T->R) +0.05, +0.45
R> Period means     (1, 2)  1.0101, 1.3634
R> Sequence means (RT, TR)  1.0221, 1.3475
R> Treatment means  (R, T)  1.3278, 1.0372
R> Simulated T/R-ratio      0.9541
R>
R>     Analysis of both periods
R> Estimated T/R-ratio      0.7812
R> Bias                    -18.13%
R> 90% CI                   0.7015 – 0.8699 (fail)
R> Estimated CV (within)    0.2393
R>
R>     Analysis of first period
R> Estimated T/R-ratio      1.0299
R> Bias                    +7.94%
R> 90% CI                   0.7741 – 1.3702 (fail)
R> Estimated CV (total)     0.4655

The sequence effect is significant at the 0.1 level and we get an extremely negatively biased T/R-ratio.
Analysis of the first period as a parallel design gives a positively biased T/R-ratio (even the direction of the difference of T from R changes).

## Unequal Case 3

Unequal carryover, different direction ($$\small{\lambda_\textrm{R}=+0.075,\,\lambda_\textrm{T}=-0.075}$$).

sim.effects(theta0 = 0.95, CV = 0.25, n = 28,
per.effect = +0.02,
carryover = c(+0.075, -0.075))
R> logPK ~ sequence + subject %in% sequence + period + treatment
R>
R> Type I ANOVA Table: Crossover
R>                  Df Sum Sq Mean Sq F value   Pr(>F)
R> sequence          1  0.000  0.0000   0.000    0.982
R> period            1  0.035  0.0349   0.627    0.436
R> treatment         1  0.011  0.0110   0.198    0.660
R> sequence:subject 26  8.814  0.3390   6.089 8.52e-06 ***
R> Residuals        26  1.448  0.0557
R>
R> Type III ANOVA Table: Crossover
R> sequence vs sequence:subject
R>                  Df Sum Sq Mean Sq F value   Pr(>F)
R> sequence          1  0.000  0.0000   0.000    0.993
R> period            1  0.035  0.0349   0.627    0.436
R> treatment         1  0.011  0.0110   0.198    0.660
R> sequence:subject 26  8.814  0.3390   6.089 8.52e-06 ***
R> Residuals        26  1.448  0.0557
R>
R> theta0                   0.9500
R> Period effect           +0.02
R> Carryover  (R->T, T->R) +0.075, -0.075
R> Period means     (1, 2)  1.0101, 1.0619
R> Sequence means (RT, TR)  1.0349, 1.0364
R> Treatment means  (R, T)  1.0212, 1.0503
R> Simulated T/R-ratio      0.9541
R>
R> Estimated T/R-ratio      1.0284
R> Bias                    +7.79%
R> 90% CI                   0.9236 – 1.1452 (pass)
R> Estimated CV (within)    0.2393

The sequence effect is not significant (actually close to 1) but the T/R-ratio extremely positively biased.
The period effect is not significant any more (0.436 instead of 0.000514).

This example demonstrates the absurdity of testing a sequence effect. The ANOVA looks completely ‘normal’, none of the effects is significant. Study passes, everybody happy, questions from an assessor unlikely.

Nevertheless, the estimated T/R-ratio is completely wrong. One gets the false impressions that T ist slightly more bioavailable than R, whereas the truth is the other way ’round. The true but unknown carryovers ‘pulled’ the responses of T up and ‘pushed’ the ones of R down.
Here we know it. In reality we don’t have a clue.

To summarize the examples (remember that the T/R-ratio, CV, n, and the period effect were identical in all, as was the resulting T/R-ratio with 0.9541 and the CV with 0.2393).

True carryover $$\small{p(F)}$$ PE Bias 90% CI BE
$$\small{\lambda_\textrm{R}=\lambda_\textrm{T}=+0.20}$$ 0.6276 0.9541 ±0.00% 0.8568–1.0625 pass
$$\small{\lambda_\textrm{R}=+0.050,\,\lambda_\textrm{T}=+0.20}$$ 0.3395 0.8852 –7.23% 0.7949–0.9857 fail
$$\small{\lambda_\textrm{R}=+0.060,\,\lambda_\textrm{T}=+0.45}$$ 0.0874 0.7812 –18.13% 0.7015–0.8699 fail
$$\small{\lambda_\textrm{R}=+0.075,\,\lambda_\textrm{T}=-0.75}$$ 0.9930 1.0284 +7.79% 0.9236–1.1452 pass

Homework

For an assumed T/R-ratio 0.94 and CV 0.25 studies were powered with ≥0.80 (n 32 achieves 0.818). How to estimate the sample size is shown in another article. No period effect.

sim.effects(theta0 = 0.94, CV = 0.25, n = 32)
sim.effects(theta0 = 0.94, CV = 0.25, n = 32,
carryover = c(+0.001, -0.001))

The 1st study ($$\small{\lambda_\textrm{R}=\lambda_\textrm{T}=0,\,p(F)\sim0.452}$$) fails.23
The 2nd with extremly small but unequal carryover ($$\small{\lambda_\textrm{R}=+0.001,\,\lambda_\textrm{T}=-0.001,\,p(F)\sim0.457}$$) passes.

Regulators, do you get the point? Aren’t you scared?

[The] impatience with ambiguity can be criticized in the phrase:
Absence of evidence is not evidence of absence.

— Carl Sagan24

# Conclusion

Apart from the proof given by Freeman already in 1989, the simulations show that even if the sequence effect is not significant, in the case of a true unequal carryover the estimated T/R-ratio will always be biased.

On the other hand – like in every statistical test – a significant sequence effect may be pure chance, i.e., does not prove that a true carryover exists and can be a false positive as well.

Coming back to the questions asked in the introduction. To repeat:

What is a significant sequence (carryover) effect … ?

It is a natural property of a test at level $$\small{\alpha}$$ that it will be significant even if the underlying condition (here a true unequal carryover) does not exist.
Furthermore, in a nonreplicative design too many effects are confounded in order to obtain an unequivocal answer.

… and do we have to care about one?

Not in particular. If the study was properly planned (sufficiently long washout periods) and only a limited number of pre-dose concentrations in higher periods were measured (ones which are > 5% of Cmax can be excluded from the comparison), there is nothing to worry about.
If the number is of excluded subjects is large, at least you learned to design the next study better.

Hoping that assessing the sequence effect can give information whether a study was properly designed and performed is futile.

The topic of suitable washouts from a PK/PD perspective will be covered in another article.

previous section ↩︎

# Open Question

What to do if we have to deal with an endogenous compound?

• Carryover considered »not unlikely« by the FDA.
• The EMA states: »In bioequivalence studies with endogenous substances, it cannot be directly assessed whether carryover has occurred, so extra care should be taken to ensure that the washout period is of an adequate duration.«

I see. However, I fail to understand why baseline-corrected pre-dose concentrations could not be used like the ones of other drugs. Is that meant by ‘indirectly assessed’?

previous section ↩︎

Helmut Schütz 2021
1st version March 19, 2021.
Rendered 2021-05-11 17:02:15 CEST by rmarkdown in 0.79 seconds.

Footnotes and References

1. Hauschker D, Steinijans V, Pigeot I. Bioequivalence Studies in Drug Development. Methods and Applications. Chichester: Wiley; 2007.↩︎

2. No monozygotic twins or triplets in the study, if you don’t mind.↩︎

3. Chow S-C, Liu J-p. Design and Analysis of Bioavailability and Bioequivalence Studies. Boca Raton: CRC Press; 3rd ed. 2009. Chapter 3.2.↩︎

4. Schütz H, Labes D, Fuglsang A. Reference Datasets for 2-Treatment, 2-Sequence, 2-Period Bioequivalence Studies. AAPS J. 2014; 26(6): 1292–7. doi:10.1208/s12248-014-9661-0.  Free Full Text.↩︎

5. Dallal GE. The Little Handbook of Statistical Practice. Note 95 Crossover Studies. eBook 2012. online 2000.↩︎

6. FDA, Center for Drug Evaluation and Research. Guidance for Industry. Statistical Approaches to Establish Bioequivalence. Section VII.B. Carryover Effects. January 2001. download.↩︎

7. Grizzle JE. The two-period change over design and its use in clinical trials. Biometrics. 1965; 21(2): 467–80. doi:10.2307/2528104.↩︎

8. Since this is a between-subject comparison, it has low power. IMHO, the level 0.10 is arbitrary (I failed to find any justification).↩︎

9. Bioavailability and Bioequivalence of Pharmaceutical Formulations. In: Peace KE, editor. Biopharmaceutical Statistics for Drug Development. New York: Marcel Dekker; 1988. p 336–7.↩︎

10. Freeman P. The performance of the two-stage analysis of two-treatment, two-period cross-over trials. Stat Med. 1989; 8(12): 1421–32. doi:10.1002/sim.4780081202.↩︎

11. FDA. Center for Drug Evaluation and Research. Guidance for Industry. Statistical Procedures for BE Studies using a Standard Two-Treatment Crossover Design. Jul 1992. .↩︎

12. D’Angelo G, Potvin D, Turgeon J. Carry-Over Effects in Bioequivalence Studies. J Biopharm Stat. 2001; 11(1–2): 35–43. doi:10.1081/BIP-100104196.↩︎

13. Senn S. Cross-over Trials in Clinical Research. Chichester: Wiley; 2nd ed. 2002. p 35–88.↩︎

14. Senn S, D’Angelo G, Potvin D. Carry-over in cross-over trials in bioequivalence: theoretical concerns and empirical evidence. Pharm Stat. 2004; 3: 133-142. doi:10.1002/pst.111.↩︎

15. Jones B, Kenward MG. Design and Analysis of Cross-Over Trials. Boca Raton: CRC Press; 3rd ed. 2015. Chapter 2.7.↩︎

16. EMA, CHMP. Guideline on the Investigation of Bioequivalence. London, 20 January 2010. CPMP/EWP/QWP/1401/98 Rev. 1/Corr **.↩︎

17. FDA. Center for Drug Evaluation and Research. Draft Guidance for Industry. Bioequivalence Studies with Pharmacokinetic Endpoints for Drugs Submitted Under an ANDA. December 2013. download.↩︎

18. Health Canada. Guidance Document: Conduct and Analysis of Comparative Bioavailability Studies. Section 2.7.4.3 Testing of Fixed Effects. Ottawa, 2018/06/08. online.↩︎

19. The CONSORT Group. The CONSORT 2010 Statement.↩︎

20. Dwan K, Li T, Altman DG, Elbourne D. CONSORT 2010 statement: extension to randomised crossover trials. BMJ. 2019; 366: l4378. doi:10.1136/bmj.l4378.↩︎

21. Why so few? Well, I deal a lot with higher-order crossovers (pilot studies, dose proportionality), replicate designs, and studies of endogenous compounds. More than 500 2×2×2 crossovers of my CRO rest in peace on DDS backups in a proprietary HP-UX binary format. I neither have the hard- nor the software any more to retrieve the data.↩︎

22. R.A. Fisher. »To call the statistician after the experiment is done may be no more than asking him to perform a postmortem examination: he may be able to say what the experiment died of.«↩︎

23. Why on earth? These are simulations. With this seed of the PNRG it was just bad luck. If you would simulate a lot of studies with different seeds, ~20% would fail. That’s the Type II Error (producer’s risk).↩︎

24. Sagan C. The Demon-Haunted World. Science as a Candle in the Dark. 1995. Chapter 12 »The Fine Art of Baloney Detection«↩︎