Consider allowing JavaScript. Otherwise, you have to be proficient in reading since formulas will not be rendered. Furthermore, the table of contents in the left column for navigation will not be available and codefolding not supported. Sorry for the inconvenience.
Examples in this article were generated with Base 4.0.5.
See also a collection of other articles.
What is a significant sequence (carryover) effect and do we have to care about one?
Sometimes regulatory assessors ask for the ‘justification’ of a significant sequence effect.
I will try to clarify why such a justification is not possible and – a bit provocative – asking for one demonstrates a lack of understanding the underlying statistical concepts.
All examples deal with the 2×2×2 Crossover (RTTR) but are applicable to any kind of Crossover (HigherOrder, Replicate Designs) as well. A basic knowledge of R does not hurt.
As the most simple case the 2×2×2 design (\(\small{\textrm{RT}\textrm{TR}}\)) including a term for carryover is considered.^{1}
Let sequences and periods be indexed by \(\small{i}\) and \(\small{k}\) (\(\small{i,k=1,2}\)) and \(\small{n_i}\) subjects are randomized to sequence \(\small{i}\). Let \(\small{Y_{ijk}}\) be the \(\small{\log_{e}}\)transformed PKresponse of the \(\small{j}\)th subject in the \(\small{i}\)th sequence at the \(\small{k}\)th period. Then \[Y_{ijk}=\mu_h+s_{ij}+\pi_k+\lambda_c+e_{ijk},\tag{1}\] where \(\small{\mu_h}\) is effect of treatment \(\small{h}\), where \(\small{h=\textrm{R}}\) if \(\small{i=k}\) and \(\small{h=\textrm{T}}\) if \(\small{i\neq k}\),
\(\small{s_{ij}}\) is the fixed effect of the \(\small{j}\)th subject in the \(\small{i}\)th sequence,
\(\small{\pi_{j}}\) is the fixed effect of the \(\small{k}\)th period,
\(\small{\lambda_{c}}\) is the carryover effect of the corresponding formulation from period 1 to period 2, where
\(\small{c=\textrm{R}}\) if \(\small{i=1,k=2}\),
\(\small{c=\textrm{T}}\) if \(\small{i=2,k=2}\),
\(\small{\lambda_{c}=0}\) if \(\small{i=1,2,k=1}\),
\(\small{e_{ijk}}\) is the random error in observing \(\small{Y_{ijk}}\) (of the \(\small{j}\)th subject in the \(\small{k}\)th period and \(\small{i}\)th sequence).
The subject effects \(\small{s_{ij}}\) are independently^{2} normally distributed with expected mean \(\small{0}\) and betweensubject variance \(\small{\sigma_{\textrm{b}}^{2}}\). The random errors \(\small{e_{ijk}}\) are independent and normally distributed with expected mean \(\small{0}\) and variances \(\small{\sigma_{\textrm{wR}}^{2}}\), \(\small{\sigma_{\textrm{wT}}^{2}}\) for the reference and test treatment. The treatment variances are given by \(\small{\sigma_{\textrm{R}}^{2}=\sigma_{\textrm{b}}^{2}+\sigma_{\textrm{wR}}^{2}}\) and \(\small{\sigma_{\textrm{T}}^{2}=\sigma_{\textrm{b}}^{2}+\sigma_{\textrm{wT}}^{2}}\). Note that these components cannot be separately estimated in a nonreplicative design.
Therefore, the layout of the RTTR design is:
Sequence  Period 1  Period 2 

1 (\(\small{\textrm{RT}}\)) 
\(\small{Y_{1j1}=\mu_\textrm{R}+s_{1j}+\pi_1+e_{1j1}}\) \(\small{j=1,\ldots,n_1}\) 
\(\small{Y_{1j2}=\mu_\textrm{T}+s_{1j}+\pi_2+\lambda_\textrm{R}+e_{1j2}}\) \(\small{j=1,\ldots,n_1}\) 
2 (\(\small{\textrm{TR}}\)) 
\(\small{Y_{2j1}=\mu_\textrm{T}+s_{2j}+\pi_1+e_{2j1}}\) \(\small{j=1,\ldots,n_2}\) 
\(\small{Y_{2j2}=\mu_\textrm{R}+s_{2j}+\pi_2+\lambda_\textrm{T}+e_{2j2}}\) \(\small{j=1,\ldots,n_2}\) 
The expected population means and variances are given by:
Sequence  Period 1  Period 2 

1 (\(\small{\textrm{RT}}\)) 
\(\small{E(Y_{1j1})=\mu_\textrm{R}+\pi_1}\) \(\small{Var(Y_{1j1})=\sigma_{\textrm{R}}^{2}=\sigma_{\textrm{b}}^{2}+\sigma_{\textrm{wR}}^{2}}\) \(\small{j=1,\ldots,n_1}\) 
\(\small{E(Y_{1j2})=\mu_\textrm{T}+\pi_2+\lambda_\textrm{R}}\) \(\small{Var(Y_{1j2})=\sigma_{\textrm{T}}^{2}=\sigma_{\textrm{b}}^{2}+\sigma_{\textrm{wT}}^{2}}\) \(\small{j=1,\ldots,n_1}\) 
2 (\(\small{\textrm{TR}}\)) 
\(\small{E(Y_{2j1})=\mu_\textrm{T}+\pi_1}\) \(\small{Var(Y_{2j1})=\sigma_{\textrm{T}}^{2}=\sigma_{\textrm{b}}^{2}+\sigma_{\textrm{wT}}^{2}}\) \(\small{j=1,\ldots,n_2}\) 
\(\small{E(Y_{2j2})=\mu_\textrm{R}+\pi_2+\lambda_\textrm{T}}\) \(\small{Var(Y_{2j2})=\sigma_{\textrm{R}}^{2}=\sigma_{\textrm{b}}^{2}+\sigma_{\textrm{wR}}^{2}}\) \(\small{j=1,\ldots,n_2}\) 
Assuming equal carryover (\(\small{\lambda_\textrm{R}=\lambda_\textrm{T}}\)), the term \(\small{\lambda_c}\) can be dropped from the model.
Most agencies (like the EMA) require an ANOVA of \(\small{\log_{e}}\) transformed responses, i.e., a linear model where all effects are fixed. In R:
lm(log(PK) ~ sequence + subject%in%sequence +
model < period + treatment, data = data)
In SAS:
proc glm data = data;
class subject period sequence treatment;
model logPK = sequence subject(sequence)
period treatment;
run;
Note that in bioequivalence subjects generally are uniquely coded. If subject 1 is randomized to sequence 1, there is not ‘another’ subject 1 randomized to sequence 2. Hence, the term subject(sequence) stated in all guidelines is a bogus one. Replacing it with the simple term subject gives exactly [sic] the same point estimate and residual variance.
It avoids the many lines in the output denoted with .
in SAS and not estimable
in Phoenix/WinNonlin.
Heresy: You could remove sequence from the models entirely.
lm(log(PK) ~ subject + period + treatment,
model <data = data)
Model  PE  MSE  90% CI 

sequence + subject(sequence)  0.954135  0.0556768  0.856834–1.062490 
sequence + subject 
0.954135  0.0556768  0.856834–1.062490 

0.954135  0.0556768  0.856834–1.062490 
Quod erat demonstrandum. So much about overparameterized models.
Other agencies (the FDA, Health Canada) require a mixedeffects model, where \(\small{s_{ij}}\) is a random effect.
In SAS:
proc mixed data = data;
class subject period sequence treatment;
model logY = sequence period treatment;
random subject(sequence);
run;
Unfortunately due to different ‘design philosophies’ the SAScode cannot be translated to R.
Note that the MSE of sequence has to be tested against the MSE of subject(sequence) by \((2)\)^{3} – or sequence in the simple model^{4} – and not against the residual MSE by \((3)\) – which is a withinsubject term – in R’s default ANOVA (Type I), i.e., \[\small{F=\frac{MSE_{\,\textrm{sequence}}}{MSE_{\,\textrm{subject(sequence))}}}}\tag{2}\] \[\small{F=\frac{MSE_{\,\textrm{sequence}}}{MSE_{\,\textrm{residual}}}}\tag{3}\]
Its \(\small{p}\)value is calculated by \[\small{p(F)=F_{\,\nu_1,\,\nu_2}}\tag{4}\] where \(\small{\nu_1}\) are the sequences’ degrees of freedom (i.e., the number of sequences minus one) and \(\small{\nu_2}\) are the subjects’ degrees of freedom (in a balanced 2×2×2 design \(\small{n2}\)).
The sequence effect is confounded with^{5}
A statistically significant sequence effect could indicate that there is
Only the last potential cause can be ruled out during monitoring or in an audit/inspection.
A statistical method to ‘correct’ for a true sequence effect does not exist – it can only be avoided by design.^{6}
A ‘Twostage analysis’ was proposed.^{7}
if (anova(model)["sequence", "Pr(>F)"] < 0.1) {
lm(logPK ~ treatment,
mod.par <$period == 1, ])
data[data }
One of the pioneers of bioequivalence noted already in 1988:
“Note that the carryover effect is, essentially, the sequence effect, which can be tested against the sum of squares within sequence. If this carryover effect exists, then it confounds the test on formulations. […] My own experience with a large number of comparative bioavailability trials has led me to believe that significant carryover effects (at the 0.05 level) tend to occur in about 5% of the trials; in other words, I believe that carryover effects do not normally exist.
— Wilfred J. Westlake^{9}
In 1989 it was analytically demonstrated that the ‘Twostage analysis’ is statistically flawed and should be avoided because it leads not only to biased estimates but – as any pretest – inflates the Type I Error.^{10}
The FDA rightly stated about testing at \(\small{\alpha=0.1}\) in 1992:^{11}
“Even if there were no true sequence effect, no unequal residual effects, and no periodbytreatment statistical interaction, approximately ten out of one hundred standard twotreatment crossover studies would be likely to show an apparent sequence effect, if the testing is carried out at the ten percent level of significance.
If the ANOVA test for the presence of a sequence effect result in statistical significance, the actual cause cannot be determine from the data alone.
This theoretical consideration was confirmed in a large metastudy of wellcontrolled 2 and 3treatment crossover trials.^{12} As expected, a significant sequence effect was observed at approximately the level of the test and hence, was considered a statistical artifact.
Parts of Stephen Senn’s textbook^{13} can be understood as an essay against ‘adjusting’ for carryover. My personal interpretation is that – if conditions stated above hold – even testing for a sequence effect should be abandoned. In a later publication we find:^{14}
“Testing for carryover in bioequivalence studies […] is not recommended and, moreover, can be harmful. It seems that whenever carryover is ‘detected’ under such conditions, it is a false positive and researchers will be led to use an inferior estimate, abandoning a superior one.
In the same sense:
“Our advice therefore is to avoid having a test for a carryover difference by doing all that is possible to remove the possibility that such a difference will exist. This requires using washout periods of adequate length between the treatment periods.
— Byron Jones & Michael G. Kenward^{15}
It’s laudable that the EMA^{16} stated in 2010:
“A test for carryover is not considered relevant and no decisions regarding the analysis (e.g. analysis of the first period only) should be made on the basis of such a test. The potential for carryover can be directly addressed by examination of the pretreatment plasma concentrations in period 2 (and beyond if applicable).
Nothing is mentioned by the FDA.^{17}
Alas, Health Canada states in the current guidance:^{18}
“A summary of the testing of sequence, period and formulation effects should be presented. Explanations for significant effects should be given.
Oh dear, period effects mean out. For the lacking relevance of significant formulation effects see another article.
According to the Consolidated Standards of Reporting Trials (CONSORT) Statement:^{19}
“In particular, given that a carry over effect can neither be identified with sufficient power, nor can adjustment be made for such an effect in the 2×2 crossover design, the assumption needs to be made that any carry over effects are negligible and some justification presented for this. The description of the design should make clear how many interventions were tested, through how many periods, including information on the length of the treatment, run in, and washout periods (if any).
— CONSORT 2010 statement: extension to randomised crossover trials^{20}
From a consultant’s diary.^{21} I see a bit more than 10%…
Perhaps I’m a victim of selection bias because quite often I get corpses on my desk to perform a – rarely useful – autopsy.^{22} Assessors regularly ask about a significant sequence effect…
Here we face a situation where we need simulations. Except in a metastudy, retrospectively assessing studies is of no value because we neither know whether a true carryover was present and, if yes, its extent.
The simulations are based on \(\small{\log_{e}}\) normal distributed data, true T/Rratio 0.95, true CV 0.25, n 28 (balanced sequences), and a small period effect \(\small{\pi_2=+0.02}\) (responses in the second period higher than in the first).
Note how the sequence effect is tested. In the first example below we get an \(\small{F}\)value of \(\small{0.0817/0.339\approx0.241}\) by \((2)\) instead of \(\small{0.0817/0.0557\approx1.468}\) by \((3)\). Only the former is correct \(\small{(p(F)\approx0.628}\) instead of \(\small{p(F)\approx0.237)}\).
As expected, we see a significant subject effect because CV_{inter} > CV_{intra} and subjects are different indeed. If you see in a particular study a subject effect which is not highly significant, that’s a suspicious case.
We need a supportive function for the simulations. Cave: 169 LOC.
function(alpha = 0.05, theta0 = 0.95,
sim.effects <CV = 0.25, CVb, n = 28L,
per.effect = 0,
carryover = c(0, 0)) {
# carryover: first element R > T, second element T > R
set.seed(123456789)
if (missing(CVb)) CVb < CV * 1.5 # arbitrary
sqrt(log(CV^2 + 1))
sd < sqrt(log(CVb^2 + 1))
sd.b < 1:n
subj <# within subjects
rnorm(n = n, mean = log(theta0), sd = sd)
T < rnorm(n = n, mean = 0, sd = sd)
R <# between subjects
rnorm(n = n, mean = 0, sd = sd.b)
TR < T + TR
T < R + TR
R < exp(mean(T)  mean(R))
TR.sim < data.frame(subject = rep(subj, each = 2),
data <period = 1:2L,
sequence = c(rep("RT", n),
rep("TR", n)),
treatment = c(rep(c("R", "T"), n/2),
rep(c("T", "R"), n/2)),
logPK = NA)
subj.R < 0L # subject counters
subj.T <for (i in 1:nrow(data)) { # clumsy but transparent
if (data$treatment[i] == "T") {
subj.T + 1L
subj.T <if (data$period[i] == 1L) {
$logPK[i] < T[subj.T]
dataelse {
} $logPK[i] < T[subj.T] + per.effect + carryover[1]
data
}else {
} subj.R + 1L
subj.R <if (data$period[i] == 1L) {
$logPK[i] < R[subj.R]
dataelse {
} $logPK[i] < R[subj.T] + per.effect + carryover[2]
data
}
}
} exp(c(mean(data$logPK[data$period == 1]),
per.mean <mean(data$logPK[data$period == 2])))
exp(c(mean(data$logPK[data$sequence == "RT"]),
seq.mean <mean(data$logPK[data$sequence == "TR"])))
exp(c(mean(data$logPK[data$treatment == "R"]),
trt.mean <mean(data$logPK[data$treatment == "T"])))
c("subject", "period", "sequence", "treatment")
cs < lapply(data[cs], factor)
data[cs] < paste("\nType I ANOVA Table: Crossover")
heading.typeI < c(paste("\nType III ANOVA Table: Crossover"),
heading.typeIII <"sequence vs")
lm(logPK ~ sequence + subject%in%sequence +
model < period + treatment,
data = data)
exp(coef(model)[["treatmentT"]])
TR.est < as.numeric(exp(confint(model, "treatmentT",
CI <level = 1  2 * alpha)))
toString(model$call)
m.form < substr(m.form, 5, nchar(m.form)6)
m.form < typeI < anova(model)
typeIII <attr(typeI, "heading")[1] < m.form
attr(typeI, "heading")[2] < heading.typeI
if ("sequence:subject" %in% rownames(typeIII)) { # nested
typeIII["sequence:subject", "Mean Sq"]
MSdenom < typeIII["sequence:subject", "Df"]
df2 <else { # simple
} typeIII["subject", "Mean Sq"]
MSdenom < typeIII["subject", "Df"]
df2 <
} typeIII["sequence", "Mean Sq"] / MSdenom
fvalue < typeIII["sequence", "Df"]
df1 <"sequence", 4] < fvalue
typeIII["sequence", 5] < pf(fvalue, df1, df2,
typeIII[lower.tail = FALSE)
attr(typeIII, "heading")[1] < heading.typeIII[1]
if ("sequence:subject" %in% rownames(typeIII)) {
attr(typeIII, "heading")[2] < paste(heading.typeIII[2],
"sequence:subject")
else {
} attr(typeIII, "heading")[2] < paste(heading.typeIII[2],
"subject")
} sqrt(exp(typeI["Residuals", "Mean Sq"])1)
CV.est <print(typeI, digits = 4, signif.legend = FALSE)
print(typeIII, digits = 4, signif.legend = FALSE)
if (typeIII["sequence", "Pr(>F)"] < 0.1) { # ‘Twostage analysis’
lm(logPK ~ treatment, data[data$period == 1, ])
mod.par < exp(coef(mod.par)[["treatmentT"]])
TR.par.est < as.numeric(exp(confint(mod.par, "treatmentT",
CI.par <level = 1  2 * alpha)))
anova(mod.par)
aovPar < toString(mod.par$call)
m.form < substr(m.form, 5, nchar(m.form)26)
m.form <attr(aovPar, "heading")[1] < paste0("\n", m.form)
attr(aovPar, "heading")[2] < "ANOVA Table: Period 1 parallel"
sqrt(exp(aovPar["Residuals", "Mean Sq"])1)
CV.par.est <print(aovPar, digits = 4, signif.legend = FALSE)
} paste("\ntheta0 ",
txt <sprintf("%.4f", theta0),
"\nPeriod effect ",
sprintf("%+g", per.effect),
"\nCarryover (R>T, T>R)",
paste(sprintf("%+g", carryover),
collapse = ", "),
"\nPeriod means (1, 2) ",
paste(sprintf("%.4f", per.mean),
collapse = ", "),
"\nSequence means (RT, TR) ",
paste(sprintf("%.4f", seq.mean),
collapse = ", "),
"\nTreatment means (R, T) ",
paste(sprintf("%.4f", trt.mean),
collapse = ", "),
"\nSimulated T/Rratio ",
sprintf("%.4f", TR.sim))
if (typeIII["sequence", "Pr(>F)"] < 0.1) {
paste(txt, "\n\n Analysis of both periods")
txt <else {
} paste(txt, "\n")
txt <
} paste(txt, "\nEstimated T/Rratio ",
txt <sprintf("%.4f", TR.est),
"\nBias ")
if (sqrt(.Machine$double.eps) >= abs(TR.est  TR.sim)) {
paste(txt, "\u00B10.00%")
txt <else {
} paste(txt, sprintf("%+.2f%%",
txt <100*(TR.estTR.sim)/TR.sim))
} paste(txt, sprintf("%s%.f%% %s",
txt <"\n", 100*(12*alpha),
"CI "),
paste(sprintf("%.4f", CI),
collapse = " \u2013 "))
if (round(CI[1], 4) >= 0.80 &
round(CI[2], 4) <= 1.25) {
paste(txt, "(pass)")
txt <else {
} paste(txt, "(fail)")
txt <
} paste(txt, "\nEstimated CV (within) ",
txt <sprintf("%.4f", CV.est))
if (typeIII["sequence", "Pr(>F)"] < 0.1) {
paste(txt, "\n\n Analysis of first period",
txt <"\nEstimated T/Rratio ",
sprintf("%.4f", TR.par.est),
"\nBias ",
sprintf("%+.2f%%",
100*(TR.par.estTR.sim)/TR.sim),
paste0("\n",
sprintf("%.f%% CI ",
100*(12*alpha))),
paste(sprintf("%.4f", CI.par),
collapse = " \u2013 "))
if (round(CI.par[1], 4) >= 0.80 &
round(CI.par[2], 4) <= 1.25) {
paste(txt, "(pass)")
txt <else {
} paste(txt, "(fail)")
txt <
} paste(txt, "\nEstimated CV (total) ",
txt <sprintf("%.4f", CV.par.est))
}cat(txt, "\n")
}
ANOVA tables’ significance codes
. <0.1 & ≥0.05
* <0.05 & ≥0.01
** <0.01 & ≥0.001
*** <0.001
Large but equal carryover (\(\small{\lambda_\textrm{R}=\lambda_\textrm{T}=+0.2}\)).
sim.effects(theta0 = 0.95, CV = 0.25, n = 28,
per.effect = +0.02,
carryover = c(+0.2, +0.2))
R> logPK ~ sequence + subject %in% sequence + period + treatment
R>
R> Type I ANOVA Table: Crossover
R> Df Sum Sq Mean Sq F value Pr(>F)
R> sequence 1 0.082 0.0817 1.468 0.236597
R> period 1 0.875 0.8746 15.709 0.000514 ***
R> treatment 1 0.031 0.0309 0.554 0.463251
R> sequence:subject 26 8.814 0.3390 6.089 8.52e06 ***
R> Residuals 26 1.448 0.0557
R>
R> Type III ANOVA Table: Crossover
R> sequence vs sequence:subject
R> Df Sum Sq Mean Sq F value Pr(>F)
R> sequence 1 0.082 0.0817 0.241 0.627562
R> period 1 0.875 0.8746 15.709 0.000514 ***
R> treatment 1 0.031 0.0309 0.554 0.463251
R> sequence:subject 26 8.814 0.3390 6.089 8.52e06 ***
R> Residuals 26 1.448 0.0557
R>
R> theta0 0.9500
R> Period effect +0.02
R> Carryover (R>T, T>R) +0.2, +0.2
R> Period means (1, 2) 1.0101, 1.2970
R> Sequence means (RT, TR) 1.1017, 1.1892
R> Treatment means (R, T) 1.1718, 1.1180
R> Simulated T/Rratio 0.9541
R>
R> Estimated T/Rratio 0.9541
R> Bias ±0.00%
R> 90% CI 0.8568 – 1.0625 (pass)
R> Estimated CV (within) 0.2393
Significant period effect, unbiased T/Rratio. All is good since the carryover means out.
Positive unequal carryover (\(\small{\lambda_\textrm{R}=+0.05,\,\lambda_\textrm{T}=+0.2}\)).
sim.effects(theta0 = 0.95, CV = 0.25, n = 28,
per.effect = +0.02,
carryover = c(+0.05, +0.20))
R> logPK ~ sequence + subject %in% sequence + period + treatment
R>
R> Type I ANOVA Table: Crossover
R> Df Sum Sq Mean Sq F value Pr(>F)
R> sequence 1 0.321 0.3209 5.764 0.0238 *
R> period 1 0.428 0.4285 7.696 0.0101 *
R> treatment 1 0.208 0.2082 3.740 0.0641 .
R> sequence:subject 26 8.814 0.3390 6.089 8.52e06 ***
R> Residuals 26 1.448 0.0557
R>
R> Type III ANOVA Table: Crossover
R> sequence vs sequence:subject
R> Df Sum Sq Mean Sq F value Pr(>F)
R> sequence 1 0.321 0.3209 0.947 0.3395
R> period 1 0.428 0.4285 7.696 0.0101 *
R> treatment 1 0.208 0.2082 3.740 0.0641 .
R> sequence:subject 26 8.814 0.3390 6.089 8.52e06 ***
R> Residuals 26 1.448 0.0557
R>
R> theta0 0.9500
R> Period effect +0.02
R> Carryover (R>T, T>R) +0.05, +0.2
R> Period means (1, 2) 1.0101, 1.2032
R> Sequence means (RT, TR) 1.0221, 1.1892
R> Treatment means (R, T) 1.1718, 1.0372
R> Simulated T/Rratio 0.9541
R>
R> Estimated T/Rratio 0.8852
R> Bias 7.23%
R> 90% CI 0.7949 – 0.9857 (fail)
R> Estimated CV (within) 0.2393
The sequence effect is not significant at the 0.1 level but we get a negatively biased T/Rratio. Great – the study fails although T and R are equivalent.
Note that the period effect is still significant but to great extent ‘masked’ (0.0101 instead of 0.000514).
Positive, extremely unequal carryover (\(\small{\lambda_\textrm{R}=+0.06,\,\lambda_\textrm{T}=+0.45}\)).
sim.effects(theta0 = 0.95, CV = 0.25, n = 28,
per.effect = +0.02,
carryover = c(+0.05, +0.45))
R> logPK ~ sequence + subject %in% sequence + period + treatment
R>
R> Type I ANOVA Table: Crossover
R> Df Sum Sq Mean Sq F value Pr(>F)
R> sequence 1 1.070 1.0696 19.210 0.000171 ***
R> period 1 1.260 1.2596 22.623 6.39e05 ***
R> treatment 1 0.854 0.8538 15.335 0.000582 ***
R> sequence:subject 26 8.814 0.3390 6.089 8.52e06 ***
R> Residuals 26 1.448 0.0557
R>
R> Type III ANOVA Table: Crossover
R> sequence vs sequence:subject
R> Df Sum Sq Mean Sq F value Pr(>F)
R> sequence 1 1.070 1.0696 3.155 0.087400 .
R> period 1 1.260 1.2596 22.623 6.39e05 ***
R> treatment 1 0.854 0.8538 15.335 0.000582 ***
R> sequence:subject 26 8.814 0.3390 6.089 8.52e06 ***
R> Residuals 26 1.448 0.0557
R>
R> logPK ~ treatment
R> ANOVA Table: Period 1 parallel
R> Df Sum Sq Mean Sq F value Pr(>F)
R> treatment 1 0.006 0.00607 0.031 0.862
R> Residuals 26 5.100 0.19615
R>
R> theta0 0.9500
R> Period effect +0.02
R> Carryover (R>T, T>R) +0.05, +0.45
R> Period means (1, 2) 1.0101, 1.3634
R> Sequence means (RT, TR) 1.0221, 1.3475
R> Treatment means (R, T) 1.3278, 1.0372
R> Simulated T/Rratio 0.9541
R>
R> Analysis of both periods
R> Estimated T/Rratio 0.7812
R> Bias 18.13%
R> 90% CI 0.7015 – 0.8699 (fail)
R> Estimated CV (within) 0.2393
R>
R> Analysis of first period
R> Estimated T/Rratio 1.0299
R> Bias +7.94%
R> 90% CI 0.7741 – 1.3702 (fail)
R> Estimated CV (total) 0.4655
The sequence effect is significant at the 0.1 level and we get an extremely negatively biased T/Rratio.
Analysis of the first period as a parallel design gives a positively biased T/Rratio (even the direction of the difference of T from R changes).
Unequal carryover, different direction (\(\small{\lambda_\textrm{R}=+0.075,\,\lambda_\textrm{T}=0.075}\)).
sim.effects(theta0 = 0.95, CV = 0.25, n = 28,
per.effect = +0.02,
carryover = c(+0.075, 0.075))
R> logPK ~ sequence + subject %in% sequence + period + treatment
R>
R> Type I ANOVA Table: Crossover
R> Df Sum Sq Mean Sq F value Pr(>F)
R> sequence 1 0.000 0.0000 0.000 0.982
R> period 1 0.035 0.0349 0.627 0.436
R> treatment 1 0.011 0.0110 0.198 0.660
R> sequence:subject 26 8.814 0.3390 6.089 8.52e06 ***
R> Residuals 26 1.448 0.0557
R>
R> Type III ANOVA Table: Crossover
R> sequence vs sequence:subject
R> Df Sum Sq Mean Sq F value Pr(>F)
R> sequence 1 0.000 0.0000 0.000 0.993
R> period 1 0.035 0.0349 0.627 0.436
R> treatment 1 0.011 0.0110 0.198 0.660
R> sequence:subject 26 8.814 0.3390 6.089 8.52e06 ***
R> Residuals 26 1.448 0.0557
R>
R> theta0 0.9500
R> Period effect +0.02
R> Carryover (R>T, T>R) +0.075, 0.075
R> Period means (1, 2) 1.0101, 1.0619
R> Sequence means (RT, TR) 1.0349, 1.0364
R> Treatment means (R, T) 1.0212, 1.0503
R> Simulated T/Rratio 0.9541
R>
R> Estimated T/Rratio 1.0284
R> Bias +7.79%
R> 90% CI 0.9236 – 1.1452 (pass)
R> Estimated CV (within) 0.2393
The sequence effect is not significant (actually close to 1) but the T/Rratio extremely positively biased.
The period effect is not significant any more (0.436 instead of 0.000514).
This example demonstrates the absurdity of testing a sequence effect. The ANOVA looks completely ‘normal’, none of the effects is significant. Study passes, everybody happy, questions from an assessor unlikely.
Nevertheless, the estimated T/Rratio is completely wrong. One gets the false impressions that T ist slightly more bioavailable than R, whereas the truth is the other way ’round. The true but unknown carryovers ‘pulled’ the responses of T up and ‘pushed’ the ones of R down.
Here we know it. In reality we don’t have a clue.
To summarize the examples (remember that the T/Rratio, CV, n, and the period effect were identical in all, as was the resulting T/Rratio with 0.9541 and the CV with 0.2393).
True carryover  \(\small{p(F)}\)  PE  Bias  90% CI  BE 

\(\small{\lambda_\textrm{R}=\lambda_\textrm{T}=+0.20}\)  0.6276  0.9541  ±0.00%  0.8568–1.0625  pass 
\(\small{\lambda_\textrm{R}=+0.050,\,\lambda_\textrm{T}=+0.20}\)  0.3395  0.8852  –7.23%  0.7949–0.9857  fail 
\(\small{\lambda_\textrm{R}=+0.060,\,\lambda_\textrm{T}=+0.45}\)  0.0874  0.7812  –18.13%  0.7015–0.8699  fail 
\(\small{\lambda_\textrm{R}=+0.075,\,\lambda_\textrm{T}=0.75}\)  0.9930  1.0284  +7.79%  0.9236–1.1452  pass 
Homework
For an assumed T/Rratio 0.94 and CV 0.25 studies were powered with ≥0.80 (n 32 achieves 0.818). How to estimate the sample size is shown in another article. No period effect.
sim.effects(theta0 = 0.94, CV = 0.25, n = 32)
sim.effects(theta0 = 0.94, CV = 0.25, n = 32,
carryover = c(+0.001, 0.001))
The 1^{st} study (\(\small{\lambda_\textrm{R}=\lambda_\textrm{T}=0,\,p(F)\sim0.452}\)) fails.^{23}
The 2^{nd} with extremly small but unequal carryover (\(\small{\lambda_\textrm{R}=+0.001,\,\lambda_\textrm{T}=0.001,\,p(F)\sim0.457}\)) passes.
Regulators, do you get the point? Aren’t you scared?
“[The] impatience with ambiguity can be criticized in the phrase:
Absence of evidence is not evidence of absence.
— Carl Sagan^{24}
Apart from the proof given by Freeman already in 1989, the simulations show that even if the sequence effect is not significant, in the case of a true unequal carryover the estimated T/Rratio will always be biased.
On the other hand – like in every statistical test – a significant sequence effect may be pure chance, i.e., does not prove that a true carryover exists and can be a false positive as well.
Coming back to the questions asked in the introduction. To repeat:
What is a significant sequence (carryover) effect … ?
It is a natural property of a test at level \(\small{\alpha}\) that it will be significant even if the underlying condition (here a true unequal carryover) does not exist.
Furthermore, in a nonreplicative design too many effects are confounded in order to obtain an unequivocal answer.
… and do we have to care about one?
Not in particular. If the study was properly planned (sufficiently long washout periods) and only a limited number of predose concentrations in higher periods were measured (ones which are > 5% of C_{max} can be excluded from the comparison), there is nothing to worry about.
If the number is of excluded subjects is large, at least you learned to design the next study better.
Hoping that assessing the sequence effect can give information whether a study was properly designed and performed is futile.
The topic of suitable washouts from a PK/PD perspective will be covered in another article.
What to do if we have to deal with an endogenous compound?
I see. However, I fail to understand why baselinecorrected predose concentrations could not be used like the ones of other drugs. Is that meant by ‘indirectly assessed’?
License
Helmut Schütz 2021
1^{st} version March 19, 2021.
Rendered 20210511 17:02:15 CEST by rmarkdown in 0.79 seconds.
Footnotes and References
Hauschker D, Steinijans V, Pigeot I. Bioequivalence Studies in Drug Development. Methods and Applications. Chichester: Wiley; 2007.↩︎
No monozygotic twins or triplets in the study, if you don’t mind.↩︎
Chow SC, Liu Jp. Design and Analysis of Bioavailability and Bioequivalence Studies. Boca Raton: CRC Press; 3^{rd} ed. 2009. Chapter 3.2.↩︎
Schütz H, Labes D, Fuglsang A. Reference Datasets for 2Treatment, 2Sequence, 2Period Bioequivalence Studies. AAPS J. 2014; 26(6): 1292–7. doi:10.1208/s1224801496610. Free Full Text.↩︎
Dallal GE. The Little Handbook of Statistical Practice. Note 95 Crossover Studies. eBook 2012. online 2000.↩︎
FDA, Center for Drug Evaluation and Research. Guidance for Industry. Statistical Approaches to Establish Bioequivalence. Section VII.B. Carryover Effects. January 2001. download.↩︎
Grizzle JE. The twoperiod change over design and its use in clinical trials. Biometrics. 1965; 21(2): 467–80. doi:10.2307/2528104.↩︎
Since this is a betweensubject comparison, it has low power. IMHO, the level 0.10 is arbitrary (I failed to find any justification).↩︎
Bioavailability and Bioequivalence of Pharmaceutical Formulations. In: Peace KE, editor. Biopharmaceutical Statistics for Drug Development. New York: Marcel Dekker; 1988. p 336–7.↩︎
Freeman P. The performance of the twostage analysis of twotreatment, twoperiod crossover trials. Stat Med. 1989; 8(12): 1421–32. doi:10.1002/sim.4780081202.↩︎
FDA. Center for Drug Evaluation and Research. Guidance for Industry. Statistical Procedures for BE Studies using a Standard TwoTreatment Crossover Design. Jul 1992. .↩︎
D’Angelo G, Potvin D, Turgeon J. CarryOver Effects in Bioequivalence Studies. J Biopharm Stat. 2001; 11(1–2): 35–43. doi:10.1081/BIP100104196.↩︎
Senn S. Crossover Trials in Clinical Research. Chichester: Wiley; 2^{nd} ed. 2002. p 35–88.↩︎
Senn S, D’Angelo G, Potvin D. Carryover in crossover trials in bioequivalence: theoretical concerns and empirical evidence. Pharm Stat. 2004; 3: 133142. doi:10.1002/pst.111.↩︎
Jones B, Kenward MG. Design and Analysis of CrossOver Trials. Boca Raton: CRC Press; 3^{rd} ed. 2015. Chapter 2.7.↩︎
EMA, CHMP. Guideline on the Investigation of Bioequivalence. London, 20 January 2010. CPMP/EWP/QWP/1401/98 Rev. 1/Corr **.↩︎
FDA. Center for Drug Evaluation and Research. Draft Guidance for Industry. Bioequivalence Studies with Pharmacokinetic Endpoints for Drugs Submitted Under an ANDA. December 2013. download.↩︎
Health Canada. Guidance Document: Conduct and Analysis of Comparative Bioavailability Studies. Section 2.7.4.3 Testing of Fixed Effects. Ottawa, 2018/06/08. online.↩︎
The CONSORT Group. The CONSORT 2010 Statement.↩︎
Dwan K, Li T, Altman DG, Elbourne D. CONSORT 2010 statement: extension to randomised crossover trials. BMJ. 2019; 366: l4378. doi:10.1136/bmj.l4378.↩︎
Why so few? Well, I deal a lot with higherorder crossovers (pilot studies, dose proportionality), replicate designs, and studies of endogenous compounds. More than 500 2×2×2 crossovers of my CRO rest in peace on DDS backups in a proprietary HPUX binary format. I neither have the hard nor the software any more to retrieve the data.↩︎
R.A. Fisher. »To call the statistician after the experiment is done may be no more than asking him to perform a postmortem examination: he may be able to say what the experiment died of.«↩︎
Why on earth? These are simulations. With this seed of the PNRG it was just bad luck. If you would simulate a lot of studies with different seeds, ~20% would fail. That’s the Type II Error (producer’s risk).↩︎
Sagan C. The DemonHaunted World. Science as a Candle in the Dark. 1995. Chapter 12 »The Fine Art of Baloney Detection«↩︎