Consider allowing JavaScript. Otherwise, you have to be proficient in reading LaTeX since formulas will not be rendered. Furthermore, the table of contents in the left column for navigation will not be available and code-folding not supported. Sorry for the inconvenience.

The examples in this article were generated with R 4.0.5 by the packages truncnorm,1 coin,2 reshape,3 and lattice.4 A basic knowledge of R is required. Any version of R would likely be sufficient to run the example.

See also a collection of other articles.

  • The right-hand badges give the respective section’s ‘level’.
    
  1. Basics about Noncompartmental Analysis (NCA) – requiring no or only limited statistical expertise.
    
  1. These sections are the most important ones. They are – hopefully – easily comprehensible even for novices.
    
  1. A somewhat higher knowledge of statistics and/or R is required. May be skipped or reserved for a later reading.
    
  1. An advanced knowledge of statistics and/or R is required. Not recommended for beginners in particular.
  • Click to show / hide R code.

Introduction

    

What is an ‘apparent’ difference in tmax?

A good question. Puzzles me for years…

The European Medicines Agency states for immediate release products:

A statistical evaluation of tmax is not required. However, if rapid release is claimed to be clinically relevant and of importance for onset of action or is related to adverse events, there should be no apparent difference in median tmax and its variability between test and reference product.
EMA. 2010.5

Later the EMA states for delayed and multiphasic release formulations:

A formal statistical evaluation of tmax is not required. However, there should be no apparent difference in median tmax and its range between test and reference product.
EMA. 2014.6

top of section ↩︎

Terminology

    

‘Apparent’ is – and for good reasons – not a term contained in the statistical toolbox. Not surprising, since according to the guidelines »a [formal] statistical evaluation of tmax is not required«.

These definitions of ‘apparent’ are given in dictionaries:

Macmillan (British English)

  1. Easy to see or understand.
  2. An apparent quality, feeling, or situation seems to exist although it may not be real.

Meriam-Webster (American English)

  1. Open to view, visible.
  2. Clear or manifest to the understanding.
  3. Appearing as actual to the eye or mind.
  4. Manifest to the senses or mind as real or true on the basis of evidence that may or may not be factually valid.

Beauty Lies in the Eye of the Beholder. Paper doesn’t blush…

Hence, the assessment of formulation differences opens the door for interpretation, which is – from a scientific perspective – extremely unsatisfactory.

top of section ↩︎ previous section ↩︎

History & Status

    

Whereas the underlying distribution of tmax is continuous, due to the sampling schedule the observed distribution is discrete. This calls for a comparison by a robust (i.e., nonparametric) statistical method.

A comparison of tmax was never – and is not – required by the FDA.

In the past in Europe7 8 a comparison by a nonparametric method was recommended. That was – and still is – statistically sound.

For modified release products tmax was not mentioned in the previous Note for Guidance at all.9

A nonparametric method is recommended in other jurisdictions (e.g., Argentina,10 Japan,11 South Africa12).

Alas, the EMA’s vague recommendation of 2010 was incurred in numerous other jurisdictions (e.g., Australia,13 ASEAN states,14 Chile,15 the EEU,16 Egypt,17 members of the Gulf Cooperation Council,18 New Zealand19).

The WHO leaves more scope for interpretation.

Where tmax is considered clinically relevant, median and range of tmax should be compared between test and comparator to exclude numerical differences with clinical importance. A formal statistical comparison is rarely necessary. Generally the sample size is not calculated to have enough statistical power for tmax. However, if tmax is to be subjected to a statistical analysis, this should be based on non-parametric methods and should be applied to untransformed data.
WHO. 2017.20

top of section ↩︎ previous section ↩︎

Problem Statement

    
  • »A [formal] statistical evaluation of tmax is not required.«
    Well. Does that imply that if one is performed anyhow, it is not welcomed?

  • »[…] there should be no apparent difference in median tmax and its variability between test and reference product.«
    Leaving the ambiguous statement about an ‘apparent’ difference aside – what is the ‘variability’ of the median? The median is a statistic (one of many estimators) whose value is a certain number (the estimate).
    If we aggree that tmax follows a discrete distribution, we can only make a statement about the variability of the sample.21 A robust measure is the IQR. Is that meant?

  • »[…] there should be no apparent difference in median tmax and its range between test and reference product.«
    Similarly the median does not have a range, only the data. At least the range is a statistical term we can work with.

top of section ↩︎ previous section ↩︎

Basics

    

The median has a breakdown point of 50%. That means, it would need 50% of the data contaminated by false observations until the median of the sample would change. Hence, it is a robust estimator.
Compare that to the mean, which has a breakdown point of 1. Imagine, we have an infinitely large sample of identical values (\(\small{x_{i=1}=\ldots=x_{i=n}}\)). When we contaminate the data with the single (‼) highest possible value \(\small{\small{\infty}}\), the mean would also be infinite and hence, all other values of the data practically ignored.

The breakdown point of the IQR is not defined. It depends on the sample size, the number of contaminations, and their values compared to the ‘normal’ ones.

On the other hand, the range has – like the mean – a breakdown point of 1. Therefore, it is not a robust statistic. Futhermore, if we compare two data sets whose extremes are contaminated in opposite directions (i.e, the minimum of one and the maximum of the other), both the median and range will be identical.22 Without inspecting the data or performing an EAD (e.g., box plots) the range is not informative and for comparisons practically useless.23

    
  • Imagine a study where all tmax values for three treatments were 1. However, in one case after R it was 2 and in one case after T1 it was 3. Now what?
  • library(coin)
    n     <- 16
    R     <- c(rep(1, n-1), 2)
    sumR  <- summary(R)
    T1    <- c(rep(1, n-1), 3)
    sumT1 <- summary(T1)
    T2    <- c(rep(1, n-1), 1)
    sumT2 <- summary(T2)
    comp1 <- data.frame(tmax = c(R, T1),
                        treatment = factor(rep(c("R", "T1"), c(n, n))))
    comp2 <- data.frame(tmax = c(R, T2),
                        treatment = factor(rep(c("R", "T2"), c(n, n))))
    res   <- data.frame(treatment = c("R", "T1", "T2"),
                        Median = c(sumR[3],
                                   sumT1[3],
                                   sumT2[3]),
                        IQR = c(sumR[5] - sumR[2],
                                sumT1[5] - sumT1[2],
                                sumT2[5] - sumT2[2]),
                        Range = c(sumR[6] - sumR[1],
                                 sumT1[6] - sumT1[1],
                                 sumT2[6] - sumT2[1]))
    wt1   <- wilcox_test(tmax ~ treatment, data = comp1,
                         distribution = "exact", conf.int = TRUE,
                         conf.level = 0.90)
    wt2   <- wilcox_test(tmax ~ treatment, data = comp2,
                         distribution = "exact", conf.int = TRUE,
                         conf.level = 0.90)
    # we need suppressWarnings() if data are identical
    # (the CI cannot be computed)
    PE    <- c(suppressWarnings(confint(wt1)[[2]]),
               suppressWarnings(confint(wt2)[[2]]))
    lower <- c(suppressWarnings(confint(wt1)$conf.int[1]),
               suppressWarnings(confint(wt2)$conf.int[1]))
    upper <- c(suppressWarnings(confint(wt1)$conf.int[2]),
               suppressWarnings(confint(wt2)$conf.int[2]))
    wilc  <- data.frame(comparison = c("T1 vs R", "T2 vs R"),
                        PE = PE, lower.CL = lower, upper.CL = upper)
    print(res, row.names = FALSE); print(wilc, row.names = FALSE)
    R>  treatment Median IQR Range
    R>          R      1   0     1
    R>         T1      1   0     2
    R>         T2      1   0     0
    R>  comparison PE lower.CL upper.CL
    R>     T1 vs R  0        0        0
    R>     T2 vs R  0       NA       NA
  • Not only ‘apparently’ the medians are identical. The IQRs are funny. Are the ranges ‘apparently’ different? I would say so. Is T1 inferior to R due to its wider range? Is T2 superior to R due to its narrower range? Doesn’t make sense to me.
    No differences are found with a nonparametric test. However, in the second comparison the confidence interval cannot be calculated because all values of T2 are identical.
  • Interested in a more ‘realistic’ example? 48 subjects, sampling every 20 minutes between 40 and 100 minutes post dose. As above R ‘contaminated’ with one tmax value of 2 hours and T1 with one of 3 hours.
  • library(coin)
    roundClosest <- function(x, y) {
     # round x to the closest multiple of y
     return(y * round(x / y))
    }
    set.seed(1234567)
    n     <- 48
    lo    <- 40/60
    hi    <- 100/60
    y     <- 20/60
    x     <- runif(n - 1, min = lo, max = hi)
    x     <- roundClosest(x, y)
    R     <- c(x, 2)
    x     <- runif(n - 1, min = lo, max = hi)
    x     <- roundClosest(x, y)
    T1    <- c(x, 3)
    x     <- runif(n - 1, min = lo, max = hi)
    x     <- roundClosest(x, y)
    T2    <- c(x, 1)
    sumR  <- summary(R)
    sumT1 <- summary(T1)
    sumT2 <- summary(T2)
    comp1 <- data.frame(tmax = c(R, T1),
                        treatment = factor(rep(c("R", "T1"), c(n, n))))
    comp2 <- data.frame(tmax = c(R, T2),
                        treatment = factor(rep(c("R", "T2"), c(n, n))))
    res   <- data.frame(treatment = c("R", "T1", "T2"),
                        Median = c(sumR[3],
                                   sumT1[3],
                                   sumT2[3]),
                        IQR = c(sumR[5] - sumR[2],
                                sumT1[5] - sumT1[2],
                                sumT2[5] - sumT2[2]),
                        Min = c(sumR[1],
                                sumT1[1],
                                sumT2[1]),
                        Max = c(sumR[6],
                                sumT1[6],
                                sumT2[6]),
                        Range = c(sumR[6] - sumR[1],
                                 sumT1[6] - sumT1[1],
                                 sumT2[6] - sumT2[1]))
    res[, 2:6] <- signif(res[, 2:6], 4)
    wt1   <- wilcox_test(tmax ~ treatment, data = comp1,
                         distribution = "exact", conf.int = TRUE,
                         conf.level = 0.90)
    wt2   <- wilcox_test(tmax ~ treatment, data = comp2,
                         distribution = "exact", conf.int = TRUE,
                         conf.level = 0.90)
    PE    <- c(confint(wt1)[[2]],
               confint(wt2)[[2]])
    lower <- c(confint(wt1)$conf.int[1],
               confint(wt2)$conf.int[1])
    upper <- c(confint(wt1)$conf.int[2],
               confint(wt2)$conf.int[2])
    wilc  <- data.frame(comparison = c("T1 vs R", "T2 vs R"),
                        PE = PE, lower.CL = lower, upper.CL = upper)
    wilc[, 2:4] <- signif(wilc[, 2:4], 4)
    ylim  <- 1.04 * c(0, max(c(R, T1, T2), na.rm = TRUE))
    # box plots
    dev.new(width = 4.6, height = 4.6)
    op    <- par(no.readonly = TRUE)
    par(mar = c(2.1, 4.1, 0, 0), cex.axis = 0.9)
    plot(c(0.5, 3.5), c(0, 0), type = "n", axes = FALSE,
         ylim = ylim, xlab = "", ylab = expression(italic(t)[max]))
    axis(1, at = 1:3, labels = c("R", "T1", "T2"), tick = FALSE)
    axis(2, las = 1)
    boxplot(tmax ~ treatment, data = comp1, add = TRUE, at = 1:2,
            boxwex = 0.4, ylim = ylim, names = "",
            main = "", axes = FALSE, col = "bisque", outcol = "red")
    boxplot(tmax ~ treatment, data = comp2, add = TRUE, at = c(1, 3),
            boxwex = 0.4, ylim = ylim, names = "", ylab = "",
            main = "", axes = FALSE, col = "bisque", outcol = "red")
    par(op)
    print(res, row.names = FALSE); print(wilc, row.names = FALSE)
    R>  treatment Median    IQR    Min   Max Range
    R>          R  1.333 0.3333 0.6667 2.000 1.333
    R>         T1  1.000 0.3333 0.6667 3.000 2.333
    R>         T2  1.333 0.3333 0.6667 1.667 1.000
    R>  comparison PE lower.CL upper.CL
    R>     T1 vs R  0        0   0.3333
    R>     T2 vs R  0        0   0.3333

    Fig. 1 Box plots of simulated tmax values.

  • What might an almighty assessor conclude?
    

Cmax and tmax are not sensitive to even substantial changes of the rate of absorption.24 Since both are composite pharmacokinetic metrics, they are influenced by the extent of absorption (\(\small{f}\) as reflected in AUC). See also the interlude below.

    

Though stated in the WHO’s guideline, it is a misconception that a statistical comparison of tmax with a nonparametric method has low power. For instance, the Wilcoxon signed-rank test (for paired differences) and the Mann–Whitney U test (for independent samples) have an asymptotical power of \(\small{3/\pi\approx95.5\%}\) for normal distributed data. For small sample sizes sufficient power was confirmed in simulations.25

top of section ↩︎ previous section ↩︎

Simulation

    

I picked an easy one. An immediate release formulation, one compartment model, PK-parameters: \(\small{D=100}\), fraction absorbed \(\small{f=0.80}\), volume of distribution \(\small{V=4}\), absorption rate constant \(\small{k_{\,01}=2\,\textrm{h}^{-1}}\), lag time \(\small{t_\textrm{lag}=5\,\textrm{min}}\), elimination rate constant \(\small{k_{\,10}=0.25\,\textrm{h}^{-1}}\).
The sampling schedule was 0, 10, 15, 20, 30, 45 minutes, 1, 1.25, 1.5, 2, 2.5, 3.25, 4.25, 5.5, 7, 9.25, and 12 hours in order to get a reliable estimate of tmax (the theoretical value based on the model is 1.272 hours).
Error distributions were uniform for \(\small{f}\) (0.6–1), log-normal for \(\small{V}\) (CV 50%), \(\small{k_{\,01}}\) (CV 35%), \(\small{k_{\,10}}\) (CV 40%), and truncated normal for \(\small{t_\textrm{lag}}\) (limits 0–15 min). Analytical error was log-normal with a CV of 5% of the simulated concentration.

25 simulated studies with 48 subjects each. The LLOQ was set to 5% of the theoretical Cmax.

I split the data of the studies in halves (mimicking a parallel design with 24 subjects / group) and compared the medians, interquartile ranges, and ranges. Furthermore, I compared the tmax values by the Mann–Whitney U test.26

Cave: 141 LOC.

library(truncnorm)
library(coin)
library(reshape)
library(lattice)
Ct <- function(x, D = D, F = F.d, V = V.d,
               k01 = k01.d, k10 = k10.d, tlag = tlag.d) {
  # one-compartment model with a lag time
  C <- F.d*D/V.d*k01/(k01-k10.d)*
       (exp(-k10*(x-tlag))-exp(-k01*(x-tlag)))
  C[C < 0] <- 0
  return(C)
}
set.seed(123456)
t       <- c(0, 10/60, 15/60, 20/60, 30/60, 45/60, 1, 1.25,
             1.5, 2, 2.5, 3.25, 4.25, 5.5, 7, 9.25, 12)
studies <- 25   # number of studies
n       <- 48   # number of subjects / study
D       <- 100  # dose
# Index ".d" denotes theoretical value, ".c" its CV
F.d     <- 0.8  # fraction absorbed
F.l     <- 0.6  # lower limit
F.u     <- 1    # upper limit
V.d     <- 4    # volume of distribution
V.c     <- 0.50 # CV 50%, lognormal
k01.d   <- 2    # absorption rate constant
k01.c   <- 0.35 # CV 35%, lognormal
k10.d   <- 0.25 # elimination rate constant
k10.c   <- 0.40 # CV 40%, lognormal
tlag.d  <- 5/60 # lag time
tlag.l  <- 0    # lower truncation 1
tlag.u  <- 0.25 # upper truncation 1
tlag.c  <- 0.5  # CV 50%, truncated normal
AErr    <- 0.05 # analytical error CV 5%, lognormal
LLOQ.f  <- 0.05 # fraction of theoretical Cmax
# theoretical profile
x       <- seq(min(t), max(t), length.out=2500)
C.th    <- Ct(x = x, D = D, F = F.d, V = V.d,
              k01 = k01.d, k10 = k10.d, tlag = tlag.d)
tmax    <- log(k01.d/k10.d)/(k01.d-k10.d)+tlag.d
Cmax    <- Ct(x = tmax, D = D, F = F.d, V = V.d,
              k01 = k01.d, k10 = k10.d, tlag = tlag.d)
LLOQ    <- LLOQ.f * Cmax
data    <- data.frame(study = rep(1:studies, each = n*length(t)),
                      subject = rep(1:n, each = length(t)),
                      t = t, C = NA)
for (i in 1:studies) {
  # individual PK parameters
  F    <- runif(n = n, min = F.l, max = F.u)
  V    <- rlnorm(n = n, meanlog = log(V.d)-0.5*log(V.c^2+1),
                 sdlog = sqrt(log(V.c^2+1)))
  k01  <- rlnorm(n = n, meanlog = log(k01.d)-0.5*log(k01.c^2+1),
                 sdlog = sqrt(log(k01.c^2+1)))
  k10  <- rlnorm(n = n, meanlog = log(k10.d)-0.5*log(k10.c^2+1),
                 sdlog = sqrt(log(k10.c^2+1)))
  tlag <- rtruncnorm(n = n, a = tlag.l, b = tlag.u,
                     mean = tlag.d, sd = tlag.c)
  for (j in 1:n) {
    # individual profiles
    C <- Ct(x = t, D = D, F = F[j], V = V[j],
            k01 = k01[i], k10 = k10[j],
            tlag = tlag[j])
    for (k in 1:length(t)) { # analytical error (multiplicative)
      if (k == 1) {
        AErr1 <- rnorm(n = 1, mean = 0, sd = abs(C[k]*AErr))
      } else {
        AErr1 <- c(AErr1, rnorm(n = 1, mean = 0, sd = abs(C[k]*AErr)))
      }
    }
    C <- C + AErr1     # add analytical error
    C[C < LLOQ] <- NA  # assign NAs to Cs below LLOQ
    data$C[data$study == i & data$subject == j] <- C
  }
}
# simple NCA
res <- data.frame(study = rep(1:studies, each = n),
                  subject = rep(1:n, studies),
                  tlag = NA, tmax = NA, Cmax = NA)
i <- 0
for (j in 1:studies) {
  for (k in 1:n) {
    i <- i + 1
    tmp.t       <- data$t[data$study == j & data$subject == k]
    tmp.C       <- data$C[data$study == j & data$subject == k]
    res$tlag[i] <- tmp.t[which(tmp.t == t[!is.na(tmp.C)][1])+1]
    res$Cmax[i] <- max(tmp.C, na.rm = TRUE)
    res$tmax[i] <- tmp.t[which(tmp.C == res$Cmax[i])]
  }
}
comp <- data.frame(study = 1:studies, me.1 = NA, me.2 = NA,
                   iqr.1 = NA, iqr.2 = NA, rg.1 = NA, rg.2 = NA,
                   CL.lo = NA, CL.hi = NA )
for (i in 1:studies) {
  study <- res[res$study == i, ]
  sum1  <- summary(study$tmax[study$subject == 1:(n/2)])
  sum2  <- summary(study$tmax[study$subject == (n/2+1):n])
  comp$me.1[i]  <- sum1[3]
  comp$me.2[i]  <- sum2[3]
  comp$iqr.1[i] <- sum1[5] - sum1[2]
  comp$iqr.2[i] <- sum2[5] - sum2[2]
  comp$rg.1[i]  <- sum1[6] - sum1[1]
  comp$rg.2[i]  <- sum2[6] - sum2[1]
  stud.tmax <- data.frame(tmax = study$tmax,
                          part = factor(rep(c(1, 2),
                                         c(n/2, n/2))))
  wt <- wilcox_test(tmax ~ part, data = stud.tmax,
                    distribution = "exact", conf.int = TRUE,
                    conf.level = 0.90)
  comp$CL.lo[i] <- sprintf("%+.2f", confint(wt)$conf.int[1])
  comp$CL.hi[i] <- sprintf("%+.2f", confint(wt)$conf.int[2])
  if (comp$CL.lo[i] == "+0.00") comp$CL.lo[i] <- "\u00B10.00"
  if (comp$CL.hi[i] == "+0.00") comp$CL.hi[i] <- "\u00B10.00"
}
diff.med <- sprintf("%+.3f", unique(sort(comp$me.1-comp$me.2)))
diff.iqr <- sprintf("%+.4f", unique(sort(comp$iqr.1-comp$iqr.2)))
diff.rge <- sprintf("%+.2f", unique(sort(comp$rg.1-comp$rg.2)))
txt <- paste("Unique differences of medians:\n",
             paste(diff.med, collapse = ", "),
             "\nUnique differences of interquartile ranges:\n",
             paste(diff.iqr, collapse = ", "),
             "\nUnique differences of ranges:\n",
             paste(diff.rge, collapse = ", "))
print(comp, row.names = FALSE); cat(txt)
R>  study  me.1 me.2  iqr.1  iqr.2 rg.1 rg.2 CL.lo CL.hi
R>      1 2.000 2.00 0.5000 1.0000 2.00 3.25 ±0.00 +0.50
R>      2 1.250 1.25 0.2500 0.3125 1.50 1.25 ±0.00 +0.25
R>      3 1.250 1.25 0.5000 0.3125 0.75 1.25 ±0.00 +0.25
R>      4 1.250 1.25 0.5000 0.5000 1.25 1.00 ±0.00 +0.25
R>      5 1.250 1.25 0.0000 0.2500 1.25 1.50 -0.25 ±0.00
R>      6 1.500 1.50 0.7500 0.2500 1.50 1.00 ±0.00 +0.25
R>      7 1.500 1.50 0.5000 0.1875 1.25 1.50 ±0.00 +0.25
R>      8 1.500 1.50 0.3750 0.7500 1.00 1.50 -0.25 ±0.00
R>      9 1.500 1.50 0.7500 0.5625 1.50 1.50 -0.25 ±0.00
R>     10 1.375 1.50 0.2500 0.2500 1.25 1.75 -0.25 ±0.00
R>     11 2.000 2.00 0.2500 1.0000 1.00 2.25 -0.50 ±0.00
R>     12 2.000 2.00 0.6250 0.5000 1.25 1.50 ±0.00 ±0.00
R>     13 1.250 1.00 0.2500 0.2500 1.25 0.75 ±0.00 +0.25
R>     14 1.250 1.25 0.2500 0.2500 1.00 1.25 ±0.00 ±0.00
R>     15 1.250 1.25 0.3125 0.0625 1.75 0.75 ±0.00 +0.25
R>     16 1.500 1.50 0.2500 0.2500 1.50 1.00 ±0.00 +0.25
R>     17 2.000 2.00 0.5000 0.5000 2.00 1.25 ±0.00 +0.25
R>     18 1.500 1.25 0.2500 0.2500 1.50 1.75 ±0.00 +0.25
R>     19 2.000 2.00 0.6250 0.5000 1.75 1.25 ±0.00 +0.50
R>     20 1.500 1.50 0.6250 0.5000 1.25 1.50 ±0.00 +0.25
R>     21 1.500 1.50 0.7500 0.7500 1.75 1.50 -0.25 ±0.00
R>     22 1.500 1.50 0.5000 0.5000 0.75 2.00 ±0.00 ±0.00
R>     23 1.000 1.00 0.3125 0.3125 1.25 1.25 -0.25 ±0.00
R>     24 1.250 1.25 0.2500 0.3125 1.50 1.00 ±0.00 ±0.00
R>     25 1.500 1.25 0.2500 0.5000 1.00 1.00 ±0.00 +0.25
R> Unique differences of medians:
R>  -0.125, +0.000, +0.250 
R> Unique differences of interquartile ranges:
R>  -0.7500, -0.5000, -0.3750, -0.2500, -0.0625, +0.0000, +0.1250, +0.1875, +0.2500, +0.3125, +0.5000 
R> Unique differences of ranges:
R>  -1.25, -0.50, -0.25, +0.00, +0.25, +0.50, +0.75, +1.00
prep <- as.data.frame(
          melt(res, id = c("study", "subject"), measure = "tmax"))
for (i in 1:nrow(prep)) {
  if (prep$subject[i] %in% 1:(n/2)) {
    prep$group[i] <- 1
  } else {
    prep$group[i] <- 2
  }
}
prep$study   <- as.factor(prep$study)
prep$subject <- as.factor(prep$subject)
prep$group   <- factor(prep$group, levels = 2:1, order = TRUE)
# box plots
dev.new(width = 6, height = 6)
op <- par(no.readonly = TRUE)
bwplot(value ~ study | variable*group, data = prep, group = group,
       xlab = "study", ylab = "hours",
       ylim = 1.04 * c(0, max(res$tmax, na.rm = TRUE)))
par(op)

Fig. 2 Box plots of studies’ tmax ordered by group
(subjects 1–24, top and 25–48, bottom).

    

In none of the simulated studies the nonparametric ~90% confidence interval27 did not include zero, i.e., showed a statistically significant difference in tmax.

    

Interlude 1

It is a misconception that the Wilcoxon signed-rank test and the Mann–Whitney U test compare medians. The Wilcoxon signed-rank test (paired samples) employs the Hodges-Leh­mann estimator. The Mann–Whitney U test (independent samples) compares the median of the difference between a sample from \(\small{x}\) and a sample from \(\small{y}\).
Both are unbiased estimators of a shift in location only if distributions are identical. However, in well-controlled studies (e.g., in BE) this is likely the case.28

    

An alternative not requiring identical distributions was proposed.29 It is out of scope of this article.

In a real study – if tmax is a clinically relevant PK metric – one would have to pre-specify the acceptance limits and assess whether the ~90% CI lies entirely within. For an analgesic for rapid relief of pain it might be as short as ±20 minutes. If this would have been the case in the example above, all studies would pass because none of the confidence limits were more than ±15 minutes from the reference.

However, if we follow the guidelines we are left in the dark.

  • Is a difference in medians of –7.5 min ‘apparent’ – or a difference of +15 min? No idea.

  • Even worse the interquartile ranges. Here we have values from –45 min to +30 min. Are they ‘apparently’ different?

  • Let’s forget the range, please. What might values from –1.25 to +1 hours mean?

    

Interlude 2

tmax is a poor predictor of differences in the rate of absorption.
Example: An immediate release formulation, one compartment model with complete absorption, no lag time. \(\small{D=100}\), \(\small{V=3}\), absorption half lives 30 minutes (Reference) and 20 minutes (Test), elimination half life 4 hours. Sigmoid effect model identical for R and T: \(\small{E_\textrm{max}=125}\), \(\small{EC_{50}=50}\), \(\small{\gamma=0.5}\).

Ct <- function(D, f, V, k01, k10, t) {
  C <- D * f * k01 / (V * (k01 - k10)) *
       (exp(-k10 * t) - exp(-k01 * t))
  return(C) # concentration
}
Et <- function(Emax, EC50, gamma, C, t) {
  E <- (Emax * C^gamma)/(C^gamma + EC50^gamma)
  return(E) # effect
}
# PK model
D       <- 100
V       <- 3
f       <- 1
k01.R   <- log(2)/0.5   # t½,abs 30 minutes
k01.T   <- log(2)/(1/3) # t½,abs 20 minutes
k10     <- log(2)/4     # t½,el 4 hours
# PD (sigmoidal link model)
Emax    <- 125
EC50    <- 50
gamma   <- 0.5
t       <- seq(0, 24, length.out = 1001)
C.R     <- Ct(D, f, V, k01.R, k10, t)
C.T     <- Ct(D, f, V, k01.T, k10, t)
E.R     <- Et(Emax, EC50, gamma, C.R, t)
E.T     <- Et(Emax, EC50, gamma, C.T, t)
res     <- data.frame(trt = c("R", "T", "T/R", "T-R"),
             Cmax = c(max(C.R), max(C.T),
                      max(C.T)/max(C.R), NA),
             tmax = c(t[C.R == max(C.R)],
                      t[C.T == max(C.T)], NA,
                      t[C.T == max(C.T)]-t[C.R == max(C.R)]),
             Emax = c(max(E.R), max(E.T),
                      max(E.T)/max(E.R), NA),
             t.Emax = c(t[E.R == max(E.R)],
                        t[E.T == max(E.T)], NA,
                        t[E.T == max(E.T)]-t[E.R == max(E.R)]))
res[, 2:5] <- signif(res[, 2:5], 4)
dev.new(width = 4.5, height = 4.5, record = TRUE)
op      <- par(no.readonly = TRUE)
invisible(split.screen(c(2, 1)))
screen(1)
  par(mar = c(3, 2.9, 1.1, 0.1), cex.axis = 0.8,
      xaxs = "r", yaxs = "i")
  plot(t, C.R, type = "n", axes = FALSE,
       ylim = 1.04*c(0, max(C.R, C.T)),
       xlab = "", ylab = "")
  axis(1, at = seq(0, 24, 4))
  axis(2, las = 1)
  mtext("concentration", 2, line = 2)
  lines(t, C.R, lwd = 3, col = "#1010FF80")
  lines(t, C.T, lwd = 3, col = "#FF101080")
  legend("topright", legend = c("R", "T"),
         col = c("#1010FF80", "#FF101080"),
         box.lty = 0, cex = 0.8, lwd = 3)
  box()
screen(2)
  par(mar = c(4, 2.9, 0.1, 0.1), cex.axis = 0.8,
      xaxs = "r", yaxs = "i")
  plot(t, E.R, type = "n", axes = FALSE,
       ylim = 1.04*c(0, max(E.R, E.T)),
       xlab = "time", ylab = "")
  axis(1, at = seq(0, 24, 4))
  axis(2, las = 1)
  mtext("effect", 2, line = 2)
  lines(t, E.R, lwd = 3, col = "#1010FF80")
  lines(t, E.T, lwd = 3, col = "#FF101080")
  legend("topright", legend = c("R", "T"),
         col = c("#1010FF80", "#FF101080"),
         box.lty = 0, cex = 0.8, lwd = 3)
  box()
close.screen(all.screens = TRUE)
par(op)
print(res, row.names = FALSE)
R>  trt   Cmax   tmax   Emax t.Emax
R>    R 24.770  1.704 51.630  1.704
R>    T 26.590  1.296 52.720  1.296
R>  T/R  1.074     NA  1.021     NA
R>  T-R     NA -0.408     NA -0.408

Fig. 3 Concentration and linked effect.

Although absorption of the Test is 33% faster than the one of the Reference, its tmax occurs only ~25 minutes earlier. The T/R-ratio of Cmax is 107.4%.
It’s another common misconception that Cmax is directly related to safety. Linking to the effect site dampens the peaks. Hence, the T/R-ratio of Emax is only 102.1%.30

top of section ↩︎ previous section ↩︎

Alternatives

    

An alternative to comparing tmax would be to assess the ‘early exposure’ (i.e., by a partial AUC) as recommended by Health Canada,31 the FDA,32 and by the EMA for multiphasic modified release products. However, this PK metric is extremely variable and might require a replicate design (for the FDA’s RSABE or the EMA’s ABEL). Similarly to the comparison of Cmax, Health Canada requires only that the point estimate lies within 80.0–125.0% and hence, a parallel or a simple 2×2×2 crossover design would be sufficient.

Interesting is an older guidance of the FDA.

An early exposure measure may be informative on the basis of appropriate clinical efficacy/safety trials and/or pharmacokinetic/pharmacodynamic studies that call for better control of drug absorption into the systemic circulation (e.g., to ensure rapid onset of an analgesic effect or to avoid an excessive hypotensive action of an antihypertensive). In this setting, the guidance recommends use of partial AUC as an early exposure measure. We recommend that the partial area be truncated at the population median of Tmax values for the reference formulation.
FDA. 2003.33

Sorry folks but the population median is unknown. Is the sample median meant? Or is it the value reported in the reference formulation’s label?

Regulations differ in their definitions of ‘early exposure’. Health Canada requires the cut-off time as the subject’s tmax of the reference,34 which can be difficult to set up in some software packages. The FDA and the EMA require a pre-specified cut-off time (identical for all subjects under all treatments). Whereas the FDA recommends to set the cut-off time based on clinical grounds (i.e., PD), for the EMA it should be based on PK. If the reference product’s tmax is not stated in the SmPC, a justification might be difficult. To complicate things: Quite often the arithmetic mean instead of the median is given.

top of section ↩︎ previous section ↩︎

Conclusion

    

Recall that in the simulations we had perfectly matching products (the average PK-parameters were identical). If we apply an appropriate statistical method, we could confirm that (i.e., detect no differences).

There is no guarantee that by looking (‼) at reported medians / IQRs / ranges an assessor will arrive at the same conclusions as the applicant.

Assessing tmax based on ‘apparent’ differences of medians, its ‘variability’ or range is bad science and thus, should be abandoned.

Nevertheless, the statement in the guidelines
  • »A [formal] statistical evaluation of tmax is not required«

does not preclude to perform one.

Pre-specify a clinically relevant acceptance range for tmax and assess the ~90% CI by an appropriate nonparametric method.

I did so since the mid 1980s and even after the 2010 guideline never received a single deficiency letter in this respect.

Enlightenment is man’s release from his self-incurred tutelage. Tutelage is man’s inability to make use of his understanding without direction from another. Self-incurred is this tutelage when its cause lies not in lack of reason but in lack of resolution and courage to use it without direction from another. Sapere aude! “Have courage to use your own reason!” – that is the motto of enlightenment.
Immanuel Kant. 1784. What is Enlightenment?

top of section ↩︎ previous section ↩︎

License

CC BY 4.0 Helmut Schütz 2021
1st version April 19, 2021.
Rendered 2021-04-22 12:25:47 CEST by rmarkdown in 0.17 seconds.

Footnotes and References


  1. Mersman O, Trautmann H, Steuer D, Bornkamp B. truncnorm: Truncated Normal Distribution. 2018-02-27. v1.0-8. CRAN.↩︎

  2. Hothorn T, Winell H, Hornik K, van de Wiel MA, Zeileis A. Conditional Inference Procedures in a Permutation Test Framework. 2021-02-08. v1.4-1. CRAN↩︎

  3. Wickham H. Flexibly Reshape Data. 2018-10-23. v0.8.8. CRAN.↩︎

  4. Sarkar D, Andrews F, Wright K, Klepeis, N, Murrell P. Trellis Graphics for R. 2020-04-01. v0.20-41. CRAN.↩︎

  5. European Medicines Agency. CHMP. London. 20 January 2010. Guideline on the Investigation of Bioequivalence. CPMP/EWP/QWP/1401/98 Rev. 1/ Corr **.↩︎

  6. European Medicines Agency. CHMP. London. 20 November 2014. Guideline on the pharmacokinetic and clinical evaluation of modified release dosage forms. EMA/CHMP/EWP/280/96 Rev1.↩︎

  7. Commission of the European Communities. CPMP Working Party on Efficacy of Medicinal Products. Brussels. December 1991. Note for Guidance: Investigation of Bioavailability and Bioequivalence. Appendix III: Technical Aspects of Bioequivalence Statistics.↩︎

  8. Commission of the European Communities. CPMP Working Party on Efficacy of Medicinal Products. Brussels. June 1992. Note for Guidance: Investigation of Bioavailability and Bioequivalence.↩︎

  9. European Agency for the Evaluation of Medicinal Products. CPMP. London. 28 July 1999. Note for Guidance on Modified Release Oral and Transdermal Dosage Forms: Section II (Pharmacokinetic and Clinical Evaluation). CPMP/EWP/280/96 Corr *.↩︎

  10. Ministerio de Salud. Secretaría de Políticas, Regulación y Relaciones Sanitarias. Buenos Aires. 2006. Régimen de Buenas Prácticas para la Realización de Estudios de Biodisponibilidad/Bioequivalencia. Spanish.↩︎

  11. Pharmaceuticals and Medical Devices Agency. Pharmaceutical and Food Safety Bureau. Guideline for Bioequivalence Studies of Generic Products. Tokio. February 29, 2012. Q & A.↩︎

  12. Medicines Control Council. Pretoria. July 2015. Registration of Medicines. Biostudies.↩︎

  13. Department of Health. Therapeutic Goods Administration. Canberra. 16 June 2011. European Union and ICH Guidelines adopted in Australia. Guideline on the Investigation of Bioequivalence with TGA Annotations.↩︎

  14. ASEAN States Pharmaceutical Product Working Group. Vientiane. March 2015. ASEAN Guideline for the Conduct of Bioequivalence Studies..↩︎

  15. Departamento Agencia Nacional de Medicamentos. Instituto de Salud Pública de Chile. Santiago. December 2018. Guia para La realización de estudios de biodisponibilidad comparativa en formas farmacéuticas sólidas de administración oral y acción sistémica. Spanish.↩︎

  16. Eurasian Economic Commission. 3 November 2016 Regulations Conducting Bioequivalence Studies within the Framework of the Eurasian Economic Union.. Russian.↩︎

  17. Ministry of Health and Population, The Specialized Scientific Committee for Evaluation of Bioavailability & Bioequivalence Studies. Cairo. February 2017. Egyptian Guideline For Conducting Bioequivalence Studies for Marketing Authorization of Generic Products..↩︎

  18. Executive Board of the Health Ministers’ Council for GCC States. March 2016 The GCC Guidelines for Bioequivalence..↩︎

  19. New Zealand Medicines and Medical Devices Safety Authority. Wellington. February 2018. Guideline on the Regulation of Therapeutic Products in New Zealand. Part 6: Bioequivalence of medicines.↩︎

  20. WHO Expert Committee on Specifications for Pharmaceutical Preparations, Fifty-first Report (WHO TRS No. 992). Geneva. 2017. Annex 6. Multisource (generic) pharmaceutical products: guidelines on registration requirements to establish interchangeability..↩︎

  21. Think about normal distributed data. The arithmetic mean is the statistic of location and the standard deviation the statistic of dispersion. Both are estimated values. Talking about the ‘variability’ of the mean would be nonsense as well.↩︎

  22. Let \(\small{x_{i=1}=\ldots=x_{i=n-1},x_{i=n}=10\cdot x_{i=1}}\) and \(\small{y_{i=1}=10\cdot y_{i=n},y_{i=2}=\ldots=y_{i=n}}\). Then \(\small{\tilde{x}\equiv\tilde{y}}\) and \(\small{G(x)\equiv G(y)}\).↩︎

  23. Imagine:
    \(\small{R:\left\{1,\ldots,1,2\right\}}\) → \(\small{\tilde{R}={\color{Blue}1}}\) and \(\small{G(R)={\color{Blue}1}}\).
    \(\small{T:\left\{1,\ldots,1,3\right\}}\) → \(\small{\tilde{T}={\color{Blue}1}}\) and \(\small{G(T)={\color{Red}2}}\).
    Is the range of T ‘apparently’ different from R?
    \(\small{T:\left\{1,\ldots,1,1\right\}}\) → \(\small{\tilde{T}={\color{Blue}1}}\) and \(\small{G(T)={\color{Red}0}}\).
    ‘Apparently’ different ranges?↩︎

  24. Tothfálusi L, Endrényi. Estimation of Cmax and Tmax in Populations After Single and Multiple Drug Administration. J. Pharmacokin. Pharmacodyn. 2003; 40(5): 363–85. doi:10.1023/B:JOPA.0000008159.97748.09.↩︎

  25. Chow S-C, Liu J-p. Design and Analysis of Bioavailability and Bioequi­valence Studies. Boca Raton: Chapman & Hall/CRC; 3rd ed. 2009. Table 5.3.4. p. 146.↩︎

  26. Note that the function wilcox.test() of Base R is not exact for tied data. Hence, I used the function wilcox_test() of package coin with its default average ranking.↩︎

  27. Due to the discrete nature of the underlying distribution, the \(\small{100(1-2\,\alpha)}\) CI is always wider than the 90% CI based on the normal distribution. In a 2×2×2 crossover design with 24 subjects the exact CI is 91.12% and with 48 subjects 90.28%. However, such a comparison is not fair since the distribution is not normal.↩︎

  28. Hauschke D, Steinijans VW, Diletti E. A distribution-free procedure for the statistical analysis of bioequivalence studies. Int J Clin Pharm Ther Toxicol. 1990; 28(2): 72–8. PMID:2307548.↩︎

  29. Brunner E, Munzel U. The Nonparametric Behrens‐Fisher Problem: Asymptotic Theory and a Small‐Sample Approximation. Biom. J. 2000; 42(1): 17–25. doi:10.1002/(SICI)1521-4036(200001)42:1%3C17::AID-BIMJ17%3E3.0.CO;2-U.↩︎

  30. That explains why PD is less sensitive to differences in formulations and PK-studies are preferred.↩︎

  31. Health Canada. Ottawa. 2018/06/08. Guidance Document. Comparative Bioavailability Standards: Formulations Used for Systemic Effects.↩︎

  32. U.S. FDA. Center for Drug Evaluation and Research. Rockville. December 2013. Draft Guidance for Industry. Bioequivalence Studies with Pharmacokinetic Endpoints for Drugs Submitted Under an ANDA.↩︎

  33. U.S. FDA. Center for Drug Evaluation and Research. Rockville. March 2003. Guidance for Industry. Bioavailability and Bioequivalence Studies for Orally Administered Drug Products — General Considerations. webarchive.↩︎

  34. Will that inflate the Type I Error because it depends on the observed data? Perhaps.↩︎