An adaptive balance-ensuring big stick randomization procedure

Cornelia Ursula Kunz
Shannon Amy Zellner
Sonja Drescher
Johannes Krisam

18 April 2023

Simplified motivating trial example

Imagine a trial with the following planning assumptions:

  • one-sided \(\alpha = 0.025\) and desired power \(1-\tilde\beta = 0.9\)
  • 1:k randomization
  • assumed effect size \(\tilde\delta = 0.4585\) and standard deviation \(\sigma = 1\)
  • planned sample sizes \(\tilde N = \frac{(k+1)^2}{k} \cdot \frac{\left(\Phi^{-1}(1-\alpha)+\Phi^{-1}(1-\beta)\right)^2}{\tilde\delta^2}\sigma^2 = 200\)
    with \(\tilde n_1 = \frac{1}{k+1}N = 100\) and \(\tilde n_2=\frac{k}{k+1}N = 100\)

We plan to conduct an interim analysis when half the patients have been observed using an O’Brien-Fleming-like spending function.

The desired power at interim for true effect \(\delta = \tilde\delta\) and standard deviation \(\sigma=\tilde\sigma\) is \(1-\tilde{\beta}_{IA} =\) 0.2525 and the desired overall power is (approximately) \(1-\tilde{\beta}\) =0.9.

Note that in this example, the sample size for the two-stage design is approximately the same as for the single-stage design.

Trial team concerns

Inclusion criteria (among other things)

  • test negative

However, test outcome only known after randomization. Number of uneligible patients unknown (estimates between 20% and 80%!)

Concerns for Power

The true power depends not just on the true values for the effect size and the standard deviation but also on the realized sample sizes.

For a fixed design, the resulting power (for true effect \(\delta = \tilde\delta\) and standard deviation \(\sigma=\tilde\sigma\)) depending on realized sample sizes \(n_1\) and \(n_2\) is given by: \[\begin{align} 1-\beta =& 1-\Phi\left(\Phi^{-1}(1-\alpha) - (\Phi^{-1}(1-\alpha) + \Phi^{-1}(1-\tilde\beta)) \sqrt{\frac{n_1n_2}{\tilde n_1 \tilde n_2}}\right) \text{ with } f = \frac{n_1n_2}{\tilde n_1 \tilde n_2} \end{align}\]

Note that \(f\) is independent of \(k\). It is an expression of how much the realized sample sizes deviate from the planned sample sizes (that would yield a power of \(1-\tilde{\beta}\)).

Realized sample sizes and power

Realized sample sizes and power at interim

Randomization

Randomization is a key element of randomized controlled trials (RCT). It reduces possible systematic bias of the treatment effect from confounding variables.

Several methods for randomization exists. In general, methods with low predictability of treatment assignments that, at the same time, lead to balanced sample sizes are preferable. In reality, often a compromise between the two features has to be made.

Some examples:

  • Complete (Coin) Randomization (CR)
  • Permuted Block Design (PBD)
  • Big Stick Design (BSD)

Complete randomization (CR)

Complete randomization corresponds to tossing a fair coin at each treatment assignment for balanced allocation in the two-group case.

Pros

  • Easy to implement
  • Equal probabilities for subject allocation in each step
  • Lowest possible predictability of allocation outcomes

Cons

  • No limit for maximum tolerated imbalance \(mti\)
  • High probability for unbalanced sample sizes

Permuted Block Design (PBD)

Pros

  • Easy to implement
  • High probability for equal sample sizes
  • \(mti\) given by half of the block length

Cons

  • Unequal probabilities for subject allocation in each step (except for the first patient in each block)
  • High predictability of allocation outcomes (at least for last patient in each block), especially for small block lengths

Notes

  • Block length more or less arbitrary, often rather small (i.e. length = 8)

Block randomization

Big Stick Design

The Big Stick Design (BSD) is a variation of complete randomization with the aim to restrict the imbalance.

BSD adds the restriction that the treatment imbalance \(D_j\) at a given allocation step \(j\) in either direction must not be larger than a pre-defined value for the \(mti\). This ensures that the imbalance is smaller than the \(mti\) across the whole trial.

The procedure is defined as follows:

  • In case \(|D_{j-1}|<mti\), the assignment \(j\) is based on a fair coin toss.
  • If, however, \(|D_{j-1}|=mti\) , assignment \(j\) will be made in a deterministic fashion to the underrepresented treatment.

Accordingly, \(|D_{j}|\leq mti\) at all allocation steps \(j=1,...,\tilde{N}\), and the procedure corresponds to complete randomization with a reflecting barrier at \(-mti\) and \(+mti\).

The \(mti\) is often chosen in a rather arbitrary way. One of our first aims was to connect the \(mti\) to an acceptable loss in power.

Big Stick Design

Pros

  • As long as the \(mti\) is not reached, the procedure is highly unpredictable
  • \(mti\) can be strictly controlled by means of a design parameter

Cons

  • Design only defined for balanced allocation and no more than two treatment groups
  • In case of small \(mti\), many deterministic assignments are made, thus high predictability
  • In case of high \(mti\), predictability is low, but large imbalance possible

As the \(mti\) is a fixed number, our second aim was to allow the \(mti\) to be chosen dependent on the number of currently enrolled patients.

Flexible Stick Design

  • The BSD can be adapted by means of including a \(mti\) that increases over time according to the number of patients enrolled.
  • The \(mti\) is defined in such a manner that \(f = \frac{n_1n_2}{\tilde n_1 \tilde n_2}\) remains below a critical boundary \(f_{crit}\) which still ensures the minimally desired power at the end of the trial (in this case 0.8937).

Pros

  • For small sample sizes, balance is ensured by means of stricter \(mti\)
  • For larger sample sizes, balance does not have such a huge impact on the power anymore, thus the larger \(mti\) decreases the predictability

Cons

  • More difficult to implement flexible \(mti\)

Flexible Stick Design (FSD) in Group-Sequential Designs

  • In a group-sequential design, power does also matter at the interim analysis \(\rightarrow\) stricter control of \(mti\) might be required for stage I of the trial.

  • A desired minimal power for the interim analysis can be guaranteed by adapting the \(mti\) for stage I (here: planned power is 0.2525, minimally desired power is 0.2508).

  • After the interim analysis, the \(mti\) is defined as for the standard FSD.

Pros

  • Imbalance (and thus power) control also ensured for interim analysis
  • Same flexibility for stage II as for FSD

FSD in Group-Sequential Designs

  • As an alternative, one might also avoid the jump in the \(mti\) after the interim by linearly increasing the \(mti\).

Adaptive Stick Design (ASD) in Group-Sequential Designs

In case a spending function is used (e.g. OBF-type alpha spending), one may increase the flexibility by allowing a \(mti\) that ensures the desired minimally power regardless of when the interim analysis is conducted

Pros

  • More flexibility with regard to interim analysis
  • Minimally desired power is ensured at all times

Cons

  • \(mti\) is no longer a linear function of the enrolled sample size

Density for imbalance at interim and final

Density for imbalance at interim and final

Density for imbalance at interim and final

Density for imbalance at interim and final

Power

Joint assessment of randomness and imbalance

Besides imbalance and power, randomness is another important issue for a randomization procedure.

  • The Forcing Index (FI) has been proposed as a measure that ranges between 0 (\(\hat{=}\) “high randomness”) and 1 (\(\hat{=}\)“low randomness”).

  • If \(\phi_i\) denotes the probability that a patient is allocated to the experimental treatment given all previous allocations, then \(FI(i)=\sum_{j=1}^i \frac{E[|\phi_j-0.5|]}{i/4}\), where \(E [|\phi_j-0.5|]\) is the expected deviation of the conditional probability of allocating the experimental treatment from the unconditional target value of 0.5 (Berger et al. 2021).

  • This means that for CR, we have \(FI(i)\equiv 0\) for all \(i\geq 1\).

  • For PBD with block size 2, \(FI(i)\rightarrow 1\) for \(i \rightarrow \infty\).

In order to jointly compare balance and predictability, we also assess imbalance via a measure that can take values between 0 (\(\hat{=}\) “low imbalance”) and 1 (\(\hat{=}\)“high imbalance”).

  • Here, we take the cumulative average loss after \(n\) allocations, which is defined as \(Imb(n)=\frac{1}{n}\sum_{i=1}^n E[D_i^2]/i\), where \(D_i\) is the difference between the allocations in the two treatment groups (Berger et al. 2021).

  • If \(n\rightarrow\infty\), \(Imb(n)\) converges to 1 for CR.

  • \(Imb(n)\) converges to 0 for PBD with block length 2.

Randomness/Balance trade-off

Summary

  • While PBD ensures balance in a quite strict manner, less strict methods can achieve the same targeted power
  • This brings an advantage in terms of predictability, which is a crucial (and often ignored) issue especially in open-label times
  • Our methods allow to achieve the minimally desired power with a much lower predictability
  • While the \(mti\) itself changes with respect to the enrolled sample size, the proposed procedures do not depend on the observed treatment allocations - thus, randomization lists can be pre-defined and no “dynamic” allocation technique is required
  • Outlook: Generalization of the procedure to unequal allocation and multiple treatment arms might be desirable

Thank you

References

Berger, V., Bour, L., Carter, K. et al. A roadmap to using randomization in clinical trials. BMC Med Res Methodol 21, 168 (2021). https://doi.org/10.1186/s12874-021-01303-z

Back-Up

Unequal sample sizes and power