Sei sulla pagina 1di 53

School of Mathematical Sciences

MTHM038: MSc Mathematics Dissertation

Response-Adaptive Randomisation in
Clinical Trials with Binary Responses

2014-15
Mateusz Matyjaszczyk
140293481
Supervisor: Dr. D.S. Coad

140293481
Abstract
Randomisation is a fundamental concept in experimental design as it is
the best known way of removing unwanted bias. The classical approach to
randomisation is to balance the number of patients receiving each treatment.
However, in a clinical trial this has an ethical disadvantage as it could lead
to a high number of treatment failures. We explore various response-adaptive
randomisation schemes which aim to assign more patients to the superior
treatment in order to reduce the number of treatment failures. In this dissertation we only consider clinical trials with binary responses.
We start by introducing the randomised play-the-winner (RPW) rule. The
RPW rule has many statistical disadvantages and a previous application in
a clinical trial lead to disastrous results. We therefore introduce three different randomisation rules: drop-the-loser (DL) rule, odds ratio based design
(ORBD) and doubly adaptive biased coin design (DBCD). For these rules to
be applicable under a realistic setting, each one is extended to (i) allow any
number of treatments (ii) allow delayed responses (iii) incorporate covariates.
We then analyse the efficient randomised adaptive design (ERADE) which
obtains the Cramer-Rao lower bound on the asymptotic variance.
The final section compares the randomisation rules mentioned. For K = 2
and K = 3 treatment design, we compare the allocation proportion and define
a hypothesis test which we then use to simulate power and significance level.
Then the same methods are used to compare the randomisation rules under
delayed responses and incorporating covariates.
We find that in general there is an inverse relationship between more ethical allocation and power. A suitable response-adaptive randomisation scheme
needs to have a good balance between these two criteria and thus such a
randomisation procedure should be tailored to a specific clinical trial.

140293481

Contents
1 Introduction

2 Response-adaptive randomisation designs


2.1 Randomised play-the-winner (RPW) rule . . . . . . .
2.1.1 K = 2 treatments design . . . . . . . . . . . .
2.1.2 Statistical properties and criticisms . . . . . .
2.2 Drop-the-loser (DL) rule . . . . . . . . . . . . . . . .
2.2.1 K 2 treatments design . . . . . . . . . . . .
2.2.2 Allowing delayed responses . . . . . . . . . . .
2.2.3 Targetting an alternative allocation proportion
2.2.4 Incorporating covariates . . . . . . . . . . . .
2.3 Odds-ratio based designs (ORBD) . . . . . . . . . . .
2.3.1 Incorporating covariates . . . . . . . . . . . .
2.3.2 K > 2 treatments design . . . . . . . . . . . .
2.4 Doubly adaptive biased coin design (DBCD) . . . . .
2.4.1 K = 2 treatments design . . . . . . . . . . . .
2.4.2 K > 2 treatments design . . . . . . . . . . . .
2.4.3 Incorporating covariates . . . . . . . . . . . .
2.5 Efficient randomised adaptive design (ERADE) . . .
2.5.1 Rule definition . . . . . . . . . . . . . . . . .
3 Comparing designs
3.1 Introduction . . . . . . . . . . . .
3.2 K = 2 treatments . . . . . . . . .
3.2.1 Allocation proportion . . .
3.2.2 Inference, significance level
3.3 K = 3 treatments . . . . . . . . .
3.3.1 Allocation proportion . . .
3.3.2 Inference, significance level
3.4 Delayed responses . . . . . . . . .
3.4.1 K = 2 treatments . . . . .
3.4.2 K = 3 treatments . . . . .
3.5 Covariates . . . . . . . . . . . . .
3.5.1 K = 2 treatments . . . . .
3.5.2 K = 3 treatments . . . . .

. . . . . . .
. . . . . . .
. . . . . . .
and power .
. . . . . . .
. . . . . . .
and power .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

3
3
3
4
7
7
8
8
10
11
11
12
13
13
15
17
18
18

.
.
.
.
.
.
.
.
.
.
.
.
.

19
19
20
20
23
25
25
27
29
29
33
36
36
39

4 Conclusion
A R code used
A.1 RPW . .
A.2 DL . . .
A.3 GDL . .
A.4 DLC . .
A.5 ORBD .
A.6 DBCD .
A.7 RDBCD
A.8 ERADE

41
in simulations
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

44
44
44
45
46
47
48
49
50

140293481

Introduction

Suppose that a new treatment has been developed and we wish to compare it to
an existing treatment through a clinical trial. In this trial, the patients arrive
sequentially and each patient is assigned to either of the treatments. If the treatment
assignment is systematic, then the physician or medical examiner may be able to
predict the next assignment and choose a patient that they would prefer to receive
the corresponding treatment. This is called selection bias and is highly undesirable
as it can invalidate the trial. The best known way of removing such bias is through
randomisation.
Complete randomisation is the most basic form of randomisation where we assign a treatment to a patient with equal probability. In an experiment with two
treatments this can be compared to a toss of a fair coin. Although such a scheme
minimises selection bias and is quite easy to implement, it has many undesirable
properties e.g. it does not take the medical histories of the patients into account.
Suppose that we have a covariate under which the treatment can have various effects. Then, it is highly desired for this covariate to be equally represented in each
treatment as unbalanced treatments could have an effect on any statistical inference
performed later on.. This form of randomisation is called covariate-adaptive randomisation and historically is the most widely used type of randomisation in clinical
trials.
Next, consider the responses to this experiment to be binary (i.e. treatment was
successful or unsuccessful) and instantaneous (i.e. before the next patient is randomised). After some patients have been randomised and their responses obtained,
one of the treatments has been shown to have a lower proportion of treatment failures than the other treatments. Due to ethics, we wish to assign more patients to
this superior treatment in order to minimise the number of treatment failures. This
type of randomisation is called response-adaptive randomisation (RAR). Although
Thompson (1933) proposed such adaptive designs as early as 1930s, they have had a
very limited use in practice with the randomised play-the-winner (RPW) rule being
the design most often applied. In Section 2.1 we will briefly investigate this design
and its limitations. One of the most severe limitations is that when the success
probability is high on both treatments then the variance is unbounded. This in turn
means that the allocation proportion of patients to each treatment heavily depends
on the initial settings of the scheme. We mention the ECMO trial in which a bad
choice of the initial settings of the RPW scheme lead to disastrous results.
For a given randomisation scheme to be applicable in a practical setting, we need
to consider the following limitations:
So far we assumed the responses to be instantaneous. In practice, this is rarely
the case as new patients may be assigned to a treatment before a response is
available for all the previous patients. Hence for an adaptive design to be
1

140293481

applicable to a wide range of clinical trials, the design should allow delayed
responses.
Covariate imbalance may be an issue, similarly as in complete randomisation.
Thus, for an adaptive design to be practical, the design should take into consideration the response history so far as well as the covariate balance of the
treatments.
We also assumed that there are only two treatments. However, this may not be
the case in many clinical trials. For example, when comparing a new treatment
to an existing one, we may wish to include a placebo group. Similarly, if the
newly proposed treatment is a drug, patients can be assigned to treatments
with different dosages in order to find the optimum dose.
In Sections 2.2-2.4 three well studied RAR designs are introduced: drop-theloser rule (DL), odds-ratio based design (ORBD) and doubly adaptive biased-coin
design (DBCD). Each of these designs is extended to allow for the three limitations
above: delayed responses; covariate adaptiveness and K > 2 treatments. Of note
is the extension of the ORBD to K = 3 treatments as this has not been previously
explored in the literature. With the exception of the DL rule, none of the rules
mentioned are known to obtain the lower bound on the asymptotic variance and
thus in Section 2.5 we introduce the efficient randomised adaptive design (ERADE)
that obtains this lower bound.
Finally, in Chapter 3 we compare the statistical properties of the randomisation
schemes. Section 3.2 compares the allocation and failure proportions for all the
rules with K = 2 treatments. We find that in general the DL, DBCD and ERADE
rules have the least variable allocation proportions while the ORBD assigns the most
patients to the superior treatment, resulting in the lowest failure proportion. In fact,
we notice the inverse relationship between these two criteria. We then define the
Wald test that can be used to test the hypothesis of no treatment difference. We use
this test to simulate the power and significance level of different randomisation rules
and confirm the well-known inverse relationship between power and variability. That
is, a more variable rule in general leads to reduced power. Thus, the DL, DBCD
and ERADE are the most powerful.
In Section 3.3 we explore the DL, ORBD and DBCD when extended to K = 3
treatments. The allocation and failure proportion are investigated for each rule and
the results reflect the findings of the previous section. That is, the DL and DBCD
are found to be the least variable but assign less patients to the superior treatment
than the ORBD. We then define the contrast test of homogeneity which allows us to
compare one treatment (usually the placebo) to the other treatments. We use this
test to simulate the power and significance level for the DL, ORBD and DBCD rules
and it is shown that the DL and DBCD maintain the highest power, when compared
2

140293481

to complete randomisation. Note that a simulation of the ORBD and DBCD when
extended to K = 3 treatments is not reported in the literature.
We then consider the DL, ORBD and DBCD rules under delayed responses in
Section 3.4. Note that out of these rules, the literature only investigates the DL under delayed responses and there is no investigation of power under delayed responses
for any of the rules mentioned. The investigation into ORBD and DBCD with delayed responses is the first known investigation of this type that has been reported.
We note that moderate delay does not have an effect on allocation proportion. We
also simulate power and significance level and find that delayed responses may lower
the power of some designs. We perform such an investigation for K = 2 and K = 3
treatments. Finally, we briefly discuss the performance of RAR designs under severe
delay.
In Section 3.5 we extensions of the DL, ORBD and DBCD rules to incorporate
covariates. We see that incorporating covariates can significantly reduce the ethical
allocation, which results in higher failure rates. We also perform an investigation
into the significance level and power of these designs and find that the power can
be severely reduced for these designs. Note that this is the first investigation of the
ORBD and DBCD incorporating covariates that has been reported in the literature.
The DL incorporating covariates with K = 3 has also been studied for the first time.
We conclude that RAR designs can be statistically and ethically desirable. However, we also find that under some realistic assumptions i.e. delayed responses and
covariate-adaptiveness these RAR designs do not perform as well. Thus, a given
RAR design should be chosen in such a way that we obtain satisfactory statistical
properties whilst maximising ethical advantages. We then consider extensions to
the work presented here.

2
2.1
2.1.1

Response-adaptive randomisation designs


Randomised play-the-winner (RPW) rule
K = 2 treatments design

Assuming K = 2 treatments with binary and instantaneous responses, Zelen (1969)


proposed the following randomisation procedure. Assign the first patient to a treatment with equal probability. Given the response of the first patient, the next patient is assigned the same treatment if the response was a success. Otherwise, if
the treatment response was a failure, then the patient is assigned the other treatment. The procedure continues until all patients have been randomised or a suitable
stopping rule has been reached. Such a design is known as play-the-winner (PW)
rule. Clearly, this design only allows for 2 treatments and there is no immediate
way of incorporating delayed responses. More importantly, the design is completely

140293481

predictable as the physician is able to guess the next treatment assignment if the
response and treatment assignment of the previous patient is known.
Wei and Durham (1978) extended the above idea by proposing the randomised
play-the-winner (RPW) rule. In this design sequentially arriving patients are assigned to a treatment by a ball being drawn from an urn. We assume that there are
i = 1, 2 treatments. There are i balls of the colour corresponding to each treatment.
We would usually choose 1 = 2 so that the urn is balanced in the beginning. Here,
this will always be the case so we let = 1 = 2 . When a patient is ready to be
randomised, a ball is drawn from the urn. The patient is assigned the corresponding
treatment, the ball is replaced and a response is observed. If the response of the
treatment for this patient was a success, we add balls of the corresponding colour
to the urn. However, if the response was a failure, then balls of the opposite kind
are added to the urn. This way we skew the probability of assignment towards the
more successful treatment so far. This process continues until a suitable stopping
criteria has been reached e.g. sufficient number of patients have been randomised.
We denote this design by RPW(,).
The above design deals with many of the limitations of the PW rule. Firstly,
the design is less predictable than PW rule since the allocation probability depends
on the whole response history, rather than just the last response. RPW also allows
the responses to be delayed as the urn can be updated once a response is available,
which was not the case with the PW rule.
2.1.2

Statistical properties and criticisms

We now list some interesting statistical properties of the RPW rule. Consider a
trial with treatments i = 1, 2, binary responses and n patients to be assigned.
The treatments have success probabilities 0 < pi < 1. Then, the probabilities of
treatment failure are given by qi = 1 pi and Ni (n) is the number of patients
assigned to treatment i. Wei and Durham (1978) have shown that
1/qi
Ni (n)

n
1/q1 + 1/q2

(1)

almost surely as n . This is known as the limiting allocation proportion and is


an important feature of any randomisation design. The RPW rule can only target
this specific allocation proportion, which we will refer to as urn allocation. In practice, it is often desired that a given randomisation rule can target different allocation
proportions. For example, Rosenberger et al. (2001) proposed an alternative allocation proportion under which the number of treatment failure would be minimised.
Thus, a randomisation rule that can target any given allocation proportion is highly
desired and such rules will be introduced in Sections 2.2.3, 2.4 and 2.5.
Next, let j be the treatment assignment of the j th patient where j = 1, . . . , n.
4

140293481
Atkinson and Biswas (2013) give the probability that the (j +1)th patient is assigned
to treatment i = 1 as
P (j+1 = 1) = 1/2 + dj+1
where

dj+1 =

j(p1 p1 ) (p1 + p1 1) X
+
dk
2(2 + j)
2 + j
k=1

and = /. Then clearly the assignment depends on rather than or alone.


This means that the rules RP W (3, 3) and RP W (1, 1) are equivalent and only the
parameter needs to be chosen.
For fixed pi and a small value of (i.e. > ) we have a small j+1 and thus
P (j+1 = 1) is close to 1/2. As we make even smaller, then P (j+1 = 1) 1/2.
For the same values of pi and a large , j+1 is further away from zero, depending
on which treatment is more successful and so P (j+1 = 1) tends away from a half.
This means that this rule will tend to assign more patients to the superior treatment
when is small but it is much more predictable.
The ECMO trial conducted by Bartlett et al. (1985) which used the RPW(1,1)
rule serves as a good example of why a proper choice of is important. When
this trial was concluded, 12 patients received the ECMO treatment, while only
one patient was assigned to the control group. The researchers concluded that the
reason for this imbalance was the bad choice of the initial urn composition and that
an RPW(3,1) or RPW(2,1) would be a more suitable choice. Such designs have a
larger and so would increase the probability of assigning patients to the inferior
treatment earlier on in the trial, resulting in more balanced treatments.
Matthews and Rosenberger (1997) derived the exact variance of the allocation
proportion of the RPW rule. The form is quite complicated, requiring at least half
a page and thus it is not given here. Their result shows that if = p1 q2 > 1/2
then the variance of the allocation proportion is unbounded and thus the allocation
depends on the initial urn composition. However, in practice such trials with high
success probabilities on both treatments are very rare.
When the asymptotic variance is bounded, that is p1 q2 < 1/2, Smythe and
Rosenberger (1995) demonstrated that
n1/2

q2
N1 (n)

n
q1 + q2

where
v=

!
N(0, v),

q1 q2 (5 2(q1 + q2 ))
.
(2(q1 + q2 ) 1)(q1 + q2 )2

Hu et al. (2006) then showed that the lower bound on this asymptotic variance is
not obtained for the RPW rule. This is an undesirable result as a reduced variability
of a randomisation rule is directly correlated to a gain in statistical power of the
5

140293481

(p1 , p2 )

(0.8,0.8)
(0.8,0.6)
(0.8,0.4)
(0.8,0.2)
(0.6,0.6)
(0.6,0.4)
(0.6,0.2)
(0.4,0.4)
(0.4,0.2)
(0.2,0.2)

100
100
100
100
100
100
100
100
100
100

Allocation Proportion (Standard Deviation)


RPW(1,1) RPW(3,1) RPW(5,1) RPW(10,1)
=1
= 1/3
= 1/5
= 1/10
0.50(0.16) 0.50(0.13) 0.50(0.11) 0.50(0.09)
0.63(0.12) 0.61(0.10) 0.60(0.09) 0.58(0.08)
0.72(0.09) 0.69(0.08) 0.67(0.07) 0.64(0.07)
0.77(0.06) 0.75(0.06) 0.73(0.06) 0.69(0.06)
0.50(0.10) 0.50(0.09) 0.50(0.08) 0.50(0.07)
0.59(0.08) 0.58(0.07) 0.58(0.07) 0.56(0.06)
0.66(0.06) 0.64(0.06) 0.63(0.06) 0.62(0.05)
0.50(0.06) 0.50(0.06) 0.50(0.06) 0.50(0.06)
0.57(0.05) 0.56(0.05) 0.56(0.05) 0.55(0.05)
0.50(0.05) 0.50(0.04) 0.50(0.04) 0.50(0.04)

Table 1: Allocation proportion of the RPW rule for different choices of the initial
urn composition. This simulation used 5, 000 replications.
design, as has been shown by Melfi and Page (1998) and Hu and Rosenberger (2003).
Thus, designs that obtain the lower bound on the asymptotic variance are highly
regarded.
We now investigate some choices of the initial urn composition under different pi
through a simulation. For each choice of pi , the n = 100 patients were assigned using
the corresponding RPW rule. The results are given in Table 1. We investigated four
choices of parameters, with the corresponding values reported. The table shows
the average allocation to treatment i = 1 over 5, 000 replications with the standard
deviation of this allocation proportion given in the brackets. The allocation to i = 2
is not given as it can be obtained by subtraction the allocation to i = 1 from one. It
can be seen that when p1 = p2 the allocation proportion is equal for all choices of the
parameters. However, the corresponding standard deviation is smaller for models
with smaller (i.e. higher ). When p1 6= p2 the RPW rule assigns more patients
to the superior treatment which is always the first treatment in the table. The
proportion of patients assigned to the superior treatment increases as the difference
between the two treatments grows, leading to an ethical advantage. It can also be
noticed that the choice of can have an effect on the allocation proportion when
p1 6= p2 . For higher values of we allocate more patients to the superior treatment
but this has a trade-off as the urn models with lower have a smaller standard
deviation. Due to the earlier mentioned relation between variance and power, the
models with smaller may prove to be less powerful. The code written in the
programming language R used for the RPW rule can be seen in Section A.1.
Overall, the RPW rule exhibits many undesirable properties meaning that a
practical application is usually troublesome. Despite this, much work has been done
on the topic. Bandyopadhyay and Biswas (1999) extend the rule to incorporate
covariates, while Biswas (1999) studies the rule under delayed responses. Such
6

140293481

extensions are not studied here and we focus on other designs with more desirable
properties.

2.2
2.2.1

Drop-the-loser (DL) rule


K 2 treatments design

Ivanova (2003) proposed the following urn model. Consider a clinical trial of K
treatments with instantaneous, binary responses. Then, we start with an urn containing K + 1 types of balls. The ball types i = 1 . . . K correspond to the K
treatments while the balls of type 0 are the so-called immigration balls. The initial
urn composition in given by Z0 = {Z0,0 , . . . , ZK,0 } while after draws it is given by
Z = {Z0, , . . . , ZK, }. When a patient is ready to be randomised, a ball is drawn
from the urn. If the ball is an immigration ball (of type 0) then no treatment is
assigned and the ball is replaced together with one ball for each of the K treatments.
This process is repeated until a treatment ball (of type i) is drawn and then the
patient is assigned the corresponding treatment. The response of the treatment is
observed. If it is a success, the ball is replaced and the urn composition is unchanged
and so we let Z+1 = Z . However, if the response is a failure, then the ball is not
replaced. The urn composition becomes Zi,+1 = Zi, 1 and Zj,+1 = Zj, , j 6= i.
The process continues until a suitable stopping rule has been triggered, such as all
patients available have been assigned. The inclusion of the immigration balls is an
important feature of the DL rule as it allows a treatment to not die out even if it
has a very small success probability.
The design proposed above is a discrete time process. The technique of embedding an urn model in a continuous time birth and death proposed by Ivanova
and Flournoy (2001) was used by Ivanova (2003) to obtained some useful statistical
properties of this design. The limiting allocation to treatment i = 1, . . . , K is given
by
1/qi
Ni (n)

(2)
n
1/q1 + + 1/qK
almost surely as n . Note that when K = 2 this limiting proportion is equal
to the allocation of the RPW rule given in (1). Thus, so far we have two RAR
procedures that both target urn allocation.
Ivanova (2003) showed that
n1/2

N1 (n)
q2

n
q1 + q2

where
v=

!
N(0, v),

q1 q2 (p1 + p2 )
.
(q1 + q2 )3

Hu et al. (2006) then demonstrated that the DL attains the lower bound on this
7

140293481

asymptotic variance, unlike the RPW rule. Thus, it could be said that although
both the rules so far target the same allocation proportion, the DL has a theoretical
advantage as it is able to obtain the minimum variance. In fact the overall variability
of this rule is known to be lower than that of the RPW rule, as shown by Ivanova
(2003) and Hu and Rosenberger (2003).
Section A.2 reports an R program that was used to simulate the DL rule.
2.2.2

Allowing delayed responses

Zhang et al. (2007) extended the DL rule in order to study delayed responses. Such
a design is called the generalised drop-the-loser (GDL) rule. Sun et al. (2007) then
extended this rule to K > 2 treatments and here we will deal with this extension.
Similarly to the DL rule, we start with K + 1 type balls. Balls of type 0 are
immigration balls and balls of type i = 1, . . . , K are balls corresponding to the
treatments. The initial urn composition is given by Z0 = {Z0,0 , . . . , Z0,K } while the
urn composition after draws have been made is given by Z = {Z,0 , . . . , Z,K }.
+
+
+
We then let Z,k
= max(0, Z,k ), k = 0, . . . , K and Z ={Z,0
, . . . , Z,K
}. This step
is required as we now allow the urn to have fractional or a negative number of balls.
Then, the probability of selecting a ball of type i is
+
Z,i

PK

+
c=0 Z,c

If the ball selected is of type 0 (i.e. an immigration ball) then no treatment is assigned and the ball is returned to the urn together with ai , i = 1, . . . , K treatment
type balls. If a treatment type ball is drawn then the subject is assigned the corresponding treatment i. Since we wish to allow delayed responses, the ball is not
replaced immediately. Instead we continue to allocate treatments until a response
is available. Once we obtain the response, we alter the urn by adding D,i > 0 balls
to the urn if the treatment was a success, leaving the urn unchanged otherwise. We
continue until a suitable stopping criteria has been reached. This design reduces to
the DL urn when ai = 1 and D,i = 0.
2.2.3

Targetting an alternative allocation proportion

By embedding the GDL rule in a continuous time process, Zhang et al. (2007) have
shown that the asymptotic allocation proportion of the GDL rule is given by
ai /qi
Ni (n)

n
a1 /q1 + + aK /qK

(3)

almost surely as n . It can be seen that if ai = 1, i = 1, . . . , K then the


asymptotic allocation is equivalent to urn allocation, seen in (2).

140293481

By carefully choosing the value of ai , we can alter (3) so that the rule targets a
different allocation proportion. For example, for K treatments with binary probabilities pi the allocation proportion of

pi
Ni (n)

, i = 1, . . . , K
n
p1 + + p K

(4)

is of a particular interest. Rosenberger et al. (2001) have shown that this allocation
proportion minimises the expected number of treatment failures, assuming a fixed
variance of the estimator for the treatment difference. We will refer to this allocation
proportion as RSIHR allocation, named after the initials of the authors. We can
target this allocation by letting

pi
, i = 1, . . . , K
ak = C
p1 + + pK

(5)

where C is a constant and pi is the current estimate of pi based on the responses so


far. Also let D,i = 0 so that the balls will only be added when an immigration ball
is selected. This urn skews the allocation by adding more balls corresponding to
the superior treatment when an immigration ball is chosen. Simulations performed
by Zhang et al. (2007) have shown that there is no significant difference among the
different choices of C they investigated so we usually let C = 2.
In practice, we can obtain the estimate pi in (5) by
pi =

(number of observed successes on treatment i ) + 1


.
(total number of observed outcomes on treatment i ) + 2

(6)

We now compare the GDL rule targeting urn allocation proportion (equivalent
to DL rule) to the GDL rule targeting RSIHR allocation. We also investigate two
choices of initial urn composition. For each combination of target allocation, initial
urn composition and pi we run the rule 5, 000 times. The mean allocation proportion
to treatment i = 1 was obtained as well as the standard deviation of this allocation.
The results of this simulation are given in Table 2 with the initial urn allocation
given in the brackets in the column name. Section A.3 reports an R program that
was used to simulate the GDL rule.
When p1 = p2 all the rules seem to have a similar mean allocation proportion
but the standard deviation is lower for the GDL rules targetting RSIHR allocation.
When p1 6= p2 the GDL rule targeting RSIHR on average assigns less patients to the
superior treatment than the standard DL rule. However, the rule targeting RSIHR
is much less variable which in turn leads to an increase in power. There is little
difference between initial urn compositions in terms of the variability.

140293481

(p1 , p2 )

(0.8,0.8)
(0.8,0.6)
(0.8,0.4)
(0.8,0.2)
(0.6,0.6)
(0.6,0.4)
(0.6,0.2)
(0.4,0.4)
(0.4,0.2)
(0.2,0.2)

100
100
100
100
100
100
100
100
100
100

Allocation Proportion
GDL(2,2,2) GDL(5,5,5)
0.50(0.06)
0.50(0.05)
0.60(0.05)
0.57(0.05)
0.68(0.04)
0.63(0.04)
0.73(0.03)
0.68(0.03)
0.50(0.05)
0.50(0.04)
0.58(0.04)
0.56(0.04)
0.64(0.04)
0.62(0.03)
0.50(0.04)
0.50(0.04)
0.56(0.03)
0.56(0.03)
0.50(0.03)
0.50(0.03)

(Standard Deviation)
GDL(2,2,2) GDL(5,5,5)
0.50(0.02)
0.50(0.02)
0.53(0.02)
0.53(0.02)
0.58(0.03)
0.57(0.03)
0.64(0.04)
0.63(0.04)
0.50(0.03)
0.50(0.03)
0.54(0.03)
0.54(0.03)
0.61(0.04)
0.60(0.04)
0.50(0.04)
0.50(0.04)
0.57(0.04)
0.56(0.04)
0.50(0.05)
0.50(0.05)

Table 2: Allocation proportion of the GDL rules for different choices of the initial
urn composition and target allocation. This simulation used 5, 000 replications.
2.2.4

Incorporating covariates

Bandyopadhyay et al. (2009) proposed an extension of the DL rule to allow incorporating covariates within treatments. For each sequentially entering patient
j = 1, . . . , n the level of the covariate Uj {0, . . . , G} is obtained. We use 0 for
the most favourable condition and G for the least favourable one. For example,
if Uj is the initial size of a tumour, then a lower category represents a favourable
condition to treat i.e. a smaller tumour. Let 0 < 1 < < G with G = 1 be a
set of probabilities representing the probability of success under the corresponding
grade Uj . In practice, the probabilities k , k = 0, . . . , G may be unknown and so
k+1
or another suitable function to obtain an estimate of the success
we may use G+1
probabilities.
Similarly as before, we start the urn with the composition Z0 = {Z0,0 , . . . , Z0,K }.
We draw a ball from the urn. If the ball drawn is of type 0 then no treatment is
assigned and we return the ball together with K balls, one ball for each treatment. If
a treatment ball is drawn, then the patient is assigned the corresponding treatment.
We note the grade k of this patient and the response is observed. If the response is a
success, then we replace the ball with the probability k . Otherwise, if the response
is a failure then we replace the ball with the probability 1 Gk .
However, treatments are likely to have different success probabilities under different covariate grades. That is, a given treatment is more likely to be successful when
the corresponding patient has the grade 0 than when the grade is G. Therefore, we
determine the success probability of a treatment by
P (Zj = 1|i, k) = aUj pi,j
where Zj is the success or failure for the j th patient, pi,j is the success probability
for the j th receiving treatment i and a (0, 1) is the so-called prognostic factor
10

140293481

index that can be estimated. Note that this definition is an extension of the one
originally proposed by Bandyopadhyay et al. (2009) to any number of treatments as
the authors only considered a trial with K = 2 treatments.
The DLC design has some strong disadvantages. Defining the number of grades
G can be troublesome. Using the tumour size example above, we may be able to
define G grades that tumours fall into, depending on their size. However, it might
also be possible to make the grades boundaries smaller, increasing the number of
grades. Grades of equal length might also not always be ideal. The number of
grades is likely to have effect on the allocation proportion and so needs to be chosen
carefully.
Similarly, not all covariates can be split into grades. Clinical trials are often
balanced by the institution. In such a case, we might be unable to rank institutions
in terms of grades. Even if such a grading was possible, then the grades are likely
to have very similar probabilities k . Finally, the DLC is not able to incorporate
multiple covariates.
Section A.4 reports an R program that can be used to simulate the DLC rule.
In the next section we define a randomisation procedure that allows a much more
flexible incorporation of covariates.

2.3
2.3.1

Odds-ratio based designs (ORBD)


Incorporating covariates

Rosenberger et al. (2001) proposed the following way to allow covariate balance
within treatments. Considering two treatments i = 1, 2, let Tj be the treatment
indicator (Tj = 1 if treatment is 1 and Tj = 0 if treatment is 2) for the j th patient
with j = 1, . . . , n and let zj be the covariate information for the given patient. We
then define the standard logistic regression model
logit(pj ) = + Tj + zj0 + Ti zi0

(7)

where pj is the probability of success for the j th patient, is the global mean,
is the treatment main effect, is the vector of covariate main effects and is the
vector of treatment-covariate interactions. Throughout this dissertation we used a
generalised linear model (GLM) to fit this regression model with a logit link function.
It is also possible to consider fitting (7) using a GLM with a probit link function or
another suitable choice.
The design works in the following way. Patients are assigned using another
randomisation scheme (e.g. block randomisation) until the regression equation in
(7) is obtainable using the data for all patients so far i.e. all possible maximum
likelihood estimates are available. Then, the covariate-adjusted odds ratio is given
by = exp( + zj+1 ) where is the current estimate of , is the current estimate
11

140293481
of and zj+1 is the covariate information for the (j + 1)th patient. Since (0, ),
we need to transform this function to (0, 1) in order to represent a probability. We
use the transformation f () = 1/(1 + ). We thus assign patients to treatment i = 1
with the probability
1
.

1 + exp( + z 0 )
j+1

This design has many advantages over the DLC design seen in Section 2.2.4. Firstly,
we no longer need to order the covariates into grades as the logistic model allows us
to have covariates that are continuous. We can also have categorical or binary covariates. The logistic model in (7) can be extended to incorporate multiple covariates
as well as interactions between them.
Rosenberger et al. (2001) obtained the limiting allocation to treatment 1 to be
N1 (n)
1
.

n
1 + exp( + z00 )

(8)

where z00 is a fixed vector of covariates. Unfortunately, the ORBD procedure is only
able to target this allocation proportion.
It is worth noting that the logistic model in (7) can be modified to not take the
covariates into account. Then, we instead build the regression model
logit(pj ) = + Tj
and assign patients to treatment i = 1 with the probability
1

1 + exp()
2.3.2

K > 2 treatments design

Atkinson and Biswas (2013) consider the extension of the two treatment ORBD
model without covariates to three treatments. Let i = 1, 2, 3 be the treatments
with the respective unknown success and failure probabilities pi and qi = 1 pi .
Similarly as before, we assign patient using another randomisation scheme to one
of the i = 1, 2, 3 treatments until the logistic model can be estimated. Once the
logistic model
logit(pj ) = + Tj
can be built on the data for all patients so far, we assign patients to treatment i
with the respective probabilities:
1
1 + exp(2 + 3 )

exp(2 )
,
1 + exp(2 + 3 )

exp(3 )
,
1 + exp(2 + 3 )

where 2 and 3 are the estimates of from the logistic model. We observe the
12

140293481

response of the patient to the treatment and update the logistic model accordingly.
We may also extend this K = 3 design to incorporate covariates. Such an
extension is not reported in the literature. That is, we now use the same logistic
regression model as in (7) with all terms defined as previously. We assign patients
using complete randomisation until the regression model is estimable. We then
assign patients to treatment i with the probabilities
1
0

)
1 + exp(2 + 3 + zj+1

exp(2 )
1 + exp(2 + 3 + z 0

j+1 )

exp(3 )
1 + exp(2 + 3 + z 0

j+1 )

We can further extend the model above to clinical trials with K > 3 treatments in
a similar way.
Thus, the ORBD gives a randomisation scheme with much flexibility. We are
able to incorporate multiple covariates of different types (e.g. continuous, categorical, binary) and even gives us a possibility of including interactions between
covariates. Section A.2 reports an R program that was used to simulate the ORBD
rule incorporating covariates.
The ORBD is also able to incorporate delayed responses. In such a case, the
logistic model uses the data for all patients so far and the model is updated whenever
a response is obtained.
However, there are also some drawbacks of the ORBD. Firstly, we are only able to
target one allocation proportion, namely (8). Also, the design is relatively variable
when compared to the RPW and DL rules. This is mainly caused by the use of
the logistic model and the variability associated with each estimate in the model.
Because a new model is built for each patient, the variances for all these estimates
add up and this results in a highly variable procedure overall.

2.4
2.4.1

Doubly adaptive biased coin design (DBCD)


K = 2 treatments design

With the exception of the GDL rule, all designs so far have been able to target
only one allocation proportion. Eisele (1994) and Eisele and Woodroofe (1995)
introduced the doubly adaptive biased-coin design (DBCD) which overcomes this
problem by allowing the target allocation proportion y(p1 , p2 ) to be specified.
The design heavily relies on the function g(x, y) from [0, 1] to [0, 1]2 . This function maps the current allocation proportion to the target allocation proportion.
Selection of g is often problematic due to the very restrictive rules it must follow,
as defined by Eisele (1994). In fact, Melfi et al. (2001) pointed out that the original
choice of g violates one of these rules. Hu and Zhang (2004) propose a more relaxed
set of conditions:
g is jointly continuous
13

140293481

g(x, x) = x
g(x, y) is strictly decreasing in x and strictly increasing in y
g has bounded derivatives in x and y.
Hu and Zhang (2004) chose g(x, y) to be

g(x, y) =

if x = 0,

y(y/x)

y(y/x) +(1y)((1y)/(1x))

if 0 < x < 1,
if x = 1,

where 0 is a parameter to be chosen which controls the randomness of the


procedure. When = 0, g(x, y) = y the design becomes the adaptive random
design. This design proposed by Rosenberger et al. (2001) assigns a patient to
treatment i with the probability equal to the current estimate of pi and has high
variability. As we increase , we obtain a design that is less variable, but is more
deterministic, meaning selection bias might be a problem. When = , we obtain
a design that has the variance minimised, but is completely predictable. Thus, in
practice needs to be carefully chosen between these two extremes.
Once g(x, y) has been chosen, the design is as follows. We start by assigning n0
patients to each treatment. This can be done using any type of randomisation, but
throughout this dissertation we will use complete randomisation. When m = 2n0
patients have been randomised, for patient j = m, . . . , n we obtain the current
estimate of pi , i = 1, 2. We may for example use (6), as was used for the GDL. We
call this estimate pi and we also obtain the estimate of qi as qi = 1 pi .
We then assign the (j + 1)th patient to treatment i = 1 with the probability
Ni (j)
, y(
p1 , p2 )
g
j

where y(
p1 , p2 ) is the current estimate of the allocation proportion using pi , Ni (j) is
the current number of patients assigned to treatment i and j is the number of the
patient being assigned, as defined previously. For example, if we wish to target urn
allocation given in (1), we let
y(p1 , p2 ) =

1/(1 p1 )
.
1/(1 p1 ) + 1/(1 p2 )

The rule works by skewing the probability of assignment towards the treatment
that is the furthest away from its target allocation. We continue to assign patients
until all patients have been assigned or until a suitable stopping rule has been
triggered.

14

140293481

We now compare the DBCD rule under different target allocations and different
choices of . We considered = 2 and = 4 and the choice of this parameter
is given as DBCD(). We compare urn allocation in (1) to RSIHR allocation in
(4). For all of the rules we set n0 = 2. We obtained the allocation proportion to
i = 1 and its standard deviation using a similar simulation as was used for the
RPW and GDL rules and the results are shown in Table 3. In general, there is no
significant difference between the allocation proportions when p1 = p2 . However,
the allocation proportion is much more variable for the DBCD rule targeting urn
allocation. When p1 6= p2 , DBCD targeting urn allocation assigns more patients
to the superior treatment than DBCD targeting RSIHR allocation. This has a
trade-off, as the rules targeting urn allocation have a higher standard deviation,
which may translate to a loss in power. This inverse correlation between allocation
proportion and the variance has been seen in all simulations performed so far and
will be fully investigated Chapter 3. This simulation has also shown that there is no
significant difference between rules with = 2 and = 4 in terms of mean allocation
proportion but the standard deviation of the allocation proportion is slightly lower
when = 4.
Section A.6 reports an R program that was used to simulate the DBCD rule.
Hu and Rosenberger (2003) demonstrated that for the DBCD rule
n1/2
where
v=

!
N1 (n)
y(p1 , p2 ) N(0, v),
n

q1 q2 ((1 + 2)(p1 + p2 ) + 2)
.
(1 + 2)(q1 + q2 )3

Hu et al. (2006) then showed that the DBCD rule does not obtain the lower bound on
this asymptotic variance. Thus, we could say that the DL has a theoretical advantage
over the DBCD targeting urn allocation as it is able to obtain its minimum variance.
2.4.2

K > 2 treatments design

The DBCD design can be generalised in order to allow K > 2, i = 1, . . . , K treatments, as demonstrated by Hu and Zhang (2004). Let v = {v1 , . . . , vK } be the
vector of current allocation proportions for each treatment out of the j patients randomised so far. We define g(x, y) = {g1 (x, y), . . . , gK (x, y)} with sum{x} = 1 and
sum{y} = 1 to be a vector of functions from [0, 1] [0, 1]K with the conditions:
g(v, v) = v and g(x, y) g(x, v) 0 as y v.
For every i
gi (x, v) gi (v, v)
0
xi vi
15

for all

xi > vi

140293481

(p1 , p2 )

(0.8,0.8)
(0.8,0.6)
(0.8,0.4)
(0.8,0.2)
(0.6,0.6)
(0.6,0.4)
(0.6,0.2)
(0.4,0.4)
(0.4,0.2)
(0.2,0.2)

100
100
100
100
100
100
100
100
100
100

Allocation Proportion
DBCD(2) DBCD(4)
Target: Urn
0.50(0.10) 0.50(0.10)
0.66(0.08) 0.66(0.07)
0.74(0.06) 0.74(0.05)
0.79(0.05) 0.79(0.04)
0.50(0.07) 0.50(0.06)
0.60(0.05) 0.60(0.05)
0.66(0.04) 0.66(0.04)
0.50(0.05) 0.51(0.04)
0.57(0.04) 0.57(0.04)
0.50(0.03) 0.50(0.03)

(Standard Deviation)
DBCD(2) DBCD(4)
Target: RSIHR
0.50(0.03) 0.50(0.02)
0.54(0.03) 0.54(0.02)
0.59(0.04) 0.59(0.03)
0.67(0.04) 0.67(0.04)
0.50(0.03) 0.50(0.03)
0.56(0.04) 0.56(0.03)
0.64(0.05) 0.64(0.04)
0.50(0.04) 0.50(0.04)
0.59(0.05) 0.59(0.05)
0.50(0.06) 0.51(0.06)

Table 3: Allocation proportion of the DBCD rule for different choices of the target
allocation and with n0 = 5. This simulation used 5, 000 replications.
where 0 0 < 1 is a constant.
g(x, y) is strictly decreasing in x and strictly increasing in y.
g(x, y) has bounded derivatives in x and y.
Hu and Zhang (2004) propose gi (x, y) to be
(yi (yi /xi ) )L
,
gi (x, y) = PK
L
c=1 (yc (yc /xc ) )

i = 1, . . . , K

where 0 and L > 1 are parameters to be chosen. The purpose of is similar as


in the K = 2 case, while L is a constant that has a reduced influence on g(x, y) for
large values.
The allocation of sequential patients is also similar to the K = 2 rule. We
start by allocating n0 patients to each treatment using another form of randomisation. For example, when using complete randomisation we would assign patients
randomly to treatment i with the probability 1/K. Once m = Kn0 patients have
been randomised, we obtain the current estimate of pi using (6) and we assign
(m + 1)th patient to treatment i with the probability gi (Ni (m)/m, y(
p1 , . . . , pK ))
where y(
p1 , . . . , pK )) is the target allocation proportion using the current estimates
pi for treatment i. That is, if we wish to target the urn allocation we can use:
1/(1 pi )
y(p1 , . . . , pK ) = PK
.
c=1 1/(1 pc )
We update the estimates pi and the procedure continues as above until all patients
have been assigned or a suitable stopping rule has been triggered.

16

140293481

2.4.3

Incorporating covariates

Baldi Antognini and Zagoraiou (2012) suggest an extension of the DBCD to include
covariate information, called the reinforced doubly adaptive biased coin design (RDBCD). We start by defining the function g(x, y, z) with the properties:
g is decreasing in x and increasing in y for any z (0, 1)
g(x, x, z) = x for any z (0, 1)
g is decreasing in z if x < y and increasing in z if x > y
g(x, y, z) = 1 g(1 x, 1 y, z) for any z (0, 1)
Baldi Antognini and Zagoraiou (2012) suggest
g(x, y, z) =

y(y/x)z
y(y/x)z + (1 y)[(1 y)/(1 x)]z

as a suitable choice of g(x, y, z) with > 0 having a similar role as before. Note that
z in the above function corresponds to the covariate information for the patient we
wish to randomise. Due to the properties of this function, namely z (0, 1) we also
need to transform the covariates so that they are also in this range. A transformation
might also be needed such that high values of z will correspond to a higher value of
g than when z is small. We denote such a transformation by H(z).
The workings of this rule are similar to the K = 2 version of the DBCD rule.
We start by assigning n0 patients to each treatment using another randomisation
method. Since we wish to balance covariates, a covariate-adaptive rule might be
suitable but throughout this dissertation we use complete randomisation. Once
m = 2n0 patients have been assigned, we obtain the estimates pi for each treatment
using (6). Given (m + 1)th patient with covariate information zm+1 , we assign this
patient to treatment i with the probability
!

Ni (m)
, y(
p1 , p2 ), zm+1 .
m

We then update pi and assign the next patient using the same method. The rule
continues until all patients have been assigned or a suitable stopping rule has been
triggered.
The RDBCD has many disadvantages. Firstly, it is only able to deal with a
single covariate. The covariates also need to be defined in such a way that low z is
favourable to treat, as this produces a larger g(x, y, z) value.
The RDBCD procedure suffers from a similar problem as DLC rule. That is, it
requires the covariate to be defined in such a way that a certain value is favourable
to treat when compared to another one. This may often not be the case in practice.
17

140293481

For example, in many trials involving a number of institutions, we wish to balance


the number of patients treated in each institution. This is done in order to reduce
the effect of institution on the trial. However, the RDBCD is not able to balance
such a covariate as in practice there is often no way of favouring one institution over
the other.
Finally, the RDBCD only allows K = 2 treatments. Although an extension to
K > 2 might be possible, it is not discussed here. It is also worth noting that
Zhang and Hu (2009) obtained an alternative method of extending the DBCD to
incorporate covariates with K > 2 treatments. Section A.7 reports an R program
that can be used to simulate the RDBCD.

2.5
2.5.1

Efficient randomised adaptive design (ERADE)


Rule definition

Recall that the DL rule has been the only rule that is able to obtain the lower bound
on its asymptotic variance. The RPW and DBCD rules are not able to obtain it,
whilst there is no literature on whether the ORBD and GDL obtain their respective
lower bounds. Although the DL attains this lower bound, it has some disadvantages
such as only being able to target urn allocation. We now define a randomisation
procedure that is able to obtain the lower bound on its asymptotic variance and is
able to target any given allocation proportion.
Hu et al. (2009) proposed the following randomisation procedure. We start by
assigning n0 patients to each treatment, similarly as for the DBCD. Once m = 2n0
patients have been assigned and their responses observed, we obtain the estimates
pi of pi for each treatment i = 1, 2 using (6). We then obtain the value of the target
allocation proportion using these estimates, that is y(
p1 , p2 ). For example, if we
wish to target urn allocation we use the function
y(p1 , p2 ) =

1/(1 pi )
,
1/(1 p1 ) + 1/(1 p2 )

similarly as for the DBCD rule. We then assign the (m + 1)th patient to treatment
i with the probability

y(
p1 , p2 )

if

y(
p1 , p2 )
if

1 + y(
p1 , p2 ) if

Ni (m)
m
Ni (m)
m
Ni (m)
m

> y(
p1 , p2 ),
= y(
p1 , p2 ),
< y(
p1 , p2 ),

where 0 < 1 is a constant that reflects the degree of randomisation. We continue


until a suitable stopping criteria.
Zhang and Hu (2009) considers an extension of the ERADE to incorporate covariate information whilst Zhang et al. (2014) extends it to K > 3 treatments. Such
18

140293481

extensions will not be considered here and so we will only consider it for K = 2
trials.
Section A.8 reports an R program that can be used to execute the ERADE rule.

3
3.1

Comparing designs
Introduction

We start by introducing ways in which different designs can be compared. A RAR


design aims to assign more patients to the superior treatment and therefore a design
with a higher allocation proportion to this treatment will be favourable due to an
ethical advantage. We mentioned previously that Melfi and Page (1998) showed that
the power of a design is a decreasing function of the variance of the allocation proportion. Thus, we can assume that a design with less variable allocation proportion
will also be more powerful. Thus, to compare RAR designs we will consider (i) allocation proportion (ii) variability of the allocation proportion (iii) failure proportion
(iv) power and significance level. In addition, we mentioned that the rules that
are able to target different allocation proportions are also favourable as they give
us more flexibility. These criteria are the standard in the literature for comparing
RAR designs and have been first proposed by Hu and Rosenberger (2003).
Given i = 1, 2 treatments with probabilities of success pi and probabilities of
failure qi = 1 pi , the power of a design is maximised by assigning patients to
treatment i = 1 with the proportion

pi qi
Ni (n)
=
.

n
p1 q1 + p 2 q2

(9)

This allocation proportion is known as Neyman allocation and the closer a design
is to this allocation, the higher the power in general. Unfortunately, using the
Neyman allocation has an ethical disadvantage as it assigns more patients to the
inferior treatment. The Neyman allocation is the reason for the inverse correlation
between allocation proportion and its variance we have seen in the simulations so far;
more patient assigned to the superior treatment resulted in higher variance which
in turn meant lower power. Thus, we can say that a suitable RAR design should be
balanced between maintaining suitable power and having an ethically advantageous
allocation proportion.
The main reason why we wish to assign more patients to the superior treatments
is to lower the number of treatment failures. Thus, for a design that assigns more
patients to the superior treatment, we expect a lower proportion of treatment failures. Due to (9) we also expect a design that has a lower failure proportion to have
lower power. Once again, we wish to balance the ethical advantages of a design with
its statistical properties.
19

140293481

Finally, by defining a suitable test, we will be able to approximate the power of


a design. In general, we will fix n such that the power of complete randomisation
under given pi is roughly 0.90. We can then study power of different RAR designs
under various assumptions such as delayed responses and covariates.

3.2
3.2.1

K = 2 treatments
Allocation proportion

We start by investigating the simplest clinical trial: K = 2 treatments design with


instantaneous responses and no covariate information. We will consider the designs:
Complete Randomisation (CR) We include complete randomisation for comparison purposes.
RPW Results in Table 1 show that the urn is highly dependant on the choice of .
High is highly variable but assigns more patients to the superior treatment.
On the other hand, low assigns less patients to the superior treatment but
has a higher variance. We choose RP W (5, 1) as a sensible value between these
two extremes and this is also the choice suggested by Rosenberger (1999).
DL Table 2 has shown us that the DL rule with lower number of balls initially are
more variable but assign more patients to the better treatment. We choose
the initial urn composition Z0 = {3, 3, 3} as suggested by Ivanova (2003).
ORBD We use the K = 2 logistic regression model mentioned in 2.3.1.
DBCD We consider the DBCD targeting the urn allocation. In Table 3 we saw
that for this allocation increasing results in lower variability of the allocation
and lower allocation to the superior treatment. We choose = 2, as suggested
by Hu and Rosenberger (2003). We also let n0 = 2.
ERADE We use ERADE targeting urn allocation, with = 0.7, as suggested by
Hu et al. (2009)
It is worth noting that the suggested RPW, DL, DBCD and ERADE designs target
urn allocation, while the ORBD targets the allocation (8). Throughout all simulations performed from now on, we will use the above parameters, unless specified
otherwise.
We now perform a simulation to compare the allocation proportions (AP) and
failure proportion (FP) and their respective standard deviations (SD) of the rules
mentioned above. Each rule was simulated 5, 000 times under each choice of pi until
n = 100 patients have been assigned to a treatment. We then work out the average
allocation to treatment i and the standard deviation across all 5, 000 simulations of
the rule. We also note the number of treatments failures for each run and also obtain
20

140293481

(p1 , p2 )

(0.8,0.8)
(0.8,0.6)
(0.8,0.4)
(0.8,0.2)
(0.6,0.6)
(0.6,0.4)
(0.6,0.2)
(0.4,0.4)
(0.4,0.2)
(0.2,0.2)

100
100
100
100
100
100
100
100
100
100

(p1 , p2 )

(0.8,0.8)
(0.8,0.6)
(0.8,0.4)
(0.8,0.2)
(0.6,0.6)
(0.6,0.4)
(0.6,0.2)
(0.4,0.4)
(0.4,0.2)
(0.2,0.2)

100
100
100
100
100
100
100
100
100
100

CR
AP(SD)
FP(SD)
0.50(0.05) 0.20(0.04)
0.50(0.05) 0.30(0.05)
0.50(0.05) 0.40(0.05)
0.50(0.05) 0.50(0.05)
0.50(0.05) 0.40(0.05)
0.50(0.05) 0.50(0.05)
0.50(0.05) 0.60(0.05)
0.50(0.05) 0.60(0.05)
0.50(0.05) 0.70(0.05)
0.50(0.05) 0.80(0.04)

RPW
AP(SD)
FP(SD)
0.50(0.11) 0.20(0.04)
0.60(0.09) 0.28(0.05)
0.67(0.07) 0.33(0.05)
0.73(0.06) 0.36(0.06)
0.50(0.08) 0.40(0.05)
0.58(0.07) 0.48(0.05)
0.63(0.06) 0.55(0.06)
0.50(0.06) 0.60(0.05)
0.56(0.05) 0.69(0.05)
0.50(0.04) 0.80(0.04)

DL
AP(SD)
FP(SD)
0.50(0.05) 0.20(0.04)
0.59(0.05) 0.28(0.04)
0.66(0.04) 0.34(0.05)
0.71(0.03) 0.37(0.05)
0.50(0.05) 0.40(0.05)
0.57(0.04) 0.48(0.05)
0.63(0.03) 0.55(0.05)
0.50(0.04) 0.60(0.05)
0.56(0.03) 0.69(0.05)
0.50(0.03) 0.80(0.04)

ORBD
AP(SD)
FP(SD)
0.50(0.15) 0.20(0.04)
0.67(0.13) 0.27(0.05)
0.78(0.10) 0.29(0.06)
0.84(0.06) 0.29(0.05)
0.50(0.16) 0.40(0.05)
0.66(0.13) 0.47(0.06)
0.78(0.09) 0.49(0.06)
0.50(0.15) 0.60(0.05)
0.67(0.12) 0.67(0.05)
0.50(0.14) 0.80(0.04)

DBCD
AP(SD)
FP(SD)
0.50(0.11) 0.20(0.04)
0.66(0.08) 0.27(0.05)
0.75(0.06) 0.30(0.05)
0.80(0.05) 0.32(0.06)
0.51(0.07) 0.40(0.05)
0.60(0.06) 0.48(0.05)
0.67(0.05) 0.53(0.06)
0.50(0.05) 0.60(0.05)
0.57(0.04) 0.68(0.05)
0.50(0.03) 0.80(0.04)

ERADE
AP(SD)
FP(SD)
0.51(0.10) 0.20(0.04)
0.65(0.07) 0.27(0.05)
0.73(0.05) 0.31(0.05)
0.78(0.04) 0.33(0.06)
0.51(0.06) 0.40(0.05)
0.60(0.05) 0.48(0.05)
0.66(0.04) 0.53(0.05)
0.50(0.04) 0.60(0.05)
0.57(0.03) 0.69(0.05)
0.50(0.03) 0.80(0.04)

Table 4: Comparison of allocation proportion (AP) and failure proportion (FP) for
some response-adaptive designs targeting urn allocation. The simulation used 5,000
replications.
the average and standard deviation. The results of this simulation are shown in Table
4. Similarly as before, the table only reports allocation proportion to treatment i = 1
as the allocation to treatment i = 2 can by obtained by subtraction.
We start by considering the case p1 = p2 . For all rules, the allocation proportion is roughly equal, which results in very similar failure proportions. The only
significant difference between designs is the standard deviation of the allocation
proportion. When the success probability is high, the DL displays the lowest variability, very similar to CR. As we decrease the success probability, the DL rule is
actually less variable than complete randomisation. The ORBD is the most variable
with a high standard deviation for all choices of pi . The DBCD and ERADE procedures perform similarly. Their behaviour is interesting as they are highly variable
for high pi and their variability reduces for small pi . In fact, for small pi both rules
have a lower variability than CR and very comparable one to DL. We do not see a
significant difference in the variability of the failure proportion between designs.
We now consider the case p1 6= p2 . With the exception of CR, all rules assign
more patients to the better treatment. The bigger the difference between p1 and p2 ,
the more patients are assigned to the better treatment. The ORBD design performs
21

140293481

the best in this respect, with the highest proportion assigned to the best treatment
for all choices of pi . RPW and DL seem to show very similar allocation proportions
to each other with a maximum difference of 0.02. Finally, DBCD and ERADE
have the least skewed allocation and also have a similar AP to each other. We
now compare the variability of the allocation proportion for these procedures. The
highly skewed allocation proportion of ORBD translates to a very high variability
for all choices of pi . The RPW also shows high variability and it is worth noting
that although the allocation proportion of RPW and DL were similar, the DL is
much less variable. In fact the DL is less or equally variable than CR for all choices
of pi . The DBCD and ERADE designs show a similar pattern as mentioned above,
i.e. they are highly variable for high pi and their variance decreases for lower pi ,
becoming very similar to the DL rule.
Finally, we compare the failure proportion between the designs. The ORBD
displays the lowest failure proportion out of all the rules and this is mostly caused
by the highly skewed allocation proportion. We notice that the failure proportion
for the RPW and DL is similar, which is most likely caused by the similar allocation
proportion. The DL still seems superior due to its less variable allocation proportion.
The DBCD and ERADE have a similar failure proportion to each other, which is
slightly smaller than the one for the RPW and DL rules. We can say that all the
designs succeed in a more ethical allocation as the failure proportion is always lower
for all p1 6= p2 . The standard deviation of failure proportion differs insignificantly
between the designs.
In Table 2 and Table 3 we compared designs that tackled RSIHR allocation,
namely the GDL, DBCD and ERADE rules. We now perform an investigation
comparing these rules when targeting RSIHR allocation. Recall that this allocation
proportion is of a significant importance as it minimises the expected number of
treatment failures. The rules chosen are:
GDL In Table 2 we saw that the initial urn composition has little effect on the
allocation proportion or its variance. Thus, we choose Z0 = {3, 3, 3}. We also

let ai = C pi /( p1 + p2 ), D = 0 and C = 2 as before.


DBCD We saw that when targeting RSIHR allocation, an increase in results in
higher allocation to the superior treatment but also a higher variance. We
thus choose = 2 as before.
ERADE We choose ERADE targetting RSIHR allocation with = 0.7.
We use a similar method to simulate allocation and failure proportions as was
used for the rules targeting urn allocation and the results of such an simulation are
shown in Table 5. When p1 = p2 , all the rules have a similar allocation and failure
proportions to complete randomisation. However, these rules have a much lower
standard deviation than the other designs investigated in Table 4. Interestingly,
22

140293481

(p1 , p2 )

(0.8,0.8)
(0.8,0.6)
(0.8,0.4)
(0.8,0.2)
(0.6,0.6)
(0.6,0.4)
(0.6,0.2)
(0.4,0.4)
(0.4,0.2)
(0.2,0.2)

100
100
100
100
100
100
100
100
100
100

GDL
AP(SD)
FP(SD)
0.50(0.02) 0.20(0.04)
0.53(0.02) 0.29(0.04)
0.57(0.03) 0.37(0.04)
0.64(0.04) 0.42(0.04)
0.50(0.03) 0.40(0.05)
0.54(0.03) 0.49(0.05)
0.61(0.04) 0.56(0.05)
0.50(0.04) 0.60(0.05)
0.57(0.04) 0.69(0.05)
0.50(0.05) 0.80(0.04)

DBCD
AP(SD)
FP(SD)
0.50(0.03) 0.20(0.04)
0.54(0.03) 0.29(0.04)
0.59(0.04) 0.36(0.04)
0.68(0.05) 0.39(0.04)
0.50(0.03) 0.40(0.05)
0.56(0.04) 0.49(0.05)
0.65(0.06) 0.54(0.05)
0.50(0.04) 0.60(0.05)
0.60(0.06) 0.68(0.05)
0.50(0.07) 0.80(0.04)

ERADE
AP(SD)
FP(SD)
0.50(0.02) 0.20(0.04)
0.54(0.02) 0.29(0.04)
0.59(0.03) 0.36(0.04)
0.66(0.04) 0.40(0.04)
0.50(0.02) 0.40(0.05)
0.55(0.03) 0.49(0.05)
0.63(0.04) 0.55(0.05)
0.50(0.03) 0.60(0.05)
0.58(0.05) 0.68(0.05)
0.51(0.05) 0.80(0.04)

Table 5: Comparison of allocation proportion (AP) and failure proportion (FP) for
some response-adaptive designs targeting RSIHR allocation. The simulation used
5,000 replications.
the variance of all the rules targeting RSIHR allocation seems to increase as we
decrease pi , which is the opposite of what happened for the same rules targeting urn
allocation. The failure proportion seems to be the same for all the rules.
We now consider the cases when p1 6= p2 . We can see that all the rules assign
more patients to the superior treatment, with all three rules having a very similar
allocation proportion. However, this allocation proportion is smaller than for all
the rules considered in Table 4. On the other hand, the rules targeting RSIHR
allocation have a smaller standard deviation. Amongst the three rules, ERADE and
GDL seem to have a very similar variability, with DBCD only slightly more variable.
Overall, we can say that on average rules targeting urn allocation seem to have a
more ethical allocation, whilst the rules targeting RSIHR allocation are less variable.
3.2.2

Inference, significance level and power

Consider a trial with i = 1, 2 treatments with binary responses and corresponding


probabilities of success pi and failure qi = 1 pi , as defined previously. We may
wish to test the difference between two treatments using the hypothesis
H0 : p1 = p2 ,
against the two-sided alternative
H1 : p1 6= p2 .

23

140293481

CR

RPW

DL
Urn

100
100
100
100

0.04
0.04
0.05
0.04

0.04
0.05
0.04
0.04

0.04
0.05
0.05
0.04

206
62
27
256
57
217

0.88
0.91
0.89
0.90
0.87
0.90

0.85
0.88
0.85
0.89
0.86
0.89

0.87
0.90
0.88
0.89
0.87
0.90

(p1 , p2 )

(0.8,0.8)
(0.6,0.6)
(0.4,0.4)
(0.2,0.2)
(0.8,0.6)
(0.8,0.4)
(0.8,0.2)
(0.6,0.4)
(0.6,0.2)
(0.4,0.2)

GDL ORBD DBCD


RSIHR
Urn
Significance Level
0.04
0.04
0.04
0.05
0.05
0.05
0.05
0.05
0.04
0.05
0.04
0.04
Power
0.88
0.83
0.88
0.88
0.82
0.89
0.86
0.79
0.79
0.88
0.87
0.90
0.84
0.80
0.81
0.88
0.86
0.88

DBCD ERADE
RSIHR
Urn

ERADE
RSIHR

0.04
0.05
0.05
0.05

0.04
0.05
0.04
0.04

0.05
0.04
0.04
0.05

0.88
0.91
0.89
0.90
0.87
0.90

0.88
0.90
0.84
0.90
0.83
0.88

0.88
0.91
0.89
0.90
0.87
0.90

Table 6: Simulated power and significance of various RAR designs for a clinical trial
with K = 2 treatments. The results were obtained using a simulation with 5,000
replications.
To test these hypotheses we may use the Wald test used by Hu and Rosenberger
(2003) with the test statistic given by
Z=q

p1 p2
p1 q1
n1

p2 q2
n2

where ni is the number of patients assigned to treatment i. Then, Z 2 is asymptotically chi-squared distributed with 1 degree of freedom.
Recall that the significance level is defined as the probability that we incorrectly
rejecting H0 when it is true (type I error) while statistical power is the probability
that the test correctly rejects H0 when it is false (type II error). We now simulate
significance level and power for all the designs considered in Table 4 and Table 5.
For each value of pi , we start by assigning n patients to treatments i = 1, 2 using
each of the rules. We repeat this 5, 000 times and for each repetition we obtain
the value of the test statistic Z. Then, we calculate the proportion of values of Z
that exceed 3.841 which is our critical value of the test statistic at = 0.05 level of
significance. When p1 = p2 we will obtain the significance level while when p1 6= p2
we will obtain the power. The results of such a simulation can be seen in Table 6.
To ease the analysis we choose n such that the power of complete randomisation is
roughly 0.90 whilst we keep n the same when simulating significance level. For rules
that can target multiple allocation proportions, the line below the rule name gives
the allocation proportion that a given rule targets. We also kept all the parameters
the same as previously.
We start by comparing the significance level of the designs. We see that the
significance level simulated is very close to = 0.05 that we have used as the
significance level for the test. Thus, we can say that the significance level for all
these procedures is very similar.
24

140293481

It can be seen that CR maintains the highest power for all randomisation schemes.
This is mostly likely caused due to the allocation proportion being the closest to
Neyman allocation, for which power is maximised. However, various designs maintain a very high level of power. We notice that for DBCD and ERADE targeting
RSIHR allocation the power is matched to the power of CR. The GDL targeting
RSIHR also maintains a very high level of power. The rules targeting urn allocation
perform slightly worse, with the RPW resulting in a considerable drop in power for
some pi . The ORBD has the lowest power out of all the designs.
We can see that the rules that were the least variable (GDL, DBCD and ERADE)
also seem to maintain the highest power. We have also previously noticed that the
less variable rules assign less patients to the superior treatment. Thus we can say
that there seems to be an inverse relationship between a more ethical allocation
and power. This confirms the simulations previously carried out by Melfi and Page
(1998) and Hu and Rosenberger (2003).

3.3
3.3.1

K = 3 treatments
Allocation proportion

In this section we consider designs with K = 3 treatments. We no longer consider the


RPW and ERADE designs as these have not been extended to K > 2 treatments. A
simulation was performed to investigate the allocation and failure proportions and
the results are shown in Table 7. For now we only consider the rules that target
urn allocation and ORBD that targets (8). The approach to this simulation was
similar as in the K = 2 case and we set the rule parameters as previously with the
exception of the DBCD rule which now also requires L = 1 parameter. We start
by considering the DL and DBCD rules that target the urn allocation. For each
rule, the first and second columns give the allocation proportion to the treatment
i = 1 and i = 2 respectively. The allocation proportion for i = 3 is not given as it
is simply obtained by subtracting the other two columns from one. As before, the
simulated standard deviations are given in brackets.
When p1 = p2 = p3 , all the rules seem to allocate a similar proportion to all
treatments. This also results in a very comparable failure rate for all rules. However,
the standard deviation of the allocations can differ significantly. When pi is high,
CR and DL rules perform very similarly. As pi decreases, the DL rule is less variable
than CR. The DBCD exhibits an interesting behaviour as it is more variable than
CR for high pi and then it becomes less variable when pi is low. Finally, the ORBD
shows a higher variability than all the other rules.
Now consider the case when the pi probabilities are not equal. Out of all the
rules, the ORBD has the most ethical allocation as it assigns most patients to the
superior treatments. The DBCD rule targeting urn allocation also assigns a very

25

140293481

(p1 , p2 , p3 )

(0.8,0.8,0.8)
(0.8,0.6,0.4)
(0.8,0.6,0.2)
(0.8,0.4,0.2)
(0.6,0.6,0.6)
(0.6,0.6,0.4)
(0.6,0.4,0.2)
(0.4,0.4,0.4)
(0.4,0.4,0.2)
(0.2,0.2,0.2)

100
100
100
100
100
100
100
100
100
100

(p1 , p2 , p3 )

(0.8,0.8,0.8)
(0.8,0.6,0.4)
(0.8,0.6,0.2)
(0.8,0.4,0.2)
(0.6,0.6,0.6)
(0.6,0.6,0.4)
(0.6,0.4,0.2)
(0.4,0.4,0.4)
(0.4,0.4,0.2)
(0.2,0.2,0.2)

100
100
100
100
100
100
100
100
100
100

CR
AP(SD) to i = 1, 2
0.33(0.05) 0.33(0.05)
0.33(0.05) 0.33(0.05)
0.33(0.05) 0.33(0.05)
0.33(0.05) 0.33(0.05)
0.33(0.05) 0.33(0.05)
0.33(0.05) 0.33(0.05)
0.33(0.05) 0.33(0.05)
0.33(0.05) 0.33(0.05)
0.33(0.05) 0.33(0.05)
0.33(0.05) 0.33(0.05)

FP(SD)
0.20(0.04)
0.27(0.04)
0.33(0.05)
0.40(0.05)
0.40(0.05)
0.47(0.05)
0.53(0.05)
0.60(0.05)
0.67(0.05)
0.80(0.04)

DL
AP(SD) to i = 1, 2
0.33(0.05) 0.33(0.05)
0.36(0.05) 0.36(0.05)
0.39(0.05) 0.39(0.05)
0.40(0.05) 0.40(0.05)
0.33(0.04) 0.33(0.04)
0.36(0.04) 0.36(0.04)
0.38(0.04) 0.38(0.04)
0.33(0.03) 0.33(0.03)
0.36(0.03) 0.36(0.03)
0.33(0.02) 0.33(0.02)

FP(SD)
0.20(0.04)
0.25(0.04)
0.29(0.04)
0.31(0.04)
0.40(0.05)
0.46(0.05)
0.50(0.05)
0.60(0.05)
0.66(0.05)
0.80(0.04)

ORBD
AP(SD) to i = 1, 2
0.33(0.11) 0.33(0.11)
0.39(0.12) 0.39(0.13)
0.42(0.13) 0.42(0.13)
0.43(0.13) 0.43(0.13)
0.33(0.12) 0.34(0.12)
0.39(0.13) 0.39(0.13)
0.42(0.13) 0.42(0.13)
0.33(0.12) 0.33(0.12)
0.39(0.12) 0.39(0.12)
0.33(0.10) 0.33(0.10)

FP(SD)
0.20(0.04)
0.24(0.04)
0.26(0.05)
0.28(0.05)
0.40(0.05)
0.45(0.05)
0.46(0.05)
0.60(0.05)
0.64(0.05)
0.80(0.04)

DBCD
AP(SD) to i = 1, 2
0.33(0.09) 0.34(0.09)
0.39(0.09) 0.39(0.09)
0.42(0.10) 0.42(0.10)
0.43(0.10) 0.43(0.10)
0.33(0.06) 0.33(0.06)
0.37(0.06) 0.37(0.06)
0.40(0.06) 0.39(0.06)
0.33(0.04) 0.33(0.04)
0.36(0.04) 0.36(0.04)
0.33(0.03) 0.33(0.03)

FP(SD)
0.20(0.04)
0.24(0.04)
0.26(0.05)
0.28(0.05)
0.40(0.05)
0.45(0.05)
0.49(0.05)
0.60(0.05)
0.66(0.05)
0.80(0.04)

Table 7: Comparison of allocation proportion (AP) and failure proportion (FP)


for some response-adaptive designs targeting urn allocation. The exception is the
ORBD targeting the allocation given in 8. The simulation used 5,000 replications.
favourable allocation, only slightly lower than the ORBD. Finally, the DL assigns
a slightly smaller proportion of patients to the best treatment, but this allocation
is still more desirable than that of the CR rule. The standard deviation of the
allocation proportion indicates that the DL is the least variable, always performing
either as well as or better than CR. For high pi the DBCD seems to be much more
variable than DL but when pi is low, the two rules seem to perform very similarly
in terms of variability. The ORBD is the most variable rule for all choices of pi .
The ethical allocation has a direct translation to reduction in failures with the
ORBD and DBCD having the lowest proportion of failures out of all the designs.
The DL has a slightly higher failure proportion but this is still less than that of CR.
There does not seem to be a significant difference between the rules in terms of the
standard deviation of failure proportion.
We now consider rules that can target the RSIHR allocation, namely the GDL
and DBCD rules. The simulation used the same parameters as before and the results
are shown in Table 8. When p1 = p2 = p3 , all treatments have a similar allocation
and failure proportions to CR and the results in Table 7. In terms of the variability,
both the rules perform better than all the rules in Table 7, being even less variable

26

140293481

(p1 , p2 , p3 )

(0.8,0.8,0.8)
(0.8,0.6,0.4)
(0.8,0.6,0.2)
(0.8,0.4,0.2)
(0.6,0.6,0.6)
(0.6,0.6,0.4)
(0.6,0.4,0.2)
(0.4,0.4,0.4)
(0.4,0.4,0.2)
(0.2,0.2,0.2)

100
100
100
100
100
100
100
100
100
100

GDL
AP(SD) to i = 1, 2
0.33(0.02) 0.33(0.02)
0.35(0.02) 0.35(0.02)
0.36(0.02) 0.36(0.02)
0.39(0.02) 0.38(0.02)
0.33(0.02) 0.33(0.02)
0.35(0.03) 0.35(0.03)
0.37(0.03) 0.37(0.03)
0.33(0.03) 0.33(0.03)
0.36(0.03) 0.36(0.03)
0.33(0.04) 0.33(0.04)

FP(SD)
0.20(0.04)
0.26(0.04)
0.31(0.04)
0.34(0.04)
0.40(0.05)
0.46(0.05)
0.50(0.05)
0.60(0.05)
0.66(0.05)
0.80(0.04)

DBCD
AP(SD) to i = 1, 2
0.33(0.02) 0.33(0.02)
0.35(0.03) 0.35(0.03)
0.37(0.03) 0.37(0.03)
0.40(0.03) 0.40(0.03)
0.33(0.03) 0.33(0.03)
0.36(0.03) 0.36(0.03)
0.39(0.04) 0.39(0.04)
0.33(0.04) 0.33(0.04)
0.37(0.04) 0.37(0.04)
0.33(0.05) 0.33(0.05)

FP(SD)
0.20(0.04)
0.26(0.04)
0.30(0.04)
0.32(0.04)
0.40(0.05)
0.46(0.05)
0.49(0.05)
0.60(0.05)
0.65(0.05)
0.80(0.04)

Table 8: Comparison of allocation proportion (AP) and failure proportion (FP) for
some response-adaptive designs targeting RSIHR allocation. The simulation used
5,000 replications.
than the CR rule.
When treatment success probabilities are no longer equal, both rules assign most
patients to the better treatment, which results in reduced treatment failures. However, this proportion is much lower than that of the rules mentioned in Table 7.
The variability of the allocation proportion for both the rules is lower than for CR.
Interestingly, the variability of these procedures decreases as pi decreases which is
opposite to what was happening for the rules targeting urn allocation. This means
that when pi is high, rules targeting RSIHR allocation are less variable while when
pi is low, urn allocation seems to be slightly less variable out of the two. We saw a
similar behaviour for these two allocation proportion for the K = 2 case. Once again
we do not notice a significant difference in the variability of the failure proportion.
To conclude, we have seen that allocation and failure proportions for K = 3
case seem to behave similarly to those of the rules with K = 2 treatments. In
general, it can be noticed that a more ethical allocation usually results in a higher
variability of the allocation proportion. We have also noticed that the rules targeting
RSIHR allocation allocation are less variable, but have a less ethical allocation than
rules targeting urn allocation. In addition, the RSIHR allocation seems to be more
suitable when pi is high, while the urn allocation seems to perform slightly better
for small pi .
3.3.2

Inference, significance level and power

We now define a suitable statistical test for the K = 3 case. Often in a clinical trial
we wish to compare K 1 treatments to a control. We wish to test the hypothesis
of no difference between the control and the other treatments against the hypothesis
that there is a significant difference between the K 1 treatments and the control.
From now on, we will consider p3 to be the control. Formally, we can formulate the

27

140293481

hypotheses as
H0 : p1 p3 = 0, p2 p3 = 0
and
H1 : p1 p3 6= 0, p2 p3 6= 0.
respectively. Hu and Rosenberger (2003) consider the contrast test of homogeneity
to test the above hypotheses. The contrast of interest is defined as
pc = {p1 p3 , p2 p3 }0
with the respective estimator
c = {
p
p1 p3 , p2 p3 }0 .
We let

"

p3 q3 /N3
= p1 q1 /N1 + p3 q3 /N3

p3 q3 /N3
p2 q2 /N2 + p3 q3 /N3

be the estimator of the variance of pc . Then, the test statistic is given by


1 p
0c
c
H=p
which under H0 follows the chi-square distribution with 2 degrees of freedom.
We may use the test of homogeneity to simulate power and significance level.
We use a similar approach as for the K = 2 case, but we now perform the test
of homogeneity instead of the Wald test. We use 0.05 level of significance to test
the H0 hypothesis. The results of such a simulation are shown in Table 9, with
all parameters as given previously. We have set n such that the test has a power
of roughly 0.90 when using complete randomisation. As previously, for GDL and
DBCD the line below the name of the rule indicates the target allocation proportion.
There is no clear difference between the designs when comparing the simulated
significance level. For all designs, the simulated significance is close to the 0.05 significance level we have used for the hypothesis test. This means that the probability
of incorrectly rejecting H0 (type I error) is close to the 0.05 level we have allowed
for the test.
For all pi , CR produces the highest level of power. However, we can also see
that the GDL and DBCD overall maintain a very high level of power. For these two
rules, the RSIHR target seems more appropriate as it has a higher power level than
urn allocation. Finally, the ORBD seems to have the lowest power of all designs.
To conclude, we have seen that the ORBD on average produces the most ethical
allocation by allocating the most patients to the best treatment. However, this
design also has a highly variable allocation and this leads to relatively large loss
in power. On the other hand, the DL, GDL and DBCD rules have a less ethical
28

140293481

(p1 , p2 , p3 )

CR

(0.8,0.8,0.8)
(0.6,0.6,0.6)
(0.4,0.4,0.4)
(0.2,0.2,0.2)

100
100
100
100

0.04
0.05
0.04
0.04

(0.8,0.8,0.6)
(0.8,0.8,0.4)
(0.8,0.8,0.2)
(0.6,0.6,0.4)
(0.6,0.6,0.2)
(0.4,0.4,0.2)

290 0.89
84 0.88
42 0.93
338 0.88
75 0.86
285 0.90

DL
GDL ORBD
Urn RSIHR
Significance Level
0.04
0.04
0.04
0.05
0.04
0.04
0.04
0.05
0.05
0.04
0.05
0.04
Power
0.83
0.88
0.74
0.84
0.87
0.71
0.88
0.91
0.83
0.85
0.87
0.82
0.81
0.84
0.67
0.87
0.87
0.79

DBCD
Urn

DBCD
RSIHR

0.04
0.05
0.05
0.04

0.04
0.05
0.05
0.05

0.79
0.83
0.83
0.84
0.81
0.86

0.87
0.87
0.89
0.87
0.82
0.88

Table 9: Simulated power and significance level of various RAR designs for a clinical
trial with K = 3 treatments. The results were obtained using a simulation with
5, 000 replications.
allocations than ORBD but the allocations are less variable and the rules maintain a
high level of power. It can be said that the rules targeting RSIHR allocation are less
variable for high pi while rules targeting urn allocation exhibit lower variability for
low pi . In terms of power, the RSIHR allocation is much more suitable, maintaining
a very high level of power. Overall, so far we have seen RAR designs that (i) assign
more patients to superior treatment (ii) are less variable than CR (iii) have lower
failure proportion than CR (iv) maintain a high level of power. The RAR designs
that is able to meet all these criteria often depends on pi .
We end the investigation of multi-treatment RAR designs here. Since DL, GDL,
ORBD and DBCD have been extended to any number of treatments, it is possible
to investigate how the rules behave for K > 3. The contrast test of homogeneity
can also be extended to K > 3 treatments, as shown by Hu and Rosenberger (2003).
However, such an investigation is not performed here and we now focus on delayed
responses.

3.4
3.4.1

Delayed responses
K = 2 treatments

We start by introducing a way of incorporating delayed responses for all the rules
being investigated. We allow patients to arrive sequentially, one patient at each
time unit. That is, the first patient arrives at time unit 1, the second patient
arrives at time unit 2 and so on. We also define the vector d = {d1 , . . . , di } which
corresponds to the mean delay in response for each treatment given in time units
defined previously. For example, d = {5, 1} means that the mean delay for treatment

29

140293481

i = 1 is 5 and so we expect to on average to assign 5 new patients before the response


is available for this patient.
Exponential distribution is often used to model queues and thus we use it to
generate our delayed responses. Given a patient assigned to treatment i, we define
the response time to be the time unit of randomisation plus a random number
generated from the exponential distribution with the rate 1/di . This is because the
mean of the exponential distribution is given as the inverse of the rate.
For each rule we are also required to alter the stopping rule. So far, each rule
stopped once n patients were assigned since the responses were immediate. We now
let the rule continue until n patients have been randomised and all n responses have
been collected.
The response is now not immediate and thus after each new randomised patient
we check if any responses are ready. That is, if the current time unit is 7, we look
for any patients that have a response time > 7 and < 8. For each patient with such
criteria, we obtain a response and then the 8th patient is randomised.
We start by considering the K = 2 case for the GDL, ORBD and DBCD rules
with delayed responses. A simulation was performed to investigate the allocation
proportion of each of these rules, which is similar to the one performed in Section 3.2
and uses the same parameters, but now using the model above incorporating delayed
responses. Three different delay values of d = {1, 1}, d = {5, 1} and d = {5, 5} were
investigated. The first choice represents a low delay on both treatments, second one
represents an unequal delay on each treatment whilst the third choice represents
a moderate delay on both treatments. The results are given in Table 10 with the
row below the rule name indicating the allocation proportion that is being targeted.
There was no significant difference in the variability of failure proportion from the
results in Section 3.2 so we only report the mean failure proportion to improve
readability.
We start the analysis by considering the p1 = p2 cases. Under all three d choices,
all the rules assign patients equally, as would be expected. However, the standard
deviations differ significantly. The ORBD is the most variable, as has been seen
previously for instantaneous responses. The DBCD targeting urn allocation is also
quite variable, especially for high pi . The GDL targeting the same allocation is less
variable. Finally, the GDL and DBCD targeting RSIHR allocation are the least
variable out of all the rules for high pi with the GDL performing slightly better for
low pi .
Consider the case when p1 6= p2 . We see that all the rules assign more patients
to the better treatment, as has been seen for the rules with instantaneous responses.
The ORBD has the most ethically advantageous allocation by far, assigning most
patients to treatment i = 1 i.e. the superior treatment. We notice that the DBCD
seems to be more variable than GDL, as seen previously in the p1 = p2 case. The

30

140293481
GDL
Urn
AP(SD)
FP

(p1 , p2 )

(0.8,0.8)
(0.8,0.6)
(0.8,0.4)
(0.8,0.2)
(0.6,0.6)
(0.6,0.4)
(0.6,0.2)
(0.4,0.4)
(0.4,0.2)
(0.2,0.2)

100
100
100
100
100
100
100
100
100
100

0.50(0.05)
0.57(0.05)
0.63(0.04)
0.68(0.03)
0.50(0.04)
0.56(0.04)
0.62(0.03)
0.50(0.03)
0.56(0.03)
0.50(0.03)

0.20
0.29
0.35
0.39
0.40
0.49
0.55
0.60
0.69
0.80

(0.8,0.8)
(0.8,0.6)
(0.8,0.4)
(0.8,0.2)
(0.6,0.6)
(0.6,0.4)
(0.6,0.2)
(0.4,0.4)
(0.4,0.2)
(0.2,0.2)

100
100
100
100
100
100
100
100
100
100

0.48(0.05)
0.55(0.05)
0.61(0.04)
0.66(0.03)
0.49(0.04)
0.55(0.04)
0.60(0.03)
0.49(0.04)
0.55(0.03)
0.50(0.02)

0.20
0.29
0.35
0.40
0.40
0.49
0.56
0.60
0.69
0.80

(0.8,0.8)
(0.8,0.6)
(0.8,0.4)
(0.8,0.2)
(0.6,0.6)
(0.6,0.4)
(0.6,0.2)
(0.4,0.4)
(0.4,0.2)
(0.2,0.2)

100
100
100
100
100
100
100
100
100
100

0.50(0.04)
0.56(0.04)
0.61(0.04)
0.66(0.03)
0.50(0.04)
0.56(0.04)
0.61(0.03)
0.50(0.03)
0.55(0.03)
0.50(0.02)

0.20
0.29
0.36
0.40
0.40
0.49
0.56
0.60
0.69
0.80

GDL
ORBD
RSIHR
AP(SD)
FP
AP(SD)
FP
d = {1, 1}
0.50(0.02) 0.20 0.50(0.15) 0.20
0.53(0.02) 0.29 0.68(0.13) 0.27
0.57(0.03) 0.37 0.78(0.10) 0.29
0.63(0.04) 0.42 0.84(0.06) 0.29
0.50(0.03) 0.40 0.50(0.16) 0.40
0.54(0.03) 0.49 0.66(0.14) 0.47
0.60(0.04) 0.56 0.78(0.09) 0.49
0.50(0.04) 0.60 0.49(0.15) 0.60
0.56(0.04) 0.68 0.67(0.12) 0.66
0.50(0.05) 0.80 0.50(0.14) 0.80
d = {5, 1}
0.50(0.02) 0.20 0.48(0.15) 0.20
0.53(0.02) 0.29 0.66(0.13) 0.27
0.57(0.03) 0.37 0.77(0.09) 0.29
0.63(0.04) 0.42 0.83(0.06) 0.30
0.50(0.03) 0.40 0.50(0.15) 0.40
0.54(0.03) 0.49 0.65(0.14) 0.47
0.60(0.04) 0.56 0.77(0.09) 0.49
0.50(0.04) 0.60 0.52(0.15) 0.60
0.57(0.04) 0.69 0.67(0.11) 0.67
0.51(0.05) 0.80 0.51(0.14) 0.80
d = {5, 5}
0.50(0.02) 0.20 0.50(0.15) 0.20
0.53(0.02) 0.29 0.66(0.13) 0.27
0.57(0.03) 0.37 0.77(0.10) 0.29
0.62(0.04) 0.42 0.83(0.06) 0.30
0.50(0.03) 0.40 0.50(0.15) 0.40
0.54(0.03) 0.49 0.65(0.13) 0.47
0.60(0.04) 0.56 0.75(0.09) 0.50
0.50(0.04) 0.60 0.50(0.14) 0.60
0.56(0.04) 0.69 0.66(0.12) 0.67
0.50(0.05) 0.80 0.50(0.14) 0.80

DBCD
Urn
AP(SD)
FP

DBCD
RSIHR
AP(SD)
FP

0.51(0.10)
0.66(0.08)
0.74(0.06)
0.79(0.05)
0.51(0.07)
0.60(0.05)
0.66(0.05)
0.51(0.05)
0.57(0.04)
0.51(0.04)

0.20
0.27
0.31
0.33
0.40
0.48
0.53
0.60
0.69
0.80

0.51(0.03)
0.54(0.03)
0.59(0.03)
0.67(0.04)
0.51(0.03)
0.55(0.04)
0.63(0.05)
0.51(0.04)
0.59(0.05)
0.50(0.06)

0.20
0.29
0.36
0.40
0.40
0.49
0.55
0.60
0.68
0.80

0.50(0.10)
0.66(0.08)
0.74(0.06)
0.79(0.05)
0.51(0.07)
0.60(0.06)
0.66(0.05)
0.51(0.05)
0.57(0.04)
0.51(0.03)

0.20
0.27
0.30
0.33
0.40
0.48
0.53
0.60
0.69
0.80

0.50(0.03)
0.54(0.03)
0.59(0.03)
0.67(0.04)
0.51(0.03)
0.56(0.04)
0.63(0.05)
0.51(0.04)
0.59(0.05)
0.51(0.06)

0.20
0.29
0.36
0.40
0.40
0.49
0.54
0.60
0.68
0.80

0.51(0.11)
0.66(0.08)
0.74(0.06)
0.79(0.05)
0.51(0.07)
0.60(0.05)
0.66(0.05)
0.51(0.05)
0.57(0.04)
0.51(0.04)

0.20
0.27
0.30
0.33
0.40
0.48
0.54
0.60
0.69
0.80

0.51(0.03)
0.54(0.03)
0.59(0.04)
0.67(0.05)
0.51(0.03)
0.56(0.04)
0.63(0.05)
0.50(0.04)
0.58(0.05)
0.51(0.06)

0.20
0.29
0.36
0.40
0.40
0.49
0.54
0.60
0.69
0.80

Table 10: Comparison of allocation proportion (AP) and its standard deviation
for some response-adaptive designs with K = 2 treatments and delayed responses.
Three different values for the response delay were investigated. The simulation used
5,000 replications.
difference can be seen between these two rules especially for the rules targeting
urn allocation. Once again we see that the rules targeting urn allocation are more
appropriate for low pi whilst urns targeting RSIHR allocation are more appropriate
for high pi . There does not seem to be any significant difference between the three
settings of d or between this table and the results found for the same rules with
instantaneous responses, as seen in Table 4 and Table 5.
We now perform a simulation of significance level and power, much like the one
performed in Table 3. We still use the Wald test at 0.05 level of significance and all
parameters are as before. The results can be seen in Table 11.
We notice that the significance level does not differ significantly between the
procedures and d values. In fact, it is very similar to the significance level of the
procedures with instantaneous responses reported in Table 6.
When compared to CR, GDL rule maintains the highest power. The power is
high for this design for both target allocations considered, often maintaining a similar
31

140293481

power to CR. The DBCD targeting RSIHR allocation is also highly powerful. The
behaviour of DBCD targeting urn allocation is particularly interesting, with the rule
maintaining high power when the difference between pi is small. When the difference
between pi is larger, i.e. pi = (0.8, 0.8), the design has the lowest power out of all
the designs. The ORBD has the lowest power out of all the designs for all other pi
values. We notice only a slight difference between the different choices of d with no
clear pattern.

(p1 ,p2 )

CR

(0.8,0.8)
(0.6,0.6)
(0.4,0.4)
(0.2,0.2)
(0.8,0.6)
(0.8,0.4)
(0.8,0.2)
(0.6,0.4)
(0.6,0.2)
(0.4,0.2)

100 0.04
100 0.04
100 0.05
100 0.04
206 0.88
62 0.91
27 0.89
256 0.90
57 0.87
217 0.90

(0.8,0.8)
(0.6,0.6)
(0.4,0.4)
(0.2,0.2)
(0.8,0.6)
(0.8,0.4)
(0.8,0.2)
(0.6,0.4)
(0.6,0.2)
(0.4,0.2)

100 0.04
100 0.04
100 0.05
100 0.04
206 0.88
62 0.91
27 0.89
256 0.90
57 0.87
217 0.90

(0.8,0.8)
(0.6,0.6)
(0.4,0.4)
(0.2,0.2)
(0.8,0.6)
(0.8,0.4)
(0.8,0.2)
(0.6,0.4)
(0.6,0.2)
(0.4,0.2)

100 0.04
100 0.04
100 0.05
100 0.04
206 0.88
62 0.91
27 0.89
256 0.90
57 0.87
217 0.90

GDL GDL ORBD


Urn RSIHR
d = {1, 1}
0.04
0.04
0.04
0.05
0.04
0.05
0.05
0.04
0.05
0.04
0.04
0.04
0.88
0.87
0.83
0.91
0.91
0.81
0.89
0.89
0.80
0.89
0.88
0.86
0.87
0.86
0.79
0.90
0.89
0.86
d = {5, 1}
0.04
0.05
0.04
0.04
0.05
0.05
0.05
0.05
0.05
0.04
0.04
0.04
0.88
0.89
0.82
0.90
0.90
0.82
0.89
0.89
0.84
0.89
0.87
0.87
0.87
0.87
0.80
0.87
0.90
0.87
d = {5, 5}
0.04
0.04
0.04
0.05
0.06
0.06
0.05
0.04
0.06
0.05
0.04
0.05
0.87
0.87
0.84
0.89
0.88
0.83
0.89
0.92
0.84
0.88
0.90
0.85
0.87
0.87
0.81
0.90
0.88
0.88

DBCD DBCD
Urn
RSIHR
0.04
0.05
0.06
0.05
0.88
0.84
0.76
0.89
0.85
0.90

0.05
0.05
0.06
0.05
0.88
0.90
0.88
0.90
0.87
0.90

0.04
0.04
0.04
0.04
0.86
0.85
0.76
0.88
0.86
0.89

0.05
0.04
0.05
0.05
0.87
0.90
0.87
0.90
0.87
0.90

0.03
0.04
0.05
0.04
0.85
0.85
0.76
0.89
0.85
0.89

0.03
0.04
0.05
0.06
0.87
0.91
0.89
0.90
0.87
0.88

Table 11: Simulated power of various RAR designs for a clinical trial with K = 2
treatments and delayed responses. Three different values for the response delay were
investigated. The results were obtained using a simulation with 5,000 replications.

32

140293481

3.4.2

K = 3 treatments

We now consider RAR procedures with delayed responses and K = 3 treatments.


We start by investigating the allocation proportions of the GDL, ORBD and DBCD
rules. We run a similar simulation as in the previous section, but we adapt the rules
to K = 3 treatments. The results of such a simulation are reported in Table 12. We
do not report the failure proportions here due to no significance difference from the
instantaneous model and to improve the readability of the table. For each rule, the
table reports the allocation proportion to treatments i = 1, 2.
Consider the case when pi values are equal. We can then see that allocation proportions are roughly equal for all designs. The variability of the allocation proportion
follows a similar pattern as seen for the same rules with instantaneous designs, investigated in Table 4 and Table 5. That is, ORBD is the most variable whilst the
performance of the GDL and DBCD highly depends on the allocation proportion
being targeted. The urn allocation is less variable for small pi , whilst RSIHR allocation is more suitable when pi is higher. Out of the DBCD and GDL rules, the GDL
is slightly less variable. There does not seem to be a significant difference between
the different values of d or between the procedures with delayed and instantaneous
responses.
When pi are unequal, all designs allocate more patients to the superior treatment. As before, the ORBD has the most ethically desirable allocation, followed by
the DBCD targeting urn allocation. The DBCD targeting RSIHR allocation and
both GDL designs have a very similar allocation, only slightly skewed from equal
allocation. The variability of the allocation proportion also follows similar patterns
as before, with ORBD being the most variable. The variability of the GDL and
DBCD rules highly depends on the allocation proportion being targeted and the
value of pi . When pi is high, RSIHR allocation is less variable but when pi is low,
urn allocation is less variable. Out of GDL and DBCD, GDL seems to be slightly
less variable for the same target allocation. We also see that no significant difference
is observed between the different levels of delay. Finally, there is also no significant
difference between the allocation of the rules with delayed responses and rules with
instantaneous responses that has been explored in Table 4 and Table 5.
We now consider the power of the procedures. We use the contrast test of homogeneity, much like in the case of the model with instantaneous responses. All
parameters were kept the same and the results are given in Table 13. There does
not seem to be a noticeable difference in significance level between the designs, other
than the ORBD. The power also seems to follow a similar pattern as the K = 2 case.
That is, the GDL and DBCD targeting RSIHR allocation maintain a very high level
of power for all pi , when compared to CR. The GDL rule targeting urn allocation
also shows a high level of power. The DBCD targeting urn allocation exhibits interesting behaviour as was seen in the K = 2 case. That is, the power is significantly
33

140293481

reduced when one pi is smaller than the others. When pi = (0.8, 0.8, 0.2), the DBCD
targeting urn allocation is actually the least powerful. For all other settings, the
ORBD has the lowest power. It is worth noting that the true power of the ORBD
may be higher due to the slightly smaller significance level that was simulated, when
compared to the 0.05 significance level at which we performed the Wald test.
To conclude, we have seen that a moderate delay in responses does not have a
significant effect on the allocation proportion and its variability. However, we have
noticed that the power can be effected by delayed responses for DBCD targeting
urn allocation. We have seen that for this particular design, when the difference
between success probabilities pi is the greatest, the design has the lowest power
when compared to the other designs. For all other designs there is no significant
drop in power due to delayed responses.
We now consider the case of a clinical trial with a large delay. Since all RAR
designs considered require some responses in order to skew the allocation proportion,
it can be said that RAR design would not be as effective. For example in the case
when all patients are assigned before any responses are received, all RAR designs
investigated would allocate patients in the same manner as complete randomisation.
Thus, none of the RAR designs would be suitable for trials where the delay is very
large e.g. survival trials.
However, in practice there are often ways of overcoming the problem of a large
delay. For example, Tamura et al. (1994) explored an application of a RAR design
in a study investigating the effect of fluoxetine in patients with a depressive disorder.
In this study, the time between the first and final measurement was approximately
8 weeks. However, the researchers decided that this delay was too large and have
decided to use a surrogate response instead. The surrogate response was thus defined
to be a success if the patient exhibited at least a 50% reduction in HAMD (a scale
measuring severity of depression) in two consecutive visits after 3 weeks of therapy. A
similar surrogate response might be possible for other clinical trials where otherwise
an RAR design would be ruled out on the basis of a large delay.
Note that Zhang et al. (2007) compared the allocation proportion of the GDL
and DBCD under delayed responses. They only considered K = 3 treatments
and no investigation is performed into the power of those designs. However, their
comparison also investigated non-uniform patient entry time which was not explored
here. The literature does not report any investigation of the ORBD under delayed
responses. The is also the first known instance of the power of the GDL, ORBD
and DBCD investigated under delayed responses.

34

35

100
100
100
100
100
100
100
100
100
100

100
100
100
100
100
100
100
100
100
100

100
100
100
100
100
100
100
100
100
100

(0.8,0.8,0.8)
(0.8,0.8,0.6)
(0.8,0.8,0.4)
(0.8,0.8,0.2)
(0.6,0.6,0.6)
(0.6,0.6,0.4)
(0.6,0.6,0.2)
(0.4,0.4,0.4)
(0.4,0.4,0.2)
(0.2,0.2,0.2)

(0.8,0.8,0.8)
(0.8,0.8,0.6)
(0.8,0.8,0.4)
(0.8,0.8,0.2)
(0.6,0.6,0.6)
(0.6,0.6,0.4)
(0.6,0.6,0.2)
(0.4,0.4,0.4)
(0.4,0.4,0.2)
(0.2,0.2,0.2)

(0.8,0.8,0.8)
(0.8,0.8,0.6)
(0.8,0.8,0.4)
(0.8,0.8,0.2)
(0.6,0.6,0.6)
(0.6,0.6,0.4)
(0.6,0.6,0.2)
(0.4,0.4,0.4)
(0.4,0.4,0.2)
(0.2,0.2,0.2)

0.33(0.04)
0.36(0.04)
0.38(0.04)
0.40(0.04)
0.33(0.04)
0.36(0.04)
0.38(0.04)
0.33(0.03)
0.36(0.03)
0.33(0.02)

0.33(0.04)
0.34(0.04)
0.36(0.04)
0.38(0.04)
0.32(0.04)
0.35(0.04)
0.37(0.04)
0.33(0.03)
0.35(0.03)
0.33(0.02)

0.33(0.04)
0.36(0.05)
0.38(0.05)
0.40(0.05)
0.33(0.04)
0.36(0.04)
0.38(0.04)
0.33(0.03)
0.36(0.03)
0.33(0.02)

0.33(0.04)
0.36(0.04)
0.38(0.04)
0.40(0.04)
0.33(0.04)
0.36(0.04)
0.38(0.04)
0.33(0.03)
0.36(0.03)
0.33(0.02)

0.34(0.04)
0.37(0.05)
0.40(0.04)
0.42(0.04)
0.34(0.04)
0.37(0.04)
0.39(0.04)
0.34(0.03)
0.36(0.03)
0.33(0.02)

0.33(0.04)
0.36(0.04)
0.39(0.05)
0.40(0.05)
0.33(0.04)
0.36(0.04)
0.38(0.04)
0.33(0.03)
0.36(0.03)
0.33(0.02)

GDL
Urn
AP(SD) to i = 1, 2

GDL
ORBD
RSIHR
AP(SD) to i = 1, 2
AP(SD) to i = 1, 2
d = {1, 1, 1}
0.33(0.02) 0.33(0.02) 0.33(0.11) 0.34(0.11)
0.35(0.02) 0.35(0.02) 0.39(0.12) 0.39(0.13)
0.36(0.02) 0.36(0.02) 0.42(0.13) 0.42(0.13)
0.38(0.02) 0.38(0.02) 0.43(0.13) 0.43(0.13)
0.33(0.02) 0.33(0.02) 0.33(0.12) 0.33(0.12)
0.35(0.03) 0.35(0.03) 0.39(0.13) 0.38(0.13)
0.37(0.03) 0.37(0.03) 0.42(0.13) 0.42(0.13)
0.33(0.03) 0.33(0.03) 0.33(0.12) 0.33(0.12)
0.36(0.03) 0.36(0.03) 0.39(0.12) 0.39(0.12)
0.33(0.04) 0.33(0.04) 0.33(0.10) 0.34(0.10)
d = {5, 1, 1}
0.33(0.02) 0.33(0.02) 0.33(0.11) 0.34(0.11)
0.34(0.02) 0.35(0.02) 0.38(0.12) 0.39(0.12)
0.36(0.02) 0.36(0.02) 0.41(0.13) 0.42(0.13)
0.38(0.02) 0.38(0.02) 0.43(0.13) 0.43(0.13)
0.33(0.02) 0.33(0.02) 0.33(0.12) 0.33(0.12)
0.35(0.03) 0.35(0.02) 0.38(0.13) 0.39(0.12)
0.37(0.03) 0.37(0.03) 0.42(0.12) 0.42(0.12)
0.33(0.03) 0.33(0.03) 0.34(0.12) 0.33(0.12)
0.36(0.03) 0.36(0.03) 0.39(0.12) 0.39(0.12)
0.34(0.04) 0.33(0.04) 0.34(0.10) 0.33(0.10)
d = {5, 5, 5}
0.33(0.02) 0.33(0.02) 0.33(0.11) 0.33(0.11)
0.35(0.02) 0.35(0.02) 0.38(0.12) 0.38(0.12)
0.36(0.02) 0.36(0.02) 0.41(0.13) 0.41(0.12)
0.38(0.02) 0.38(0.02) 0.42(0.12) 0.43(0.12)
0.33(0.02) 0.33(0.02) 0.33(0.12) 0.33(0.12)
0.35(0.03) 0.35(0.03) 0.39(0.12) 0.38(0.12)
0.37(0.03) 0.37(0.03) 0.41(0.12) 0.41(0.12)
0.33(0.03) 0.33(0.03) 0.33(0.11) 0.33(0.11)
0.36(0.03) 0.36(0.03) 0.38(0.11) 0.39(0.11)
0.33(0.04) 0.33(0.04) 0.33(0.10) 0.33(0.10)
0.33(0.09)
0.39(0.10)
0.41(0.10)
0.43(0.10)
0.33(0.06)
0.37(0.06)
0.40(0.06)
0.33(0.04)
0.36(0.04)
0.33(0.03)

0.33(0.10)
0.39(0.09)
0.42(0.10)
0.43(0.10)
0.33(0.06)
0.37(0.06)
0.39(0.06)
0.33(0.04)
0.36(0.04)
0.33(0.03)

0.33(0.09)
0.39(0.09)
0.42(0.10)
0.44(0.10)
0.33(0.06)
0.37(0.06)
0.39(0.06)
0.33(0.04)
0.36(0.04)
0.33(0.03)

0.33(0.09)
0.39(0.10)
0.42(0.10)
0.44(0.10)
0.33(0.06)
0.37(0.06)
0.39(0.06)
0.33(0.04)
0.36(0.04)
0.33(0.03)

0.33(0.10)
0.39(0.09)
0.42(0.10)
0.43(0.10)
0.33(0.06)
0.37(0.06)
0.39(0.06)
0.33(0.04)
0.36(0.04)
0.33(0.03)

0.34(0.09)
0.39(0.09)
0.42(0.10)
0.43(0.10)
0.33(0.06)
0.37(0.06)
0.39(0.06)
0.33(0.04)
0.36(0.04)
0.33(0.03)

DBCD
Urn
AP(SD) to i = 1, 2

0.33(0.03)
0.35(0.03)
0.37(0.03)
0.39(0.03)
0.33(0.03)
0.35(0.03)
0.38(0.03)
0.33(0.04)
0.37(0.04)
0.33(0.05)

0.33(0.02)
0.35(0.03)
0.37(0.03)
0.39(0.03)
0.33(0.03)
0.35(0.03)
0.38(0.03)
0.33(0.04)
0.36(0.04)
0.33(0.05)

0.33(0.03)
0.35(0.03)
0.37(0.03)
0.39(0.03)
0.33(0.03)
0.35(0.03)
0.38(0.03)
0.33(0.04)
0.37(0.04)
0.33(0.05)

0.33(0.02)
0.35(0.03)
0.37(0.03)
0.39(0.03)
0.33(0.03)
0.35(0.03)
0.38(0.03)
0.33(0.04)
0.36(0.04)
0.33(0.05)

0.33(0.02)
0.35(0.03)
0.37(0.03)
0.40(0.03)
0.33(0.03)
0.35(0.03)
0.38(0.03)
0.33(0.04)
0.37(0.04)
0.33(0.05)

0.33(0.02)
0.35(0.03)
0.37(0.03)
0.39(0.03)
0.33(0.03)
0.35(0.03)
0.38(0.03)
0.33(0.04)
0.37(0.04)
0.33(0.05)

DBCD
RSIHR
AP(SD) to i = 1, 2

Table 12: Comparison of allocation proportion (AP) and its standard deviation for some response-adaptive designs with K = 2 treatments and
delayed responses. Three different values for the response delay were investigated. The simulation used 5,000 replications.

(p1 , p2 , p3 )

140293481

140293481

(p1 , p2 , p3 )

CR

(0.8,0.8,0.8)
(0.6,0.6,0.6)
(0.4,0.4,0.4)
(0.2,0.2,0.2)
(0.8,0.8,0.6)
(0.8,0.8,0.4)
(0.8,0.8,0.2)
(0.6,0.6,0.4)
(0.6,0.6,0.2)
(0.4,0.4,0.2)

100 0.04
100 0.04
100 0.05
100 0.04
290 0.88
84 0.89
42 0.93
338 0.90
75 0.87
285 0.91

(0.8,0.8,0.8)
(0.6,0.6,0.6)
(0.4,0.4,0.4)
(0.2,0.2,0.2)
(0.8,0.8,0.6)
(0.8,0.8,0.4)
(0.8,0.8,0.2)
(0.6,0.6,0.4)
(0.6,0.6,0.2)
(0.4,0.4,0.2)

100 0.04
100 0.04
100 0.05
100 0.04
290 0.88
84 0.89
42 0.93
338 0.90
75 0.87
285 0.91

(0.8,0.8,0.8)
(0.6,0.6,0.6)
(0.4,0.4,0.4)
(0.2,0.2,0.2)
(0.8,0.8,0.6)
(0.8,0.8,0.4)
(0.8,0.8,0.2)
(0.6,0.6,0.4)
(0.6,0.6,0.2)
(0.4,0.4,0.2)

100 0.04
100 0.04
100 0.05
100 0.04
290 0.88
84 0.89
42 0.93
338 0.90
75 0.87
285 0.91

GDL GDL ORBD


Urn RSIHR
d = {1, 1, 1}
0.04
0.03
0.03
0.04
0.05
0.05
0.05
0.05
0.05
0.04
0.04
0.03
0.85
0.87
0.75
0.85
0.89
0.71
0.87
0.91
0.84
0.85
0.88
0.79
0.80
0.84
0.64
0.85
0.87
0.79
d = {5, 1, 1}
0.04
0.04
0.03
0.04
0.04
0.04
0.04
0.06
0.04
0.04
0.05
0.03
0.85
0.88
0.74
0.84
0.89
0.74
0.88
0.90
0.83
0.85
0.86
0.76
0.83
0.86
0.69
0.88
0.88
0.76
d = {5, 5, 5}
0.05
0.05
0.03
0.04
0.05
0.04
0.04
0.04
0.04
0.05
0.04
0.03
0.84
0.86
0.74
0.86
0.87
0.76
0.90
0.93
0.86
0.86
0.87
0.79
0.84
0.84
0.68
0.88
0.87
0.79

DBCD DBCD
Urn
RSIHR
0.04
0.03
0.06
0.04
0.79
0.69
0.74
0.86
0.79
0.88

0.03
0.05
0.05
0.06
0.88
0.89
0.88
0.87
0.81
0.88

0.04
0.04
0.05
0.04
0.80
0.73
0.74
0.84
0.75
0.90

0.04
0.05
0.05
0.06
0.88
0.88
0.90
0.86
0.85
0.88

0.04
0.04
0.04
0.04
0.79
0.74
0.77
0.85
0.79
0.86

0.04
0.05
0.05
0.04
0.89
0.87
0.91
0.86
0.83
0.87

Table 13: Simulated power of various RAR designs for a clinical trial with K = 3
treatments and delayed responses. Three different values for the response delay were
investigated. The results were obtained using a simulation with 5,000 replications.

3.5
3.5.1

Covariates
K = 2 treatments

So far, we have extended the DL, ORBD and DBCD to allow the incorporation of
covariates. This extended version of DL rule is known as DLC, while the DBCD
version is known as the RDBCD. It is worth noting that the way the ORBD allows
covariate balance is different to that of the DLC and RDBCD. The ORBD balances
36

140293481

covariates in a traditional sense, meaning that it aims to assign in such a way that
the covariate is equally represented in each treatment. It does this by skewing the
assignment probability towards the treatment in which the current covariate value
is under represented. However, it also has the ethical basis to deal with, so it
also skews the probability of assignment towards the treatment performing best so
far. On the other hand, the DLC and RDBCD incorporate the covariates on a more
ethical basis. These rules skew the probability in such a way that the best treatment
is assigned to the patient with the most favourable condition.
In this section we aim to compare these two ways of incorporating covariates.
The ORBD will be compared to the DLC and RDBCD. Recall that the DLC is able
to have a varied probability of success which depends on the covariate level, that is
we assign a patient to treatment i with the probability aUj pi where a (0, 1) is the
so-called prognostic factor index and Uj {0, . . . , G} is the covariate level for the
j th patient. We extended the ORBD and RDBCD to also have this probability of
success. Although this change does not alter the inner workings of the rule, it allows
us to fairly compare the rules in a more realistic setting. This is because in practice
the treatment might have a different probability of success depending on the value
of the the covariate and we always want to balance the treatments on the covariate
that is likely to have an impact on the treatment outcome. Here, we investigated the
G value of G = 1. We then generate n random numbers from the standard uniform
distribution, denoted by zj with each value corresponding to the j th patient. The
zj values were then categorised into G + 1 levels Uj . Since zj values come from
the uniform distribution, we expect that on average every Uj will contain the same
number of patients. Note that the lower the Uj level, the higher the probability of
a success.
Recall that the ORBD and RDBCD were able to incorporate continuous covariates, but the DLC was not. We thus use the zj values rather than Uj for the two
former rules. That is for the ORBD the regression will use the actual covariate
values and the Uj will only be used for obtaining the response outcome. Similarly,
the RDBCD will use zj with the transformation H(z) = 1 z. This is because the
g function has a higher probability of assignment when z is high and so we need
to introduce this transformation in order to correctly reflect that the probability of
success is high for low Uj .
A simulation was performed using the above adjustments to the rules and the
results are given in Table 14. We chose a = 0.7, as suggested by Bandyopadhyay
et al. (2009). We also chose RDBCD to target RSIHR allocation. This is because
the DLC can only target urn allocation and so this will also give us a comparison
between the two target allocations. In addition, we have seen that in general the
DL is less variable than DBCD when targeting urn allocation with no covariate
information taken into account. The table reports the allocation proportion to

37

140293481

(p1 , p2 )

(0.8,0.8)
(0.8,0.6)
(0.8,0.4)
(0.8,0.2)
(0.6,0.6)
(0.6,0.4)
(0.6,0.2)
(0.4,0.4)
(0.4,0.2)
(0.2,0.2)

100
100
100
100
100
100
100
100
100
100

DLC
AP(SD)
CI
FP
0.50(0.05) 0.50 0.32
0.53(0.04) 0.50 0.40
0.56(0.04) 0.50 0.47
0.59(0.04) 0.50 0.53
0.50(0.04) 0.50 0.49
0.53(0.04) 0.50 0.57
0.56(0.04) 0.50 0.64
0.50(0.04) 0.50 0.66
0.53(0.04) 0.50 0.74
0.50(0.03) 0.50 0.83

ORBD
AP(SD)
CI
0.50(0.15) 0.50
0.64(0.14) 0.50
0.74(0.11) 0.50
0.81(0.06) 0.50
0.50(0.15) 0.50
0.63(0.14) 0.50
0.74(0.09) 0.50
0.50(0.14) 0.50
0.65(0.12) 0.50
0.50(0.13) 0.50

RDBCD
FP AP(SD)
CI
0.32 0.50(0.09) 0.49
0.38 0.52(0.09) 0.49
0.40 0.55(0.09) 0.48
0.42 0.59(0.09) 0.48
0.49 0.50(0.09) 0.49
0.55 0.52(0.09) 0.48
0.58 0.57(0.09) 0.48
0.66 0.50(0.09) 0.49
0.72 0.54(0.09) 0.48
0.83 0.50(0.10) 0.49

FP
0.32
0.40
0.47
0.52
0.49
0.57
0.63
0.66
0.74
0.83

Table 14: The allocation proportion and its standard deviation, covariate information and failure proportion for the DLC, ORBD and RDBCD with K = 2 treatments. We chose a = 0.7 and RDBCD to target RSIHR allocation. This simulation
used 5,000 replications.
i = 1 and its standard deviation, the failure proportion and covariate information
(CI). The latter represents the average value of zi in treatment i = 1. We do not
report the standard deviation for the failure proportion or covariate information as
there was no significant difference between the rules and pi values.
When p1 = p2 the allocation proportion is roughly the same for all choices of pi .
The standard deviation of this allocation proportion is the highest for the ORBD
rule, as was seen for rules not incorporating covariates. The RDBCD is also highly
variable, while the DLC exhibits the lowest variability out of all the rules. We
also notice that the failure proportion is similar for all the rules. The covariate
information is the same for the DLC and ORBD but the RDBCD seems to have
slightly unbalanced treatments in terms of covariates.
In the case p1 6= p2 we see that all the rules assign more patients to the better
treatment. The ORBD has the most ethically desirable allocation. The DLC and
RDBCD skew the allocation only slightly, even for a high difference in treatment
success probability. This has the usual translation into the failure proportion. When
the difference between pi is high, the failure proportion for the ORBD is much lower
than for the other rules. When this difference is small, e.g. p = (0.4, 0.2) this
difference is quite small. We notice that the covariate information for the DLC and
ORBD is equal for all pi whilst for the RDBCD it is slightly skewed towards the
worse treatment.
We now perform an investigation into the power of these designs. We used the
Wald test as before and the results are shown in Table 15. We use the same n
values as before. We can see that the significance level for all the rules are near the
expected 0.05 level we used to perform the test. We see that in general the ORBD
is the least powerful. The RDBCD and DLC maintain slightly higher power than
the ORBD.

38

140293481
(p1 , p2 )
(0.8,0.8)
(0.6,0.6)
(0.4,0.4)
(0.2,0.2)
(0.8,0.6)
(0.8,0.4)
(0.8,0.2)
(0.6,0.4)
(0.6,0.2)
(0.4,0.2)

n DLC
100 0.05
100 0.05
100 0.04
100 0.04
206 0.70
62 0.77
27 0.76
256 0.79
57 0.77
217 0.82

ORBD
0.06
0.06
0.06
0.04
0.65
0.68
0.68
0.75
0.72
0.77

RDBCD
0.07
0.05
0.05
0.03
0.73
0.78
0.76
0.79
0.78
0.82

Table 15: Simulated power of various RAR designs for a clinical trial with K = 3
treatments and incorporating covariates. Three different values for the response
delay were investigated. The results were obtained using a simulation with 5,000
replications.
3.5.2

K = 3 treatments

We now consider covariates for rules with K = 3 treatments. Thus, we will compare
the DLC and ORBD. Recall that when we introduced the RDBCD rule in Section
2.4.3, we did not define the g function for K > 2 treatments so we cannot use this
rule. We used a similar approach as before to obtain the allocation proportion and
its standard deviation as well as the failure proportion. The results can be seen in
Table 16. We no longer include the column for covariate information as this was the
same for both the rules. Similarly as before, the table only reports the allocation to
the first two treatments as the allocation proportion of the i = 3 treatment can be
obtained by subtraction.
We start the analysis by considering the rules when pi are equal. We notice that
the allocation is the same for both the rules, as expected. The standard deviation
of this allocation is lower for the DLC on both treatments for all pi . The failure
proportion is also very similar.
When pi are unequal, the ORBD assigns more patients than the DLC to the
best treatment. For the DLC the amount the allocation is skewed by is very small.
We notice a similar pattern as in the K = 2 case, that is the DLC maintains a
comparable level of treatment failures as the ORBD, despite a less ethical allocation.
Once again, the only difference of note is when the difference between pi is the largest,
that is pi = (0.8, 0.8, 0.2). We also notice that the DLC is much less variable than
the ORBD.
We now consider the power and significance level of the ORBD and DLC incorporating covariates and with K = 3 treatments. A simulation was run using
all parameters as before, and the results are shown in Table 17. As before, for the
power simulation we used n values as was used for the equivalent rules that are not
covariate-adaptive.

39

140293481

(p1 , p2 )

(0.8,0.8,0.8)
(0.8,0.8,0.6)
(0.8,0.8,0.4)
(0.8,0.8,0.2)
(0.6,0.6,0.6)
(0.6,0.6,0.4)
(0.6,0.6,0.2)
(0.4,0.4,0.4)
(0.4,0.4,0.2)
(0.2,0.2,0.2)

100
100
100
100
100
100
100
100
100
100

DLC
AP(SD) to i = 1, 2
0.33(0.04) 0.33(0.04)
0.35(0.04) 0.34(0.04)
0.36(0.04) 0.36(0.04)
0.37(0.04) 0.37(0.04)
0.34(0.04) 0.33(0.04)
0.35(0.04) 0.34(0.04)
0.35(0.04) 0.36(0.04)
0.33(0.04) 0.33(0.04)
0.34(0.03) 0.34(0.04)
0.33(0.03) 0.33(0.03)

FP
0.32
0.37
0.42
0.45
0.49
0.54
0.59
0.66
0.71
0.83

ORBD
AP(SD) to i = 1, 2
0.33(0.13) 0.30(0.09)
0.37(0.14) 0.33(0.10)
0.41(0.14) 0.36(0.09)
0.42(0.14) 0.37(0.09)
0.33(0.14) 0.29(0.10)
0.38(0.14) 0.33(0.09)
0.40(0.14) 0.36(0.09)
0.33(0.13) 0.29(0.09)
0.38(0.13) 0.34(0.09)
0.33(0.10) 0.31(0.07)

FP
0.32
0.37
0.40
0.42
0.49
0.54
0.57
0.66
0.71
0.83

Table 16: The allocation proportion and its standard deviation, covariate information and failure proportion for the DLC, ORBD and RDBCD with K = 3 treatments. We chose a = 0.7 and RDBCD to target RSIHR allocation. This simulation
used 5,000 replications.
(p1 , p2 , p3 )
(0.8,0.8,0.8)
(0.6,0.6,0.6)
(0.4,0.4,0.4)
(0.2,0.2,0.2)
(0.8,0.8,0.6)
(0.8,0.8,0.4)
(0.8,0.8,0.2)
(0.6,0.6,0.4)
(0.6,0.6,0.2)
(0.4,0.4,0.2)

n DLC
100 0.05
100 0.03
100 0.04
100 0.04
206 0.66
62 0.70
27 0.65
256 0.75
57 0.74
217 0.81

ORBD
0.06
0.06
0.05
0.03
0.60
0.59
0.61
0.67
0.63
0.71

Table 17: Simulated power of various RAR designs for a clinical trial with K = 3
treatments and incorporating covariates. Three different values for the response
delay were investigated. The results were obtained using a simulation with 5,000
replications.
We start by noticing that there is no meaningful difference in the significance
level between the designs. When comparing the power, we can say that the DLC
maintains higher power than the ORBD.
We conclude that under covariate-adaptiveness, the ORBD has the most ethical
allocation proportion which results in the smallest failure proportion. However, the
trade-off is that this rule is the most variable and has the lowest power. The DLC
and RDBCD have less ethical allocations, but are less variable and have higher
power. Out of these two rules, the DLC is much less variable although the two rules
have similar power.

40

140293481

Conclusion

The initial intention of this dissertation has been to define some well studied RAR
designs, explore the extensions of these RAR design in a practical settings and then
to investigate the behaviour of these designs under those practical settings. We
have achieved this by considering the extensions of a number of rules to multiple
treatments, delayed responses and covariate-adaptiveness.
Throughout this investigation, it has been seen that the performance of a RAR
scheme significantly depends on a number of factors such as (i) target allocation
(ii) value of pi (iii) delay in response (iv) covariates. This means that there is no
universal design that is superior in all aspects the others. We have seen that for
the K = 2 treatments design with no delay and not incorporating covariates, the
rules that are able to target any given allocation proportion i.e. GDL, DBCD and
ERADE perform the best in terms of maintaining high power and having a less
variable allocation proportion. This can be seen particularly in the case when these
rules target RSIHR allocation. However, the ORBD performs the best in terms of
the most ethical allocation. We also found that rules targeting urn allocation are
less variable when pi is low whilst RSIHR is less variable for high pi , meaning that a
suitable allocation proportion should also be chosen depending on the approximate
pi levels expected. Similar results have been shown for the same designs with K = 3
treatments.
We then investigated the behaviour of the rules under delayed responses. We
saw that in general the performance of the procedures is not significantly affected
by delayed responses, as long as the delay is moderate.
When RAR procedures incorporate covariates, the ORBD has the most favourable
allocation and failure proportion, but is highly variable. On the other hand, the DLC
is much less variable but skews the allocation only slightly. One limitation of the
RAR designs studied here is the extension so that the rules have varied success
probability under difference covariate levels. Although this is a more realistic setting in practice, it meant that we were unable to compare the covariate-adaptive
procedures to the same procedures without covariates. Thus, if this investigation
was to be done again, this could be a suitable alternative.
The subject of randomisation in clinical trial is a rapidly growing field of research and therefore many extensions to the work presented here could have been
considered. One rule that could have been covered here in more detail is ERADE.
We saw that for K = 2 treatments it performed very well so investigating the rule
under all criteria considered for the other rules (e.g. delayed responses, covariateadaptiveness) would be a suitable extension. It would also be interesting to consider
all the rules incorporating covariates and delayed responses as this could be a common scenario in practice. If more time was available, responses that are not binary
could also have been considered.
41

140293481

References
Atkinson, A. C. and A. Biswas (2013). Randomised Response-Adaptive Designs
in Clinical Trials. Chapman & Hall/CRC Monographs on Statistics & Applied
Probability. Boca Raton: Taylor & Francis Group.
Baldi Antognini, A. and M. Zagoraiou (2012). Multi-objective optimal designs in
comparative clinical trials with covariates: The reinforced doubly adaptive biased
coin design. Ann. Statist. 40 (3), 13151345.
Bandyopadhyay, U. and A. Biswas (1999). Allocation by randomized play-thewinner rule in the presence of prognostic factors. Sankhy: The Indian Journal of
Statistics, Series B (1960-2002) 61 (3), 397412.
Bandyopadhyay, U., A. Biswas, and R. Bhattacharya (2009). Drop-the-loser design
in the presence of covariates. Metrika 69 (1), 115.
Bartlett, R. H., D. W. Roloff, R. G. Cornell, A. F. Andrews, P. W. Dillon, and J. B.
Zwischenberger (1985). Extracorporeal circulation in neonatal respiratory failure:
A prospective randomized study. Pediatrics 76 (4), 479487.
Biswas, A. (1999). Delayed response in randomized play-the-winner rule revisited.
Communications in Statistics - Simulation and Computation 28 (3), 715731.
Eisele, J. R. (1994). The doubly adaptive biased coin design for sequential clinical
trials. Journal of Statistical Planning and Inference 38 (2), 249 261.
Eisele, J. R. and M. B. Woodroofe (1995). Central limit theorems for doubly adaptive
biased coin designs. Ann. Statist. 23 (1), 234254.
Hu, F. and W. F. Rosenberger (2003). Optimality, variability, power. Journal of
the American Statistical Association 98, 671678.
Hu, F., W. F. Rosenberger, and L.-X. Zhang (2006). Asymptotically best responseadaptive randomization procedures. Journal of Statistical Planning and Inference 136 (6), 1911 1922.
Hu, F. and L.-X. Zhang (2004). Asymptotic properties of doubly adaptive biased
coin designs for multitreatment clinical trials. Ann. Statist. 32 (1), 268301.
Hu, F., L.-X. Zhang, and X. He (2009). Efficient randomized-adaptive designs. Ann.
Statist. 37 (5A), 25432560.
Ivanova, A. (2003). A play-the-winner-type urn design with reduced variability.
Metrika 58 (1), 113.
Ivanova, A. and C. Flournoy (2001). A birth and death urn for ternary outcomes:
Stochastic processes applied to urn models. In C. A. Charalambides, M. V.
Koutras, and N. Balakrishnan (Eds.), Probability and Statistical Models with Applications, pp. 583600. Boca Raton: Chapman and Hall/CRC Press.
Matthews, P. C. and W. F. Rosenberger (1997). Variance in randomized play-thewinner clinical trials. Statistics & Probability Letters 35 (3), 233 240.

42

140293481

Melfi, V. and C. Page (1998). Variability in Adaptive Designs for Estimation of


Success Probabilities, Volume 34 of Lecture NotesMonograph Series, pp. 106
114. Hayward, CA: Institute of Mathematical Statistics.
Melfi, V. F., C. Page, and M. Geraldes (2001). An adaptive randomized design with
application to estimation. Canadian Journal of Statistics 29 (1), 107116.
Rosenberger, W. F. (1999). Randomized play-the-winner clinical trials: review and
recommendations. Controlled Clinical Trials 20 (4), 328342.
Rosenberger, W. F., N. Stallard, A. Ivanova, C. N. Harper, and M. L. Ricks (2001).
Optimal adaptive designs for binary response trials. Biometrics 57 (3), 909913.
Rosenberger, W. F., A. N. Vidyashankar, and D. K. Agarwal (2001). Covariateadjusted response-adaptive designs for binary response. Journal of Biopharmaceutical Statistics 11 (4), 227236.
Smythe, R. T. and W. F. Rosenberger (1995). Play-the-winner designs, generalized
Polya urns, and Markov branching processes, Volume Volume 25 of Lecture Notes
Monograph Series, pp. 1322. Hayward, CA: Institute of Mathematical Statistics.
Sun, R., S. H. Cheung, and L.-X. Zhang (2007). A generalized drop-the-loser
rule for multi-treatment clinical trials. Journal of Statistical Planning and Inference 137 (6), 20112023.
Tamura, R. N., D. E. Faries, J. S. Andersen, and J. H. Heiligenstein (1994). A case
study of an adaptive clinical trial in the treatment of out-patients with depressive
disorder. Journal of the American Statistical Association 89, 768776.
Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds
another in view of the evidence of two samples. Biometrika 25 (3), 285294.
Wei, L. J. and S. Durham (1978). The randomized play-the-winner rule in medical
trials. Journal of the American Statistical Association 73, 840843.
Zelen, M. (1969). Play the winner rule and the controlled clinical trial. Journal of
the American Statistical Association 64, 131146.
Zhang, L.-X., W. Chan, S. Cheung, and F. Hu (2007). A generalized drop-the-loser
urn for clinical trials with delayed responses. Statist. Sinica 17 (1), 387409.
Zhang, L.-X. and F. Hu (2009). A new family of covariate-adjusted response adaptive designs and their properties. Applied Mathematics-A Journal of Chinese
Universities 24 (1), 113.
Zhang, L.-X., F. Hu, S. H. Cheung, and W. S. Chan (2014). Multiple-treatment
efficient randomized adaptive design with minimum selection bias. Manuscript.

43

140293481

R code used in simulations

Note
The code given in the sections below performs a simple execution of all rules considered throughout this dissertation. Each rule was defined in such a way that it allows
any number of treatments, apart from RPW, RDBCD and ERADE which were not
extended to K > 2 treatments. The designs given here also do not allow delayed
responses, but a description of how to extend the procedures is given in Section 3.4.
The ORBD given here allows covariate adaptiveness but can be altered to disregard
this, as was outlined in Section 2.3.1.

A.1

RPW

rpw <- function(p,n,alpha,beta){


#declarations
balls<-rep(alpha,2) #initial urn composition
allocated<-c() #treatment allocated to each patient
s<-rep(0,2) #holds number of successes
f<-rep(0,2) #holds number of failures
for(i in 1:n){ #main loop
decision<-runif(1) #random number for deciding treatment
if (decision<(balls[1]/sum(balls))){ #if ball drawn corresponds to tmnt 1
allocated[i]<-1 #assign tmnt 1
}else{
allocated[i]<-2 #assign tmnt 2
}
t_outcome<- runif(1) #decides outcome
if(t_outcome<p[allocated[i]]){ #outcome is success
s[allocated[i]]<-s[allocated[i]]+1 #update number of successes
balls[allocated[i]]<-balls[allocated[i]]+beta
#add beta balls of corresponding tmnt to urn
}else{ # outcome is a failure
f[allocated[i]]<-f[allocated[i]]+1 #update number of failures
if(allocated[i]==1){ #if tmnt i=1
balls[1]<-balls[1]+beta #add beta balls of opposite kind
}else{
balls[2]<-balls[2]+beta #add beta balls of opposite kind
}
}
}
return(append(sum(allocated==1)/n,sum(f)/n)) #return AP and FP
}

A.2

DL

dl <- function(p,n,z){ #this rule works for K treatments


#declarations
bounds<-c(0)
allocated<-rep(1,n) #treatment allocated to each patient
s<-rep(0,length(p)) #holds number of successes
f<-rep(0,length(p)) #holds number of failures
for (i in 1:n){ #main loop
#bounds[1]<-0
while (allocated[i]==1){ #while immigration ball is being drawn
alloc_rnd<-runif(1) #random number to decide tmnt

44

140293481

for (j in 1:length(z)){ #loop to obtain probabilities of each tmnt


bounds[j+1]<-sum(z[1:j])/sum(z)
#probability of each tmnt based on urn composition
if (alloc_rnd > bounds[j] &&
alloc_rnd < bounds[j+1]){
#allocate based on rnd number generated
allocated[i]<-j
}
}
if(allocated[i]==1){ #if immigration ball chosen
for (k in 2:length(z)){ #add 1 ball for each tmnt
z[k]<-z[k]+1
}
}else{ #if not immigration ball obtain response
t_choice<-runif(1) #decides outcome
if (t_choice<p[allocated[i]-1]){ #success
s[allocated[i]-1]
<-s[allocated[i]-1]+1 #update number of successes
}else{
z[allocated[i]]
<- z[allocated[i]]-1 #update urn composition
f[allocated[i]-1]
<-f[allocated[i]-1]+1 #update number of failures
}
}
}
}
return(append(sum(allocated==2)/n,sum(f)/n)) #return AP and FP
}

A.3

GDL

rsihr_alloc <- function(p,tmnt){ #choice of a for rsihr allocation


return( 2*(sqrt(p[tmnt])/(sum(sqrt(p)))) ) #return a
}
urn_alloc<- function(p,tmnt){ #choice of a for urn allocation e.g DL rule
return(1)
}
gdl<- function(p,n,z,D=0,a_function){ #this rule works for K treatments
#declarations
bounds<-c(0)
allocated<-rep(1,n) #treatment allocated to each patient
s<-rep(0,length(p)) #holds number of successes
f<-rep(0,length(p)) #holds number of failures
p_est<-c(rep(0,length(p))) #estimate of p based on current responses
for (i in 1:n){
p_est<- (s+1)/(s+f+2) #update estimates of p
while (allocated[i]==1){ #while immigration ball is being drawn
alloc_rnd<-runif(1) #random number to allocation tmnt
for(j in 1:length(z)){ #decision prob based on urn composition
bounds[j+1]<-max(sum(z[1:j])/sum(z),0)
#probability of each tmnt based on urn composition
if ((alloc_rnd > bounds[j]) && (alloc_rnd < bounds[j+1])){
allocated[i]<-j
#allocate tmnt based on rnd number generated
}
}
if(allocated[i]==1){ #if immigration ball is drawn
for (k in 2:length(z)){ #update urn
z[k]<-z[k]+a_function(p_est,k-1)
#add a balls, depends on target allocation

45

140293481

}
}else{
z[allocated[i]]<-z[allocated[i]]-1 #draw a ball
}
}
#Response
t_choice<-runif(1)
if (t_choice < p[allocated[i]-1]){ #success
s[allocated[i]-1]<-s[allocated[i]-1]+1
z[allocated[i]] <- z[allocated[i]] + D
#if target allocation is rsihr D=0, if
}else{ #failure
f[allocated[i]-1]<-f[allocated[i]-1]+1
}

#update successes
#update urn composition
urn allocation D=1
#update failures

}
return(append(sum(allocated==2)/n,sum(f)/n)) #return AP and FP
}

A.4

DLC

dlc <- function(p,n,z,u,G,a){ #this rule works for K treatments


bounds<-c(0)
allocated<-rep(1,n) #tmnt allocation of each patient
pi_est<-c() #estimates of p
s<-rep(0,length(p)) #holds number of succcesses
f<-rep(0,length(p)) #holds number of failures
cov_bound<-c() #bounds to work out covariate levels
u_lvl<-c() #covariate levels
#bounds to split the uniformly distributed covariates into grades
for(m in 1:(G+1)){
cov_bound[m]<-abs(min(u))+ (m/(G+1))
}
cov_bound<-append(0,cov_bound) #lower bound for covariates
for (q in 0:G){
pi_est[q+1]<- (q+1)/(G+1) #success probabilites for each covariate level
}
for (i in 1:n){ #main loop
for(b in 1:(G+1)){ #place continuous covariate into right covariate level
if(u[i]>=cov_bound[b] && u[i]<=cov_bound[b+1]){
u_lvl[i]<-b
}
}
while (allocated[i]==1){ #while immigration ball is drawn
alloc_rnd<-runif(1) #random number to assign tmnt
for (j in 1:length(z)){ #decision prob based on urn composition
bounds[j+1]<-sum(z[1:j])/sum(z)
#probability of each tmnt based on urn composition
if (alloc_rnd > bounds[j] && alloc_rnd < bounds[j+1]){
allocated[i]<-j
#allocate tmnt based on rnd number generated
}
}
if(allocated[i]==1){ #if immigration ball is drawn
for (k in 2:length(z)){ #update urn
z[k]<-z[k]+1 #add a ball for each tmnt
}
}else{
t_choice<-runif(1) #decides response
replace_dec<-runif(1) #decides whether to replace ball
z[allocated[i]] <- z[allocated[i]] - 1 #draw ball

46

140293481

if (t_choice<((a^(u_lvl[i]-1))*(p[allocated[i]-1]))){
#success, using a parameter
s[allocated[i]-1]<-s[allocated[i]-1] + 1 #update success
if (replace_dec < pi_est[u_lvl[i]]){
#replace ball with corresponding probability pi
z[allocated[i]] <- z[allocated[i]] +1
}
}else{ #failure
f[allocated[i]-1]<-f[allocated[i]-1] + 1 #update failure
#replace ball with corresponding probability 1-pi(G-j)
if (replace_dec < (1-pi_est[(G+2)-u_lvl[i]])){
z[allocated[i]] <- z[allocated[i]] +1
}
}
}
}
}
return(append(sum(allocated==2)/n,sum(f)/n)) #return AP and FP
}

A.5

ORBD

orbd_cov <- function(p,n,u){ #this rule works for K treatments


n_t<-length(p) #number of tmnts
s<-rep(0,n_t) #holds success probabilities for each tmnt
f<-rep(0,n_t) #holds failure probabilities for each tmnt
tmnt<-c() #holds tmnts for each consecutive patient
response<-c() #holds response for each consecutive patient
allocated<-c() #tmnt allocated
cov_tbl<-c() #holds covariate information for each patient
beta_est<-0 #beta estimates
for (i in 1:n){ #main loop
dec_prob<-c(0) #reset decision probability after each patient
if (any(s==0)|| any(f==0)) {
#if there are any tmnts with no outcomes yet
dec_prob<-append(dec_prob,rep(1/n_t,n_t-1)) #equal allocation
}else{ #if all tmnts have at least one response
model<-glm(response~tmnt+cov_tbl+cov_tbl*tmnt,
family="quasibinomial"(link="logit")) #logistic model using glm
beta_est<-model$coefficients #current model estimates
beta_est[1]<-0 #since exp(0)=1,
# allows generalisation since we assign using 1/(B+..),B[2]/(B+..) and so on
if(any(is.na(beta_est)) || model$converged==0){
#if beta are missing or algorithm did not converge
dec_prob<-append(dec_prob,rep(1/n_t,n_t-1)) #equal allocation
}else{
for (j in 2:(n_t)){
#probability of each treatment based on logisitic model
dec_prob[j]<- exp(beta_est[j-1])/(1 +
sum(exp(beta_est[2:n_t]+u[i]*beta_est[j-1])))
#using function defined in orbd section
}
}
}
dec_prob<-append(dec_prob,1) #upper decision probability, used to generalise next loop
fair_coin<-runif(1); #random number deciding tmnt
for (k in 1:n_t){
if(fair_coin > sum(dec_prob[1:k]) && fair_coin <
sum(dec_prob[1:(k+1)])){
allocated[i]<-k
}

47

140293481

}
t_outcome<-runif(1); #random number to decide outcome
if(t_outcome<p[allocated[i]]){ #success
s[allocated[i]]<-s[allocated[i]]+1 #update successes
}else{ #failure
f[allocated[i]]<-f[allocated[i]]+1 #update failures
}
#update variables for the logistic model
tmnt<-append(tmnt,toString(allocated[i]))
response<-append(response, (s[allocated[i]]/
(s[allocated[i]]+f[allocated[i]])))
cov_tbl<-append(cov_tbl, u[i])
}
return(append(sum(allocated==1)/n,sum(f)/n)) #return AP and FP
}

A.6

DBCD

g_func <- function(x,y,alpha,L){ #g function. This is the multi-tmnt version


vec<-rep(0,length(y))
ans<-vec
for(i in 1:length(vec)){
vec[i]<-((y[i]*(y[i]/x[i])^alpha)^L)
}
for(j in 1:length(vec)){
ans[j]<-vec[j]/sum(vec)
}
return(ans)
}
urn_allocation <- function(p){ #target urn allocation
ans<-rep(0,length(p))
for(i in 1:length(ans)){
ans[i]<- (1/(1-p[i]))/(sum(1/(1-p)))
}
return(ans)
}
rsihr_allocation <- function(p){ #target rsihr allocation
ans<-rep(0,length(p))
for(i in 1:length(ans)){
ans[i]<- (sqrt(p[i]))/sum(sqrt(p))
}
return(ans)
}
dbcd<- function(p,n,n0,alpha=0,L=1,alloc_prop){
#declarations
allocated<-c() # tmnt assigned to each patient
t<- length(p) #number of tmnts
s<-rep(0,t) #number of successes
f<-rep(0,t) #number of failures
bounds<-c(0) #bounds on each tmnt probability, used for tmnt assignment
p_est<-c() #estimates of p
for(i in 1:n){ #main loop
if(any((s+f) < n0)){ #if less than n0 patients assigned
bounds<-append(0,rep(1/t,t)) #equal allocation
}else{
for(j in 1:t){
p_est[j]<-(s[j]+1)/(s[j] +f[j] + 2) #p estimates
}
bounds[2:(t+1)]<-g_func(s+f,alloc_prop(p_est),alpha,L) #assignment probabilities for each tmnt
}
fair_coin <- runif(1) #random number to decide tmnt

48

140293481

for(k in 1:t){
if(fair_coin>sum(bounds[1:k]) && fair_coin<sum(bounds[1:(k+1)])){
#assign tmnt based on probabilities above
allocated[i]<-k
}
}
response_dec <- runif(1) #random number to obtain response
if(response_dec < p[allocated[i]]){ #success
s[allocated[i]]<-s[allocated[i]] + 1 #update number of successes
}else{ #failure
f[allocated[i]]<-f[allocated[i]] + 1 #update number of failures
}
}
return(append(sum(allocated==1)/n,sum(f)/n)) #return AP and FP
}

A.7

RDBCD

g_func <- function(x,y,z){ #g function for the RDBCD, as defined previously


ans<-0
if(x==0 || x==1){
ans<- 1-x
}else{
ans<- (y*((y/x)^z))/((y*((y/x)^z))+(((1-y)*((1-y)/(1-x))^z)))
}
return(ans)
}
rsihr_alloc<-function(p_a,p_b){ #rsihr allocation proportion
ans<- (sqrt(p_a)/(sqrt(p_a)+sqrt(p_b)))
return(ans)
}
rdbcd<- function(p,n,n0,allocation,u){ #only allows K=2 tmnts
allocated<-rep(0,n) #allocation for each consecutive patient
s<-rep(0,2) #number of successes for each tmnt
f<-rep(0,2) #number of failures for each tmnt
cov_bound<-c()
u_lvl<-c()
for(i in 1:n){
if(sum(allocated==1)<n0 || sum(allocated==2)<n0){ #if patients assigned is less than n0
dec_prob=0.5 #equal allocation
}else{
p_a <- (s[1]+1)/(s[1] +f[1] + 2) #estimate p_1
p_b <- (s[2]+1)/(s[2] +f[2] + 2) #estimate p_2
dec_prob<-g_func(sum(allocated==1)/i,allocation(p_a,p_b),u[i])
#probability using g function and current estimates of p
}
fair_coin <- runif(1) #random number to decide tmnt assignemnt
if (fair_coin < dec_prob){
allocated[i]=1 #assign tmnt 1
}else{
allocated[i]=2 #assign tmnt 2
}
response_dec <- runif(1) #random number to decide response
if(response_dec<p[allocated[i]]){ #success
s[allocated[i]]<-s[allocated[i]] + 1 #update number of successes
}else{ #failure
f[allocated[i]]<-f[allocated[i]] + 1 #update number of failures
}
}
return(append(sum(allocated==1)/n,sum(f)/n)) #return AP and FP
}

49

140293481

A.8

ERADE

#allocation proportions not defined here to save space


#these are the same as for the DBCD
erade <- function(p,n,n0,alloc_prop,alpha){ #only allows K=2 tmnts
allocated<-c() #tmnt allocated to each consecutive patient
s<-rep(0,length(p)) #number of successes for each tmnt
f<-rep(0,length(p)) #number of failures for each tmnt
p_est<-rep(0,length(p)) #estimate of p
for(i in 1:n){ #main loop
if(any( (s+f) < n0)){ #if less patients assigned than n0 on each tmnt
sel_prob<-0.5 #equal allocation
}else{
rho_est<-alloc_prop(p_est[1],p_est[2])
#estimate of allocation proportion using current estimates of p
#piecewise function that defines assignment probability, defined in ERADE section
if(((s[1]+f[1])/i)>rho_est){ #condition 1
sel_prob<-alpha*rho_est
}else if((s[1]+f[1])==rho_est){ #condition 2
sel_prob <-alpha*rho_est
}else{ #condition 3
sel_prob <- 1 - alpha*(1-rho_est)
}
}
fair_coin <- runif(1) #random number to decide tmnt
if(fair_coin<sel_prob){
allocated[i]<-1 #allocate tmnt 1
}else{
allocated[i]<-2 #allocate tmnt 2
}
response_dec <- runif(1) #random number to decide response
if(response_dec < p[allocated[i]]){ #success
s[allocated[i]]<-s[allocated[i]] + 1 #update number of successes
}else{ #failure
f[allocated[i]]<-f[allocated[i]] + 1 #update number of failures
}
p_est<-c((s[1]+1)/(s[1]+f[1]+2),(s[2]+1)/(s[2]+f[2]+2)) #update current p estimates
}
return(append(sum(allocated==1)/n,sum(f)/n)) #return AP and FP
}

50

Potrebbero piacerti anche