Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Hannelore Liero
Institute of Mathematics
University of Potsdam
Contents
1 Introduction
1.1 Explanatory Variables . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Parametric and Semiparametric Models . . . . . . . . . . . . . .
1.3 Data Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
4
7
10
11
.
.
.
17
17
18
18
.
.
.
.
.
.
.
.
21
24
27
28
29
35
39
39
43
Introduction
Survival analysis concerns the times of events. Such data are particularly
common in medicine, engineering and social sciences, but also arise in many
other domains. The responses may be incompletely observed owing to censoring
or truncation. Here we give an introduction to regression analysis of such data,
that is we investigate the relationship between the survival time and values of
an explanatory variable.
We consider the situation with just one event per individual. Throughout we
use the terms death, failure or event to describe the event of interest, and
refer to the time to death as a lifetime or survival time.
Let us start with an example:
Data example 1.1 (Leukaemia and white blood count.) Table 1.1 contains data on survival of acute leukaemia victims considered by Feigl and Zelen.
The covariate is log10 white blood cell count at time of diagnosis, and the
patients are grouped according to the presence or not of a morphologic characteristic of their white blood cells (AG).
No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
AG positive
log10 (WBC) Time
3.36
65
2.88
156
3.63
100
3.41
134
3.78
16
4.02
108
4.00
121
4.23
4
3.73
39
3.85
143
3.97
56
4.51
26
4.54
22
5.00
1
5.00
1
4.72
5
5.00
65
No.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
AG negative
log10 (WBC) Time
3.64
56
3.48
65
3.60
17
3.18
7
3.95
16
3.72
22
4.00
3
4.28
4
4.43
2
4.45
3
4.49
8
4.41
4
4.32
3
4.90
30
5.00
4
5.00
43
Table 1.1: Survival time and white blood cells (Feigl and Zelen)
In Data example 1.1 the white blood cell counts (WBC) gives additional
information about the survival time. In addition we have to take into
account the morphologic characteristic. Thus we have two covariates: z =
(log10 (WBC), group number).
1.1
Explanatory Variables
z2
z3
1 The data are taken from Klein and Moeschberger and are available in the Appendix:
R-Programs.
In this data set we see that there three intermediate events that occur during the
transplant recovery process which be related to the disease-free survival time.
These are the development of acute graft-versus-host disease, the development
6
Pat.Nr
1
2
3
..
.
group
1
1
1
..
.
t1
2081
1602
1496
..
.
t2
2081
1602
1496
..
.
d1
0
0
0
..
.
d2
0
0
0
..
.
d3
0
0
0
..
.
ta
67
1602
1496
..
.
da
1
0
0
..
.
tc
121
139
307
..
.
dc
1
1
1
..
.
tp
13
18
12
..
.
dp
1
1
1
..
.
49
50
51
2
2
2
860
1258
2246
860
1258
2246
0
0
0
0
0
0
0
0
0
860
1258
52
0
0
1
860
120
380
0
1
1
15
66
15
1
1
1
Pat.Nr
1
2
3
..
.
z1
26
21
26
..
.
z2
33
37
35
..
.
z3
1
1
1
..
.
z4
0
1
1
..
.
z5
1
0
1
..
.
z6
1
0
0
..
.
z7
98
1720
127
..
.
z8
0
0
0
..
.
z9
1
1
1
..
.
z10
0
0
0
..
.
49
50
51
25
30
45
31
16
39
0
0
0
1
1
0
0
1
0
1
0
0
180
180
105
0
0
0
1
2
4
0
1
0
In Section 2 and Section 3 we will consider only models with covariates which
do not vary in time.
As in the classical regression theory we can consider the covariate as a value
of a random variable or as a fixed quantity. Roughly speaking, if we know
the values of the covariate before the experiment is carried out or if we observe
2 taken
from Klein/Moeschberger
1.2
Let denote the function describing the influence of the covariate z on the
lifetime X. Suppose that has a parametric form, that is () = (; ),
where (; ) is known up to the finite-dimensional parameter .
If the type of the distribution of X is known, we have a parametric model.
One possibility to define such a model is to choose a typical lifetime distribution
where the parameters depend on the covariate. Consider the following examples:
Example 1.1 (Exponential model) Suppose that X given Z = z is expop
nentially distributed with expectation exp( t z) = exp( j=1 zj j ). Then the
survival function is defined by
S(x|z) = exp((z; )x) with (z; ) = exp( t z).
ln x t z
The function h0 is the so-called baseline hazard rate, which does not depend
on the covariate, and > 0. The key feature of this type of models is that
the hazard rates of two individuals with distinct values of the covariate are
proportional: Let z = z , then the ratio of the hazards is for arbitrary x
h(x|z)
h0 (x) (z)
(z)
=
=
h(x|z )
h0 (x) (z )
(z )
which is a constant. The baseline hazard h0 can be considered as hazard function
for an individual whose covariate vector z is such that (z) = 1. In other words,
the baseline hazard is for all objects/individuals the same. The function does
not contain a constant.
Fully parametric PH models specify the baseline hazard h0 (; ) and (; )
parametrically.
Very often a parametric form is assumed only for the function, ; the
baseline hazard h0 is treated nonparametrically. Such models are called
semiparametric models.
Because h must be positive, a common parametric specification for is
(z; ) = exp( t z), in which case h0 is the hazard function for z = 0.
An estimation procedure for this class of semiparametric models is given in
Section 3.
Data example 1.4 (Male laryngeal cancer patients; continuation) Let
us fit a PH model to the data of Data example 1.2. Set
(z; ) = exp(1 z1 + 2 z2 + 3 z3 + 4 z4 ).
With the method described in the next section we will obtain the following
estimates
1 = 0.1386 2 = 0.6383 3 = 1.6931 4 = 0.0189.
Thus the relative risk for a 50-year-old patient compared to a 40-year-old
patient (both in Stage IV disease) is exp(10 4 ) = 1.2.
Exercise 1.1 Show that the survival function of a PH model has the form
S(x|z) = S0 (x)(z) ,
where S0 is the baseline survival function.
Exercise 1.2 Does the Weibull distribution with shape parameter and scale
parameter (z), i.e.
S(x|z) = exp((x/(z)) )
belong to the PH family?
(1.1)
Another important class of survival models with covariates are the accelerated
life time models (ALT)3 . Here the covariates are assumed to act directly on
the lifetime, so to speed it up or to retard its progress. In terms of the lifetime
X, the speeding up or slowing down is accomplished by the positive covariate
function and we may write for X given Z = z or X = Xz , respectively
X =
X0
.
(z)
(1.2)
It follows from (1.2) that the survival function of X has the form
S(x|z) = S0 (x(z)),
where S0 is the survival function of the baseline lifetime X0 .
Exercise 1.3 Show that the Weibull distribution (2.10) belongs to the ALT
family.
Exercise 1.4 Show that the survival functions of PH models and ALT models
are ordered in the sense that S(x|z1 ) S(x|z2 ) for all x or S(x|z1 ) S(x|z2 ).
An interesting property of the ALT models is the following: Taking the
logarithms on each side of (1.2) gives
Y = ln X = ln (z) + W,
where W is a random error with a distribution independent of the covariates:
P(W > w|z)
10
|j (u)| du <
0
for all t, j = 1, . . . , p.
Lin and Ying proposed an alternate additive hazards regression model. For their
model the possible time-varying coefficients in the Aalen model are replaced by
constants:
h(x|z) = 0 (x) + t z(x)
where j , j = 1, . . . , p are unknown parameters and 0 is an arbitrary baseline
function.
For the investigation of such models the reader is referred to the books
Klein/Moeschberger (Chapter 10) and Martinussen/Scheike (Chapter 5).
1.3
Data Structure
Data of the following structure are the basis of our investigations: An individual
may not be observed on the whole of its lifetime, so that, for example, we
may only know that it survived to the end of the study. In other words, we
have censored data. We will consider random right-censoring of type I: Each
individual is assumed to have a life time X and a censoring time C with survival
functions S and 1 G, respectively. Our observations are realizations of the
independent and identically distributed triples (Ti , i , Zi ) or of the independent
(not identically distributed) pairs (Ti , i )
Ti = min(Xi , Ci ),
i =
1
0
if
if
Xi Ci
.
Xi > Ci
(1.3)
1
0
tc
.
t<c
11
h(ti |z i ; )i S(ti |z i ; )
L() =
= (t , t )t
(2.4)
i=1
l() =
i ln h(ti |z i ; ) H(ti |z i ; ).
li (; ti , z i , i ) =
i=1
(2.5)
i=1
l() !
=0
s
s = 1, . . . , m + p = k.
D
n Nk (0, ()1 )
(2.6)
for a positive definite matrix () defined later on. The proof of (2.6) is based
on the following steps:
Step 1: By Taylor expansion (assuming that the likelihood function is three
times differentiable) we obtain for the score vector 4
.
U () U () = U () = M () ( )
where M () is the k k matrix with the elements
Msr () =
2 l()
,
s r
s, r = 1, . . . , k.
Thus
.
n = nM ()1 U () =
M ()
n
U ()
.
n
(2.7)
Step 2: Applying the central limit theorem for independent but not necessarily
identically distributed random vectors to
n
U () =
i=1
4 The
li (, Ti , z i , i )
12
Var
i=1
li (, Ti , z i , i )
zi
1
n
Vi (, z i ) (),
i=1
Nk (0, ()).
n
(2.8)
Step 3: By the law of large numbers for independent, but not necessarily
identically distributed random vectors we obtain
EM () P
M ()
0.
n
n
And since (under the usual regularity condition of exchange of integration and
differentiation)
n
Vi (, z i )
EM () =
i=1
we have
M () P
().
n
(2.9)
The statements (2.7), (2.8) and (2.9) together imply the desired asymptotic
normality (2.6).
Nk (, (n ())1 ),
and the limiting covariance matrix can be consistently estimated by
1
J()1 =
Vi (, z i )
i=1
t z i i
l() =
i=1
ti exp( t z i ).
i=1
13
ti exp( t z i )zij = 0
zij i +
i=1
j = 1, . . . , p.
i=1
The Fisher information matrix has (for fixed covariates) the elements
n
I()rs =
i=1
I()rs =
zir zis .
i=1
J()rs =
i=1
(0 + 1 zi1 + 2 zi2 )
i=1
where zi1 is the log10 (wbc) of the ith patient and zi2 = 1 if the ith patient is in
group AG positive and zi2 = 0 otherwise.
The maximum likelihood estimates and their standard errors are computed with
help of the R-procedure
ll<- function(beta)
{-sum(-beta[1]-beta[2]* leuk$wbc-beta[3]*leuk$groupleuk$time*exp(-beta[1]-beta[2]*leuk$wbc-beta[3]*leuk$group))}
outw<-nlm(ll,c(6,0,1),hessian=TRUE)
b<-outw$estimate
14
Parameter
0
1
2
Estimate
5.81
-0.70
1.02
se
1.29
0.30
0.35
The results are given in Table 2.1. The fitted means are shown in Figure 2.1.
With outw$hessian we obtain the estimated observed Fisher information,
150
AG positive
AG negative
1
1
1
100
1
2
50
1
1
1
2
2
2
1
1
2
1
2
2
3.0
3.5
4.0
1 2 2 222
4.5
2
1
5.0
log(wbc)
Exercise 2.1 Fit a Weibull distribution with shape parameter and scale
parameter (z; ), i.e.
S(x|z) = exp((x/(z; )) )
to the data of Data example 1.1.
(2.10)
15
Voltage Level (kV)
26
28
30
ni
3
5
11
32
15
34
19
36
15
38
Breakdown Times
5.79, 1579.52, 2323.7
68.85, 426.07, 110.29, 108.29, 1067.6
17.05, 22.66, 21.02, 175.88, 139.07, 144.12,
20.46, 43.40, 194.90, 47.30, 7.74
0.40, 82.85, 9.88, 89.29, 215.10,
2.75, 0.79, 15.93, 3.91, 0.27,
0.69, 100.58, 27.80, 13.95, 53.24
0.96, 4.15, 0.19, 0.78, 8.01,
31.75, 7.35, 6.50, 8.27, 33.91,
32.52, 3.16, 4.85, 2.78, 4.67,
1.31, 12.06, 36.71, 72.89
1.97, 0.59, 2.58, 1.69, 2.71,
25.50, 0.35, 0.99, 3.99, 3.67,
2.07, 0.96, 5.35, 2.90, 13.77
0.47, 0.73, 1.40, 0.74, 0.39,
1.13, 0.09, 2.38
Table 2.2: Times to breakdown (in minutes) at each of seven voltage levels
x
(z)
16
Parameter
0
1
Estimate
64.85
-17.73
1.288
se
5.62
1.61
0.113
Exercise 2.2 (Steel specimens) In Table5 2.4 you see data for four independent rolling contact fatigue tests on hardened steel specimens at four different
stress levels (in psi2 /106 ).
Failure at stress level
S2
S3
S4
S1
0.87
0.99 1.09
1.18
1.67 0.012
0.8 0.073
2.20 0.180 1.00 0.098
2.51 0.200 1.37 0.117
3.00 0.240 2.25 0.135
2.90 0.260 2.95 0.175
4.70 0.320 3.70 0.262
7.53 0.320 6.07 0.270
14.70 0.420 6.65 0.350
27.80 0.440 7.05 0.386
37.40 0.880 7.37 0.456
Table 2.4: Failure times of certain steel specimens at four stress levels
5 The
17
3.1
(3.11)
(3.12)
h(x|z)
= t (z z ).
h(x|z )
A slight extension of (3.11) is a model of the form h(x|z) = h0 (x) c( t z), where
c is a parametric function taking values in R+ . Model (3.11) can be extended
to a completely nonparametric model
p
j (zj )),
j=1
18
3.2
3.2.1
for the PH
L () =
zi )
(3.13)
i=1
In this case d = i=1 i . Tied life times have probability zero for continuous
distributions, but nevertheless they arise in data due to rounding. We will
consider tied observations later. Let R(t) denote the set of individuals who are
alive and uncensored just prior to time t; this is referred to as the risk set at
t since it consists of those individuals, who could be observed to die at t, given
what has occurred up to that time. The covariate associated with the individual
observed to die at t(i) is denoted by z (i) .
We start with a heuristic consideration. The idea behind the partial likelihood
approach is:
By considering t(i) along with its associated risk set Ri = R(t(i) ) in terms of
the probability of imminent death at t(i) conditional on Ri , we may write the
following approximation: Define the events
Ai = The individual from Ri with z (i) dies at t(i) .
Bi = A member of Ri dies at t(i) ..
Since
P(t X < t + dt|X t, z)
h(t|z) dt
we have
P(Ai |Bi )
=
=
=
P(Ai )
P(Bi )
h(t(i) |z (i) )
jRi h(t(i) |z j )
exp( t z (i) )
jRi
exp( t z j )
(3.14)
6 Note that for the case of random covariates (3.13) is the likelihood function based on the
conditional distribution.
19
exp( t z (i) )
L() =
jRi
i=1
exp( t z j )
(3.15)
However, the function defined in (3.15) is not a likelihood in the usual sense,
since it does not arise from the probability of some observable outcome. The
mathematical justification of this approach will be given in Section 3.2.3.
Note that the numerator of the partial likelihood depends only on information
from the individual who experiences the event, whereas the denominator utilizes
information about all individuals who have not yet experienced the event
including the individuals who will be censored later.
Here is an example for the constructions of the risk sets :
Observation t
7
124
88
13
2
79
No.
1
2
3
4
5
6
Censor
1
1
0
1
0
1
The log-likelihood is
d
k z(i)k
l() =
exp
ln
i=1
i=1 k=1
k zjk .
jRi
(3.16)
k=1
l() !
= 0.
l()
=
s
Us () = 0
d
jRi
z(i)s
i=1
i=1
zjs exp (
jRi
exp (
p
k=1 k zjk )
.
p
k=1 k zjk )
for s = 1, . . . , p.
(3.17)
(3.18)
20
Modifications when ties are present. Until now we have supposed that all
event times are different. Now, consider the case that ties are present. There
are modifications of the partial likelihood to adjust for simultaneous failure of
elements of the risk set Ri : Let t(1) < t(2) < < t(s) denote s distinct, ordered
death times. Further, di is the number of events at t(i) and Di is the set of all
individuals who die at time t(i) . Let v i be the sum of the vectors z j over all
individuals who die at t(i) . That is
zj .
vi =
jDi
The first proposal is due to Breslow (1974): The partial likelihood is expressed
as
s
exp( t v (i) )
L() =
i=1
jRi
di
(3.19)
exp( z j )
Lifetime t
51
51
322
828
339
551
Censor
1
1
1
0
0
1
Covariate z
50
47
48
42
54
50
Lifetime t()
51
322
339
551
838
Censor ()
1,1
1
0
1
0
Ties d
2
1
1
1
1
Covariate z()
50,47
48
54
50
42
21
e54
e48
,
+ e50 + e42 )
L() =
i=1
exp( t v (i) )
di
j=1
t
kRi exp( z k )
j1
di
t
kDi exp( z k )
(3.20)
For further proposals the reader is referred to the book of Klein and
Moeschberger.
3.2.2
To compute the the estimates for we have to solve (3.18) or the corresponding
equations following from a modified partial likelihood function. This can be done
numerically via a NewtonRaphson procedure. The R command
coxph( Surv(time, status) ~ x , data, method=... )
from the survival package can be used to find the maximum of the likelihood
function.
Here x is (as in the regression case) the vector of covariates; time is either the
event or the censoring time and status is a dummy variable coded 1 if the event
is observed or 0 if the observation is censored.
Let use consider some examples:
Data example 3.1 (Breast-Cancer Trial) In a study designed to determine
if female breast-cancer patients, originally classified as lymph node negative
by standard light microscopy (SLM), could be more accurately classified
by immunohistochemical (IH) examination of their lymph nodes with an
anticytokeratin, monoclonal antibody cocktail. The data for 45 female breastcancer patients with negative axillary lymph nodes and a minimum 10-year
follow-up were selected from The Ohio State University Hospitals Cancer
Registry. Of the 45 patients, 9 were immunoperoxidase positive and the
remaining 36 still negative.
Here are the data: (+ censored observation)
19
57
130+
152+
25
61
130+
153+
Immunoperoxidase
30
34
37
66
67
74
133+ 134+ 136+
154+ 156+ 162+
22
23
38
Negative:
46
47
78
86
141+ 143+
164+ 165+
Immunoperoxidase Positive:
42
73
77
89
51
122+
148+
182+
56
123+
151+
189+
115
144+
22
l() = d1
U () = d1
i=1
y1i exp()
!
= 0.
y0i + y1i exp()
im
coef
0.98
exp(coef)
se(coef)
2.66
0.435
z
2.25
p
0.024
exp(coef) se(coef)
z
2.326
0.313
2.70
0.343
0.429
-2.49
p
0.007
0.013
23
exp(coef) se(coef)
2.287
0.312
0.361
0.423
z
2.65
-2.40
p
0.008
0.016
wbc
group
coef
0.898
-1.085
exp(coef)
2.454
0.338
se(coef)
z
p
0.335
2.68 0.0073
0.446
-2.43 0.0150
Let us interpret these results: The relative risk of dying for a AG positive patient
compared to a AG negative patient of the same wbc is exp(2 ) = 0.343.
The relative risk for a patient with log10 (wbc) = z compared to a patient with
log10 (wbc) = z is exp(1 (z z )). Note that this is the same for both groups.
To include interactions let us fit the following model
h(x|z) = h0 (x) exp(1 z1 + 2 z2 + 3 z1 (z2 mean(group)),
Here are the results
coxph(Surv(time) ~ z_1 * z_2, data=leuk, method = breslow)
wbc
group
wbc:group
coef
0.922
-1.139
1.137
exp(coef)
2.51
0.32
3.12
se(coef) z
0.320 2.88
0.428 -2.66
0.636 1.79
p
0.0040
0.0078
0.0740
Let us discuss the relative risks based on the model with interactions. Here the
relative risk between both groups depends on the white blood counts: Note that
mean(group)=0.5151.
h(x|z1 , 1, z1 (1 0.5151))
= exp(2 + 3 z1 ).
h(x|z1 , 0, z1 (0 0.5151))
Consider now the hazard ratios in the same group. In the model without
interaction term we have
h(x|z1 , 0)
h(x|z1 , 1)
= exp(1 (z1 z1 )) =
.
h(x|z1 , 1)
h(x|z1 , 0)
However, with interaction term the relative risks differ:
h(x|z1 , 1, z1 (1 0.5151))
= exp(1 (z1 z1 ) + 3 (1 0.5151)(z1 z1 ))
h(x|z1 , 1, z1 (1 0.5151))
and
h(x|z1 , 0, z1 (0 0.5151))
= exp(1 (z1 z1 ) 3 0.5151(z1 z1 ))
h(x|z1 , 0, z1 (0 0.5151))
The plot in Figure 3.1 supports this fact.
24
Predictor
AG positive
AG negative
3.0
3.5
4.0
4.5
5.0
log(wbc)
3.2.3
The partial likelihood, proposed by Cox (1972), is a nice idea for handling
complicated data structures where the full likelihood function is hard to obtain.
It is particularly useful for delating nuisance parameters, leading to a simplified
likelihood equation for the parameters of interest.
The basic behind the partial likelihood concept in general is as follows: Suppose
that the data vector y is a realization of a random vector with the density
f (y; , ), where is the parameter of interest and is a nuisance parameter.
Suppose that Y can be transformed into parts V 1 , W 1 , V 2 , W 2 . . . , V m , W m .
The joint density of V 1 , W 1 , V 2 , W 2 . . . , V m , W m can be written as
m
fV i |Si (v i |Si ; , )
i=1
(3.21)
i=1
where Si = (V 1 , W 1 , . . . , V i1 , W i1 ), Qi = (Si , V i ).
If the second term depends just on , i.e.
m
25
In other words, V j consists of all events occurring between the time right after
T(j1) and just before T(j) , and the random variable T(j) itself. Then Qj is
the history from the time 0 to T(j) and T(j) , where T(j) denotes the time
instantaneously before T(j) . The following figure illustrates the notation:
Q3
Q2
Q1
V1
](
V2
](
V3
7
6
5
4
2
1
T(1)
T(2)
T(3)
In Figure 3.2 we have 7 observations, 3 are events. The labels of the events
are: w1 = 4, w2 = 1 and w3 = 6. The risk sets are R1 = {1, 3, 4, 5, 6, 7},
R2 = {1, 3, 6} and R3 = {6}
Now, let us derive the second term of (3.21) for the proportional hazard model:
P(Wj = l|Qj ) = lim
dt0
iRj \{l}
P(Ti
/ [t(j) , t(j) + dt)|Qj )
iRj \{k}
P(Ti
/ [t(j) , t(j) + dt)|Qj )
Now, as dt 0
P(Tk [t(j) , t(j) + dt)|Qj ) = P(Tk [t(j) , t(j) + dt)|Tk t(j) , z k )
Similarly
lim P(Ti
/ [t(j) , t(j) + dt)|Qj ) = 1.
dt0
h(t(j) |z k ) dt.
26
It follows
P(Wj = l|Qj )
h(t(j) |z l )
kRj h(t(j) |z k )
exp( t z l )
t
kRj exp( z k )
exp( t z (j) )
P(Wj = wj |Qj ) =
j=1
j=1
kRj
exp( t z k )
(3.22)
=
0
x(1)
(3.23)
x(n1) i=1
P(V = v) =
i=1
exp( t z (i) )
jRi
exp( t z j )
(3.24)
which is the likelihood function (3.15). In obtaining this result, we have used
the fact that the risk set is Ri = R(x(i) ) = {vi , vi+1 , . . . , vn }, since there is no
censoring.
In the simple noncensored case, therefore (3.15) is a legitimate likelihood
function arising from the probability function of the rank statistic. Under
suitable assumptions concerning the z i s, L() behaves in the usual way, with
being asymptotically normally distributed with mean and covariance matrix
I 1 , where I has the elements Ilm = E( 2 log L/i m ). If the data are
subject to Type II censoring, an extension of the preceding arguments yields the
same result. For more general types of censoring the argument breaks down,
however. In general the rank statistic is in fact unknown, because censoring
makes it impossible to know the exact ordering of the actual lifetimes.
27
Finally, consider the relation to the profile likelihood: If the hazard rate h0
is entirely arbitrary, then inference could only be based on events where failures
actually occurred, because the hazard might in principle be zero at every other
time. Thus it suffices to estimate the baseline cumulative hazard function by a
step function, say H0 (t) = i:t(i) t hi , where hi = h0 (t(i) ) > 0 only at observed
failure times. Suppose there are no ties. We can write the likelihood function
(3.13) in the form
d
L (, h1 , . . . , hd ) =
hi exp( t z (i) )
i=1
i=1
l (, h1 , . . . , hd )
[ln hi + t z (i) ]
=
i=1
H0 (ti ) exp( t z i )
i=1
[ln hi + t z (i) ]
=
i=1
exp( t z i )
i=1
[ln hi + t z (i) ]
hj
j|t(j) ti
i=1
exp( t z j )
hi
i=1
jRi
1
.
t
jRi exp( z j )
(3.25)
lp () = max l (, h1 , . . . , hd ) =
h1 ,...,hd
ln
i=1
exp( t z (i) )
jRi
exp( t z j )
(3.26)
i=1
exp( t z (i) )
jRi
exp( t z j )
3.3
28
Asymptotic Normality of
Let be the solution of the likelihood equations (3.18)). For the investigation
of the properties of it is convenient to write L in the following form
n
L() =
i=1
exp( t z i )
n
t
l=1 Yl (ti ) exp( z l )
with
Yi (t) =
(ti t),
Yi (t) = 1
(3.27)
i R(t)
i t z i ln
l() =
i=1
Yl (ti ) exp( t z l )
(3.28)
l=1
The score vector U () and the information matrix take simple forms. Define
for any t > 0 the p vector
n
t
l=1 Yl (t)z l exp( z l )
,
n
t
l=1 Yl (t) exp( z l )
z(t; ) =
i [z i z(ti ; )].
U () =
(3.29)
i=1
2 l()
t
we obtain
n
J() =
i
i=1
n
l=1
. (3.30)
Under regularity conditions one can show that the score function (properly
standardized) converges in distribution to a normal distribution. Roughly
29
speaking, these conditions ensure that n1 J() converges in probability to a
positive definite matrix (). For a detailed formulation of these condition see
Martinussen and Scheike (2006).
Theorem 3.1 Suppose that (given the covariate z)
n1 J() ()
in probability, where is a positive definite p p-matrix. Under regularity
conditions we have
D
n1/2 U ()
D
n1/2 ( )
Np (0, ())
(3.31)
Np (0, ()1 )
(3.32)
.
n( ) =
n I()1 U ()
=
.
=
Np (0, ()1 )
1 n1/2 U ()
Standard errors of and confidence intervals for are based on the normal
approximation.
Estimates for the variances of s , s = 1, . . . , p are found on the diagonal of
I 1 (); the standard errors, denoted by se(s ) or se(coef) in computer packages,
are the corresponding square roots. Let J()rs be the element (r, s) of the
matrix J()1 . Then
se(s ) =
3.3.2
J()ss .
(3.33)
30
(3.34)
(3.35)
( )t J()( )
2p
(3.36)
The third confidence region avoids the calculation of the estimate . From
statement (3.31) it follows that the quadratic form
QSn () = U ()t J()1 U ()
(3.37)
(3.38)
Based on the relationship between confidence regions and tests we can immediately formulate asymptotic -test procedures for testing the hypothesis
H : = 0
versus
K : = 0
(3.39)
31
- Score test
QSn ( 0 ) > 2p;1 .
For testing the single parameter s , s = 1, . . . , p we can use the analogue to the
confidence interval (3.33). The hypothesis H : s = 0s is rejected if 0s is not
covered by the confidence interval, i.e.
|s 0s |
> z1/2 .
se(s )
Note that the procedures in software packages carry out this test for the
hypotheses 0s = 0, s = 1, . . . , p.
2
= 2.66 and
se(2 )
3
= 1.79.
se(3 )
versus
K : C = c0 .
(3.40)
versus
K : 1 = 10 .
32
The estimators for the hypothetical parameter c0 based on the partial likelihood
method is C . The Wald statistic for testing (3.40) is given by
t
1 t 1
QW
C ] (C c0 ).
n (c0 ) = (C c0 ) [CJ()
(3.41)
2
If the hypothesis H in (3.40) is true the distribution of QW
n (c0 ) tends to a distribution with m degrees of freedom as n . Thus, we reject H if
2
QW
n (c0 ) > m;1 .
(3.42)
z1
z2
z3
z4
(stage II)
(stage III)
(stage IV)
(age)
coef
0.1400
0.6424
1.7060
0.0190
exp(coef)
1.15
1.90
5.51
1.02
se(coef)
0.4625
0.3561
0.4219
0.0143
z
0.303
1.804
4.043
1.335
p
7.6e-01
7.1e-02
5.3e-05
1.8e-01
Table 3.4: Results for the Data example 1.2 (without interaction)
33
I
II
III
II
1.15
III
1.90
1.65
IV
5.51
4.79
2.90
Let us derive estimates for the relative risk of dying for a stage II patient of age
z4 compared to a stage I patient of the same age:
h(x|stage II and age = z4 )
= exp(1 ) = 1.15
h(x|stage I and age = z4 )
Results for the other comparisons are given in Table 3.5.
Now, consider the model with all interactions:
coxph( Surv(time, delta)~ z1$*$z4 + z2$*$z4 + z3$*$z4, data = larynx)
z1
z2
z3
age
z1:age
z2:age
z3:age
coef
-8.08376
-0.16404
0.82526
-0.00293
0.12236
0.01203
0.01422
exp(coef)
0.000309
0.848705
2.282480
0.997073
1.130165
1.012106
1.014325
se(coef)
z
3.6936 -2.1886
2.4742 -0.0663
2.4229
0.3406
0.0261 -0.1124
0.0525
2.3295
0.0375
0.3206
0.0359
0.3959
p
0.029
0.950
0.730
0.910
0.020
0.750
0.690
exp(coef)
0.000559
1.868488
5.859972
1.006048
1.119998
se(coef)
z
3.4169
-2.192
0.3558
1.757
0.4238
4.172
0.0149
0.405
0.0479
2.367
p
0.02800
0.07900
0.00003
0.69000
0.01800
on 5 df, p=0.000175
n= 90
34
30
30
10
15
20
25
Stage I
Stage II
Stage III
Stage IV
10
15
20
25
Stage I
Stage II
Stage III
Stage IV
40
50
60
70
80
40
50
Age at diagnosis
60
70
Age at diagnosis
H : 6 = 7 = 0 versus
K : 6 = 0 or 7 = 0
80
35
Relative risk
76yearold
60yearold
40
50
60
70
80
Age at diagnosis
Figure 3.4: Relative Risk of a patient at stage II with respect to a patient at stage I,
depending on age at diagnosis; Data example 1.2
3.4
We can use the proportional hazard approach to compare the survival distribution of two groupsperhaps a treatment group and a control group. The
covariate z is used to identify which of the two groups an observation belongs
to: We set
z =
1
0
Let S1 and S2 be the survival functions corresponding to the two groups. One
then tests the hypotheses
H : S1 = S2
(3.43)
S1 = S2
(3.44)
(3.45)
36
(ti is a lifetime)
1i =
n1i =
Yl (ti )zj
j=1
n
n2i =
j=1
l()
Yj (ti ) ezj
i zi ln
=
i=1
j=1
n1
=
i=1
n
i=n1 +1
n
1i
i=1
i=1
1i
U () =
i=1
i n1i e
n1i e + n2i
(3.46)
J() =
i=1
i n1i n2i e
.
(n1i + n2i )2
(3.47)
1i
i n1i
n1i + n2i
n
i n1i n2i
i=1 (n1i +n2i )2
> u1/2 .
(3.48)
37
Treatment A
1 3 3 6 7 7 10 12 14 15 18 19 22
26 28+ 29 34 40 48+ 49+
1 1 2 2 3 4 5 8 8 9 11 12 14 16
18 21 27+ 31 38+ 44
Treatment B
1.0
0.0
0.2
0.4
0.6
0.8
Traetment A
Treatment B
10
20
30
40
50
coef
-0.388
exp(coef)
0.678
se(coef)
0.341
z
-1.14
p
0.25
38
on 1 df, p=0.255
n= 40
The commands
fit$wald.test; 1-pchisq(fit$wald.test,1)
fit$score; 1-pchisq(fit$score,1)
yield the values and the p-values of the test statistic of the Wald- and of the
score test:
QW = 1.30 p = 0.255
QS = 1.31 p = 0.252
With sqrt(fit$score) we get the test statistic of the log rank test; and the
two-sided significance is 2*(1-norm(sqrt(fit$score)))=0.2512.
All tests procedures give essentially identically results. There is no evidence of
a difference between both distributions.
The expression of the score U (0) has a built-in structure which becomes evident
when we re-express (3.46) (for = 0) in the following form
n
1i
i=1
n
i n1i
n1i + n2i
(3.49)
=
i=1
To see this note that only times ti at which a death occur (i = 1) contribute
to U (0) and J(0), and that if S1 = S2 , then the conditional expectation of 1i ,
given i = 1 and the numbers n1i , n2i at risk, is i n1i /(n1i + n2i ). This shows
directly that EU (0) = 0 under H : = 0.
Test of equality of three or more lifetime distributions are also readily obtained.
To compare m distributions S1 , . . . Sm we define a vector of m 1 indicators
z = (z1 , . . . , zm1 )t , where
zr =
1
0
r = 1, . . . , m 1.
The hypothesis
H : S1 = S2 = = Sm
is equivalent to
H : = 0.
Exercise 3.3 Write down the score test for comparing three survival distributions.
39
3.5
H0 (t) =
(3.50)
t(i) t
jRi
exp( z j )
H0 (t) =
ti t
n
l=1
(3.51)
Yl (ti ) exp( z l )
Note that in the case = 0 this estimator is just the NelsonAalen estimate.
The estimate H0 is a step function with jumps at the observed event times.
A simple way to estimate S0 is to exploit the relationship S0 (t) = exp(H0 (t))
and define
S0 (t) = exp(H0 (t)).
(3.52)
When there are no covariates, or = 0, this does not give the KaplanMeier
estimator, but another estimate, sometimes referred to as the Fleming
Harrington estimate.
To estimate the survival function for an individual with covariate z , one uses
the estimate
t z )
S(t|z ) = S0 (t)exp(
(3.53)
3.6
Stratification
There are instances when the proportional hazards assumption is violated for
some covariate. In such cases it may be possible to stratify on that variable
and employ the proportional hazards model within each stratum for the other
covariates. Here the subjects in the jth stratum have an arbitrary baseline
40
0.2
0.4
0.6
0.8
1.0
0.0
Stage I
Stage II
Stage III
Stage IV
0
time
Figure 3.6: Estimated survival functions for a larynx cancer patient of age 60 at
diagnosis
function hj0 and the effect of the explanatory variables on the hazard function
can be represented by a proportional hazards model
hj (x|z) = h0j (x) exp( t z),
j = 1, . . . , s.
(3.54)
l() =
lj (),
j=1
where lj is the partial likelihood (see (3.16)) using only the observations for those
individuals in the jth stratum. The parameter is estimated by maximizing
the partial likelihood function as before:
Data example 3.7 (Remission) Gehan (1965) and others have discussed the
results of a clinical trial reported by Freireich et al (1963), in which the drug
6-mercaptopurine (6-MP) was compared to a placebo with respect to the ability
to maintain remission in acute leukemia patients. 7 The trial was conducted
by matching pairs of patients by remission status (complete or partial) and
randomized within the pair to either 6-MP or placebo maintenance therapy.
Patients were followed until their leukaemia returned (relapse) or until the end
of the study.
7 The
41
Table 3.8 gives remission times for two groups of 21 patients each, one group
given the placebo and the other the drug 6-MP.
Pair
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16
17 17
18 18
19 19
20 20
21 21
Remission status
1
2
2
2
2
1
2
2
2
2
2
1
2
2
2
1
1
2
2
2
2
Placebo
1
22
3
12
8
17
2
11
8
12
2
5
4
15
8
23
5
11
4
1
8
Drug 6-MP
10
7
32
23
22
6
16
34
32
25
11
20
19
6
17
35
6
13
9
6
10
Relapse
1
1
0
1
1
1
1
0
0
0
0
0
0
1
0
0
1
1
0
0
0
coef
-1.57
exp(coef)
0.208
se(coef)
0.412
z
-3.81
p
0.00014
That is, the risk of relapse for patients given a placebo is exp(1.57) = 4.8 times
higher than those given 6-MP.
Taking into account the remission status we fit the model
hj (x|z) = hj0 (x) exp(z),
where j = 1 for the pairs with complete remission and j = 2 for those with
partial remission. The procedure is
42
coef
-1.79
exp(coef)
0.167
se(coef)
0.463
z
-3.87
p
0.00011
Here the relative risk of the placebo group compared to the 6MP-group is
exp(1.79) = 5.99.
43
Index
accelerated life time model, 9
ALT, 9
additive hazard models, 10
asymptotic confidence regions, 29
asymptotic normality, 11, 28
asymptotic tests, 29, 31
likelihood ratio statistic, 32, 38
likelihood ratio test, 31
score statistic, 38
score test, 31
Wald statistic, 32, 38
Wald test, 30
parametric model, 7
partial likelihood, 18, 24
partial log-likelihood, 40
profile likelihood, 27
proportional hazard model, 7
relative risk, 17, 23, 33, 34, 42
risk set, 1820, 25
score
statistic, 31
vector, 11, 19, 28, 30, 36
semiparametric model, 8
standard error, 29
stratification, 39
baseline distribution, 39
baseline hazard, 8