Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract
Survival analysis algorithm is often applied in the data mining process. Cox regression is one of the survival analysis tools
that has been used in many areas, and it can be used to analyze the failure times of aircraft crashed. Another survival
analysis tool is the competing risks where we have more than one cause of failure acting simultaneously. Lunn-McNeil
analysed the competing risks in the survival model using Cox regression with censored data. The modified Lunn-McNeil
technique is a simplify of the Lunn-McNeil technique. The Kalbfleisch-Prentice technique is involving fitting models
separately from each type of failure, treating other failure types as censored. To compare the two techniques, (the modified
Lunn-McNeil and Kalbfleisch-Prentice) a simulation study was performed. Samples with various sizes and censoring
percentages were generated and fitted using both techniques. The study was conducted by comparing the inference of
models, using Root Mean Square Error (RMSE), the power tests, and the Schoenfeld residual analysis. The power tests in
this study were likelihood ratio test, Rao-score test, and Wald statistics. The Schoenfeld residual analysis was conducted to
check the proportionality of the model through its covariates. The estimated parameters were computed for the cause-
specific hazard situation. Results showed that the modified Lunn-McNeil technique was better than the Kalbfleisch-
Prentice technique based on the RMSE measurement and Schoenfeld residual analysis. However, the Kalbfleisch-Prentice
technique was better than the modified Lunn-McNeil technique based on power tests measurement.
Key words: Survival analysis, data mining, Cox model, the modified Lunn-McNeil, Kalbfleisch-Prentice.
1.Introduction
Competing risks is a survival model dealing with more than one possible cause of death or failure to a subject in the
population (David and Moeschberger, 1978). Cox and Oakes (1984) suggested that in competing risks quite often interest is
focussed on one type of failure on its own. According to Kalbfleisch-Prentice (1980) the failure on an individual may be
caused by one of several distinct types or causes. In general, study subjects may experience a variable number of failures
each with its own type or cause. Each study subject has an underlying failure time T and a covariate vector z . The T may
be subject to censoring. Vector z is more generally a covariate function of Z = {z (u ) : u ≥ 0}, where failure can occur
due to m distinct types or causes denoted by J ∈ {1,2,..., m}. This paper compares the estimation of caused-specific
hazards method of Kalbfleisch-Prentice (1980) with the modified Lunn-McNeil (Lukman, 1999).
In this simulation study, two regression models for competing risks with censored data are compared. The first technique is
the Kalbfleisch-Prentice with Cox model. The second technique is based on the modified Lunn-McNeil. The modified
Lunn-McNeil technique is a simplification of Lunn-McNeil technique due to Lukman (1999). Both techniques can be used
for any number of different failure types assuming the risks are independent. In this study sample of various sizes and
censoring proportion are generated and fitted using both models. The cause specific hazard of joint estimation of
parameters obtained from the data duplication approach of the modified Lunn-McNeil has an advantage over the
Kalbfleisch-Prentice technique thus provide new insight into the analysis of survival data.
Data Mining and Knowledge Discovery: Theory, Tools, and Technology IV, Belur V. Dasarathy, Editor,
Proceedings of SPIE Vol. 4730 (2002) © 2002 SPIE · 0277-786X/02/$15.00 203
The modelling of the cause-specific hazard as a function of a covariate vector xi pertinent to each individual for the type
failure j with Cox model is
h j (t : xi ) = h0 j (t ) exp( β l xi ), j=1,2,…,m.
T
(1)
This is a straightforward extension of
λ (t , x i ) = λ 0 (t ) exp( β T xi ) , i = 1,..., n.
However both the underlying hazard h0l (t ) ≥ 0 and the vector of regression coefficients are specific to each of the m
failure types. As shown by Kalbfleisch-Prentice (1980.p.170), the parameter estimation is based upon the method of partial
likelihood. Resorting (1) , its conditional log likelihood function can be expressed as
n
L( β ) = ∑ β lT xi − ln ∑ exp( β lT xi ) .
k =1
j∈R
β l ' s are estimated separately for each failure type by considering failures of the remaining types as censored
T
The
observations on each individual.
The assumption is the same as in the Lunn-McNeil technique. The data entries are the same. Suppose that we have two
failure type, where failure type I and failure type II are denoted by δ and 1 − δ where δ = 0 or 1. If subject i fails at
time t i and the first failure type δ i (or 1 − δ i ) then the second failure type is 1 − δ i (or δ i ). By providing a column for
the second failure type, two entries are made as follows
Table 1. Data handling of the modified Lunn-McNeil technique
__________________________________________________________________
Failure Type Covariates
__________________________________________________________________
Subj Failure time Status I II I II
__________________________________________________________________
i ti 1 δi 1−δi xi , δ i xi xi , ( 1 − δ i ) xi
i (rep) ti 0 1−δi δi xi , ( 1 − δ i ) xi xi , δ i xi
__________________________________________________________________
If subject i is censored repeat as above but now the status for both failure type is equal to 0. The augmentation of the
second failure type in Table 1 is useful for seeking the joint estimation of parameters as shown in Table 4 of Lunn-McNeil
(1995). Later, we will regress the duplicated data on failure type (either one of two failure types) and its covariates.
The mechanism of modification is as follows; Cox model is run on the duplicated data set, failure type δ (or 1 − δ ) being
included with the covariates x and δ x (or x and 1 − δ ). Within the competing risks framework Kay (1986) says, for a
patient with covariate values x 2 ,..., x p the estimated hazards for cause–specific j is as follows
Where x1 is the binary treatment indicator and x 2 ,..., x p are the background covariates. Then run (2) using the approach
shown in Table 1.
The regression model introduced by Cox (1972) specifies the hazard rate λ j (t ; x) for each individual in terms of a vector
of covariates xi = ( xi1 ,...., xip ) specific to that individual and its vector of regression parameters β = ( β 1 ,....., β p ) , (2)
can then be written as follows
p
λ j (t ; xi ) = λ * (t ) exp ∑ β r x jr (3)
r =1
Contribution of (3) to the partial likelihood corresponding to the jth risk set is as follows
p
exp ∑ β r x jr
r =1
p
(4)
∑ exp ∑ β
j∈R r =1
r x jr
By introducing censoring indicator ϕ i (Noor Akma Ibrahim and Isa Daud, 1995) where ϕ i =1 if t i is failure and ϕ i = 0 if
t i is censored, (4) can be written as
p
exp((∑ β r x jr )ϕ i )
r =1
p
(5)
∑ exp(∑ β
j∈R r =1
r x jr ) ϕi
2.Simulation study
The first objective of this simulation study is to compare the mean, biasness, and root means square error (rmse) obtained
from fitting the cause-specific proportional hazards model by both techniques namely cause-specific hazards Kalbfleisch-
Prentice technique and cause–specific hazards based on Lunn-McNeil. In this simulation study the covariates chosen are
In generating the failure times we impose two values for λ (parameter of an exponential variate), in the first type of failure
λ=1.25, the second λ= .75. This is to get the proportionality between the two types of failure. Hence the curves of both
types of failure are parallel (in vertical distance). The data generated is simulated 1000 times for every sample size together
with the designated percentage of censoring. The observations to be censored, were randomly chosen. True (Initial)
Parameter of any failure type β 1 =1, β 2 =. 06605 (for age), β 3 =. 200 (for mismatch score), β 4 =. 0654 (for age by any
failure type), β 5 =. 198 (for mismatch score by any failure type).
In the rmse computation above we find that in general for every sample size and censoring percentage the modified Lunn-
McNeil technique gives values of estimated parameter for mean, biasness, and root mean square error (rmse) smaller than
those of Kalbfleisch-Prentice technique.
Bruning and Kintz (1997) wrote that the power of a statistical test is defined as the probability of rejecting the null
hypothesis when it is false.
On the simulation study regarding the power of the tests of the Likelihood ratio test, Rao-score test, and Wald statistic we
set α = .05, the power is set at .80 for the entire performances of those three tests.
The statistical inference on β = ( β 1 , . . ., β Κ ) are tested using the three tests which are based on the properties
of the likelihood function (Marubini and Valsecchi,1995). Suppose the hypothesis to be tested is
H0 : β = β 0
(a) The likelihood ratio (LR) test.
The test statistics for H 0 is based on the difference of the log-likelihood values. Miller (1981) wrote the
likelihood ratio test as follows
(β − β ) I ( β ) (β − β )
0 T 0 0
(8)
Which is asymptotically distributed as χ under H 0 with p degrees of freedom, where I (β ) is the Fisher information of
2
the entire sample or the expected value of the sample information matrix, Ε(i (β )) = Ι(β ) , and i β = −
∂2
∂β 2
()
log L β ()
is called the sample information matrix at β .
(c) Rao’s Score test
According to Miller (1981) the Rao’s score test is as follows
∂ ∂
log L( β 0 ) T Ι −1 ( β 0 ) log L( β 0 ) (9)
∂β ∂β
It is asymptotically distributed as χ with p degrees of freedom under H 0 , where Ι β
2 0
( )
is the Fisher information,
identical conclusions on the regression parameters (parameter estimate) (Marubini and Valsecchi, 1995). Powers of the
Wald, Rao-score, and the Likelihood Ratio statistics were used to test H 0 : β = β 0 , where β = (β 1 ,......., β Κ ) is the
vector of parameters with Κ = 1,2,3,........, n
Table 3 gives the results of power tests of the cause-specific hazards in the competing risks with censored data per 1000
Simulated Data. The value of the power is written using 3 digits, for example for Wald statistics of Kalbfleisch-Prentice
technique, for sample size 45 and censoring percentage 25% the power is written as 022. The 022 means that the χ value of
2
Wald statistics where the p-value < .05, occured only 22 out of 1000 chances. The interpretation is the same for Likelihood
ratio statistics and for Rao-score test. For the overall of 1000 data sets, the size of test is set at α = .05 and
β = 4 × α = .200 , and the power becomes 1- β = .800. This complies with the desired level of its power, that is in 1000
simulated data sets, any test of power can be powerful if only if it power occurs in at least 800 simulated data sets.
From Table 3 above it is clear that in the sample size of 45 (c.p of 25, and 50), and sample size of 80 (c.p of 25,50, and 75)
the Kalbfleisch-Prentice technique is powerful than the modified Lunn-McNeil technique with respect to the likelihood ratio
test and Rao-score test. The Wald statistic is weak in both techniques.
In accordance with the Schoenfeld residual, if the model holds, the hazards proportionality’s assumption is met, the
residuals will randomly be fluctuating around zero, and gradually converge to zero with regards to the time.
Suppose n individuals are indexed by i = 1,..., n and that each has a p − vector of covariates z i = ( z i1 ,..., z ip ) T . The
proportional hazards regression model specifies that the hazard function of the i th individual is
λi (t ) = λ0 (t ) exp( β T z i ),
where β is a vector of p parameters and λ 0 (t ) is an arbitrary function.
Let D be the indices of the individuals who failed and let Ri be the indices of those under observation when the i th
individual fails. Using partial (Cox, 1975) or marginal (Kalbfleisch and Prentice, 1980, p. 71) likelihood arguments, one
3. Conclusion
Using cause-specific hazards for parameter estimation in competing risks, the modified Lunn-McNeil technique is better
than the Kalbfleisch-Prentice technique with respect to the values of the rmse and results from Schoenfeld residual analysis.
However, the Kalbfleisch-Prentice is better than the modified Lunn-McNeil with respect to its powerfulness in power test
measurement.
References
Bruning, J.L.and Kintz, B.L. Computational Handbook of Statistics. Fourth Edition. New York,Longman, 1997.
Cox, D. R. “Regression Models and Life Tables (with discussion)”. J. R. Statist. Soc. B, 34, 187-220, 1972.
Cox, D. R. ”Partial Likelihood.” Biometrika 62,269-276, 1975.
Crowley , J and Hu, M. “Covariance Analysis of Heart Transplant Survival Data”. J. Amer. Statist. Assoc. 72, 27-36, 1977.
David, H. A. and Moeschberger, M. L. The Theory of Competing Risks. London, Griffin, 1978.
Farewell, V. T. “An Applicattion of Cox's Proportional Hazard Model to Multiple Infection Data.” Applied Statistics. 28,
136-143, 1979.
Kalbfleisch, J and Prentice, R. The Statistical Analysis of Failure Time Data. New York,Wiley, 1980.
Kay, R. “Treatment Effects in Competing Risks Analysis of Prostate Cancer Data”. Biometrics 42, 203-211, 1986.
Kuk, A. Y. C. ”A Semiparametric Mixture Model for the Analysis of Competing Risks Data”. Australian Journal of
Statistics 34, 169-180, 1992.
Larson, M. G. and Dinse,G. E. “A Mixture Model for the Regression Analysis of Competing Risks Data.” Applied Statistics
34, 201-211, 1985.
Lukman, I. A Simulation Study on Competing Risks with Censored Data Using Cox Model. Master Thesis.Universiti Putra
Malaysia, Serdang Malaysia. Unpublish.1999.
Lunn, M. and McNeil, D. “Applying Cox Regression to Competing Risks”. Biometrics 51, 524-532, 1995.
Marubini, E. and Valsecchi, M.G. Analysis Survival Data from Clinical Trials and Observational Studies. Chichester, John
Wiley & Sons, 1995.
Miller, R. Survival Analysis.New York, Wiley, 1981.
Noor Akma Ibrahim and Isa Daud. “Estimating Parameters of Proportional Hazards Model with Censored Data Using
SAS”. Proceedings of the Annual SAS User's Group Malaysia pp.19-20, 1995.
Iingl@yahoo.com; phone 603-89466742; Fax 603-89438109; Department of Environmental Science, Universiti Putra Malaysia, Serdang-
UPM,Selangor D.E. Malaysia 43400; nakma@fsas.upm.edu.my; phone 603-89466847; Fax 603-89437958; http://www.upm.edu.my;
Department of Mathematics, Serdang UPM, Selangor, D.E. Malaysia 43400;