Sei sulla pagina 1di 42

Survival Analysis: Kaplan-Meier estimate

“Maestrı́a en Estadı́stica”
“Universidad Nacional de San Antonio Abad del Cusco”

PhD. Alfredo Valencia-Toledo*1

*Research Group in Economic Analysis (RGEA), Universidade de Vigo

November 24, 2017

1
alfredo.valencia@unsaac.edu.pe
www.alfredo.valencia.webs.uvigo.es
PhD. Alfredo Valencia-Toledo*2 (*Research Group
Survival
in Economic
Analysis:Analysis
Kaplan-Meier
(RGEA),
estimate
Universidade de Vigo)November 24, 2017 1/1
Overview

Introduction to Kaplan-Meier methods


 Different representations.
 Comparing survival curves: the log-rank test.
 Using R.

2
Estimation of Survival
If there are no censored observations in a sample of
dimension n, the most natural estimator for survival
is the empirical estimator, given by

S (t )   I ti  t 
ˆ 1
n ti t

proportion of observations with failure times > t.


Alternative methods are necessary to incorporate
censoring (censored times different than event
times).
3
Introduction to Kaplan-Meier

 Non-parametric estimate of the survival function.

 Commonly used to describe the survivor of


population/s.

 Commonly used to compare two or more


populations.

 Intuitive graphical presentation.

4
Kaplan-Meier estimate

Kaplan and Meier (1958), obtained a nonparametric


estimate of the survival function, called product-
limit, which is the generalization of the empirical
estimator for censored data.
 dj 
Sˆ (t )   1  
j:t j t 
 n j 

5
Kaplan-Meier estimate
k distinct event times t1  t j  ...  t k Observed event times

at each event time t j , there are n j individuals at risk


d j is the number who have the event at time t j
nj - the number of individuals at risk at time tj.
 dj  Consists of the original sample minus all those who
Sˆ (t )   1   have been censored or had the event before tj
 n
j:t j t 

j  dj/nj=proportion that failed at the event time tj
1- dj/nj=proportion surviving the event time tj

S(t) represents estimated survival probability at time t: P(T>t)

6
Kaplan-Meier estimate
When there are no censored data, the Kaplan-Meier
(K-M) estimator is the empirical estimator (simple and
intuitive).
 For example, if you are following 10 patients, and 8 of
them die by the end of the first year, then your best
estimate of S(1 year) = 20%.

When there are censored data, K-M provides estimate


of S(t) that takes censoring into account.
 K-M remains unchanged at censored times.

7
Different representation based on
Kaplan-Meier Weights (KMW)
Assume that Yi  minTi , Ci  and  i  I Ti  Ci .
Let Y(1)  ...  Y( n ) denote the ordered Yi ' s and [i ] its concomitant.
S t   1  i 1Wi I Y(i )  t 
n
ˆ

[ i ] 
i 1 [ j ] 
Wi   1 
n  i  1 j 1  n  j  1

Where ties within the censored or within the uncensored


times are ordered arbitrarily, and ties among the uncensored
and censored times are treated as if the former precede the
later. 8
An example of right-censored data
Subject A

Subject B

Subject C

Subject D

Subject E
X 1. subject E with
event at 4 months

Beginning of study End of study or follow-up


9
Corresponding Kaplan-Meier
Curve

100%
Kaplan-Meier
fweight W1=1/5
Probability of
surviving to 4
months is 100% =
5/5

Probability of
surviving is = 4/5

Subject E with event


at 4 months

10
Survival Data
Subject A 2. subject A
O drops out after
6 months
Subject B

Subject C 3. subject C
X with event at 7
months
Subject D

Subject E
X 1. subject E with
event at 4 months

Beginning of study End of study or follow-up


11
Corresponding Kaplan-Meier
Curve
subject A drops
out at 6 months;
Kaplan-Meier
100% weigth W2 is null
Weigth W3

Survival = 8/15
subject C with
event at 7
months
12
Survival Data
Subject A 2. subject A
O drops out after
6 months
Subject B
O
4. Subjects B
Subject C 3. subject C
and D survive
X with event at 7
for the whole
months
Subject D study period
O

Subject E
X 1. subject E with
event at 4 months

Beginning of study End of study or follow-up


13
Corresponding Kaplan-Meier
Curve

Kaplan-Meier
100%
Weights. Sum is not
always equal to 1.

Kaplan-Meier is not
a proper distribution
function if the
higher time is
censored.
14
Greenwood estimator

2
     
ˆ Ŝt   vâr Ŝt   Ŝt  
2 di
i:ti t ni ni  d i 

15
Confidence intervals for S(t)
We can use this greenwood estimator of variance estimate
to derive a confidence interval for all time points t. Let σ
denote greenwood’s standard deviation. Then confidence
intervals for the survival function are then computed as
follows:

Sˆt   z1 2ˆ


However these confidence bands may be out of the (0,1)
interval!

16
Confidence intervals for S(t)

17
Example 1. Acute Myelogenous
Leukemia survival data
Survival in patients with Acute Myelogenous Leukemia

time: survival or censoring


status: censoring status
x: maintenance chemotherapy given? (factor)

R> library(survival)
R> ?aml
R> str(aml)
R> aml
18
Acute Myelogenous Leukemia
survival data

19
Example 1. Acute Myelogenous
Leukemia survival data
Survival in patients with Acute Myelogenous Leukemia

Consider only those patients for which x is “Maintained”

The data:
9, 13, 13+, 18, 23, 28+, 31, 34, 45+, 48, 161+

where + denotes right-censored observations.

R> aml2<-aml[aml$x=="Maintained",]
or just
R> aml3<-aml[aml$x=="Maintained",c(1,2)] 20
Example 1. Acute Myelogenous
Leukemia survival data
time status nj dj survival
9 1 11 1 1-1/11 = 10/11 = 0.909
13 1 10 1 10/11*(1-1/10) = 9/11 = 0.818
18 1 8 1 9/11*(1-1/8) = 63/88 = 0.716
23 1 7 1 63/88*(1-1/7) = 54/88 = 0.614
31 1 5 1 54/88*(1-1/5) = 27/55 = 0.491
34 1 4 1 27/55*(1-1/4) = 81/220 = 0.368
48 1 2 1 81/220*(1-1/2) = 81/440 = 0.184

 dj 
First censored time is 13 → Empirical Sˆ (t )   1  
estimator is equal to K-M before t=13. j:t j t 
 n j 
Be carefull: dj may be >1! 21
Example 1: Using R
Use survfit to compute the survival curve for censored data:

R> ?survfit
R> fit<-survfit(Surv(time,status)~1,data=aml2)
R> summary(fit)

time n.risk n.event survival std.err lower 95% CI upper 95% CI


9 11 1 0.909 0.0867 0.7541 1.000
13 10 1 0.818 0.1163 0.6192 1.000
18 8 1 0.716 0.1397 0.4884 1.000
23 7 1 0.614 0.1526 0.3769 0.999
31 5 1 0.491 0.1642 0.2549 0.946
34 4 1 0.368 0.1627 0.1549 0.875
48 2 1 0.184 0.1535 0.0359 0.944

22
Example 1: Using R
Use survfit.object to obtain specific information about the
survival curve:

R> ?survfit.object
R> summary(fit)$time # survival (event) times
R> summary(fit)$surv # estimates of survival
R> summary(fit,time=10)$surv

Use the plot function to obtain the graphical output

R> plot(fit, xlab="Time",ylab="Survival")

See plot.survfit function to obtain specific information about the plot


R> ?plot.survfit
23
Example 1: Using R
Use attributes to know what can be obtained from the object:

R> class(fit)
R> attributes(fit)
$names
[1] "n" "time" "n.risk" "n.event" "n.censor" "surv"
[7] "type" "std.err" "upper" "lower" "conf.type" "conf.int"
[13] "call"

$class
[1] "survfit"

24
Example 1: Using R
K-M with confidence intervals
R> plot(fit, xlab=“time",ylab="Survival")

Or without…
R> plot(fit, xlab=“time",ylab="Survival", conf.int=FALSE, main=“Kaplan-
Meier Estimator")

What about the empirical estimator?

R> fit2<-survfit(Surv(time,rep(1,length(time)))~1,data=aml2)
R> plot(fit2,xlab=“time",ylab="Survival", conf.int=FALSE,
main=“Empirical Estimator")

25
Corresponding Kaplan-Meier
Curve

26
Example 2: time-to-conception for
subfertile women
The event of interest is conception. "Failure" here is a
good thing! And women "survived" until they
conceived.
Time: time-to-conception (in months)
38 women were treated for infertility with laparoscopy
and hydrotubation.
All women were followed for up to 2-years to describe
time-to-conception.
Example from: BMJ, Dec 1998; 317: 1572 – 1580 (Published 5
December 1998). http://www.bmj.com/content/317/7172/1572.full

27
The data
1,1,1,1,1,1,2,2,2,2,2,2+,3,3,3,3+,4,4,4,4+,6,6,7+,
7+,8+,8+,9,9,9,9+,9+,9+,10,11+,13,16,24+,24+
where + denotes right-censored observations.

25 events
13 (right) censored observations.
Event times (ordered):
t1=1< t2=2< t3=3< t4=4< t5=6< t6=9< t7=10< t8=13< t9=16.

What about dk and nk?


28
The data
1,1,1,1,1,1, time ni di ci survival
1 38 6 0
2,2,2,2,2,2+
2 32 5 1
3,3,3,3+, 3 26 3 1
4,4,4,4+, 4 22 3 1
6 18 2 0
6,6,7+,7+,
7 16 0 2
8+,8+,9,9,9, 8 14 0 2
9+,9+,9+, 9 12 3 3
10 6 1 0
10,11+,13,
11 5 0 1
16,24+,24+ 13 4 1 0
16 3 1 0
29
24 2 0 2
Kaplan-Meier Curve (R)

Survival is a step-wise function


estimated at 9 event times.

Only decreases at event times!

30
Using the R package
Use read.table to import the dataset (e.g. from a
portable disk F; or use change dir…):
R> conception<-read.table("F:\\conception.txt",header=T,dec=".")

Use survfit to compute the survival curve for


censored data:
R> KM<-survfit(Surv(time,event)~1,data=conception)
R> summary(KM)

Use plot function for plotting of R objects:


R> plot(KM, xlab="time", ylab="Survival")
31
Using the R package: building
your own functions
KM<-function(Stime,status,predict.time){
times = Stime
bad <- is.na(times) | is.na(status)
times <- times[!bad]
status <- status[!bad]
if (sum(bad) > 0)
cat(paste("\n", sum(bad), "records with missing values dropped. \n"))
ooo <- order(times)
times <- times[ooo]
status <- status[ooo]
s0 <- 1
unique.t0 <- unique(times)
unique.t0 <- unique.t0[order(unique.t0)]
n.times <- sum(unique.t0<=predict.time)

for (j in 1:n.times) {
n <- sum(times >= unique.t0[j])
d <- sum((times == unique.t0[j]) & (status == 1))
if (n > 0)
s0 <- s0 * (1 - d/n)
}
return(s0) 32
}
Using the R package: building
your own functions
R> time<-c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,4,4,4,4,6,6,7,7,8,8,
9,9,9,9,9,9,10,11,13,16,24,24)
R> status<-c(1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,0,1,1,1,0,1,1,0,0,0,0,
1,1,1,0,0,0,1,0,1,1,0,0)
R> cbind(time,status)

R> KM(time,status,7)
[1] 0.4825175
R> system.time(KM(time,status,7))[3]
elapsed
0
33
Why use R
 The survival plot provides confidence bands.
 Other methods than Kaplan-Meier are
available (e.g. fleming-harrington)
 ….
survfit(formula, data, weights, subset, na.action, newdata,
individual=F, conf.int=.95, se.fit=T, type=c("kaplan-
meier","fleming-harrington", "fh2"),
error=c("greenwood","tsiatis"), conf.type=c("log","log-
log","plain","none"), conf.lower=c("usual", "peto",
"modified"))

34
Pointwise confidence bands
conf.type: One of "none", "plain", "log" (the default), or "log-log".

R> fit<-survfit(Surv(time,status)~1,data=aml, conf.type="none")

R> plot(fit)

R> fit<-survfit(Surv(time,status)~1,data=aml, conf.type="log-log")

R> plot(fit)

R>fit<-survfit(Surv(time,status)~1,data=aml,conf.type="log-log",
type="fh2", conf.lower="peto",error="tsiatis")

R> plot(fit)
35
Comparing 2 groups: back to
Example 1
Are these two
curves different?

Use log-rank test


to test the null
hypothesis of no
difference
between survival
curves of the two
(or more) groups.
36
Comparing two groups: another
example
Are these two
curves different?

Big drops at the end of Misleading to the eye - apparent convergence by end
the curve indicate few of study. This is due to 5 observations who survived
patients left. fairly long, and 2 events in the treatment group
(upper curve) when the sample size was small.
37
Example 1: Using R
Use survfit to compute the survival curve for
censored data:

R> fit<-survfit(Surv(time,status)~x,data=aml)

R> plot(fit,lty=1:2,col=1:2,xlab="Time",ylab="Survival")

R> legend(c(75,150), c(0.8,1), legend=c("maintenance",


"nonmaintenance"), lty=1:2, col=1:2)

38
The corresponding plot

1.0
maintenance
0.8
0.6 nonmaintenance
Survival

0.4
0.2
0.0

0 50 100 150

Time 39
Hypothesis test to check for
differences between groups
 Use survdiff function
R> survdiff(Surv(time,status)~x,data=aml)

N Observed Expected (O-E)^2/E (O-E)^2/V


x=Maintained 11 7 10.69 1.27 3.40
x=Nonmaintained 12 11 7.31 1.86 3.40

Chisq= 3.4 on 1 degrees of freedom, p= 0.0653

R> survdiff(Surv(time,status)~x,data=aml, rho=1)

With rho = 0 this is the log-rank or Mantel-Haenszel test (default), and


with rho = 1 it is equivalent to the Peto & Peto modification of the Gehan-
Wilcoxon test. 40
Hypothesis test to check for
differences between groups
Log Rank test (sometimes called
the Mantel-Cox test) is a
Chi-Square df p-value
nonparametric test and
appropriate to use when the Log Rank 3.4 1 0.0653
data are right skewed and (Mantel-Cox)
censored. Gehan- 2.8 1 0.0955
Wilcoxon
This test is the one with most
power to test differences that fit
the proportional hazards model -
Test of equality of survival distributions for the
so works well as a set-up for different levels of maintenance chemotherapy
subsequent Cox regression. given
Gives equal weight to early and Chi-square test (with 1 df for 2 groups) of the
late failures. (overall) difference between the two groups.

Gehan-Wilcoxon weights strata Difference between the two groups appear not
by their size. to be statistically significant (p-value>0.05).
More sensitive to differences at
41
earlier time points.
Advantages and Limitations of Kaplan-
Meier
 Commonly used to describe survivor.
 Commonly used to compare two study populations.
 Intuitive graphical presentation.

Limitations:
 Mainly descriptive.
 Doesn’t control for covariates.
 Can’t accommodate time-dependent variables.

42

Potrebbero piacerti anche