Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
“Maestrı́a en Estadı́stica”
“Universidad Nacional de San Antonio Abad del Cusco”
1
alfredo.valencia@unsaac.edu.pe
www.alfredo.valencia.webs.uvigo.es
PhD. Alfredo Valencia-Toledo*2 (*Research Group
Survival
in Economic
Analysis:Analysis
Kaplan-Meier
(RGEA),
estimate
Universidade de Vigo)November 24, 2017 1/1
Overview
2
Estimation of Survival
If there are no censored observations in a sample of
dimension n, the most natural estimator for survival
is the empirical estimator, given by
S (t ) I ti t
ˆ 1
n ti t
4
Kaplan-Meier estimate
5
Kaplan-Meier estimate
k distinct event times t1 t j ... t k Observed event times
6
Kaplan-Meier estimate
When there are no censored data, the Kaplan-Meier
(K-M) estimator is the empirical estimator (simple and
intuitive).
For example, if you are following 10 patients, and 8 of
them die by the end of the first year, then your best
estimate of S(1 year) = 20%.
7
Different representation based on
Kaplan-Meier Weights (KMW)
Assume that Yi minTi , Ci and i I Ti Ci .
Let Y(1) ... Y( n ) denote the ordered Yi ' s and [i ] its concomitant.
S t 1 i 1Wi I Y(i ) t
n
ˆ
[ i ]
i 1 [ j ]
Wi 1
n i 1 j 1 n j 1
Subject B
Subject C
Subject D
Subject E
X 1. subject E with
event at 4 months
100%
Kaplan-Meier
fweight W1=1/5
Probability of
surviving to 4
months is 100% =
5/5
Probability of
surviving is = 4/5
10
Survival Data
Subject A 2. subject A
O drops out after
6 months
Subject B
Subject C 3. subject C
X with event at 7
months
Subject D
Subject E
X 1. subject E with
event at 4 months
Survival = 8/15
subject C with
event at 7
months
12
Survival Data
Subject A 2. subject A
O drops out after
6 months
Subject B
O
4. Subjects B
Subject C 3. subject C
and D survive
X with event at 7
for the whole
months
Subject D study period
O
Subject E
X 1. subject E with
event at 4 months
Kaplan-Meier
100%
Weights. Sum is not
always equal to 1.
Kaplan-Meier is not
a proper distribution
function if the
higher time is
censored.
14
Greenwood estimator
2
ˆ Ŝt vâr Ŝt Ŝt
2 di
i:ti t ni ni d i
15
Confidence intervals for S(t)
We can use this greenwood estimator of variance estimate
to derive a confidence interval for all time points t. Let σ
denote greenwood’s standard deviation. Then confidence
intervals for the survival function are then computed as
follows:
16
Confidence intervals for S(t)
17
Example 1. Acute Myelogenous
Leukemia survival data
Survival in patients with Acute Myelogenous Leukemia
R> library(survival)
R> ?aml
R> str(aml)
R> aml
18
Acute Myelogenous Leukemia
survival data
19
Example 1. Acute Myelogenous
Leukemia survival data
Survival in patients with Acute Myelogenous Leukemia
The data:
9, 13, 13+, 18, 23, 28+, 31, 34, 45+, 48, 161+
R> aml2<-aml[aml$x=="Maintained",]
or just
R> aml3<-aml[aml$x=="Maintained",c(1,2)] 20
Example 1. Acute Myelogenous
Leukemia survival data
time status nj dj survival
9 1 11 1 1-1/11 = 10/11 = 0.909
13 1 10 1 10/11*(1-1/10) = 9/11 = 0.818
18 1 8 1 9/11*(1-1/8) = 63/88 = 0.716
23 1 7 1 63/88*(1-1/7) = 54/88 = 0.614
31 1 5 1 54/88*(1-1/5) = 27/55 = 0.491
34 1 4 1 27/55*(1-1/4) = 81/220 = 0.368
48 1 2 1 81/220*(1-1/2) = 81/440 = 0.184
dj
First censored time is 13 → Empirical Sˆ (t ) 1
estimator is equal to K-M before t=13. j:t j t
n j
Be carefull: dj may be >1! 21
Example 1: Using R
Use survfit to compute the survival curve for censored data:
R> ?survfit
R> fit<-survfit(Surv(time,status)~1,data=aml2)
R> summary(fit)
22
Example 1: Using R
Use survfit.object to obtain specific information about the
survival curve:
R> ?survfit.object
R> summary(fit)$time # survival (event) times
R> summary(fit)$surv # estimates of survival
R> summary(fit,time=10)$surv
R> class(fit)
R> attributes(fit)
$names
[1] "n" "time" "n.risk" "n.event" "n.censor" "surv"
[7] "type" "std.err" "upper" "lower" "conf.type" "conf.int"
[13] "call"
$class
[1] "survfit"
24
Example 1: Using R
K-M with confidence intervals
R> plot(fit, xlab=“time",ylab="Survival")
Or without…
R> plot(fit, xlab=“time",ylab="Survival", conf.int=FALSE, main=“Kaplan-
Meier Estimator")
R> fit2<-survfit(Surv(time,rep(1,length(time)))~1,data=aml2)
R> plot(fit2,xlab=“time",ylab="Survival", conf.int=FALSE,
main=“Empirical Estimator")
25
Corresponding Kaplan-Meier
Curve
26
Example 2: time-to-conception for
subfertile women
The event of interest is conception. "Failure" here is a
good thing! And women "survived" until they
conceived.
Time: time-to-conception (in months)
38 women were treated for infertility with laparoscopy
and hydrotubation.
All women were followed for up to 2-years to describe
time-to-conception.
Example from: BMJ, Dec 1998; 317: 1572 – 1580 (Published 5
December 1998). http://www.bmj.com/content/317/7172/1572.full
27
The data
1,1,1,1,1,1,2,2,2,2,2,2+,3,3,3,3+,4,4,4,4+,6,6,7+,
7+,8+,8+,9,9,9,9+,9+,9+,10,11+,13,16,24+,24+
where + denotes right-censored observations.
25 events
13 (right) censored observations.
Event times (ordered):
t1=1< t2=2< t3=3< t4=4< t5=6< t6=9< t7=10< t8=13< t9=16.
30
Using the R package
Use read.table to import the dataset (e.g. from a
portable disk F; or use change dir…):
R> conception<-read.table("F:\\conception.txt",header=T,dec=".")
for (j in 1:n.times) {
n <- sum(times >= unique.t0[j])
d <- sum((times == unique.t0[j]) & (status == 1))
if (n > 0)
s0 <- s0 * (1 - d/n)
}
return(s0) 32
}
Using the R package: building
your own functions
R> time<-c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,4,4,4,4,6,6,7,7,8,8,
9,9,9,9,9,9,10,11,13,16,24,24)
R> status<-c(1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,0,1,1,1,0,1,1,0,0,0,0,
1,1,1,0,0,0,1,0,1,1,0,0)
R> cbind(time,status)
R> KM(time,status,7)
[1] 0.4825175
R> system.time(KM(time,status,7))[3]
elapsed
0
33
Why use R
The survival plot provides confidence bands.
Other methods than Kaplan-Meier are
available (e.g. fleming-harrington)
….
survfit(formula, data, weights, subset, na.action, newdata,
individual=F, conf.int=.95, se.fit=T, type=c("kaplan-
meier","fleming-harrington", "fh2"),
error=c("greenwood","tsiatis"), conf.type=c("log","log-
log","plain","none"), conf.lower=c("usual", "peto",
"modified"))
34
Pointwise confidence bands
conf.type: One of "none", "plain", "log" (the default), or "log-log".
R> plot(fit)
R> plot(fit)
R>fit<-survfit(Surv(time,status)~1,data=aml,conf.type="log-log",
type="fh2", conf.lower="peto",error="tsiatis")
R> plot(fit)
35
Comparing 2 groups: back to
Example 1
Are these two
curves different?
Big drops at the end of Misleading to the eye - apparent convergence by end
the curve indicate few of study. This is due to 5 observations who survived
patients left. fairly long, and 2 events in the treatment group
(upper curve) when the sample size was small.
37
Example 1: Using R
Use survfit to compute the survival curve for
censored data:
R> fit<-survfit(Surv(time,status)~x,data=aml)
R> plot(fit,lty=1:2,col=1:2,xlab="Time",ylab="Survival")
38
The corresponding plot
1.0
maintenance
0.8
0.6 nonmaintenance
Survival
0.4
0.2
0.0
0 50 100 150
Time 39
Hypothesis test to check for
differences between groups
Use survdiff function
R> survdiff(Surv(time,status)~x,data=aml)
Gehan-Wilcoxon weights strata Difference between the two groups appear not
by their size. to be statistically significant (p-value>0.05).
More sensitive to differences at
41
earlier time points.
Advantages and Limitations of Kaplan-
Meier
Commonly used to describe survivor.
Commonly used to compare two study populations.
Intuitive graphical presentation.
Limitations:
Mainly descriptive.
Doesn’t control for covariates.
Can’t accommodate time-dependent variables.
42