Informe de Viaje de Visita Tecnica de Los Puentes La Leche Vilela y Motupe

Survival Analysis: Kaplan-Meier estimate
“Maestrı́a en Estadı́stica”
“Universidad Nacional de San Antonio Abad del Cusco”
PhD. Alfredo Valencia-Toledo*1
*Research Group in Economic Analysis (RGEA), Universidade de Vigo
November 24, 2017
1
alfredo.valencia@unsaac.edu.pe
www.alfredo.valencia.webs.uvigo.es
PhD. Alfredo Valencia-Toledo*2 (*Research Group
Survival
in Economic
Analysis:Analysis
Kaplan-Meier
(RGEA),
estimate
Universidade de Vigo)November 24, 2017 1/1
Overview
Introduction to Kaplan-Meier methods

 Different representations.
 Comparing survival curves: the log-rank test.
 Using R.
2
Estimation of Survival
If there are no censored observations in a sample of
dimension n, the most natural estimator for survival
is the empirical estimator, given by
S (t )   I ti  t 
ˆ 1
n ti t
proportion of observations with failure times > t.

Alternative methods are necessary to incorporate
censoring (censored times different than event
times).
3
Introduction to Kaplan-Meier
 Non-parametric estimate of the survival function.
 Commonly used to describe the survivor of

population/s.
 Commonly used to compare two or more

populations.
 Intuitive graphical presentation.
4
Kaplan-Meier estimate
Kaplan and Meier (1958), obtained a nonparametric

estimate of the survival function, called product-
limit, which is the generalization of the empirical
estimator for censored data.
 dj 
Sˆ (t )   1  
j:t j t 
 n j 
5
k distinct event times t1  t j  ...  t k Observed event times
at each event time t j , there are n j individuals at risk

d j is the number who have the event at time t j
nj - the number of individuals at risk at time tj.
 dj  Consists of the original sample minus all those who
Sˆ (t )   1   have been censored or had the event before tj
 n
j:t j t 

j  dj/nj=proportion that failed at the event time tj
1- dj/nj=proportion surviving the event time tj
S(t) represents estimated survival probability at time t: P(T>t)
6
When there are no censored data, the Kaplan-Meier
(K-M) estimator is the empirical estimator (simple and
intuitive).
 For example, if you are following 10 patients, and 8 of
them die by the end of the first year, then your best
estimate of S(1 year) = 20%.
When there are censored data, K-M provides estimate

of S(t) that takes censoring into account.
 K-M remains unchanged at censored times.
7
Different representation based on
Kaplan-Meier Weights (KMW)
Assume that Yi  minTi , Ci  and  i  I Ti  Ci .
Let Y(1)  ...  Y( n ) denote the ordered Yi ' s and [i ] its concomitant.
S t   1  i 1Wi I Y(i )  t 
n
ˆ
[ i ] 
i 1 [ j ] 
Wi   1 
n  i  1 j 1  n  j  1

Where ties within the censored or within the uncensored

times are ordered arbitrarily, and ties among the uncensored
and censored times are treated as if the former precede the
later. 8
An example of right-censored data
Subject A
Subject B
Subject C
Subject D
Subject E
X 1. subject E with
event at 4 months
Beginning of study End of study or follow-up

9
Corresponding Kaplan-Meier
Curve
100%
Kaplan-Meier
fweight W1=1/5
Probability of
surviving to 4
months is 100% =
5/5
Probability of
surviving is = 4/5
Subject E with event

at 4 months
10
Survival Data
Subject A 2. subject A
O drops out after
6 months
Subject B
Subject C 3. subject C
X with event at 7
months
Subject D
Subject E
X 1. subject E with
event at 4 months

11
Curve
subject A drops
out at 6 months;
Kaplan-Meier
100% weigth W2 is null
Weigth W3
Survival = 8/15
subject C with
event at 7
months
12
Survival Data
Subject A 2. subject A
O drops out after
6 months
Subject B
O
4. Subjects B
Subject C 3. subject C
and D survive
X with event at 7
for the whole
months
Subject D study period
O
Subject E
X 1. subject E with
event at 4 months

13
Curve
Kaplan-Meier
100%
Weights. Sum is not
always equal to 1.
Kaplan-Meier is not
a proper distribution
function if the
higher time is
censored.
14
Greenwood estimator
2
     
ˆ Ŝt   vâr Ŝt   Ŝt  
2 di
i:ti t ni ni  d i 
15
Confidence intervals for S(t)
We can use this greenwood estimator of variance estimate
to derive a confidence interval for all time points t. Let σ
denote greenwood’s standard deviation. Then confidence
intervals for the survival function are then computed as
follows:
Sˆt   z1 2ˆ

However these confidence bands may be out of the (0,1)
interval!
16
Confidence intervals for S(t)

17
Example 1. Acute Myelogenous
Leukemia survival data
Survival in patients with Acute Myelogenous Leukemia
time: survival or censoring

status: censoring status
x: maintenance chemotherapy given? (factor)
R> library(survival)
R> ?aml
R> str(aml)
R> aml
18
Acute Myelogenous Leukemia
survival data
19
Survival in patients with Acute Myelogenous Leukemia
Consider only those patients for which x is “Maintained”
The data:
9, 13, 13+, 18, 23, 28+, 31, 34, 45+, 48, 161+
where + denotes right-censored observations.
R> aml2<-aml[aml$x=="Maintained",]
or just
R> aml3<-aml[aml$x=="Maintained",c(1,2)] 20
time status nj dj survival
9 1 11 1 1-1/11 = 10/11 = 0.909
13 1 10 1 10/11*(1-1/10) = 9/11 = 0.818
18 1 8 1 9/11*(1-1/8) = 63/88 = 0.716
23 1 7 1 63/88*(1-1/7) = 54/88 = 0.614
31 1 5 1 54/88*(1-1/5) = 27/55 = 0.491
34 1 4 1 27/55*(1-1/4) = 81/220 = 0.368
48 1 2 1 81/220*(1-1/2) = 81/440 = 0.184
 dj 
First censored time is 13 → Empirical Sˆ (t )   1  
estimator is equal to K-M before t=13. j:t j t 
 n j 
Be carefull: dj may be >1! 21
Example 1: Using R
Use survfit to compute the survival curve for censored data:
R> ?survfit
R> fit<-survfit(Surv(time,status)~1,data=aml2)
R> summary(fit)
time n.risk n.event survival std.err lower 95% CI upper 95% CI

9 11 1 0.909 0.0867 0.7541 1.000
13 10 1 0.818 0.1163 0.6192 1.000
18 8 1 0.716 0.1397 0.4884 1.000
23 7 1 0.614 0.1526 0.3769 0.999
31 5 1 0.491 0.1642 0.2549 0.946
34 4 1 0.368 0.1627 0.1549 0.875
48 2 1 0.184 0.1535 0.0359 0.944
22
Example 1: Using R
Use survfit.object to obtain specific information about the
survival curve:
R> ?survfit.object
R> summary(fit)$time # survival (event) times
R> summary(fit)$surv # estimates of survival
R> summary(fit,time=10)$surv
Use the plot function to obtain the graphical output
R> plot(fit, xlab="Time",ylab="Survival")
See plot.survfit function to obtain specific information about the plot

R> ?plot.survfit
23
Example 1: Using R
Use attributes to know what can be obtained from the object:
R> class(fit)
R> attributes(fit)
$names
[1] "n" "time" "n.risk" "n.event" "n.censor" "surv"
[7] "type" "std.err" "upper" "lower" "conf.type" "conf.int"
[13] "call"
$class
[1] "survfit"
24
Example 1: Using R
K-M with confidence intervals
R> plot(fit, xlab=“time",ylab="Survival")
Or without…
R> plot(fit, xlab=“time",ylab="Survival", conf.int=FALSE, main=“Kaplan-
Meier Estimator")
What about the empirical estimator?
R> fit2<-survfit(Surv(time,rep(1,length(time)))~1,data=aml2)
R> plot(fit2,xlab=“time",ylab="Survival", conf.int=FALSE,
main=“Empirical Estimator")
25
Curve
26
Example 2: time-to-conception for
subfertile women
The event of interest is conception. "Failure" here is a
good thing! And women "survived" until they
conceived.
Time: time-to-conception (in months)
38 women were treated for infertility with laparoscopy
and hydrotubation.
All women were followed for up to 2-years to describe
time-to-conception.
Example from: BMJ, Dec 1998; 317: 1572 – 1580 (Published 5
December 1998). http://www.bmj.com/content/317/7172/1572.full
27
The data
1,1,1,1,1,1,2,2,2,2,2,2+,3,3,3,3+,4,4,4,4+,6,6,7+,
7+,8+,8+,9,9,9,9+,9+,9+,10,11+,13,16,24+,24+
where + denotes right-censored observations.
25 events
13 (right) censored observations.
Event times (ordered):
t1=1< t2=2< t3=3< t4=4< t5=6< t6=9< t7=10< t8=13< t9=16.
What about dk and nk?

28
The data
1,1,1,1,1,1, time ni di ci survival
1 38 6 0
2,2,2,2,2,2+
2 32 5 1
3,3,3,3+, 3 26 3 1
4,4,4,4+, 4 22 3 1
6 18 2 0
6,6,7+,7+,
7 16 0 2
8+,8+,9,9,9, 8 14 0 2
9+,9+,9+, 9 12 3 3
10 6 1 0
10,11+,13,
11 5 0 1
16,24+,24+ 13 4 1 0
16 3 1 0
29
24 2 0 2
Kaplan-Meier Curve (R)
Survival is a step-wise function

estimated at 9 event times.
Only decreases at event times!
30
Using the R package
Use read.table to import the dataset (e.g. from a
portable disk F; or use change dir…):
R> conception<-read.table("F:\\conception.txt",header=T,dec=".")
Use survfit to compute the survival curve for

censored data:
R> KM<-survfit(Surv(time,event)~1,data=conception)
R> summary(KM)
Use plot function for plotting of R objects:

R> plot(KM, xlab="time", ylab="Survival")
31
Using the R package: building
your own functions
KM<-function(Stime,status,predict.time){
times = Stime
bad <- is.na(times) | is.na(status)
times <- times[!bad]
status <- status[!bad]
if (sum(bad) > 0)
cat(paste("\n", sum(bad), "records with missing values dropped. \n"))
ooo <- order(times)
times <- times[ooo]
status <- status[ooo]
s0 <- 1
unique.t0 <- unique(times)
unique.t0 <- unique.t0[order(unique.t0)]
n.times <- sum(unique.t0<=predict.time)
for (j in 1:n.times) {
n <- sum(times >= unique.t0[j])
d <- sum((times == unique.t0[j]) & (status == 1))
if (n > 0)
s0 <- s0 * (1 - d/n)
}
return(s0) 32
}
Using the R package: building
your own functions
R> time<-c(1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,4,4,4,4,6,6,7,7,8,8,
9,9,9,9,9,9,10,11,13,16,24,24)
R> status<-c(1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,0,1,1,1,0,1,1,0,0,0,0,
1,1,1,0,0,0,1,0,1,1,0,0)
R> cbind(time,status)
R> KM(time,status,7)
[1] 0.4825175
R> system.time(KM(time,status,7))[3]
elapsed
0
33
Why use R
 The survival plot provides confidence bands.
 Other methods than Kaplan-Meier are
available (e.g. fleming-harrington)
 ….
survfit(formula, data, weights, subset, na.action, newdata,
individual=F, conf.int=.95, se.fit=T, type=c("kaplan-
meier","fleming-harrington", "fh2"),
error=c("greenwood","tsiatis"), conf.type=c("log","log-
log","plain","none"), conf.lower=c("usual", "peto",
"modified"))
34
Pointwise confidence bands
conf.type: One of "none", "plain", "log" (the default), or "log-log".
R> fit<-survfit(Surv(time,status)~1,data=aml, conf.type="none")
R> plot(fit)
R> fit<-survfit(Surv(time,status)~1,data=aml, conf.type="log-log")
R> plot(fit)
R>fit<-survfit(Surv(time,status)~1,data=aml,conf.type="log-log",
type="fh2", conf.lower="peto",error="tsiatis")
R> plot(fit)
35
Comparing 2 groups: back to
Example 1
Are these two
curves different?
Use log-rank test

to test the null
hypothesis of no
difference
between survival
curves of the two
(or more) groups.
36
Comparing two groups: another
example
Are these two
curves different?
Big drops at the end of Misleading to the eye - apparent convergence by end
the curve indicate few of study. This is due to 5 observations who survived
patients left. fairly long, and 2 events in the treatment group
(upper curve) when the sample size was small.
37
Example 1: Using R
Use survfit to compute the survival curve for
censored data:
R> fit<-survfit(Surv(time,status)~x,data=aml)
R> plot(fit,lty=1:2,col=1:2,xlab="Time",ylab="Survival")
R> legend(c(75,150), c(0.8,1), legend=c("maintenance",

"nonmaintenance"), lty=1:2, col=1:2)
38
The corresponding plot
1.0
maintenance
0.8
0.6 nonmaintenance
Survival
0.4
0.2
0.0
0 50 100 150
Time 39
Hypothesis test to check for
differences between groups
 Use survdiff function
R> survdiff(Surv(time,status)~x,data=aml)
N Observed Expected (O-E)^2/E (O-E)^2/V

x=Maintained 11 7 10.69 1.27 3.40
x=Nonmaintained 12 11 7.31 1.86 3.40
Chisq= 3.4 on 1 degrees of freedom, p= 0.0653
R> survdiff(Surv(time,status)~x,data=aml, rho=1)
With rho = 0 this is the log-rank or Mantel-Haenszel test (default), and

with rho = 1 it is equivalent to the Peto & Peto modification of the Gehan-
Wilcoxon test. 40
Hypothesis test to check for
differences between groups
Log Rank test (sometimes called
the Mantel-Cox test) is a
Chi-Square df p-value
nonparametric test and
appropriate to use when the Log Rank 3.4 1 0.0653
data are right skewed and (Mantel-Cox)
censored. Gehan- 2.8 1 0.0955
Wilcoxon
This test is the one with most
power to test differences that fit
the proportional hazards model -
Test of equality of survival distributions for the
so works well as a set-up for different levels of maintenance chemotherapy
subsequent Cox regression. given
Gives equal weight to early and Chi-square test (with 1 df for 2 groups) of the
late failures. (overall) difference between the two groups.
Gehan-Wilcoxon weights strata Difference between the two groups appear not
by their size. to be statistically significant (p-value>0.05).
More sensitive to differences at
41
earlier time points.
Advantages and Limitations of Kaplan-
Meier
 Commonly used to describe survivor.
 Commonly used to compare two study populations.
 Intuitive graphical presentation.
Limitations:
 Mainly descriptive.
 Doesn’t control for covariates.
 Can’t accommodate time-dependent variables.
42

Informe de Viaje de Visita Tecnica de Los Puentes La Leche Vilela y Motupe

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Informe de Viaje de Visita Tecnica de Los Puentes La Leche Vilela y Motupe

Caricato da

Copyright:

Formati disponibili

Survival Analysis: Kaplan-Meier estimate

PhD. Alfredo Valencia-Toledo*1

*Research Group in Economic Analysis (RGEA), Universidade de Vigo

November 24, 2017

Introduction to Kaplan-Meier methods

proportion of observations with failure times > t.

 Non-parametric estimate of the survival function.

 Commonly used to describe the survivor of

 Commonly used to compare two or more

 Intuitive graphical presentation.

Kaplan and Meier (1958), obtained a nonparametric

at each event time t j , there are n j individuals at risk

S(t) represents estimated survival probability at time t: P(T>t)

When there are censored data, K-M provides estimate

Where ties within the censored or within the uncensored

Beginning of study End of study or follow-up

Subject E with event

Beginning of study End of study or follow-up

Beginning of study End of study or follow-up

Sˆt   z1 2ˆ

time: survival or censoring

Consider only those patients for which x is “Maintained”

where + denotes right-censored observations.

time n.risk n.event survival std.err lower 95% CI upper 95% CI

Use the plot function to obtain the graphical output

R> plot(fit, xlab="Time",ylab="Survival")

See plot.survfit function to obtain specific information about the plot

What about the empirical estimator?

What about dk and nk?

Survival is a step-wise function

Only decreases at event times!

Use survfit to compute the survival curve for

Use plot function for plotting of R objects:

R> fit<-survfit(Surv(time,status)~1,data=aml, conf.type="none")

R> fit<-survfit(Surv(time,status)~1,data=aml, conf.type="log-log")

Use log-rank test

R> legend(c(75,150), c(0.8,1), legend=c("maintenance",

N Observed Expected (O-E)^2/E (O-E)^2/V

Chisq= 3.4 on 1 degrees of freedom, p= 0.0653

R> survdiff(Surv(time,status)~x,data=aml, rho=1)

With rho = 0 this is the log-rank or Mantel-Haenszel test (default), and

Potrebbero piacerti anche