Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Outcome 1
A Outcome 2
C
Outcome 3
1 Outcome 4
Outcome 5
2
Outcome 6
B
Outcome 7
Figure 1 Decision trees are graphical models for describing sequential decision problems.
Benign
458/241
Nuclei = 1,2,3,4,5,? Nuclei = 10,6,7,8 Unshape < 2.5 Unshape > = 2.5
Benign Malignant
417/12 41/229
Clump < 5.5 Clump > = 5.5 Nuclei = 1,2,? Nuclei = 10,3,4,5,6,7,8,9
Benign Malignant
7/0 6/23
The first split of the tree (at the root node) is on construction is completed, terminal nodes are
the variable “unsize,” measuring uniformity of cell assigned class labels by majority voting (the class
size. All patients having values less than 2.5 for label with the largest frequency). Each patient in a
this variable are assigned to the left node (the left given terminal node is assigned the predicted class
daughter node); otherwise they are assigned to the label for that terminal node. For example, the left-
right node (right daughter node). The left and right most terminal node in Figure 2 is assigned the class
daughter nodes are then split (in this case, on the label “benign” because 416 of the 421 cases in the
variable “unshape” for the right daughter node node have that label. Looking at Figure 2, one can
and on the variable “nuclei” for the left daughter see that voting heavily favors one class over the
node), and patients are assigned to subgroups other for all terminal nodes, showing that the deci-
defined by these splits. These nodes are then split, sion tree is accurately classifying the data. However,
and the process is repeated recursively in a proce- it is important to assess accuracy using external
dure called recursive partitioning. When the tree data sets or by using cross-validation as well.
326 Decision Tree: Introduction
focuses on understanding how time-to-event var- without occurrence of a symptom and then
ies in terms of different variables that might be released from a hospital. Such a patient is said to
collected for a patient. Time-to-event can be time be right censored because the time-to-event must
to death from a certain disease, time until recur- exceed 2 weeks, but the exact event time is
rence (for cancer), time until first occurrence of a unknown. Another example of right censoring
symptom, or simple all-cause mortality. occurs when patients enter a study at different
The analysis of time-to-event data is often com- times and the study is predetermined to end by a
plicated by the presence of censoring. Generally certain time. Then, all patients who do not experi-
speaking, this means that the event times for some ence an event within the study period are right
individuals in a study are not observed exactly censored.
and are only known to fall within certain time Decision trees can be used to analyze right-cen-
intervals. Right censoring is one of the most com- sored survival data. Such trees are referred to as
mon types of censoring encountered. This occurs survival trees. Survival trees can be constructed
when the event of interest is observed only if it using recursive partitioning. The measure of impu-
occurs prior to some prespecified time. For exam- rity plays a key role, as in CART, and this can be
ple, a patient might be monitored for 2 weeks defined in many ways. One popular approach is to
NKI70
p = .007
>0
≤0 3
TSP
p = .298
≤1 >1
.8 .8 .8
.6 .6 .6
.4 .4 .4
.2 .2 .2
0 0 0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20