Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Tree-based Methods
max_leaf_nodes
min_samples_leaf
max_depth
Where, pi is the is a
probability of samples in a given
class.
Entropy
Consider a variable with two possible outcome let’s say
‘yes’ and ‘no’.
Case 1: If the samples are completely homogeneous ie: all ‘yes’ or all
‘no’.
Then, the entropy when all sample are ‘yes’ will be calculated as:
E(s) = -p(yes)log2 p(yes)
= -1 log2 1
= 0
similarly, when all samples are ‘no’ the entropy will be:
E(s) = -p(no)log2 p(no)
= -1 log2 1
= 0
Entropy
Case 2: If the samples are equally divided i.e: equal no. of ‘yes’ and
equal no. of ‘no’.
Then, the entropy will be calculated as:
E(s) = -p(yes) log2 p(yes) + -p(no) log2 p(no)
= -0.5 log2 0.5 + (- 0.5 log2 0.5)
= -0.5 (-1) + (-0.5) (-1)
= 0.5 + 0.5
= 1
An attribute with highest Information gain will be selected for split first.
Top Row: True linear boundary; Bottom row: true non-linear boundary.
Left column: linear model; Right column: tree-based model
Advantages and
Disadvantages of Trees