Sei sulla pagina 1di 11

Decision Stump Classifier

Pi19404
July 30, 2013

Contents

Contents
Decision Stump Classier
0.1 Introduction . . . . . . . . . . . . . . 0.2 Decision Tree Stump . . . . . . . . 0.2.1 Entropy . . . . . . . . . . . . . 0.2.2 Selecting Optimal Thresold 0.2.3 Decision Stump Classifier . 0.3 Implementation . . . . . . . . . . . . . 0.4 Code . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3
3 3 3 4 5 5 10 10

2 | 11

Decision Stump Classifier

Decision Stump Classifier


0.1 Introduction
In the article we will look at Decision Stump Classifier which is used as weak classifier in feature selection algorithms like AdaBoost.

0.2 Decision Tree Stump


A Decision tree stump make a decision based on single input feature/attribute. For continous features a threshold values is selected ,the stump contains two leaves for values of feature below and above threshold. The decision tree stump classifier instances into two categories ,lets say (1; 1) The aim is to learn a classification rule ,such that feature x can be classified into positive or negative class based on threshold function sign(x ). The value of  is optimally choosen based on some predefined criteria which is specified by a error/objective function. The object function used in present implementation is average entropy which is commonly used in classical decision tree algorithms

0.2.1 Entropy
Entropy is measure of impurity/uncertainty in the training set. The entropy is given by
H (U ) =

X
i

2(1 1)
;

P (i) log(P (i))

(1)

3 | 11

Decision Stump Classifier where P (i) is normalized ratio of number of instances that are in class i If P (i) = 1 or P (i) = 0 .Then there is no uncertainty in assignment and entropy is zero If P (i) = 0:5 then there is maximum uncertainty in If data is dominated by one class ,then entropy will be low however if data is not dominanted by one class,then entropy measure will be high Thus entropy provides us with measure of purity of data set. Let us assume that we have choosen a threshod .This will divide the data set into two parts one for value X <  and other X > . If the rule is very good then each of subsets will be dominated by data beloning to one class and therefore will have low entropy measure Average entopy after the classification will be low if classification rule is good. Average entropy is given by
Ha () = Pl Hl (U ) + Pr Hl (U )
lef t

(2)

where P is normalize ratio of total samples that belong to left split of data which satisfy criteria X <  and P is normalize ratio of total samples that belong to left split of data which satisfy criteria X > 
right

0.2.2 Selecting Optimal Thresold


We need to find which value of  provide us with minimum value of average entropy To do this the attribute,class pair cording to attribute value.
(Xi ; yi )

2 (X; y)

are sorted ac-

4 | 11

Decision Stump Classifier The threshold is selected as midpoint of adjancet attribute values.
 = Xsorted (i) + (Xsorted (i + 1)

sorted

(i))

0:5

(3) (4)

Ha () = Pl Hl (U ) + Pr Hl (U )

The average entropy is calculated for each value of  and one that provides minimum entropy is selected as optimal value of 

0.2.3 Decision Stump Classifier


Given the value of  the data set is divided into two parts X <  and X > .Since average entropy is found to be minimum each subset is dominanted by a data belonging to specific class. decision rule X <  is associated with dominant class label. Let class labels be represented as y and decision rule X <  i be associated with class lable y such that y ; y 2 (1; 1)
L R L R

Let C denote the set of classes,

yL = max(count(y yR

2 c); 8c 2 C ) = max(count(y 2 c); 8c 2 C )

where X <  where X > 

(5) (6) (7)

Thus we are simply required to find the dominant class for subsets X <  and X >  Given unknown feature we are required to decide the class label . If the feature/attribute satisfies the condition X <  then we assign it class label y and if X >  then we assign it class label y
L R

0.3 Implementation
Code is written in python.sklearn package is used in particular application to read from database etc. The code is also inspired from the sklearn decision tree module.

5 | 11

Decision Stump Classifier A class DecisionStumpLearn contains methods to learn decision rule and prediction function. Class Decision Stump contains information about decision rule

#data structure to hold basic information #for decision tree stump classification class DecisionStump: def init(self,threshold,left,right): #split point self.threshold=threshold #class accociated with left and right tree self.left=left self.right=right
Since we will require large memory correspoding to Haar features a different data structure Decision Stump that contains bare minimum information is used to store only parameter required for classification. The Class DecisionStumpLearn does not store any data,and contains just the methods for prediction and training of decision tree stump classifier.

class DecisionStumpLearner: #initialization function init(self,X,y,sample_mask) #function resets the model parameters and computation reset(self) #free allocated arrays free(self) #update the positive and negative sample counts update_index(self,a,b,X,y,s_x,n_total_samples,sample_mask) #find the next unique sorted sample index find_next_sample(self,sample_index,X,s_x) #compute the entropy based on current split point eval(self) #main function to be called for trainining fit(self,X,y,sample_mask=None)

6 | 11

Decision Stump Classifier

#main function to be called for prediction predict_class(self,X,classifier=None) \end{mitned} \begin{minted}{python} #--------------------------------------------#function that compute the model parameters #for decision treen stump def fit(self,X,y,sample_mask): X = np.asarray(X) y = np.asarray(y) #initalize the decision tree stump classifier #this method will initalize the variables required #for training and prediction self.init(X,y,sample_mask) #sorting the input attribute values and #storing sorted index in arrays_x s_x=np.argsort(X.T).astype(np.int32).T while True: #find the next unique,valid sample b=self.find_next_sample(a,X,s_x,self.n_total_samples) if b==-1: break #update the count for number of positive classes & negative classes to #to the left of point corresponding to index b based on count at point #corresponding to index a if not self.update_index(a,b,X,y,s_x,self.n_total_samples,sample_mask) : a=b continue #evaluate the average entropy at point b error=self.eval() #check if entropy is lower and previous best if error < best_error: #value of attributes at point with indexes a,b X_a=X[s_x[a]] X_b=X[s_x[b]] #compute the new split point t=X_a+(X_b-X_a)/2

7 | 11

Decision Stump Classifier

if t==X_b: t=X_a #updating the best observed split point best_t=t best_error=error #analysing attribute from point b onwards #we have require statictics till point b a=b #the final split point and entropy obtained self.best_t=best_t self.best_error=best_error #compute the class lables based on the best split a=0 b=best_t for idx in range (0,self.n_total_samples): j=s_x[idx] if sample_mask[j]==0: continue if X[j] > best_t: break y_idx=int(y[j]) #compute the lable count classes based on split self.lable_countl[y_idx]+=1 self.lable_countr[y_idx]-=1 #compute the dominant class in the left/right tree #after the split self.left=self.lable_countl.argmax()+self.miny self.right=self.lable_countr.argmax()+self.miny #create a decision tump and store information classifier=DecisionStump() classifier.init(float(best_t),int(self.left),int(self.right)) self.threshold=self.best_t return classifier #--------------------------------------------#function to perform prediction

8 | 11

Decision Stump Classifier

def predict_class(self,X,classifier=None): n_samples = X.shape[0] #if no classifier is supplied if classifier is None: classifier=self #allocate the array for output class lable y = np.zeros((n_samples)) for i in range(0,n_samples): #apply decision tree classification rule if X[i] <= classifier.threshold: #apply the class lables based on rules y[i]=classifier.left else: y[i]=classifier.right return y

#--------------------------------------------#function which compute the entropy of data #based on current split point def eval(self): #array contaning the number of point belonging #to class 0 or 1 to left of split point lable_countl=self.lable_countl #array contaning the number of point belonging #to class 0 or 1 to the right of split poin lable_countr=self.lable_countr #total number of points to left of split point n_left=self.n_left #total number of points to right of split point n_right=self.n_right #number of classes n_classes=self.n_classes n_samples=self.n_samples #average Entropy of points to left/right of split point H_left=0 H_right=0 #for all the classes for c in 0 <=c < n_classes: if lable_countl[c]>0: #average entropy of points to left of split point H_left -= ((lable_countl[c] / n_left) * log(lable_countl[c] / n_lef #average entropy of points to right of split point if lable_countr[c]>0: H_right -= ((lable_countr[c] / n_right) * log(lable_countr[c] / n_r

9 | 11

Decision Stump Classifier

#weighing the left and right entopy by number of samples e1=(n_left/n_samples)*H_left; e2=(n_right/n_samples)*H_right; #computing the total average entropy total=e1+e2 return total

0.4 Code
The code for the testing and training utility can be found at https: //github.com/pi19404/m19404/tree/master/objdetect/HaarCascade/DecisionStump1. py and testing script https://github.com/pi19404/m19404/tree/master/ objdetect/HaarCascade/testDecisionStump.py

10 | 11

Bibliography

Bibliography
[1] P. Viola and M. Jones.  Rapid object detection using a boosted cascade of simple features. In: Computer Vision and Pattern Recognition, 2001. CVPR 2001.
Proceedings of the 2001 IEEE Computer Society Conference on.

I511I518 vol.1. [2]

doi: 10.1109/CVPR.2001.990517.

Vol. 1. 2001,

Jianxin Wu et al.  Fast Asymmetric Learning for Cascade Face Detection. In:
IEEE Trans. Pattern Anal. Mach. Intell.

0162-8828.

doi: 10.1109/TPAMI.2007.1181. 1109/TPAMI.2007.1181.

30.3 (Mar. 2008), pp. 369382.

url: http://dx.doi.org/10.

issn:

11 | 11