Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Pi19404
July 30, 2013
Contents
Contents
Decision Stump Classier
0.1 Introduction . . . . . . . . . . . . . . 0.2 Decision Tree Stump . . . . . . . . 0.2.1 Entropy . . . . . . . . . . . . . 0.2.2 Selecting Optimal Thresold 0.2.3 Decision Stump Classifier . 0.3 Implementation . . . . . . . . . . . . . 0.4 Code . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
3 3 3 4 5 5 10 10
2 | 11
0.2.1 Entropy
Entropy is measure of impurity/uncertainty in the training set. The entropy is given by
H (U ) =
X
i
2(1 1)
;
(1)
3 | 11
Decision Stump Classifier where P (i) is normalized ratio of number of instances that are in class i If P (i) = 1 or P (i) = 0 .Then there is no uncertainty in assignment and entropy is zero If P (i) = 0:5 then there is maximum uncertainty in If data is dominated by one class ,then entropy will be low however if data is not dominanted by one class,then entropy measure will be high Thus entropy provides us with measure of purity of data set. Let us assume that we have choosen a threshod .This will divide the data set into two parts one for value X < and other X > . If the rule is very good then each of subsets will be dominated by data beloning to one class and therefore will have low entropy measure Average entopy after the classification will be low if classification rule is good. Average entropy is given by
Ha () = Pl Hl (U ) + Pr Hl (U )
lef t
(2)
where P is normalize ratio of total samples that belong to left split of data which satisfy criteria X < and P is normalize ratio of total samples that belong to left split of data which satisfy criteria X >
right
2 (X; y)
4 | 11
Decision Stump Classifier The threshold is selected as midpoint of adjancet attribute values.
= Xsorted (i) + (Xsorted (i + 1)
sorted
(i))
0:5
(3) (4)
Ha () = Pl Hl (U ) + Pr Hl (U )
The average entropy is calculated for each value of and one that provides minimum entropy is selected as optimal value of
yL = max(count(y yR
Thus we are simply required to find the dominant class for subsets X < and X > Given unknown feature we are required to decide the class label . If the feature/attribute satisfies the condition X < then we assign it class label y and if X > then we assign it class label y
L R
0.3 Implementation
Code is written in python.sklearn package is used in particular application to read from database etc. The code is also inspired from the sklearn decision tree module.
5 | 11
Decision Stump Classifier A class DecisionStumpLearn contains methods to learn decision rule and prediction function. Class Decision Stump contains information about decision rule
#data structure to hold basic information #for decision tree stump classification class DecisionStump: def init(self,threshold,left,right): #split point self.threshold=threshold #class accociated with left and right tree self.left=left self.right=right
Since we will require large memory correspoding to Haar features a different data structure Decision Stump that contains bare minimum information is used to store only parameter required for classification. The Class DecisionStumpLearn does not store any data,and contains just the methods for prediction and training of decision tree stump classifier.
class DecisionStumpLearner: #initialization function init(self,X,y,sample_mask) #function resets the model parameters and computation reset(self) #free allocated arrays free(self) #update the positive and negative sample counts update_index(self,a,b,X,y,s_x,n_total_samples,sample_mask) #find the next unique sorted sample index find_next_sample(self,sample_index,X,s_x) #compute the entropy based on current split point eval(self) #main function to be called for trainining fit(self,X,y,sample_mask=None)
6 | 11
#main function to be called for prediction predict_class(self,X,classifier=None) \end{mitned} \begin{minted}{python} #--------------------------------------------#function that compute the model parameters #for decision treen stump def fit(self,X,y,sample_mask): X = np.asarray(X) y = np.asarray(y) #initalize the decision tree stump classifier #this method will initalize the variables required #for training and prediction self.init(X,y,sample_mask) #sorting the input attribute values and #storing sorted index in arrays_x s_x=np.argsort(X.T).astype(np.int32).T while True: #find the next unique,valid sample b=self.find_next_sample(a,X,s_x,self.n_total_samples) if b==-1: break #update the count for number of positive classes & negative classes to #to the left of point corresponding to index b based on count at point #corresponding to index a if not self.update_index(a,b,X,y,s_x,self.n_total_samples,sample_mask) : a=b continue #evaluate the average entropy at point b error=self.eval() #check if entropy is lower and previous best if error < best_error: #value of attributes at point with indexes a,b X_a=X[s_x[a]] X_b=X[s_x[b]] #compute the new split point t=X_a+(X_b-X_a)/2
7 | 11
if t==X_b: t=X_a #updating the best observed split point best_t=t best_error=error #analysing attribute from point b onwards #we have require statictics till point b a=b #the final split point and entropy obtained self.best_t=best_t self.best_error=best_error #compute the class lables based on the best split a=0 b=best_t for idx in range (0,self.n_total_samples): j=s_x[idx] if sample_mask[j]==0: continue if X[j] > best_t: break y_idx=int(y[j]) #compute the lable count classes based on split self.lable_countl[y_idx]+=1 self.lable_countr[y_idx]-=1 #compute the dominant class in the left/right tree #after the split self.left=self.lable_countl.argmax()+self.miny self.right=self.lable_countr.argmax()+self.miny #create a decision tump and store information classifier=DecisionStump() classifier.init(float(best_t),int(self.left),int(self.right)) self.threshold=self.best_t return classifier #--------------------------------------------#function to perform prediction
8 | 11
def predict_class(self,X,classifier=None): n_samples = X.shape[0] #if no classifier is supplied if classifier is None: classifier=self #allocate the array for output class lable y = np.zeros((n_samples)) for i in range(0,n_samples): #apply decision tree classification rule if X[i] <= classifier.threshold: #apply the class lables based on rules y[i]=classifier.left else: y[i]=classifier.right return y
#--------------------------------------------#function which compute the entropy of data #based on current split point def eval(self): #array contaning the number of point belonging #to class 0 or 1 to left of split point lable_countl=self.lable_countl #array contaning the number of point belonging #to class 0 or 1 to the right of split poin lable_countr=self.lable_countr #total number of points to left of split point n_left=self.n_left #total number of points to right of split point n_right=self.n_right #number of classes n_classes=self.n_classes n_samples=self.n_samples #average Entropy of points to left/right of split point H_left=0 H_right=0 #for all the classes for c in 0 <=c < n_classes: if lable_countl[c]>0: #average entropy of points to left of split point H_left -= ((lable_countl[c] / n_left) * log(lable_countl[c] / n_lef #average entropy of points to right of split point if lable_countr[c]>0: H_right -= ((lable_countr[c] / n_right) * log(lable_countr[c] / n_r
9 | 11
#weighing the left and right entopy by number of samples e1=(n_left/n_samples)*H_left; e2=(n_right/n_samples)*H_right; #computing the total average entropy total=e1+e2 return total
0.4 Code
The code for the testing and training utility can be found at https: //github.com/pi19404/m19404/tree/master/objdetect/HaarCascade/DecisionStump1. py and testing script https://github.com/pi19404/m19404/tree/master/ objdetect/HaarCascade/testDecisionStump.py
10 | 11
Bibliography
Bibliography
[1] P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In: Computer Vision and Pattern Recognition, 2001. CVPR 2001.
Proceedings of the 2001 IEEE Computer Society Conference on.
doi: 10.1109/CVPR.2001.990517.
Vol. 1. 2001,
Jianxin Wu et al. Fast Asymmetric Learning for Cascade Face Detection. In:
IEEE Trans. Pattern Anal. Mach. Intell.
0162-8828.
url: http://dx.doi.org/10.
issn:
11 | 11