Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
A Bose
BIMTECH
November 2009
A Bose (BIMTECH)
Discriminant Analysis
1/ 24
November 2009
1 / 24
Table of Contents
What is Discriminant Analysis Discriminant Analysis Calculations for Discriminant Analysis Analysis of DA Calculations Hypothesis Testing of equality of factor eects
A Bose (BIMTECH)
Discriminant Analysis
2/ 24
November 2009
2 / 24
What is Discriminant Analysis Discriminant analysis undertakes te same task as Multiple Linear Regression by predicting an outcome. However, MLR is limited to cases where the dependent variable on the Y axis is an interval variable so that the combination of predictors, through the regression equation, produce estimated mean population numerical Y values for given values of weighted combinations of X values. But many interesting variables are categorical, such as
1 2 3 4 5 6 7 8 9
political party voting intention migrant / non / migrant status making a prot or not holding a particular credit card owning, renting or paying a mortgage for a house employed / unemployed satised versus dissatised employees which customers are likely to buy a product or not buy whether a person is a credit risk
A Bose (BIMTECH) Discriminant Analysis 3/ 24 November 2009 3 / 24
the dependent variable is categorical with the predictor IVs at interval level such as age, income, perceptions, and years of education there are more than two DV categories, unlike logistic regression, which is limited to a dichotomous dependent variable.
A Bose (BIMTECH)
Discriminant Analysis
4/ 24
November 2009
4 / 24
Discriminant Analysis
Table of Contents
What is Discriminant Analysis Discriminant Analysis Calculations for Discriminant Analysis Analysis of DA Calculations Hypothesis Testing of equality of factor eects
A Bose (BIMTECH)
Discriminant Analysis
5/ 24
November 2009
5 / 24
Discriminant Analysis
Discriminant Analysis involves the determination of a linear equation like regression that will predict which group the case belongs to. The form of the equation or function is: D = v1 X1 + v2 X2 + v3 X3 + + vk Xk + c where D = discriminant function v = the discriminant coecient vector (weights vector) X = vector of respondents score for the variables k = the number of predictor variables c = a constant
A Bose (BIMTECH)
Discriminant Analysis
6/ 24
November 2009
6 / 24
Discriminant Analysis
A discriminant score This is a weighted linear combination (sum) of the discriminating variables.
the observations are a random sample each predictor variable is normally distributed
A Bose (BIMTECH)
Discriminant Analysis
7/ 24
November 2009
7 / 24
Discriminant Analysis
An example of Discriminant Analysis (from Panneerselvam P.424) The Director of a management school wants to do discriminant analysis concerning the eect of two factors, namely, the yearly spending (in Rs. lakhs) on infrastructure of the school (X1 ) and the yearly spending on interface events of the school (X2 ) on the grading of the school by an inspection team as shown in the table on the next slide. Based on the data, the committee has awarded one of the following grades for each year, as shown in the same table.
Design the discriminant function, Y = aX1 + bX2 . Compute the discriminant ratio, K and identify the variable which is more important in relation to the other variable. Validate the discriminant function using the given data by forming groups based on the critical discriminant score. Test whether the group means are equal in importance at a signicance level of 0.05
A Bose (BIMTECH) Discriminant Analysis 8/ 24 November 2009 8 / 24
Discriminant Analysis
The combination of hypotheses of this example are: H0 :The group means are equal in importance H1 :The group means are not equal in importance Design of the discriminant function,Y = aX1 + bX2 Year 1 2 3 4 5 6 7 8 9 10 11 12
A Bose (BIMTECH)
Grade Below Below Above Below Below Above Below Above Below Below Above Above
Infrastructure (X1 ) 3 4 10 5 6 11 7 12 8 9 13 14
Discriminant Analysis
Discriminant Analysis
Table for Group G1, Grade=Below Infrastructure Year (X1 ) 1 3 2 4 4 5 5 6 7 7 9 8 10 9 G1 Total 42 G1 Mean 6
Table for Group G2, Grade=Above Infrastructure Year (X1 ) 3 10 6 11 8 12 11 13 12 14 G2 Total 60 G2 Mean 12
Grand Mean
1 =8.5 X
2 =5.417 X
A Bose (BIMTECH)
Discriminant Analysis
10/ 24
November 2009
10 / 24
Table of Contents
What is Discriminant Analysis Discriminant Analysis Calculations for Discriminant Analysis Analysis of DA Calculations Hypothesis Testing of equality of factor eects
A Bose (BIMTECH)
Discriminant Analysis
11/ 24
November 2009
11 / 24
G2
49 16 25 36 64 190
A Bose (BIMTECH)
Discriminant Analysis
12/ 24
November 2009
12 / 24
Sum of squares
Below 28 8 7
Above 10 10 4
Total 38 18 11
A Bose (BIMTECH)
Discriminant Analysis
13/ 24
November 2009
13 / 24
1 )(X2 X 2 ) + b (X1 X
we have 38a + 11b = 12 6 = 6 11a + 18b = 6 5 = 1 From these simultaneous equations, a = 0.17229 and b=-0.04973. Hence, the discriminant function is as shown below: Y = 0.17229X1 0.04973X2
A Bose (BIMTECH) Discriminant Analysis 14/ 24 November 2009 14 / 24
(Below ) Y 1 0.04973X 2 = 0.17229X = 0.17229x 6 0.04973x 5 = 0.78509 (Above ) Y 1 0.04973X 2 = 0.17229X = 0.17229x 12 0.04973x 6 = 1.7691 (Grandmean) Y 1 0.04973X 2 = 0.17229X = 0.17229x 8.5 0.04973x 5.417 = 1.19509 This is known as the Critical Discriminant Score
A Bose (BIMTECH)
Discriminant Analysis
15/ 24
November 2009
15 / 24
Discriminant function: Y = 0.17229X1 0.04973X2 Below (Group-G1) Above (Group-G2) Data Discriminant Data Discriminant 1 )2 2 )2 set (j) Year score (S2j ) (S2j S set (j) Year score (S1j ) (S1j S 1 1 0.31795 0.218220 1 3 1.37479 0.155480 2 2 0.44051 0.118735 2 6 1.69627 0.005304 3 4 0.66253 0.015021 3 8 1.81883 0.002473 4 5 0.73536 0.002473 4 11 1.94139 0.029684 5 7 1.00711 0.049293 5 12 2.01422 0.060084 6 9 1.03021 0.060084 7 10 1.30196 0.267155 Total 5.49563 0.730981 Total 8.84550 0.253025 1 ) 0.78509 2 ) 1.7691 Mean (S Mean (S Grand Total of discriminant scores = 14.34113 ) = 1.195094 Grand Mean of discriminant scores (S
A Bose (BIMTECH)
Discriminant Analysis
16/ 24
November 2009
16 / 24
Analysis of DA Calculations
Table of Contents
What is Discriminant Analysis Discriminant Analysis Calculations for Discriminant Analysis Analysis of DA Calculations Hypothesis Testing of equality of factor eects
A Bose (BIMTECH)
Discriminant Analysis
17/ 24
November 2009
17 / 24
Analysis of DA Calculations
The variability between groups (VBG ) Sum of squares between groups 1 S ) + n2 (S 2 S ) VBG = n1 (S
2 2 2 2
= 7(0.78509 1.195094) + 5(1.7691 1.195094) = 2.824137 The variability within groups (VWG ) Sum of squares within groups
7 5
VWG =
j =1
1 )2 + (S1j S
j =1
2 )2 (S2j S
= 0.730981 + 0.253025 = 0.984006 The discriminant ratio, K K= VBG 2.824137 = = 2.87 VWG 0.984006
A Bose (BIMTECH)
Discriminant Analysis
18/ 24
November 2009
18 / 24
Analysis of DA Calculations
Validation based on the Critical Discriminant Score (1.19509) If the discriminant score of a data set < 1.19509, include that data set into the group corresponding to Below category. If the discriminant score of a data set > 1.19509, include that data set into the group corresponding to Above category.
Classication of Data Sets based Critical Discrimination Score Year Original Classication Revised Classication 1 Below Below 2 Below Below 3 Above Above 4 Below Below 5 Below Below 6 Above Above 7 Below Below 8 Above Above 9 Below Below 10 Below Above 11 Above Above 12 Above Above Status Unchanged Unchanged Unchanged Unchanged Unchanged Unchanged Unchanged Unchanged Unchanged Changed Unchanged Unchanged
Direction for including future data set In future if the values of the predictor variables, X1 and X2 are known, then its discriminant score can be obtained using the discriminant function. Then as per the guidelines stated, that year can be included in the appropriate group.
A Bose (BIMTECH) Discriminant Analysis 19/ 24 November 2009 19 / 24
Table of Contents
What is Discriminant Analysis Discriminant Analysis Calculations for Discriminant Analysis Analysis of DA Calculations Hypothesis Testing of equality of factor eects
A Bose (BIMTECH)
Discriminant Analysis
20/ 24
November 2009
20 / 24
Hypothesis Testing for Equality of eect of the two factors H0 : The factors X1 (Infrastructure) and X2 (Interface events) are equal in importance H1 : The factors X1 (Infrastructure) and X2 (Interface events) are not equal in importance
The formula to compute F is shown below: n1 n2 (n1 + n2 m 1) 2 D F = m(n1 + n2 )(n1 + n2 2) where m is the number of predictor variables, (in this case, it is 2) 1(G 2) X 1(G 1) ] + b [X 2(G 2) X 2(G 1) ]} D 2 = (n1 + n2 2){a[X = (7 + 5 - 2) (0.17229x6 - 0.04973x1) = 9.8401 and F=
7x 5x (7+521) 2x (7+5)(7+52) x 9.9401
= 12.915
A Bose (BIMTECH)
Discriminant Analysis
21/ 24
November 2009
21 / 24
F=
= 12.915
The degrees of freedom for the F ratio is m, (n1 + n2 m 1), where m = 2 is the number of factors. The table value of F0.05,(2,9) = 4.26 Fobserved = 12.915 > Fcritical = 4.26, we reject H0 factors X1 (Infrastructure) and X2 (Interface events) are not equal in importance Based on H1 and the discriminant function, it is clear that the variable X1 (annual spending on infrastructure) is more important than the other variable X2 (annual spending on interface events).
A Bose (BIMTECH)
Discriminant Analysis
22/ 24
November 2009
22 / 24
Problem 1 - from Panneerselvam, P. 481 The performance standard of employees at a function of their age (X1 ) and family size (X2 ) is classied into Above Average and Below Average. The data on 10 dierent employees in a company are presented below: Employee 1 2 3 4 5 6 7 8 9 10 Standard Below Below Above Below Below Above Below Above Below Below X1 43 24 30 55 56 41 37 22 38 59 X2 3 4 6 3 5 3 3 4 6 4 (a) Design the discriminant function, Y = aX1 + bX2 (b) Compute the discriminant ratio, K, and identify the variable which is more important in relation to the other variable (c) Validate the discriminant function using the given data by forming groups based on the critical discriminant score (d) Test whether the group means are equal in importance at a signicance level of 0.05
23/ 24 November 2009 23 / 24
A Bose (BIMTECH)
Discriminant Analysis
Problem 2 - from Panneerselvam, P. 482 The potential customers of a computer company rate the product of the company as good or bad based on the time to respond to breakdown calls (X1 ) and the percentage discount on product price (X2 ). The ratings by customers are presented below: Customer 1 2 3 4 5 6 7 8 9 10 Rating Good Good Bad Bad Bad Good Good Bad Bad Good X1 (hrs) 24 12 36 12 36 36 24 48 96 36 (a) Design the discriminant function, Y = aX1 + bX2 X2 (%) 5 (b) Compute the discriminant ratio, K, 8 and identify the variable which is 4 more important in relation to the 0 other variable 3 (c) Validate the discriminant function 10 using the given data by forming 3 groups based on the critical 4 discriminant score 5 (d) Test whether the group means are 12 equal in importance at a signicance level of 0.05
Discriminant Analysis 24/ 24 November 2009 24 / 24
A Bose (BIMTECH)