Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Discriminant Analysis
Content list
Introduction
This chapter introduces another extension of regression where the DV may have more than
two conditions at a categorical level and IV’s are scale data.
DA is used when:
• the dependent is categorical with the predictor IV’s at interval level such as age, income,
attitudes, perceptions, and years of education, although dummy variables can be used
as predictors as in multiple regression. Logistic regression IV’s can be of any level of
measurement.
• there are more than two DV categories, unlike logistic regression, which is limited to a
dichotomous dependent variable.
D = v1X1 + v 2 X 2 + v3 X 3 = ........v i X i + a
This function is similar to a regression equation or function. The v’s are unstandardized
discriminant coefficients analogous to the b’s in the regression equation. These v’s maximize
the distance between the means of the criterion (dependent) variable. Standardized
discriminant coefficients can also be used like beta weight in regression. Good predictors
tend to have large weights. What you want this function to do is maximize the distance
between the categories, i.e. come up with an equation that has strong discriminatory power
between groups. After using an existing set of data to calculate the discriminant function
and classify cases, any new cases can then be classified. The number of discriminant func-
tions is one less the number of groups. There is only one function for the basic two group
discriminant analysis.
• each of the allocations for the dependent categories in the initial classification are
correctly classified;
• there must be at least two groups or categories, with each case belonging to only one
group so that the groups are mutually exclusive and collectively exhaustive (all cases
can be placed in a group);
• each group or category must be well defined, clearly differentiated from any other
group(s) and natural. Putting a median split on an attitude scale is not a natural way to
form groups. Partitioning quantitative variables is only justifiable if there are easily
identifiable gaps at the points of division;
• for instance, three groups taking three available levels of amounts of housing loan;
• the groups or categories should be defined before collecting the data;
• the attribute(s) used to separate the groups should discriminate quite clearly between
the groups so that group or category overlap is clearly non-existent or minimal;
• group sizes of the dependent should not be grossly different and should be at least five
times the number of independent variables.
• To investigate differences between groups on the basis of the attributes of the cases,
indicating which attributes contribute most to group separation. The descriptive tech-
nique successively identifies the linear combination of attributes known as canonical
discriminant functions (equations) which contribute maximally to group separation.
• Predictive DA addresses the question of how to assign new cases to groups. The DA
function uses a person’s scores on the predictor variables to predict the category to
which the individual belongs.
• To determine the most parsimonious way to distinguish between groups.
• To classify cases into groups. Statistical significance tests using chi square enable you
to see how well the function separates the groups.
• To test theory whether cases are classified as predicted.
The aim of the statistical analysis in DA is to combine (weight) the variable scores in
some way so that a single new composite variable, the discriminant score, is produced.
One way of thinking about this is in terms of a food recipe, where changing the propor-
tions (weights) of the ingredients will change the characteristics of the finished cakes.
Hopefully the weighted combinations of ingredients will produce two different types
of cake.
Similarly, at the end of the DA process, it is hoped that each group will have a normal
distribution of discriminant scores. The degree of overlap between the discriminant score
distributions can then be used as a measure of the success of the technique, so that, like the
592 EXTENSION CHAPTERS ON ADVANCED TECHNIQUES
Poor
Good
different types of cake mix, we have two different types of groups (Fig. 25.1). For
example:
The top two distributions in Figure 25.1 overlap too much and do not discriminate too
well compared to the bottom set. Misclassification will be minimal in the lower pair,
whereas many will be misclassified in the top pair.
Standardizing the variables ensures that scale differences between the variables are
eliminated. When all variables are standardized, absolute weights (i.e. ignore the sign) can
be used to rank variables in terms of their discriminating power, the largest weight being
associated with the most powerful discriminating variable. Variables with large weights are
those which contribute mostly to differentiating the groups.
As with most other multivariate methods, it is possible to present a pictorial explanation
of the technique. The following example uses a very simple data set, two groups and two
variables. If scattergraphs are plotted for scores against the two variables, distributions like
those in Figure 25.2 are obtained.
The new axis represents a new variable which is a linear combination of x and y, i.e. it is
a discriminant function (Fig. 25.3). Obviously, with more than two groups or variables this
graphical method becomes impossible.
Clearly, the two groups can be separated by these two variables, but there is a large
amount of overlap on each single axis (although the y variable is the ‘better’ discriminator).
It is possible to construct a new axis which passes through the two group centroids
(‘means’), such that the groups do not overlap on the new axis.
In a two-group situation predicted membership is calculated by first producing a score
for D for each case using the discriminate function. Then cases with D values smaller than
the cut-off value are classified as belonging to one group while those with values larger
are classified into the other group. SPSS will save the predicted group membership and
D scores as new variables.
The group centroid is the mean value of the discriminant score for a given category of
the dependent variable. There are as many centroids as there are groups or categories. The
cut-off is the mean of the two centroids. If the discriminant score of the function is less than
or equal to the cut-off the case is classed as 0, whereas if it is above, it is classed as 1.
3 Click Define Range button and enter the lowest and highest code for your groups (here
it is 1 and 2) (Fig. 25.5).
4 Click Continue.
5 Select your predictors (IV’s) and enter into Independents box (Fig. 25.6) and select
Enter Independents Together. If you planned a stepwise analysis you would at this
point select Use Stepwise Method and not the previous instruction.
6 Click on Statistics button and select Means, Univariate Anovas, Box’s M, Unstandardized
and Within-Groups Correlation (Fig. 25.7).
7 Continue >> Classify. Select Compute From Group Sizes, Summary Table, Leave
One Out Classification, Within Groups, and all Plots (Fig. 25.8).
8 Continue >> Save and select Predicted Group Membership and Discriminant Scores
(Fig. 25.9).
9 OK.
Group Statistics
Valid N (listwise)
smoke or not Mean Std. deviation Unweighted Weighted
standard deviations. For example, mean differences between self-concept scores and
anxiety scores depicted in Table 25.1 suggest that these may be good discriminators as the
separations are large. Table 25.2 provides strong statistical evidence of significant differ-
ences between means of smoke and no smoke groups for all IV’s with self-concept and
anxiety producing very high value F’s. The Pooled Within-Group Matrices (Table 25.3)
also supports use of these IV’s as intercorrelations are low.
Log Determinants
Smoke or not Rank Log determinant
non-smoker 5 17.631
smoker 5 18.058
Pooled within-groups 5 18.212
The ranks and natural logarithms of determinants printed are those of the group covariance matrices.
DISCRIMINANT ANALYSIS 599
Test Results
Box’s M 176.474
F Approx. 11,615
df1 15
df2 600825.3
Sig. .000
example (Table 25.6) a canonical correlation of .802 suggests the model explains 64.32%
of the variation in the grouping variable, i.e. whether a respondent smokes or not.
Eigenvalues
Function Eigenvalue % of variance Cumulative % Canonical correlation
a
1 1.806 100.0 100.0 .802
a.
First 1 canonical discriminant functions were used in the analysis.
Wilks’ lambda
Wilks’ lambda indicates the significance of the discriminant function. This table (Table 25.7)
indicates a highly significant function (p < .000) and provides the proportion of total variability
not explained, i.e. it is the converse of the squared canonical correlation. So we have 35.6%
unexplained.
Wilks’ Lambda
Test of function(s) Wilks’ Lambda Chi-square df Sig.
1 .356 447.227 5 .000
age .212
self concept score .763
anxiety score −.614
days absent last year −.073
total anti-smoking policies subtest B .378
Structure Matrix
Function
1
Pooled within-groups correlations between discriminating variables and standardized canonical discriminant
functions. Variables ordered by absolute size of correlation within function.
DISCRIMINANT ANALYSIS 601
Function
1
age .024
self concept score .080
anxiety score −.100
days absent last year −.012
total anti-smoking policies subtest B .134
(Constant) −4.543
Unstandardized coefficients.
The discriminant function coefficients b or standardized form beta both indicate the
partial contribution of each variable to the discriminate function controlling for all other
variables in the equation. They can be used to assess each IV’s unique contribution to the
discriminate function and therefore provide information on the relative importance of each
variable. If there are any dummy variables, as in regression, individual beta weights cannot
be used and dummy variables must be assessed as a group through hierarchical DA running
the analysis, first without the dummy variables then with them. The difference in squared
canonical correlation indicates the explanatory effect of the set of dummy variables.
Group centroids table
A further way of interpreting discriminant analysis results is to describe each group in
terms of its profile, using the group means of the predictor variables. These group means
are called centroids. These are displayed in the Group Centroids table (Table 25.11). In our
example, non-smokers have a mean of 1.125 while smokers produce a mean of –1.598.
Cases with scores near to a centroid are predicted as belonging to that group.
non-smoker 1.125
smoker −1.598
Classification table
Finally, there is the classification phase. The classification table, also called a confusion
table, is simply a table in which the rows are the observed categories of the dependent and
the columns are the predicted categories. When prediction is perfect all cases will lie on the
diagonal. The percentage of cases on the diagonal is the percentage of correct classifica-
tions. The cross validated set of data is a more honest presentation of the power of the
discriminant function than that provided by the original classifications and often produces
a poorer outcome. The cross validation is often termed a ‘jack-knife’ classification, in that
it successively classifies all cases but one to develop a discriminant function and then
categorizes the case that was left out. This process is repeated with each case left out in
turn. This cross validation produces a more reliable function. The argument behind it is that
one should not use the case you are trying to predict as part of the categorization process.
The classification results (Table 25.12) reveal that 91.8% of respondents were classified
correctly into ‘smoke’ or ‘do not smoke’ groups. This overall predictive accuracy of the
discriminant function is called the ‘hit ratio’. Non-smokers were classified with slightly
better accuracy (92.6%) than smokers (90.6%). What is an acceptable hit ratio? You must
compare the calculated hit ratio with what you could achieve by chance. If two samples are
equal in size then you have a 50/50 chance anyway. Most researchers would accept a hit
ratio that is 25% larger than that due to chance.
Classification Resultsb,c
Predicted Group Membership
smoke or not non-smoker smoker Total
Saved variables
As a result of asking the analysis to save the new groupings, two new variables can now be
found at the end of your data file. dis_1 is the predicted grouping based on the discriminant
analysis coded 1 and 2, while dis1_1 are the D scores by which the cases were coded into
their categories. The average D scores for each group are of course the group centroids
reported earlier. While these scores and groups can be used for other analyses, they are
DISCRIMINANT ANALYSIS 603
40
30
smoker
20
40
30
non-smoker
20
10
0
−2.50000 0.00000 2.50000
Discriminant Scores from Function1 for Analysis 1
Figure 25.10 Histograms showing the distribution of discriminant scores for smokers
and non-smokers.
New cases
Mahalanobis distances (obtained from the Method Dialogue Box) are used to analyse
cases as it is the distance between a case and the centroid for each group of the dependent.
So a new case or cases can be compared with an existing set of cases. A new case will have
one distance for each group and therefore can be classified as belonging to the group for
which its distance is smallest. Mahalanobis distance is measured in terms of SD from the
centroid, therefore a case that is more than 1.96 Mahalanobis distance units from the
centroid has a less than 5% chance of belonging to that group.
0.00000
−2.50000
non-smoker smoker
Predicted Group for Analysis 1
Figure 25.11 Box plots illustrating the distribution of discriminant scores for the
two groups.
anxiety score, and attitude to anti-smoking workplace policy. Significant mean differences
were observed for all the predictors on the DV. While the log determinants were quite
similar, Box’s M indicated that the assumption of equality of covariance matrices was
violated. However, given the large sample, this problem is not regarded as serious. The
discriminate function revealed a significant association between groups and all predictors,
accounting for 64.32% of between group variability, although closer analysis of the struc-
ture matrix revealed only two significant predictors, namely self-concept score (.706) and
anxiety score (–.527) with age and absence poor predictors. The cross validated
classification showed that overall 91.8% were correctly classified’.
amount to the canonical R squared. The criteria for adding or removing is typically the
setting of a critical significance level for ‘F to remove’.
To undertake this example, please access SPSS Ch 25 Data File A. It is the same file we
used above. On this occasion we will enter the same predictor variables one step at a time
to see which combinations are the best set of predictors, or whether all of them are retained.
Only one of the SPSS screen shots will be displayed, as the others are the same as those
used above.
Figure 25.12 Discriminant analysis dialogue box selected for stepwise method.
606 EXTENSION CHAPTERS ON ADVANCED TECHNIQUES
Wilks’ Lambda
Exact F
Number of
Step Variables Lambda df1 df2 df3 Statistic df1 df2 Sig.
1 1 .526 1 1 436 392.672 1 436.000 .000
2 2 .406 2 1 436 317.583 2 435.000 .000
3 3 .368 3 1 436 248.478 3 434.000 .000
4 4 .358 4 1 436 194.468 4 433.000 .000
DISCRIMINANT ANALYSIS 607
SPSS Activity. Please access SPSS Chapter 25 Data File B on the Web page and
conduct both a normal DA and a stepwise DA using all the variables in both
analyses. Discuss your results in class. The dependent or grouping variable is
whether the workplace is seen as a beneficial or unpleasant environment. The
predictors are mean opinion scale scores on dimensions of workplace perceptions.
Review questions
Qu. 25.1
The technique used to develop an equation for predicting the value of a qualitative DV
based on a set of IV’s that are interval and categorical is:
Qu. 25.2
The number of correctly classified cases in discriminant analysis is given by:
Qu. 25.3
If there are more than 2 DV categories:
Qu. 25.4
Why would you use discriminant analysis rather than regression analysis?
Check your answers in the information above.
Now access the Web page for Chapter 25 and check your answers to the above
questions. You should also attempt the SPSS activity located there.
Further reading
Agresti, A. 1996. An Introduction to Categorical Data Analysis. John Wiley and Sons.