Discriminant Analysis Statistics

DISCRIMINANT
No analysis is done for any subfile group for which the number of non-empty
groups is less than 2 or the number of cases or sum of weights fails to exceed the
number of non-empty groups. An analysis may be stopped if no variables are
selected during variable selection or the eigenanalysis fails.
Notation
The following notation is used throughout this chapter unless otherwise stated:
g Number of groups
p Number of variables
q Number of variables selected
Xijk Value of variable i for case k in group j
f jk Case weights for case k in group j
mj Number of cases in group j
nj Sum of case weights in group j
n Total sum of weights
Basic Statistics
Mean
F
mj I
= G∑ f JJ bvariable i in group jg
Xij
GH
k =1
jk Xijk
K
nj
Fg mj I avariable if
= G∑∑ f JJ
Xi•
GH
j =1 k =1
jk Xijk
K
n
1
2 DISCRIMINANT
Variances
F mj I
GG ∑ f 2
jk Xijk − n j Xij2 JJ
Sij2 =
H k =1 K bvariable i in group jg
dn − 1i
j
F g mj I
GG ∑ ∑ f X − nX 2 2 JJ avariable if
H K
jk ijk i
j =1 k =1
Si2• =
an − 1f
Within-groups Sums of Squares and Cross-product Matrix (W)
g mj F g mj IF mj I
wil = ∑∑ −∑ G ∑ f JJ GG ∑ f JJ i, l = 1, …, p
j =1 k =1
f jk Xijk Xljk
GH j =1 k =1
jk Xijk
KHk =1
jk Xljk
K
nj
Total Sums of Squares and Cross-product Matrix (T)
g mj F g mj IF g mj I
til = ∑∑ f jk Xijk Xljk − GG ∑ ∑ f jk Xijk JJ GG ∑ ∑ f jk Xljk JJ n
j =1 k =1 H j =1 k =1 KHj =1 k =1 K
Within-groups Covariance Matrix
W
C= n>g
a f
n−g
DISCRIMINANT 3
Individual Group Covariance Matrices C FH a f IK

j
Fm I
GG ∑ f jk Xijk Xljk − Xij Xlj n j JJ
j
a j f H k =1
cil =
K
dn j − 1i
Within-groups Correlation Matrix (R)
R| wil if wii wll > 0

ril = S| wii wll
TSYSMIS otherwise
Total Covariance Matrix
T
T′ =
n −1
Univariate F and Λ for Variable I
b
t − wii n − g
Fi = ii
ga f
wii g − 1 a f
with g −1 and n − g degrees of freedom
wii
Λi =
tii
with 1, g − 1 and n − g degrees of freedom.

4 DISCRIMINANT
Rules of Variable Selection

Both direct and stepwise variable entry are possible. Multiple inclusion levels may
also be specified.
Method = Direct
For direct variable selection, variables are considered for inclusion in the order in
which they are written on the ANALYSIS = list. A variable is included in the
analysis if, when it is included, no variable in the analysis will have a tolerance less
than the specified tolerance limit (default = 0.001).
Stepwise Variable Selection

At each step, the following rules control variable selection:
• Eligible variables with higher inclusion levels are entered before eligible
variables with lower inclusion levels.
• The order of entry of eligible variables with the same even inclusion level
is determined by their order on the ANALYSIS = specification.
• The order of entry of eligible variables with the same odd level of
inclusion is determined by their value on the entry criterion. The variable
with the “best” value for the criterion statistic is entered first.
• When level-one processing is reached, prior to inclusion of any eligible

variables, all already-entered variables which have level one inclusion
numbers are examined for removal. A variable is considered eligible for
removal if its F-to-remove is less than the F value for variable removal,
or, if probability criteria are used, the significance of its F-to-remove
exceeds the specified probability level. If more than one variable is
eligible for removal, that variable is removed that leaves the “best” value
for the criterion statistic for the remaining variables. Variable removal
continues until no more variables are eligible for removal. Sequential
entry of variables then proceeds as described previously, except that after
each step, variables with inclusion numbers of one are also considered for
exclusion as described before.
• A variable with a zero inclusion level is never entered, although some

statistics for it are printed.
DISCRIMINANT 5
Ineligibility for Inclusion

A variable with an odd inclusion number is considered ineligible for inclusion if:
• The tolerance of any variable in the analysis (including its own) drops
below the specified tolerance limit if it is entered, or
• Its F-to-enter is less than the F-value for a variable to enter value, or
• If probability criteria are used, the significance level associated with its F-
to-enter exceeds the probability to enter.
A variable with an even inclusion number is ineligible for inclusion if the first
condition above is met.
Computations During Variable Selection

During variable selection, the matrix W is replaced at each step by a new matrix
W∗ using the symmetric sweep operator described by Dempster (1969). If the first
q variables have been included in the analysis, W may be partitioned as:
LMW11 W12 OP
NW21 W22 Q
where W11 is q × q . At this stage, the matrix W∗ is defined by
W∗ =
LM−W111
− −1
W11 W12
=
∗OP LM
W11 ∗
W12 OP
MNW21W111− −1
W22 − W21W11 W12 ∗
W21PQ MN ∗
W22 PQ
In addition, when stepwise variable selection is used, T is replaced by the matrix
T ∗ , defined similarly.
6 DISCRIMINANT
The following statistics are computed:
Tolerance
R|0 if wii = 0
TOL i
|
= Sw w ∗
if variable i is not in the analysis and wii ≠ 0
||−1 ew w j
ii ii
∗
T ii ii if variable i is in the analysis and wii ≠ 0.
If a variable’s tolerance is less than or equal to the specified tolerance limit, or its
inclusion in the analysis would reduce the tolerance of another variable in the
equation to or below the limit, the following statistics are not computed for it or
any set including it.
F-to-Remove
Fi =
ew∗
ii ja
− tii∗ n − q − g + 1 f
tii∗ ag − 1f
with degrees of freedom g −1 and n − q − g +1.
F-to-Enter
F =
et ∗
ii ja
− wii∗ n − q − gf
i
wii∗ ag − 1f
with degrees of freedom g −1 and n − q − g .
Wilks’ Lambda for Testing the Equality of Group Means
Λ = W11 T11
with degrees of freedom q, g −1, and n − g .

DISCRIMINANT 7
The Approximate F Test for Lambda (the “overall F”), also known as Rao’s R (Tatsuoka, 1971)
F=
e1 − Λ jbr s + 1 − qh 2g
s
Λs qh
where
R| q + h − 5
2 2
|
s= S q h −4
2 2
if q 2 + h 2 ≠ 5
||
T1 otherwise
r = n − 1 − a q + gf 2
h = g −1
with degrees of freedom qh and r s + 1 − qh 2 . The approximation is exact if q or h

is 1 or 2.
Rao’s V (Lawley-Hotelling trace) (Rao, 1952; Morrison, 1976)
q q
a f∑ ∑ w ∗ bt
V = − n−g il il − wil g
i =1 l =1
When n − g is large, V, under the null hypothesis, is approximately distributed as

a f
χ 2 with q g −1 degrees of freedom. When an additional variable is entered, the
change in V, if positive, has approximately a χ 2 distribution with g −1 degrees of
freedom.
The Squared Mahalanobis Distance (Morrison, 1976) between groups a and b
q q
2
Dab a f∑ ∑ w∗ c X
= − n−g il ia hc
− Xib Xla − Xlb h
i =1 l =1
8 DISCRIMINANT
The F value for Testing the Equality of Means of Groups a and b
Fab =
bn − q − g + 1gn n D a b 2
qbn − g gbn + n g
ab
a b
The Sum of Unexplained Variations (Dixon, 1973)
g−1 g
R= ∑ ∑ 4 e4 + D j 2
ab
a =1 b = a +1
Classification Functions
Once a set of q variables has been selected, the classification functions (also known
as Fisher’s linear discriminant functions) can be computed using
q
a f∑ w∗ X
bij = n − g il lj i = 1, 2, …, q, j = 1, 2, …, g
l =1
for the coefficients, and
∑b X
1
a j = log p j − ij ij j = 1, 2, …, q
2
i =1
for the constant, where p j is the prior probability of group j.
Canonical Discriminant Functions

The canonical discriminant function coefficients are determined by solving the
general eigenvalue problem
( T − W ) V = λWV
DISCRIMINANT 9
where V is the unscaled matrix of discriminant function coefficients and l is a

diagonal matrix of eigenvalues. The eigensystem is solved as follows:
The Cholesky decomposition
W = LU
is formed, where L is a lower triangular matrix, and U = L ′ .

The symmetric matrix L−1BU −1 is formed and the system
bL ( T − W )U
−1 −1
g
− λI ( UV ) = 0
is solved using tridiagonalization and the QL method. The result is m eigenvalues,

b g
where m = min q, g − 1 and corresponding orthonormal eigenvectors, UV. The
eigenvectors of the original system are obtained as
V = U −1 UV a f
For each of the eigenvalues, which are ordered in descending magnitude, the
following statistics are calculated:
Percentage of Between-Groups Variance Accounted for
100λ k
m
∑λ k
k =1
Canonical Correlation
b
λk 1+ λk g
10 DISCRIMINANT
Wilks’ Lambda
Testing the significance of all the discriminating functions after the first k:
m
Λk = ∏1 b1 + λ g i k = 0, 1, …, m − 1
i = k +1
The significance level is based on
c a f h
χ 2 = − n − q + g 2 − 1 ln Λ k ,
a fa f
which is distributed as a χ 2 with q − k g − k − 1 degrees of freedom.
The Standardized Canonical Discriminant Coefficient Matrix D

The standard canonical discriminant coefficient matrix D is computed as
−1
D = S11 V
where
S = diag e w11 , w22 , …, w pp j
S11 = partition containing the first q rows and columns of S
V = matrix of eigenvectors such that
V ′W11V =I
The Correlations Between the Canonical Discriminant Functions and the

Discriminating Variables
The correlations between the canonical discriminant functions and the
discriminating variables are given by
−11
R = S11 W11V
DISCRIMINANT 11
a
If some variables were not selected for inclusion in the analysis q < p , thef
eigenvectors are implicitly extended with zeroes to include the nonselected
variables in the correlation matrix. Variables for which Wii = 0 are excluded from S
and W for this calculation; p then represents the number of variables with non-zero
within-groups variance.
The Unstandardized Coefficients

The unstandardized coefficients are calculated from the standardized ones using
B= an − gfS11−1D
The associated constants are:
q
ak = − ∑ bik Xi•
i =1
The group centroids are the canonical discriminant functions evaluated at the group
means:
q
fkj = ak + ∑ bik Xij
i =1
Tests For Equality Of Variance

Box’s M is used to test for equality of the group covariance matrices.
g
a f
M = n − g log C′ − ∑ dn j − 1ilog Ca jf
j =1
12 DISCRIMINANT
where
C′ = pooled within-groups covariance matrix excluding groups with singular
covariance matrices
Ca jf = covariance matrix for group j.
a jf
Determinants of C′ and C are obtained from the Cholesky decomposition. If any
diagonal element of the decomposition is less than 10 −11, the matrix is considered
singular and excluded from the analysis.
p
a jf
log C = 2 ∑ log lii − p logdn j − 1i
i =1
where lii is the ith diagonal entry of L such that n j − 1 C d i a f = L ′L .

j
Similarly,
p
log C′ = 2 ∑ log lii − p logan′ − gf
i =1
where
an′ − gfC′ = L′L
n′ = sum of weights of cases in all groups with nonsingular covariance matrices
The significance level is obtained from the F distribution with t1 and t2 degrees of
freedom using (Cooley and Lohnes, 1971):
R|M b if e2 > e12
F=S
|| t atb −MM f
2 if e2 < e12
T 1
DISCRIMINANT 13
where
F g 1 1 I 2 p + 3p −1
e1 = GG ∑ n − 1 − n − g JJ 6ag − 1fa p + 1f 2
Hj j
=1 K
Fg 1 I a p − 1fa p + 2f
e2 = GG ∑ (n − 1) −
1
JJ 6ag − 1f
Hj j
=1
2
( n − g) 2
K
a fa f
t1 = g − 1 p p + 1 2
b
t2 = t1 + 2 g e2 − e12
R| t 1 if e2 > e12
1− e − t t
b=S 1 1 2
||1 − e t− 2 t 2 if e2 < e12
T 1 2
If e12 − e2 is zero, or much smaller than e2 , t2 cannot be computed or cannot be

computed accurately. If
e2 = e2 + 0.0001 e2 − e12 e j
the program uses Bartlett’s χ 2 statistic rather than the F statistic:
b
χ 2 = M 1 − e1 g
with t1 degrees of freedom.
For testing the group covariance matrix of the canonical discriminant functions, the
procedure is similar. The covariance matrices C a j f and C′ are replaced by D j and
D′ , where
D j = B′ C B a jf
14 DISCRIMINANT
is the group covariance matrix of the discriminant functions.

The pooled covariance matrix in this case is an identity, so that
a f
D′ = n − g I m − ∑ dn j − 1iD j
j
where the summation is only over groups with singular D j .
Classification
The basic procedure for classifying a case is as follows:
• If X is the 1× q vector of discriminating variables for the case, the 1× m
vector of canonical discriminant function values is
f = XB + a
• A chi-square distance from each centroid is computed
d i d
χ 2j = f − f j D −j 1 f − f j i′
where D j is the covariance matrix of canonical discriminant functions for
group j and f j is the group centroid vector. If the case is a member of group j,
χ 2j has a χ 2 distribution with m degrees of freedom. P(X G) is the
significance level of such a χ 2j .
• The classification, or posterior probability P(G j | X) , is

−1 2 − χ 2j 2
Pj D j e
P(G j | X ) = g
∑
−1 2 − χ 2j 2
Pj D j e
j =1
where p j is the prior probability for group j. A case is classified into the group
for which P(G j | X) is highest.
DISCRIMINANT 15
The actual calculation of P(G j | X) is
g j = log Pj −
1
2
elog D j + χ 2j j
R| expFG g − max g IJ
|| H j
j K j
if g j − max g j > −46
|| ∑ expFGH g − max g IJK
g j
|
j j
j
P( G | X ) = S j =1
j
||
||0 otherwise
||
T
If individual group covariances are not used in classification, the pooled within-
groups covariance matrix of the discriminant functions (an identity matrix) is
substituted for D j in the above calculation, resulting in considerable simplification.
If any D j is singular, a pseudo-inverse of the form
LMD −1
j11 0 OP
MN0 0 PQ
replaces D −j 1 and D j11 replaces D j . D j11 is a submatrix of D j whose rows and

columns correspond to functions not dependent on preceding functions. That is,
function 1 will be excluded only if the rank of D j = 0 , function 2 will be excluded
only if it is dependent on function 1, and so on. This choice of the pseudo-inverse
is not optimal for the numerical stability of D −j11
1
, but maximizes the discrimination
power of the remaining functions.
16 DISCRIMINANT
Cross-Validation
The following notation is used in this section:
X
~ jk ( X1 jk , …, Xqjk )T
M
~ j
Sample mean of jth group
mj
∑f
1
M
~ j
= jk X
nj ~ jk
k =1
M
~ jk
Sample mean of jth group excluding point X jk
~
mj
∑ f jl X~ jl
1
M
~ jk
=
n j − f jk
l =1
l≠k
Σ Polled sample covariance matrix

Σj Sample covariance matrix of jth group
Σ jk Polled sample covariance matrix without point X jk
~
Σ −1 n − g − f jk −1
jk = (Σ +
n−g
n j Σ −j 1 ( X jk − M
~ j
)( X jk − M j ) T Σ −j 1
~ ~ ~
)
(n j − f jk )(n j − g) − n j ( X jk − M j ) T Σ −j 1 ( X jk − M
~ j
)
~ ~ ~
−1
d02 ( a~, b~ ) ~− b
~ ) Σ jk ( a − b)
T T
= (a ~ ~
Cross-validation applies only to linear discriminant analysis (not quadratic). During
cross-validation, SPSS loops over all cases in the data set. Each case, say X
~ jk
, is
extracted once and treated as test data. The remaining cases are treated as a new
data set.
DISCRIMINANT 17
Here we compute d02 ( X , M ) and d02 ( X

~ jk ~ jk
, M ) (i = 1,..., g. i ≠ j ) . If there is
~ jk ~ i
an i (i ≠ j ) that satisfies (log( Pi ) − d02 ( X , M ) / 2 > log( Pj ) − d02 ( X jk , M

~ jk ~ i ~ jk
) / 2 ),
~
then the extracted point X

~ jk
is misclassified. The estimate of prediction error rate is
the ratio of the sum of misclassified case weights and the sum of all case weights.
To reduce computation time, the linear discriminant method is used instead of
the canonical discriminant method. The theoretical solution is exactly the same for
both methods.
Rotations
Varimax rotations may be performed on either the matrix of canonical discriminant
function coefficients or on that of the correlation between the canonical
discriminant functions and the discrimination variables (the structure matrix). The
actual algorithm for the rotation is described in FACTOR.
For the Kaiser normalization
R|
1 + 1 w w∗
hi2 =S
| a
ii ii
squared multiple correlation f
if coefficients rotated
||∑ r
m
2
if correlations rotated
|T =
k 1
ik
The unrotated structure matrix is
−1
R = S11 W11V
If the rotation transformation matrix is represented by K, the rotated standardized

coefficient matrix D R is given by
D R = DK
18 DISCRIMINANT
The rotated matrix of pooled within-groups correlations between the canonical

discriminant functions and the discriminating variables R R is
R R = RK
The eigenvector matrix V satisfies
V ′( T − W )V = Λ a
= diag λ 1 , λ 2 , …, λ m f
where the λ k are the eigenvalues.
The equivalent matrix for the rotated coefficient VR
b V g′ a T − W fV
R R
is not diagonal, meaning the rotated functions, unlike the unrotated ones, are
correlated for the original sample, although their within-groups covariance matrix
is an identity. The diagonals of the above matrix may still be interpreted as the
between-groups variances of the functions. They are the numerators for the
proportions of variance printed with the transformation matrix. The denominator is
their sum. After rotation, the columns of the transformation are exchanged, if
necessary, so that the diagonals of the matrix above are in descending order.
References
Anderson, T. W. 1958. Introduction to multivariate statistical analysis. New York:
John Wiley & Sons, Inc.
Cooley, W. W., and Lohnes, P. R. 1971. Multivariate data analysis. New York:
John Wiley & Sons, Inc.
Dempster, A. P. 1969. Elements of Continuous Multivariate Analysis. Reading,

Mass.: Addison-Wesley.
Dixon, W. J., ed. 1973. BMD Biomedical computer programs. Los Angeles:
University of California Press.
Tatsuoka, M. M. 1971. Multivariate analysis. New York: John Wiley & Sons, Inc.

Discriminant Analysis Statistics

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Discriminant Analysis Statistics

Caricato da

Copyright:

Formati disponibili

DISCRIMINANT

n Total sum of weights

Within-groups Sums of Squares and Cross-product Matrix (W)

Total Sums of Squares and Cross-product Matrix (T)

Within-groups Covariance Matrix

Individual Group Covariance Matrices C FH a f IK

Within-groups Correlation Matrix (R)

R| wil if wii wll > 0

Total Covariance Matrix

Univariate F and Λ for Variable I

with 1, g − 1 and n − g degrees of freedom.

Rules of Variable Selection

Stepwise Variable Selection

• When level-one processing is reached, prior to inclusion of any eligible

• A variable with a zero inclusion level is never entered, although some

Ineligibility for Inclusion

Computations During Variable Selection

The following statistics are computed:

Wilks’ Lambda for Testing the Equality of Group Means

with degrees of freedom q, g −1, and n − g .

with degrees of freedom qh and r s + 1 − qh 2 . The approximation is exact if q or h

Rao’s V (Lawley-Hotelling trace) (Rao, 1952; Morrison, 1976)

When n − g is large, V, under the null hypothesis, is approximately distributed as

The Squared Mahalanobis Distance (Morrison, 1976) between groups a and b

The F value for Testing the Equality of Means of Groups a and b

The Sum of Unexplained Variations (Dixon, 1973)

for the coefficients, and

for the constant, where p j is the prior probability of group j.

Canonical Discriminant Functions

where V is the unscaled matrix of discriminant function coefficients and l is a

The Cholesky decomposition

is formed, where L is a lower triangular matrix, and U = L ′ .

is solved using tridiagonalization and the QL method. The result is m eigenvalues,

Percentage of Between-Groups Variance Accounted for

The significance level is based on

The Standardized Canonical Discriminant Coefficient Matrix D

The Correlations Between the Canonical Discriminant Functions and the

The Unstandardized Coefficients

Tests For Equality Of Variance

Ca jf = covariance matrix for group j.

where lii is the ith diagonal entry of L such that n j − 1 C d i a f = L ′L .

If e12 − e2 is zero, or much smaller than e2 , t2 cannot be computed or cannot be

is the group covariance matrix of the discriminant functions.

where the summation is only over groups with singular D j .

• A chi-square distance from each centroid is computed

• The classification, or posterior probability P(G j | X) , is

The actual calculation of P(G j | X) is

If any D j is singular, a pseudo-inverse of the form

replaces D −j 1 and D j11 replaces D j . D j11 is a submatrix of D j whose rows and

Σ Polled sample covariance matrix

Cross-validation applies only to linear discriminant analysis (not quadratic). During

Here we compute d02 ( X , M ) and d02 ( X

an i (i ≠ j ) that satisfies (log( Pi ) − d02 ( X , M ) / 2 > log( Pj ) − d02 ( X jk , M

then the extracted point X

The unrotated structure matrix is

If the rotation transformation matrix is represented by K, the rotated standardized

The rotated matrix of pooled within-groups correlations between the canonical

The eigenvector matrix V satisfies

The equivalent matrix for the rotated coefficient VR

Dempster, A. P. 1969. Elements of Continuous Multivariate Analysis. Reading,

Potrebbero piacerti anche