Sei sulla pagina 1di 52

Structural Equation Modelling for small samples

Michel Tenenhaus
HEC School of Management (GRECHEC),
1 rue de la Libration, Jouy-en-Josas, France [tenenhaus@hec.fr]

Abstract
Two complementary schools have come to the fore in the field of Structural Equation Modelling
(SEM): covariance-based SEM and component-based SEM.
The first approach developed around Karl Jreskog. It can be considered as a generalisation of both
principal component analysis and factor analysis to the case of several data tables connected by
causal links.
The second approach developed around Herman Wold under the name "PLS" (Partial Least
Squares). More recently Hwang and Takane (2004) have proposed a new method named
Generalized Structural Component Analysis. This second approach is a generalisation of principal
component analysis (PCA) to the case of several data tables connected by causal links.
Covariance-based SEM is usually used with an objective of model validation and needs a large
sample (what is large varies from an author to another: more than 100 subjects and preferably more
than 200 subjects are often mentioned). Component-based SEM is mainly used for score
computation and can be carried out on very small samples. A research based on 6 subjects has been
published by Tenenhaus, Pags, Ambroisine & Guinot (2005) and will be used in this paper.
In 1996, Roderick McDonald published a paper in which he showed how to carry out a PCA using
the ULS (Unweighted Least Squares) criterion in the covariance-based SEM approach. He
concluded from this that he could in fact use the covariance-based SEM approach to obtain results
similar to those of the PLS approach, but with a precise optimisation criterion in place of an
algorithm with not well known properties.
In this research, we will explore the use of ULS-SEM and PLS on small samples. First experiences
have already shown that score computation and bootstrap validation are very insensitive to the
choice of the method. We will also study the very important contribution of these methods to multiblock analysis.
Key words: Multi-block analysis, PLS path modelling, Structural Equation Modelling, Unweighted
Least Squares
Introduction
Compare to covariance-based SEM, PLS suffers from several handicaps: (1) the diffusion of path
modelling softwares is much more confidential than that of covariance-based SEM softwares, (2) the
PLS algorithm is more an heuristic than an algorithm with well known properties and (3) the
possibility of imposing value or equality constraints on path coefficients is easily managed in
covariance-based SEM and does not exist in PLS. Of course, PLS has also some advantages on
covariance-based SEM (thats why PLS exists) and we can list some of them: systematic
convergence of the algorithm due to its simplicity, possibility of managing data with a small number
of individuals and a large number of variables, practical meaning of the latent variable estimates,
general framework for multi-block analysis.

It is often mentioned that PLS is to covariance-based SEM as PCA is to factor analysis. But the
situation has seriously changed when Roderick McDonald showed in his 1996 seminal paper that he
could easily carry out a PCA with a covariance-based SEM software by using the ULS (Unweighted
Least Squares) criterion and cancelling the measurement error variances. Furthermore, the
estimation of the latent variables proposed by McDonald is similar to using the PLS mode A and the
SEM scheme (i.e. using the theoretical latent variables as inner LV estimates). Thus, it became
possible to use a covariance-based SEM software to mimic PLS.
In the first section of this paper, it is reminded how to use the ULS criterion for covariance-based
SEM and the PLS way of estimating latent variables for mimicking PLS path modelling. Then, the
second section is devoted to show how to carry out a PCA with a covariance-based SEM software
and to comment the interest of this approach for taking into account parameter constraints and for
bootstrapping. Multi-block analysis is presented in the third section as a confirmatory factor
analysis.
We have used AMOS 6.0 (Arbuckle, 2005) and XLSTAT-PLSPM, a module of the XLSTAT
software (XLSTAT, 2007), on practical examples to illustrate the paper. Listing the pluses and
minuses of ULS-SEM and PLS finally concludes the paper.
I.

Using ULS and PLS estimation methods for structural equation modelling

We describe in this section the use of the ULS estimation method applied to the SEM parameter
estimates and that of the PLS estimation method for computing the LV values.
In a first part we remind the structural equation model following Bollen (1989). A structural
equation model consists of two models: the latent variable model and the measurement model.
The latent variable model
Let be a column vector consisting of m endogenous (dependent) centred latent variables, and a
column vector consisting of k exogenous (independent) centred latent variables. The structural
model connecting the vector to the vectors and is written as

= B + +

(1)

where B is a zero-diagonal m m matrix of regression coefficients, a m k matrix of regression


coefficients and a centred random vector of dimension m.
The measurement model
Each latent (unobservable) variable is described by a set of manifest (observable) variables. The
column vector yj of the centred manifest variables linked to the dependent latent variable j can be
written as a function of j through a simple regression with usual hypotheses.
y j = yj j + j

(2)

The column vector y, obtained by concatenation of the yjs, is written as

y = y +

(3)
m

where y = yj is the direct sum of 1y ,..., my and is a column vector obtained by concatenation
j=1

of the js. It may be reminded that the direct sum of a set of matrices A1, A2,, Am is a block
diagonal matrix in which the blocks of the diagonal are formed by matrices A1, A2,, Am.
2

Similarly, the column vector x of the centred manifest variables linked to the latent independent
variables is written as a function of :
(4)

x = x +

Adding the usual hypothesis that the matrix I-B is non-singular, equation (1) can also be written as:
= (I - B) 1 ( + )

(5)

and consequently (3) becomes


y = y [(I - B)1 ( + )] +

(6)

Factorisation of the manifest variable covariance matrix

Let = Cov() = E(), = Cov() = E(), = Cov() = E() and = Cov() = E().
Suppose that the random vectors , , and are independent of each other and that the covariance
matrices , , of the error terms are diagonal. Then, we get:
xx = x x ' + ,
yy = y [(I - B) 1 (' + )][(I - B) ']1 y ' + ,

xy = x ' [ (I - B) '] y '


1

From which we finally obtain:


(7)
yy
=
xy

yx y [(I - B) 1 (' + )][(I - B) ']1 y ' +


=
xx
x ' (I - B) 1 ' y '

y [ (I - B) ] x '

x x ' +
1

Let = { x , y , B, ,, , , } be the set of parameters of the model and () the matrix (7).
Model estimation using the ULS method

Let S be the empirical covariance matrix of the MVs. The object is to seek the set of parameters
,
, B , ,,
,
,
} minimizing the criterion

= {
x
y

(8)

S ()

The aim is therefore to seek a factorisation of the empirical covariance matrix S as a function of the
parameters of the structural model. In SEM softwares, the covariance matrix estimations
n () and
n () of the residual terms are computed in such a way that the diagonal of
= Cov
= Cov

is null, even when it yields to negative variance


the reconstruction error matrix E = S ()
(Heywood case).

,
, B , ,,
, 0, 0) and by the i-th term

Lets denote by ii the i-th term of the diagonal of (
x
y
ii

. From the formula:
of the diagonal of

(9)

sii = ii + ii

we may conclude that ii is the part of the variance sii of the i-th MV explained by its LV (except in
a Heywood case) and ii is the estimate of the variance of the measurement error relative to this
MV. As all the error terms e = s ( + ) are null, this method is not oriented towards the
ii

ii

ii

ii

research of parameters explaining the MV variances. It is in fact oriented towards the reconstruction
of the covariances between the MVs, variances excluded.
The McDonald approach for parameter estimation

In his 96 paper, McDonald proposes to estimate the model parameters subject to the constraints that
,
, B , ,,
minimizing the criterion

all the ii are null. The object is to seek the parameters
x
y
(10)

,
, B , ,,
, 0, 0)

S (
x
y

The estimations of the variances of the residual terms and are integrated in the diagonal terms of
,
, B , ,,
, 0, 0) . This method is therefore oriented

the reconstruction error matrix E = S (
x
y
towards the reconstruction of the full MV covariance matrix, variances included. On a second step,
and
of the variances of the residual terms and are obtained by using again
final estimation

formula (9).
Goodness of Fit

The quality of the fit can be measured by the GFI (Goodness of Fit Index) criterion of Jreskog &
Sorbum, defined by the formula
(11)

GFI = 1

i.e. the proportion of S

,
, B , ,,
,
,
)

S (
x
y

explained by the model. By convention, the model under study is

acceptable when the GFI is greater than 0.90.


,
, B , ,,
,
,
)

The quantity S (
x
y

can be deduced from the CMIN criterion given in

AMOS:
(12)

CMIN =

N 1
,
, B , ,,
,
,
)

S (
x
y

where N is the number of cases.

In practical applications of the McDonald approach, the difference between the GFI given by AMOS
and the exact GFI computed with formula (11) will be small:
GFI = 1

(13)

,
, B , ,,
,
,
)

S (
x
y

2
2

= 1

,
, B , ,,
, 0, 0) 2

S (
ii
x
y
i

Using the McDonald approach, the GFI given by AMOS is equal to


GFI = 1

(14)

and

2
ii

/ S

,
, B , ,,
, 0, 0)

S (
x
y
S

is usually small. Furthermore, the exact GFI will always be larger than the GFI

given by AMOS.
Evaluation of the latent variables

After having estimated the parameters of the model, we now present the problem of evaluating the
latent variables. Three approaches can be distinguished: the traditional SEM approach, the
"McDonald" approach, and the "Fornell" approach. As it is usual in the PLS approach, we now
designate one manifest variable with the letter x and one latent variable with the letter , regardless
of whether they are of the dependent or independent type. The total number of latent variables is
n = k + m and the number of manifest variables related to the latent variable j is pj.
The traditional SEM approach
To construct an estimation j of j , one proceeds by multiple regression of j on the whole set of
the centred manifest variables x x ,..., x x . In other words, if one denotes as the
11

11

npn

npn

xx

implied (i.e. predicted by the structural model) covariance matrix between the manifest variables,
and as x j the vector of the implied covariances between the manifest variables x and the latent
variable j, one obtains an expression of j as a function of the whole set of manifest variables:
(15)

j = X xx1 x

where X = x11 x11 ,..., xnpn xnpn . This method is not really usable, as it is more natural to
estimate a latent variable solely as a function of its own manifest variables.

The "McDonald" approach for LV evaluation


Let x j1 ,..., x jp j be the manifest variables relative to the latent variable j. McDonald (1996) proposes
evaluating the latent variable j with the aid of the formula
(16)

j w jk ( x jk x jk )
k

n ( x , ) is the implied covariance between the MV xjk and the LV j and where
where w jk = Cov
jk
j
means that the left term is the standardized version of the right term.
The regression coefficient jk of the latent variable j in the regression of the manifest variable xjk on
the latent variable j is estimated by
(17)

n ( x , ) / Var
m ( )
jk = Cov
jk
j
j

From this, we deduce that formula (16) can also be written as


(18)

j jk ( x jk x jk )
k

The McDonald approach thus amounts to estimating the latent variable j with the aid of the first
PLS component computed in the PLS regression of the latent variable j on the manifest variables
x j1 ,..., x jp j . This approach could enter into the PLS framework. In the usual PLS approach (Wold
(1985), Tenenhaus, Esposito Vinzi, Chatelin & Lauro (2005)), under mode A, the outer weights are
obtained by simple regression of each variable xjk on the inner estimate zj of the latent variable j.. It
is necessary to calculate expressly the inner estimate zj of j to obtain these weights. Three
procedures are proposed in PLS softwares: the centroid, factorial and structural schemes. The
covariance-based SEM softwares, on the other hand, give directly the weights (loadings) that for
each xjk represent an estimation of the regression coefficient of the "theoretical" latent variable j in
the regression of xjk on j. Consequently, instead of the regression coefficient of the inner estimate
zj, the estimated regression coefficient of the "theoretical" latent variable j can be used. We have
proposed this procedure for calculating the weights based simply on the outputs of a covariancebased SEM software in Tenenhaus, Esposito Vinzi, Chatelin & Lauro (2005). We called it the
"LISREL" scheme and, without knowing it, found the choice of weights proposed by McDonald.
The "Fornell" approach
When all the coefficients jk relative to a latent variable j have the same sign and the manifest
variables are of a comparable order of magnitude, Fornell proposes building up a score taking into
account the level of the manifest variables xjk:
(19)

j = k jk x jk / k jk

This approach is standard in customer satisfaction studies.

Example 1

The following data have been collected by Jrome Pags (ENSAR-INSFA, Rennes). Six orange
juices were selected from the most well-known brands in France. Three products can be stored at
room temperature (Joker, Pampryl and Tropicana all at room temperature (r.t.)) and three others
have to be stored in refrigerated conditions (Fruivita, Pampryl and Tropicana all refrigerated (refr.)).
Table 1 provides an extract of the data. The first nine variables correspond to the physico-chemical
data, the following seven to sensorial assessments and the last 96 variables represent marks of
appreciation of the product given by students at ENSA, Rennes. These figures have already been
used in Tenenhaus, Pags, Ambroisine, Guinot (2005) to illustrate the use of PLS regression and
PLS path modeling on very small samples. In that paper, we have shown how to select a group of
homogenous judges with respect to their preferences. Only this homogenous group of judges will be
used in the present paper.
Table 1: Extract from the orange juice data file
PAMPRYL
r.t.
________
Glucose
Fructose
Saccharose
Sweetening power
pH before processing
pH after centrifugation
Titer
Citric acid
Vitamin C
Smell Intensity
Odor typicity
Pulp
Taste intensity
Acidity
Bitterness
Sweetness
Judge 1
Judge 2
Judge 3
.
.
.
Judge 96

TROPICANA
r.t.
_________

FRUIVITA
refr.
_________

JOKER
r.t.
________

TROPICANA
refr.
_________

PAMPRYL
refr.
_________

25.32
27.36
36.45
89.95
3.59
3.55
13.98
.84
43.44
2.82
2.53
1.66
3.46
3.15
2.97
2.60
2.00
1.00
2.00

17.33
20.00
44.15
82.55
3.89
3.84
11.14
.67
32.70
2.76
2.82
1.91
3.23
2.55
2.08
3.32
2.00
3.00
3.00

23.65
25.65
52.12
102.22
3.85
3.81
11.51
.69
37.00
2.83
2.88
4.00
3.45
2.42
1.76
3.38
3.00
3.00
4.00

32.42
34.54
22.92
90.71
3.60
3.58
15.75
.95
36.60
2.76
2.59
1.66
3.37
3.05
2.56
2.80
2.00
2.00
2.00

22.70
25.32
45.80
94.87
3.82
3.78
11.80
.71
39.50
3.20
3.02
3.69
3.12
2.33
1.97
3.34
4.00
4.00
3.00

27.16
29.48
38.94
96.51
3.68
3.66
12.21
.74
27.00
3.07
2.73
3.34
3.54
3.31
2.63
2.90
3.00
1.00
1.00

3.00

3.00

4.00

2.00

4.00

1.00

PLS regression makes possible to link the block Y comprising hedonic data related to the
homogenous block of judges to the block X comprising physico-chemical and sensorial data. One
may, however, wish to take into account the fact that there are actually two blocks of variables
explaining the block Y of hedonic data: block X1 comprising physico-chemical data and block X2
comprising sensorial data.
Let us assume that the sensorial variables depend on the physico-chemical variables and that the
hedonic variables in turn depend on the physico-chemical variables and the sensorial variables. We
can then construct the arrow diagram shown in figure 1. The ULS-SEM approach described above
and the PLS approach proposed by Herman Wold (Wold, 1985, Tenenhaus, Esposito Vinzi, Chatelin
& Lauro (2005)) allow this type of model to be studied.
In the arrow diagram shown in figure 1, we assume that each block of manifest variables is
summarized by a latent variable. The relationship between the manifest variables (observable) and
the latent variables (non-observable) may be formative, i.e. the function of the latent variable is to
7

summarize the manifest variables of the block. This relationship may also be reflective: each
manifest variable is then a reflection of a latent variable existing a priori, a theoretical concept one
would try to outline with measures. The formative mode does not require the blocks to be onedimensional, while that is compulsory for the reflective mode. Here, we are more in a formative
mode for the physico-chemical and sensorial blocks and reflective mode by construction for the
hedonic block. The two modes are indicated by the direction of the arrows in Figure 1.
With regard to the PLS algorithm, it is recommended that the method of calculating outer estimates
of the latent variables is selected depending on the type of relationship between the manifest
variables and their latent variables: Mode A for the reflective type and Mode B for the formative
type (Wold, 1985). The low number of products has obliged us to use Mode A to calculate the outer
estimates of the latent variables (although the mode of relationship between the manifest and latent
variables is formative for the physico-chemical and sensorial blocks).
The ULS-SEM approach presented here is clearly oriented towards the reflective mode. Therefore
this orange juice example will be analyzed with ULS-SEM and PLS approaches using the reflective
mode for the three blocks. Concerning the physico-chemical and sensorial blocks, the direction of
the arrows connecting the MVs to their LVs shown in Figure 1 should thus be inversed.
Figure 1: Theoretical model of relationships between the hedonic, physico-chemical and sensorial
data
Glucose
Fructose
Saccharose
Sweetening power
pH before processing
pH after centrifugation
Titer
Citric acid
VitaminC
Smell intensity
Odor typicity
Pulp
Taste intensity
Acidity
Bitterness
Sweetness

1.

1
3

Judge 2
Judge 3

#
Judge 96

Use of ULS-SEM

We now use the ULS-SEM approach on the orange juice data. Following McDonald, the
measurement error variances are put to 0. The results are given in Figure 2 and in Table 2.
All manifest variables have been standardized. The value 1 has been given to the path coefficients
related to the manifest variables pH before centrifugation, Sweetness and Judge2.

Figure 2: AMOS 6.0 software output for the orange juice data
1
judge2

.00
e7

.00

Saccharose

.89

.92

Sweetening power

.00

.22

1.00
.96
.88

PHYSICO-CHEMICAL

1.00

1
e3

pH after centrifuga.

.00

-.88
-.06

e38

HEDONIC

e14

.00

.00
.00

.00

Odor typicity
1

e12

e11

e10

.00

.64

.93

SENSORIAL

.82

Acidity

judge68

e31

.00
e32

judge84

e33

judge86

Sweetness

e34

Judge92
1

judge96

judge91

e35

e36
e39

.00

.00

.00

judge79

1.00

.00

e30

Bitterness

e29

judge77

1
e8

e28

judge63

.70 1.06

1
e9

-.97

.00

.00

e27

judge60

.81

-.95

e26

.97
1.12

-.57
Taste intensity

judge59

.75
.86
.57

.00

e25

judge55

1.07

.66

Pulp
1

.00

.24

e13

.94

d1

.00

.59

.30
Smell intensity

e24

judge52

1.06
Vitamin C

.00

1
judge48

.90

.00
e23

1.06

.00

1
judge35

d2

Citric acid

.00

e22

judge31

.75
.03

.78

e21

.76

.22

Titer

e37

.00

-.87

1
e1

.00

1
judge30

1.04

e2

.00

e20

judge25

.97

.00

1.05

1.00

.00

e19

judge12

pH before centrifuga.

.00

e18

judge11

e4

.00

e17

judge6

-.76

e5

-.77

Fructose

.00

e16

judge3

e6

.00

Glucose

.00

.00

e15

.00

.00

.00

.00

.00

Table 2: Outputs of AMOS 6.0


Table 2.1: Regression Weights (non significant weights in bold):
Parameter
SENSORIAL
HEDONIC
HEDONIC
Glucose
Fructose
Saccharose
Sweetening power
pH before centrifug.
pH after centrifug.
Titer
Citric acid
Vitamin C
Smell intensity
Odor typicity
Pulp
Taste intensity
Acidity
Bitterness
Sweetness
Judge2
Judge3
Judge6
Judge11
Judge12
Judge25
Judge30
Judge31
Judge35
Judge48
Judge52
Judge55
Judge59
Judge60
Judge63
Judge68
Judge77
Judge79
Judge84
Judge86
Judge91
Judge92
Judge96

<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<---

PHYSICO-CHEMICAL
PHYSICO-CHEMICAL
SENSORIAL
PHYSICO-CHEMICAL
PHYSICO-CHEMICAL
PHYSICO-CHEMICAL
PHYSICO-CHEMICAL
PHYSICO-CHEMICAL
PHYSICO-CHEMICAL
PHYSICO-CHEMICAL
PHYSICO-CHEMICAL
PHYSICO-CHEMICAL
SENSORIAL
SENSORIAL
SENSORIAL
SENSORIAL
SENSORIAL
SENSORIAL
SENSORIAL
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC
HEDONIC

Estimate
.784
.216
.643
-.765
-.764
.890
.219
1.000
.998
-.869
-.877
-.064
.244
.935
.657
-.565
-.946
-.974
1.000
1.000
.956
.879
1.051
.975
1.045
.758
.747
1.063
.896
1.060
.937
.593
1.069
.747
.855
.575
.975
1.120
.809
1.058
.702
.821

Lower (90%)
.642
-.522
.216
-1.228
-1.224
.438
-.699
1.000
.926
-1.221
-1.225
-.703
-.503
.676
-.154
-1.416
-1.483
-1.154
1.000
1.000
.471
.178
.490
.482
.562
.000
.000
.639
.397
.546
.485
-.236
.669
.000
.000
-.222
.482
.594
.026
.541
-.075
.225

Upper (90%)
1.149
.672
1.653
-.292
-.287
1.353
.894
1.000
1.057
-.371
-.402
1.044
1.028
1.278
1.057
.012
-.719
-.774
1.000
1.000
1.703
1.803
1.984
1.480
2.342
2.226
1.173
1.483
1.264
2.015
1.354
1.418
1.989
1.173
2.165
1.927
1.480
1.994
1.675
1.772
1.900
1.562

P
.010
.412
.064
.010
.010
.010
.799
...
.010
.020
.020
.818
.737
.010
.169
.112
.010
.010
...
...
.010
.018
.010
.010
.010
.070
.061
.010
.018
.010
.018
.286
.010
.061
.068
.316
.010
.020
.050
.010
.180
.040

Comments:

1) Confidence intervals are computed by bootstrapping on 200 bootstrap samples.


2) The percentile 5% of the empirical distribution of the 200 bootstrap estimates is given in column
Lower and the 95% one in column Upper.
3) P is the p-value in the following sense: the smallest interval containing the value 0 is obtained by
using a confidence interval of order 1-P.
4) The regression weights for judges 59, 77 and 92 are not significant. They appear to be the least
correlated to the first principal component of the hedonic data as it can be seen in Figure 3.

10

Tableau 2.2: Variances


Parameter
PHYSICO-CHEMICAL
d1
d2

Estimate
.921
.298
.034

Lower
.429
.028
.000

Upper
1.120
.364
.044

P
.020
.010
.177

Tableau 2.3: Squared Multiple Correlations


Parameter
SENSORIAL
HEDONIC

Estimate
.655
.946

Lower
.529
.919

Upper
.951
1.000

P
.010
.020

Tableau 2.4: Model Fit Summary


Model
Default model

NPAR
42

CMIN
105.613

Model
Default model

RMR
.175

GFI
.904

AGFI
.898

Comment: The GFI is equal to .904 and suggests that the model is acceptable.

Figure 3: Loading plot for the PCA of judges

11

PGFI
.855

The main objective of component-based SEM is the construction of scores. Following the
McDonald approach, we use the path coefficients given in Figure 2 and in Table 2.1. We obtain the
following constructs:
For the Physico-chemical block
Score(Physico-Chemical) -.765*Glucose -.764*Fructose +.890*Saccharose
+.219*(Sweetening power) + 1*(pH before centrifugation) + .998*(pH after
centrifugation) -.869*Titer -.877*(Citric acid) -.064*(Vitamin C)

where all the variables (score and manifest variables) are standardized.
For the Sensorial block
Score(Sensorial) .244*(Smell intensity) + .935*(Odor typicity) +
.657*Pulp -.565*(Taste intensity) - .946*Acidity -.974*Bitterness +
1*Sweetness

with the same standardization than for the previous block.


For the Hedonic block
Score(Hedonic) = 1*Judge2 + .956*Judge3 + + .821*Judge96

with the same standardization than for the previous blocks.


The latent variable scores are given in Table 2.5 and their correlations in Table 2.6. The correlations
between these scores and the manifest variables are given in Table 2.7.
Tableau 2.5: ULS-SEM latent variable scores

Pampryl r.t.
Tropicana r.t.
Fruivita refr.
Joker r.t.
Tropicana refr.
Pampryl refr.

Physico-chemical
-0.72
1.05
0.81
-1.54
0.56
-0.16

Sensorial
-1.26
0.43
0.87
-0.77
1.27
-0.53

Hedonic
-1.10
0.66
1.17
-0.84
0.85
-0.74

Tableau 2.6: ULS-SEM latent variable score correlation matrix


Physico-chemical

Sensorial

Hedonic

Physico-chemical

1.000

.810

.867

Sensorial

.810

1.000

.961

Hedonic

.867

.961

1.000

12

Tableau 2.7: Correlations between the ULS-SEM LV scores and the MVs

Glucose
Fructose
Saccharose
Sweetening power
pH before centrifugation
pH after centrifugation
Titer
Citric acid
Vitamin C
Smell intensity
Odor typicity
Pulp
Taste intensity
Acidity
Bitterness
Sweetness
Judge2
Judge3
Judge6
Judge11
Judge12
Judge25
Judge30
Judge31
Judge35
Judge48
Judge52
Judge55
Judge59
Judge60
Judge63
Judge68
Judge77
Judge79
Judge84
Judge86
Judge91
Judge92
Judge96

Physico-chemical
-0.898
-0.898
0.926
0.078
0.950
0.939
-0.973
-0.977
-0.195
0.229
0.806
0.558
-0.401
-0.745
-0.775
0.871
0.640
0.647
0.656
0.872
0.718
0.971
0.742
0.343
0.771
0.460
0.791
0.504
0.534
0.870
0.343
0.909
0.734
0.718
0.953
0.453
0.827
0.724
0.554

Sensorial
-0.585
-0.575
0.755
0.288
0.896
0.904
-0.735
-0.740
-0.040
0.410
0.976
0.704
-0.646
-0.927
-0.951
0.967
0.928
0.756
0.662
0.785
0.929
0.817
0.518
0.693
0.936
0.837
0.840
0.878
0.592
0.854
0.693
0.670
0.473
0.929
0.934
0.685
0.845
0.419
0.679

Hedonic
-0.673
-0.673
0.817
0.242
0.947
0.946
-0.765
-0.774
-0.001
0.174
0.893
0.625
-0.552
-0.950
-0.976
0.979
0.887
0.877
0.794
0.919
0.823
0.864
0.637
0.712
0.926
0.834
0.944
0.863
0.458
0.924
0.712
0.666
0.396
0.823
0.941
0.762
0.927
0.595
0.744

The estimate of the hedonic score, shown in Table 2.5, enables us to classify the products by order of
preference:
Fruivita refr. > Tropicana refr. > Tropicana r.t. > Pampryl refr. > Joker r.t. > Pampryl r.t.
Using the significant regression weights of Table 2.1 and the correlations given in Table 2.7, we may
conclude that the physico-chemical score is correlated negatively with the fructose, glucose, titer and
citric acid characteristics and positively with the saccharose, pH before and after centrifugation

13

characteristics. The sensorial score is correlated positively with odor typicity and sweetness and
negatively with acidity and bitterness.
The hedonic score related to the homogenous group of judges is correlated positively with the
physico-chemical (.867) and sensorial scores (.961). Consequently, this group of judges likes
products with odor typicity and sweetness (Fruivita refr., Tropicana r.t., Tropicana refr.) and rejects
products with an acidic and bitter nature (Joker r.t., Pampryl refr., Pampryl r.t.). This result is
verified in Table 3.
Table 3: Sensorial characteristics of the products ranked according to the hedonic score
odor
hedonic
Product
sweeteness typicity acidity bitterness
score
_____________________________________________________________________
Fruivita refr.
3.4
2.88
2.42
1.76
1.17
Tropicana refr.
3.3
3.02
2.33
1.97
0.85
Tropicana r.t.
3.3
2.82
2.55
2.08
0.66
--------------------------------------------------------------------Pampryl refr.
2.9
2.73
3.31
2.63
-0.74
Joker r.t.
2.8
2.59
3.05
2.56
-0.84
Pampryl r.t.
2.6
2.53
3.15
2.97
-1.10

2.

Use of PLS Path modeling

For estimating the parameters of the model, we have used the module XLSTAT-PLSPM of the
XLSTAT software (XLSTAT, 2007). The variables have all been standardized. To calculate the
inner estimates of the latent variables, we have used the centroid scheme recommended by Herman
Wold (1985).
Table 4 contains the output of this modelling of the orange juice data with comments. Figure 4
includes the regression coefficients between the latent variables of the model shown in Figure 1 and
the correlation coefficients between the manifest and latent variables.
Coefficient validation

Although it gives robust and stable results with the various methods used on these orange juice data
(the same items appear to be significant in Tenenhaus, Pags, Ambroisine, Guinot (2005) and in the
present paper), we may think that bootstrap validation carried out on only 6 cases cannot be very
reliable. One reason is the following: In this example, data structure comes from the opposition
between the two groups {Fruivita refr., Tropicana r.t., Tropicana refr.} on one side and {Pampryl
refr., Joker r.t., Pampryl r.t.} on the other side. If one of these groups of products is not selected in
the bootstrap sampling selection, then the correlations between the latent variables disappear.
Maybe, non representative samples should be eliminated.
Bootstrap has been based on 200 samples and 90% confidence intervals have been asked for.
Results of bootstrap validation for the inner model are shown in Table 3.7. The confidence intervals
indicate the regression coefficients which are significant. We can also look to the usual Student t
related to the regression coefficients. By convention, a coefficient is significant if the absolute value
of t is larger than 2. In this specific example, both methods give the same results. The relationship
between the hedonic data and the physico-chemical data is not significant (t = 1.522), while that
between the hedonic data and the sensorial data is (t = 3.546). There is also a significant connection
between the physico-chemical and the sensorial data (t = 2.864).

14

Figure 4: XLSTAT-PLSPM software output for the orange juice data

15

However, the strong correlation between the hedonic data and the physico-chemical data suggests
that a PLS regression of the hedonic score should be carried out on the physico-chemical and
sensorial scores. This PLS regression (with one component) leads to the following equation:
Hedonic score = 0.49*(Physico-Chemical score) + 0.53*(Sensorial score)
2

with an R = 0.948 to be compared with R2 = 0.960 in the model shown in Figure 4.


Bootstrap validation for PLS regression yields to the same significant regression coefficients as for
OLS regression (see Table 4). If the PLS regression is validated by Jack-knife on the observed latent
variables, both coefficients are now significant (Table 4 and Figure 5).
Figure 5: XLSTAT-PLSPM software output: Validation of the PLS regression of the hedonic score
on the physico-chemical and sensorial scores
Hedonic / Standardized coefficients
(95% conf. interval)
0.7

Standardized coefficients

0.6
Senso rial
0.5

P hysico -chemical

0.4

0.3

0.2

0.1

Variable

Table 3: XLSTAT-PLSPM outputs for the orange juice example


Table 3.1: Block dimensionality
Latent variable
Physico-chemical

Dimensions
9

Critical value
1.800

Sensorial

1.400

Hedonic

23

4.600

Eigenvalues
6.213
1.410
1.046
0.317
0.013
4.744
1.333
0.820
0.084
0.019
14.655
3.663
2.199
1.837
0.646

Comment: The critical value is equal to the average eigenvalue. In this example the number of
eigenvalues is equal to 5 as the number of observations (6) is smaller than the number of variables
and because the variables are centered. Each block can be considered as unidimensional.

16

Table 3.2: Checking block dimensionality (larger correlation per row is in bold)
Variables/Factors correlations (Physico-chemical):

Glucose
Fructose
Saccharose
Sweetening power
pH before centrifugation
pH after centrifugation
Titer
Citric acid.

F1
0.914
0.913
-0.912
-0.035
-0.945
-0.933
0.974
0.978

F2
0.388
0.378
0.261
0.947
0.019
0.071
-0.144
-0.136

Vitamin C

0.212

-0.328

F3
-0.057
-0.083
0.286
0.319
-0.026
-0.069
0.070
0.049
0.916

F4
0.109
0.121
-0.127
-0.020
0.325
0.346
0.150
0.144

F5
-0.013
-0.050
0.049
0.017
0.006
-0.006
0.062
0.052

0.080

-0.035

Variables/Factors correlations (Sensorial):

Smell intensity
Odor typicity
Pulp
Taste intensity
Acidity
Bitterness
Sweetness

F1

F2

F3

F4

F5

0.460
0.985
0.722
-0.650
-0.913
-0.935
0.955

0.754
0.134
0.617
0.429
0.348
0.188
-0.159

-0.468
-0.058
0.298
0.626
-0.021
-0.285
0.187

0.008
0.077
-0.096
0.005
0.205
-0.028
0.161

0.004
0.041
-0.031
0.048
-0.057
0.093
0.048

Variables/Factors correlations (Hedonic):

judge2
judge3
judge6
judge11
judge12
judge25
judge30
judge31
judge35
judge48
judge52
judge55
judge59
judge60
judge63
judge68
judge77
judge79
judge84
judge86
judge91
judge92
judge96

F1

F2

F3

F4

F5

0.894
0.890
0.798
0.919
0.814
0.849
0.625
0.733
0.925
0.852
0.948
0.878
0.438
0.922
0.733
0.638
0.363
0.814
0.928
0.778
0.927
0.585
0.755

-0.218
-0.318
-0.039
0.051
0.221
0.422
0.399
-0.565
0.035
-0.474
-0.049
-0.398
0.475
0.051
-0.565
0.763
0.862
0.221
0.352
-0.410
0.043
0.348

0.307
-0.310
-0.166
-0.177
0.213
-0.177
-0.631
0.321
0.329
0.207
-0.286
0.131
0.740
-0.149
0.321
-0.063
0.235
0.213
0.115
-0.372
0.069
-0.282

-0.203
-0.018
0.522
0.278
-0.429
-0.166
-0.058
0.179
0.179
-0.039
0.068
-0.164
0.176
-0.154
0.179
-0.072
-0.169
-0.429
0.032
-0.229
0.364
0.676

0.132
0.104
-0.247
0.212
-0.243
0.205
-0.221
0.090
-0.060
-0.074
-0.115
-0.166
-0.062
0.317
0.090
-0.041
0.203
-0.243
0.012
-0.187
0.043
0.000

-0.285

-0.331

-0.430

0.231

17

Table 3.3: Model validation


Goodness of fit index:
GoF

GoF (Bootstrap)

Standard error

Critical ratio (CR)

Absolute
Relative
Outer model

0.731
0.823
0.911

0.732
0.801
0.852

0.049
0.048
0.039

14.943
17.146
23.286

Inner model

0.903

0.940

0.048

18.815

Lower bound
(90%)

Upper bound
(90%)

Minimum

1st
Quartile

Median

3rd
Quartile

Maximum

Absolute
Relative
Outer model

0.645
0.707
0.784

0.799
0.847
0.893

0.522
0.525
0.707

0.711
0.790
0.816

0.738
0.811
0.863

0.762
0.824
0.865

0.821
0.855
0.911

Inner model

0.885

0.999

0.669

0.921

0.941

0.966

1.000

Comment: Number of bootstrap samples = 200. Level of the confidence intervals: 90%

Absolute Goodness-of-Fit
1
1
Cor 2 ( x jh , j )
R 2 ( j ; i explaining j )

Nb of endogenous LV Endogenous VL
J j h

GoF =

where J = p j
j

Relative Goodness-of-Fit

pj

Cor 2 ( x jh , j )

R 2 ( j ; i explaining j )
1
1
h =1

J j
Nb of endogenous LV endogenous LV
j
j2


Outer model

where:
-

Inner model

j is the first eigenvalue computed from the PCA of block j


j is the first canonical correlation between the dependent block j and the
concatenation of all the blocks i explaining the dependent block j.

18

Table 3.4: Latent Variable validation


Cross-loadings (Monofactorial manifest variables):

Glucose
Fructose
Saccharose
Sweetening power
pH before centrifugation
pH after centrifugation
Titer
Citric acid.
Vitamin C
Smell intensity
Odor typicity
Pulp
Taste intensity
Acidity
Bitterness
Sweetness
judge2
judge3
judge6
judge11
judge12
judge25
judge30
judge31
judge35
judge48
judge52
judge55
judge59
judge60
judge63
judge68
judge77
judge79
judge84
judge86
judge91
judge92
judge96

Comment:
-

Physicochemical
-0.889
-0.889
0.931
0.099
0.952
0.942
-0.972
-0.977
-0.194
0.236
0.814
0.574
-0.397
-0.751
-0.784
0.877
0.646
0.654
0.665
0.873
0.729
0.972
0.750
0.349
0.777
0.470
0.801
0.517
0.533
0.872
0.349
0.910
0.727
0.729
0.957
0.467
0.831
0.724
0.559

Sensorial
-0.584
-0.574
0.758
0.294
0.896
0.905
-0.738
-0.743
-0.045
0.411
0.977
0.709
-0.639
-0.925
-0.952
0.968
0.925
0.755
0.667
0.785
0.930
0.817
0.524
0.690
0.936
0.835
0.843
0.876
0.593
0.853
0.690
0.673
0.474
0.930
0.935
0.685
0.846
0.424
0.677

Hedonic
-0.689
-0.689
0.832
0.242
0.955
0.954
-0.789
-0.798
-0.023
0.199
0.904
0.637
-0.549
-0.942
-0.972
0.982
0.880
0.860
0.787
0.916
0.834
0.879
0.648
0.689
0.926
0.815
0.938
0.847
0.479
0.924
0.689
0.695
0.432
0.834
0.953
0.742
0.925
0.602
0.731

Sweetening power and Vitamin C are not correlated to their block.


Smell intensity is not correlated to its own block
Judges 59 and 77 are weakly correlated to their block.

19

Table 3.5: Latent Variable weights (non significant weights are in bold)

Latent variable

Physico-chemical

Sensorial

Hedonic

Comment:
-

Manifest variables

Outer
weight

Outer
weight
(Bootstrap)

Standard
error

Critical
ratio (CR)

Lower
bound
(90%)

Upper
bound
(90%)

Glucose
Fructose
Saccharose
Sweetening power
pH before centrifugation
pH after centrifugation
Titer
Citric acid.
Vitamin C
Smell intensity
Odor typicity
Pulp
Taste intensity
Acidity
Bitterness
Sweetness
judge2
judge3
judge6
judge11
judge12
judge25
judge30
judge31
judge35
judge48
judge52
judge55
judge59
judge60
judge63
judge68
judge77
judge79
judge84
judge86
judge91
judge92

-0.124
-0.123
0.154
0.052
0.180
0.180
-0.148
-0.150
-0.007
0.052
0.206
0.145
-0.113
-0.203
-0.210
0.223
0.059
0.053
0.050
0.062
0.062
0.067
0.048
0.039
0.064
0.049
0.062
0.052
0.042
0.065
0.039
0.059
0.045
0.062
0.071
0.043
0.063
0.043

-0.113
-0.111
0.140
0.038
0.159
0.159
-0.137
-0.139
-0.010
0.033
0.190
0.114
-0.107
-0.190
-0.197
0.208
0.056
0.051
0.047
0.059
0.058
0.063
0.040
0.036
0.062
0.047
0.061
0.049
0.036
0.060
0.036
0.051
0.037
0.058
0.066
0.039
0.059
0.038

0.050
0.049
0.041
0.093
0.021
0.020
0.017
0.016
0.085
0.099
0.039
0.082
0.080
0.045
0.034
0.036
0.012
0.015
0.018
0.019
0.013
0.013
0.020
0.023
0.012
0.016
0.012
0.014
0.026
0.016
0.023
0.018
0.027
0.013
0.013
0.020
0.018
0.028

-2.491
-2.498
3.720
0.560
8.460
9.073
-8.808
-9.127
-0.077
0.527
5.255
1.759
-1.406
-4.472
-6.136
6.148
4.740
3.435
2.749
3.268
4.605
5.247
2.364
1.695
5.530
3.047
5.320
3.717
1.642
4.065
1.695
3.305
1.649
4.605
5.420
2.168
3.499
1.554

-0.147
-0.144
0.096
-0.115
0.121
0.121
-0.169
-0.171
-0.126
-0.136
0.143
-0.065
-0.196
-0.227
-0.240
0.143
0.043
0.034
0.023
0.048
0.037
0.051
0.000
0.000
0.050
0.020
0.049
0.029
-0.021
0.047
0.000
0.000
-0.021
0.037
0.055
-0.007
0.050
-0.007

-0.057
-0.057
0.180
0.151
0.184
0.185
-0.111
-0.113
0.130
0.166
0.241
0.207
0.069
-0.143
-0.143
0.267
0.064
0.065
0.071
0.080
0.083
0.078
0.065
0.066
0.075
0.064
0.082
0.061
0.066
0.074
0.066
0.072
0.065
0.083
0.084
0.057
0.074
0.081

judge96

0.046

0.044

0.020

2.345

0.013

0.066

Sweetening power and Vitamin C are not significant in block physico-chemical


Smell intensity, Pulp and Taste intensity are not significant in block sensorial
Judges 59, 77, 86 and 92 are not significant in block hedonic.

20

Table 3.6: Correlations between MV and LV


Correlations:

Latent variable

Physico-chemical

Sensorial

Hedonic

Comments:
-

Manifest variables

Standardized
loadings

Communalities

Redundancies

Standardized
loadings
(Bootstrap)

Standard
error

Glucose
Fructose
Saccharose
Sweetening power
pH before
centrifugation
pH after centrifugation
Titer
Citric acid.
Vitamin C
Smell intensity
Odor typicity
Pulp
Taste intensity
Acidity
Bitterness
Sweetness
judge2
judge3
judge6
judge11
judge12
judge25
judge30
judge31
judge35
judge48
judge52
judge55
judge59
judge60
judge63
judge68
judge77
judge79
judge84
judge86
judge91
judge92

-0.889
-0.889
0.931
0.099

0.790
0.790
0.867
0.010

-0.850
-0.847
0.876
0.101

0.226
0.227
0.268
0.591

0.952
0.942
-0.972
-0.977
-0.194
0.411
0.977
0.709
-0.639
-0.925
-0.952
0.968
0.880
0.860
0.787
0.916
0.834
0.879
0.648
0.689
0.926
0.815
0.938
0.847
0.479
0.924
0.689
0.695
0.432
0.834
0.953
0.742
0.925
0.602

0.906
0.887
0.946
0.954
0.038
0.169
0.954
0.503
0.408
0.856
0.907
0.936
0.774
0.740
0.619
0.840
0.695
0.773
0.419
0.475
0.858
0.664
0.879
0.717
0.230
0.853
0.475
0.483
0.186
0.695
0.909
0.551
0.856
0.363

0.113
0.641
0.338
0.274
0.575
0.609
0.629
0.743
0.710
0.594
0.806
0.667
0.742
0.403
0.456
0.824
0.638
0.844
0.688
0.221
0.820
0.456
0.464
0.179
0.667
0.873
0.529
0.822
0.348

0.968
0.964
-0.950
-0.956
-0.203
0.285
0.940
0.612
-0.589
-0.915
-0.949
0.967
0.859
0.828
0.736
0.885
0.852
0.896
0.570
0.619
0.923
0.785
0.936
0.809
0.447
0.893
0.619
0.685
0.426
0.852
0.943
0.670
0.895
0.531

0.078
0.073
0.066
0.063
0.435
0.497
0.110
0.382
0.404
0.206
0.087
0.069
0.182
0.236
0.254
0.230
0.115
0.159
0.284
0.340
0.147
0.233
0.130
0.215
0.406
0.214
0.340
0.258
0.404
0.115
0.137
0.308
0.215
0.411

judge96

0.731

0.535

0.514

0.709

0.281

Standardized loading = correlation


Communality = squared correlation
Redundancy = Communality*R2(Dep. LV; Explanatory related LVs)

21

Table 3.6: Correlations between MV and LV (continued)


Correlations:

Latent variable

Manifest variables

Glucose
Fructose
Saccharose
Sweetening power
Physico-chemical pH before centrifugation
pH after centrifugation
Titer
Citric acid.
Vitamin C
Smell intensity
Odor typicity
Pulp
Sensorial
Taste intensity
Acidity
Bitterness
Sweetness
judge2
judge3
judge6
judge11
judge12
judge25
judge30
judge31
judge35
judge48
judge52
Hedonic
judge55
judge59
judge60
judge63
judge68
judge77
judge79
judge84
judge86
judge91
judge92
judge96

Critical ratio
(CR)

Lower bound
(90%)

Upper bound
(90%)

-3.938
-3.913
3.476
0.167
12.199
12.966
-14.729
-15.413
-0.445
0.827
8.912
1.856
-1.580
-4.495
-10.967
13.953
4.832
3.642
3.102
3.993
7.258
5.541
2.277
2.030
6.293
3.493
7.234
3.942
1.179
4.315
2.030
2.694
1.069
7.258
6.934
2.411
4.301
1.465

-0.995
-0.998
0.729
-0.860
0.925
0.899
-1.000
-1.000
-0.970
-0.684
0.763
-0.174
-0.998
-1.000
-1.000
0.940
0.639
0.469
0.347
0.773
0.648
0.742
0.000
0.000
0.825
0.408
0.858
0.425
-0.426
0.716
0.000
0.000
-0.426
0.648
0.911
-0.093
0.783
-0.192

-0.569
-0.589
0.996
0.950
1.000
1.000
-0.860
-0.872
0.656
0.930
0.999
0.998
0.203
-0.752
-0.897
1.000
0.981
0.999
0.982
0.994
0.999
0.997
0.936
0.991
0.997
0.997
0.998
0.996
0.948
0.997
0.991
0.997
0.896
0.999
0.998
0.992
0.986
0.982

2.602

0.173

0.997

Comment: (identical with those for weights)


Sweetening power and Vitamin C are not significant in block physico-chemical
Smell intensity, Pulp and Taste intensity are not significant in block sensorial
Judges 59, 77, 86 and 92 are not significant in block hedonic.

22

Table 3.7: Inner model


R (Sensorial):

R
0.672

R(Bootstrap)
0.791

Standard
error
0.157

Critical ratio (CR)


4.276

Lower bound
(90%)
0.588

Upper bound (90%)


0.999

Path coefficients (Sensorial):


Latent variable
Physico-chimique

Latent variable

Value

Standard error

Pr > |t|

Value(Bootstrap)

0.820

0.286

2.864

0.046

0.835

Standard
error(Bootstrap)

Critical ratio
(CR)

Lower bound
(90%)

Upper bound (90%)

0.308

2.660

0.757

0.994

Physico-chemical

Comment:

The usual Student t test and the bootstrap approach give here the same results.

R (Hedonic):

R
0.960

R(Bootstrap)
0.986

Standard
error
0.017

Critical ratio (CR) Lower bound (90%) Upper bound (90%)


58.017
0.947
1.000

Path coefficients (Hedonic):


Latent variable

Value

Standard error

Pr > |t|

Physico-chemical

0.306

0.201

1.522

0.225

Sensorial

0.713

0.201

3.546

0.038

Path coefficients (Hedonic):

Latent variable
Physico-chemical
Sensorial

Value(Bootstrap)

Standard
error(Bootstrap)

Critical ratio
(CR)

Lower bound
(90%)

Upper bound
(90%)

0.331
0.651

0.698
0.674

0.438
1.058

-0.642
0.000

1.000
1.397

Comment:

The usual Student t test and the bootstrap approach give here the same results. But the nonsignificance of the physico-chemical can also be due to a multicolinearity problem. PLS regression
for estimating the structural regression equations can be used and is presented in Table 4.

23

Table 3.8: Impact and contribution of the variables to Hedonic


Impact and contribution of the variables to Hedonic:
Sensorial
0.964
0.713
0.688
71.612
71.612

Correlation
Path coefficient
Correlation * path coefficient
Contribution to R (%)
Cumulative %

Physico-chemical
0.891
0.306
0.273
28.388
100.000

Impact and contribution of the variables to Hedonic


0.8

100

80

Path coefficients

0.6
0.5

60

0.4
40

0.3
0.2

20

Contribution to R (%)

0.7

0.1
0

0
Senso rial

P hysico -chemical

Latent variable
Path coefficient

Cumulative %

Comment:

R 2 (Y ; X 1 ,..., X k ) = Cor (Y , X j ) * j
When all the terms Cor (Y , X ) * are positive, it makes sense to compute the
j

relative contribution of each explanatory variable Xj to the R square.

24

Table 3.9: Model assessment

Latent variable

Type

Physico-chemical
Sensorial
Hedonic

Exogenous
Endogenous
Endogenous

Mean

Adjusted
R

0.000
0.000
0.000

0.672
0.960

0.672
0.950

Mean

Weighted Mean
Communalities
(AVE)

Mean Redundancies

0.687
0.676
0.634

0.454
0.609

0.654

0.532

0.816

Comments:

The weighted mean takes into account the number of MVs in each block.
(Absolute GoF)2 = (Mean R2)*(Weighted mean communalities)

Table 3.10: Correlation between the latent variables


Correlations (Latent variable):
Physico-chemical

Sensorial

Hedonic

Physico-chemical
Sensorial

1.000
0.820

0.820
1.000

0.891
0.964

Hedonic

0.891

0.964

1.000

Table 3.11: Direct, indirect and total effects


Direct effects (Latent variable):
Physico-chemical
Physico-chemical
Sensorial

0.820

Hedonic

0.306

Sensorial

Hedonic

0.713

Comment :

Sensorial = .820*Physico-chemical
Hedonic = .306*Physico-chemical + .713*Sensorial

25

Table 3.11: Direct, indirect and total effects (continued)


Indirect effects (Latent variable):
Physico-chemical
Physico-chemical
Sensorial

0.000

Hedonic

0.585

Sensorial

Hedonic

0.000

Comment :

Hedonic = .306*Physico-chemical + .713*.820*Physico-chemical


Indirect effect of Physico-chemical on Hedonic = .713*.820 = 0.585

Total effects (Latent variable):

Physico-chemical
Sensorial
Hedonic

Physico-chemical

Sensorial

0.820
0.891

0.713

Hedonic

Comment :

Hedonic = .306*Physico-chemical + .713*.820*Physico-chemical


= .891*Physico-chemical

Table 3.12: Discriminant validity


Discriminant validity (Squared correlations < AVE) :

Physico-chemical
Sensorial
Hedonic
Mean Communalities (AVE)

Physico-chemical
1
0.672
0.793

Sensorial
0.672
1
0.930

Hedonic
0.793
0.930
1

0.687

0.676

0.634

Comment: Due to non significant MVs, the AVE criterion is too small for the three LVs.

26

Table 3.13: Latent variable score


Summary statistics / Latent variable scores:

Observations

Minimum

Maximum

Mean

Std.
Deviation

Physico-chemical
Sensorial

6
6

-1.680
-1.381

1.120
1.378

0.000
0.000

1.000
1.000

Hedonic

-1.203

1.253

0.000

1.000

Variable

Latent variable scores :


Physico-chemical

Sensorial

Hedonic

-0.810
1.120
0.917
-1.680
0.630
-0.176

-1.381
0.462
0.964
-0.852
1.378
-0.570

-1.203
0.742
1.253
-0.991
0.946
-0.747

pampryl r. t.
tropicana r. t.
fruvita refr.
joker r. t.
tropicana refr.
pampryl refr.

Table 4: PLS regression of Hedonic score on Physico-chemical and sensorial scores


Goodness of fit statistics (Variable Hedonic):
R

0.948

Bootstrap validation
Path coefficients (Hedonic):

Latent variable

Value

Value
(Bootstrap)

Standard error
(Bootstrap)

Critical ratio
(CR)

Lower bound
(90%)

Upper bound
(90%)

Physico-chemical
Sensorial

0.490
0.531

0.267
0.744

0.408
0.402

1.201
1.320

-0.422
0.103

0.893
1.397

Jack-knife validation on the observed latent variables


Standardized coefficients (Variable Hedonic):

Variable
Physico-chemical
Sensorial

Coefficient
0.490

Std.
deviation
0.021

Lower bound
(95%)
0.449

Upper bound
(95%)
0.531

0.531

0.022

0.488

0.573

27

3.

Comparison between PLS, ULS-SEM and PCA

Comparison between weights

When we compare the weight confidence intervals computed with PLS (Table 3.5) with those
coming from ULS-SEM (Table 2.1), we find that both methods yield to the same non significant
weights with only one exception for Judge 86 (non significant for PLS and significant for ULSSEM). These weights are compared in Figure 6.
Figure 6: Comparison between the PLS and ULS-SEM weights

28

Comparison between PLS and ULS-SEM scores

The scores coming from PLS and ULS-SEM are compared in Figure 7. They are highly correlated.
This confirms our previous findings and a general remark of Noonan and Wold (1982) on the fact
that the final outer LV estimates depend very little on the selected scheme of calculation of the inner
LV estimates.
Figure 7: Comparison between the PLS and ULS-SEM scores

29

Comparison between the PLS and ULS-SEM scores and the block principal components

The correlations between the PLS and USL-SEM scores with the block principal components are
given in Table 5.
Table 5: Correlation between the PLS and ULS-SEM scores and the block principal components

ULS-SEM scores
.999
.998
.999

st

Physico-chemical 1 PC
Sensorial 1st PC
Hedonic 1st PC

PLS scores
.997
.998
.997

We may conclude that ULS-SEM, PLS and principal component analysis are giving practically the
same scores on this orange juice example.
II.

Exploratory factor analysis, ULS-SEM and PCA

If the structural model is limited to one standardized latent variable (or common factor) described
by a vector x composed of p centred manifest variables, one gets the decomposition
(20)

x = x +

It is usual to add the following hypotheses:

(21)

E () = 0,
= Cov() = E ( ') is diagonal
Cov( , ) = 0

Under these hypotheses, the covariance matrix of the random vector x is written as
(22)

= E (xx ') = x 'x +

The parameters x and in model (22) can now be estimated using the ULS method. This means
, minimizing the criterion
searching for the parameters x and

(23)

)
S ( x 'x +

where S is the matrix of empirical covariances. To remove the indetermination on the global sign of
the vector x (if x is a solution, then - x is also a solution), the solution can be chosen to make the
sum of the coordinates positive. This is the option chosen in AMOS 6.0.
The advantage of the ULS method over the other more frequently used GLS (Generalized Least
Squares) or ML (Maximum Likelihood) methods lies in its ability to function with a singular
covariance matrix S, particularly in situations where the number of observations is less than the
number of variables.

30

The quality of the fit is measured by the GFI written here as

GFI = 1

(24)

)
S ( x 'x +

Principal component analysis (PCA) is found again if one imposes the additional condition
=0

(25)

In this case, one seeks to minimise the criterion


S x 'x

(26)

The vector x is now equal to

1 u1 , where u1 is the normed eigenvector of the covariance matrix

S associated with the largest eigenvalue 1.


For each MV xj, the explained variance (or communality) is therefore jj = 1u12j . The residual
variance (or specificity) j is then estimated by j = s jj 1u12j .
The quality of the fit can still be measured by the GFI:
2
S - x 'x ( s jj jj )
2

GFI = 1

(27)

j
2

The square of the norm of S is equal to the sum of the squared eigenvalues h of S. In PCA,
2
S - ' is equal to the sum of the squares of the p-1 last eigenvalues of S. Consequently, in PCA
x

one obtains

(28)

GFI =

12 + ( s jj 1u12j )

h =1

2
h

Moreover, SEM softwares allow the computation of confidence intervals for parameters by
bootstrapping. They also allow criterion (26) to be minimised by imposing value constraints or
equality constraints on the coordinates of the vector x . We can continue to use criterion (27) to
measure the quality of the model.

31

Link between ULS-SEM, Factor Analysis, PLS and Principal Component Analysis

A central point in PLS path modelling concerns the relation between the MVs related to one LV and
this LV.
Reflective mode
The reflective mode is common to PLS and SEM. In this mode, each MV is related to its LV by a
simple regression:
(29)

x j = j + j

This model corresponds to the usual one-dimension factor analysis (FA) model. Minimization of
criterion (23) allows the estimation of the parameters of this model. As the diagonal terms of the
) are automatically null, the path coefficient j are computed with the
residual matrix S ( x 'x +

objective of reconstruction of the covariance matrix terms outside the diagonal. The average
variance extracted (AVE), defined by jj / s jj , measures the summary power of the LV. It is
j

not the first objective in this approach. It is an a posteriori value of the model.
In a one block of variables situation, it is natural to estimate the LV using the first principal
component of the MVs. The minimization of criterion (26) yields to this solution. Furthermore, the
diagonal terms of the residual S x 'x are now taken into account in the minimization. The path
coefficients j are now computed with the objective of reconstruction of the whole covariance matrix
terms, diagonal included. The AVE still measures the summary power of the LV. But it is now a
part of the objective in this approach. Consequently, in the ULS-SEM context, PCA can be obtained
by considering the FA model (22) and then by cancelling in a first step the residual measurement
variances.
In PLS path modelling softwares, the one block situation has been implemented. In this situation,
the outer estimate of the block LV is also taken as the inner estimate. Therefore, Mode A yields to
the following equation:
(30)

Cov( x j , ) x j
j

The PLS algorithm will converge to the first principal component of the block of MVs, solution of
equation (30).
Formative mode

The formative mode is easy to implement in PLS. In this mode, each LV is related to its MVs by a
multiple regression:
(31)

= jxj +
j

But in a one block situation, it is an indeterminate problem.

32

Conclusion

The residual sum of squares (RESS), defined by

(s

ij

i< j

ij ) , is smaller for FA than for PCA. On


2

the other hand the AVE is larger for PCA than for FA.
Example 2

We use data on the cubic capacity, power output, speed, weight, width and length of 24 car models
in production in 2004 given in Tenenhaus (2007). We compare on these data FA and PCA with
respect to the RESS and AVE criterions. The analyses are carried out on standardized variables. The
correlation matrix is given in Table 6.
Table 6: Car example: Correlation matrix
Capacity
Power
Speed
Weight
Width
Length

Capacity Power
1
0.954
1

Speed Weight Width Length


0.885 0.692 0.706 0.664
0.934 0.529 0.730 0.527
1
0.466 0.619 0.578
1
0.477 0.795
1
0.591
1

The path models for one-dimension FA and PCA are given in Figure 8. The common factor is
denominated as F1. The implied covariance matrices and the residual matrices produced by AMOS
are given in Table 7.
FA

PCA
.02
1

Capacity

.00
1

Capacity

e1

e1
.00

.14
1

Power

.99

F1

Speed

.87

e3
.53

Weight

F1

e4

e2
.00

.92
1.00

.68

Power

.96

.25

.93
1.00

e2

Speed

.89

e3
.00

.76

Weight

e4

.80

.74

.00

.45
.73

Width

.80

e5

Width

Length

e5
.00

.47
1

Length

e6

Figure 8 : Path model for FA and PCA

33

e6

Table 7: Car example: Implied covariance matrices and residuals produced by AMOS

FA

Capacity
Power
Implied
Speed
correlations
Weight
Width
Length

Residuals

PCA

Capacity
Power
Speed
Weight
Width
Length

Capacity
Power
Implied
Speed
correlations
Weight
Width
Length

Residuals

Capacity
Power
Speed
Weight
Width
Length

Capacity
1

Power
.918
1

Speed
.860
.804
1

Weight
.678
.633
.593
1

Width
.737
.689
.645
.508
1

Capacity
0

Power
0.036
0

Speed
0.025
0.130
0

Weight
0.014
-0.104
-0.127
0

Width
-0.031
0.041
-0.026
-0.031
0

Capacity
.926

Power
.889
.853

Speed
.853
.818
.785

Weight
.738
.699
.671
.573

Width
.771
.740
.710
.606
.642

Capacity
0.074
0
0
0
0
0

Power
0.065
0.147
0
0
0
0

Speed
0.032
0.116
0.215
0
0
0

Weight
-0.046
-0.170
-0.205
0.427
0
0

Width
-0.065
-0.010
-0.091
-0.129
0.358
0

Length
.722
.674
.632
.498
.541
1
Length
-0.058
-0.147
-0.054
0.297
0.050
0
Length
.765
.734
.705
.602
.637
.632
Length
-0.101
-0.207
-0.127
0.193
-0.046
0.368

The comparison between FA and PCA results is shown in Table 8.


Table 8: Comparison between FA and PCA approaches

RESS
.169
.230

FA
PCA

AVE
.690
.735

GFI
.983
.978

For PCA, the GFI produced by AMOS has to be modified according to formula (27). The usual
PCA of standardized data results in the following eigenvalues: 4.4113, .8534, .4357, .2359, .0514
is therefore measured by the
and .0124. The quality of the approximation of S by 1u1u1' +

following value of the GFI:

(32)

GFI =

12 + ( s jj 1u12j )
j

h =1

2
h

34

19.459 + .519
= .978
20.436

We then used AMOS 6.0 to carry out a first-order PCA of these standardized data under the
hypothesis of equality of weights for the engine variables "cubic capacity, power, speed" and
similarly equality of weight for the passenger compartment variables "weight, width, length".
Figure 9 shows the results of this estimation and Table 9 the 90% bootstrap confidence intervals.
The bootstrap intervals contain values greater than 1 because the bootstrap samples no longer consist
of standardized variables.
.00
1

Capacity

e1
.00

Power

.92

e2

.92

.00
1

Speed

.92

e3

1.00
.00

F1

.78

Weight

e4

.78

.00

.78

Width

e5
.00

Length

e6

Figure 9 : PCA under constraints on the "Auto 2004" data (AMOS 6.0 output)
Table 9: PCA under constraints for the "Auto 2004" data (AMOS 6.0 output)
Estimation and bootstrap confidence interval for the coordinates of x
Parameter
Capacity - F1
Power
- F1
Speed
- F1
Weight - F1
Width
- F1
Length - F1

Estimate
.924
.924
.924
.784
.784
.784

Inf (90%)
.542
.542
.542
.555
.555
.555

Sup (90%)
1.195
1.195
1.195
1.003
1.003
1.003

The GFI for the model with constraints has the following value provided by AMOS:
(33)

GFI * = 1

S - x 'x
S

= .9505

35

Using the modified formula yields to:


GFI = GFI * +

(34)

(s

jj

x2j

= .9505 +

.509
= .975
20.436

The very slight reduction of the GFI (.975 vs .978) means that one can accept the model with
constraints.
In this example, we obtain the component as the "McDonald" estimation of the factor ,
calculated as follows:

.924(capacity* + power* + speed*) + .784(weight* + length* + width*)


where the asterisk means that the variable is standardized.

III.

Confirmatory factor analysis, ULS-SEM and analysis of multi-block data

We assume now that the random column vector x breaks down into J blocks of random vectors
x j = ( x j1 ,..., x jp j ) ' . A specific model with one standardized latent variable (and usual hypotheses) is
constructed for each block xj:
(35)

x j = j j + j , j = 1,..., J
J

This model is similar to model (4) with x = j . For each block j we have
j=1

(36)

x j = j 'j + j

and for two blocks j and k we get


(37)

x j xk = jk j 'k

where jk = Cor ( j , k ) .
Decomposition (7) thus becomes
(38)

= j j ' +

The parameters 1,, J, and in model (38) can now be estimated by using the ULS method.
minimizing the criterion
and
This means seeking the parameters 1 ,..., J ,

(39)


( ) '+
S ( j )
j

Adding constraint (25) gives a new criterion to be minimized:


(40)

( ) '
S ( j )
j

36

This results in a new factorisation of the covariance matrix allowing an estimation to be made of
both the loadings and also the correlations between the factors. The quality of the fit is still
measured by the GFI criterion.

Example 3
In this example we are going to study data about wine tasting described in detail in Pags, Asselin,
Morlat & Robichet (1987).
Description of the data

A collected of 21 red wines of Bourgueil, Chinon and Saumur appellations is described by a set of
27 taste variables divided into 4 blocks:

X1 = Smell at rest
Rest1 = smell intensity at rest, Rest2 = aromatic quality at rest, Rest3 = fruity note at rest, Rest4 =
floral note at rest, Rest5 = spicy note at rest

X2 = View
View1 = visual intensity, View2 = shading (from orange to purple), View3 = surface impression

X3 = Smell after shaking


Shaking1 = smell intensity, Shaking2 = smell quality, Shaking3 = fruity note, Shaking4 = floral
note, Shaking5 = spicy note, Shaking6 = vegetable note, Shaking7 = phenolic note, Shaking8 =
aromatic intensity in mouth, Shaking9 = aromatic persistence in mouth, Shaking10 = aromatic
quality in mouth

X4 = Tasting
Tasting1 = intensity of attack, Tasting2 = acidity, Tasting3 = astringency, Tasting4 = alcohol,
Tasting5 = balance (acidity, astringency, alcohol), Tasting6 = mellowness, Tasting7 = bitterness,
Tasting8 = ending intensity in mouth, Tasting9 = harmony
These data have already been analysed using PLS in Tenenhaus & Esposito Vinzi (2005) and in
Tenenhaus & Hanafi (2007). We present here the ULS-SEM solution on the standardized variables
with cancellation of the residual measurement variances. First of all, we present the PCA for each
separate block in Table 10.

37

Table 10: Principal component analysis of each block for the "Wine" data
Smell at rest

Smell intensity at rest


Aromatic quality at rest
Fruity note at rest
Floral note at rest
Spicy note at rest

Component
1
2
.741
.551
.915
-.144
.854
-.191
.345
-.537
.077
.933

View
1
Visual intensity
Shading (from orange to purple)
Surface impression

Component
2
.986
-.146
.983
-.163
.947
.320

Smell after shaking

Smell intensity
Smell quality
Fruity note
Floral note
Spicy note
Vegetable note
Phelonic note
Aromatic intensity in mouth
Aromatic persistence in mouth
Aromatic quality in mouth

Component
1
2
.472
.743
.881
-.180
.819
-.176
.328
-.500
.089
.746
-.635
.593
.370
.633
.895
.277
.888
.307
.882
-.372

Tasting

Intensity of attack
Acidity
Astringency
Alcohol
Balance (acidity, astringency, alcohol)
Mellowness
Bitterness
Ending intensiry in mouth
Harmony

38

Component
1
2
.937
.082
-.257
.691
.775
.427
.774
.378
.844
-.423
.901
-.380
.377
.760
.967
.117
.958
-.233

Use of ULS-SEM for the analysis of multi-block data

All the variables are standardized: S = R. The correlation matrix R is now approximated using
criterion (40), with the aid of the following factorisation formula:

1
0
R (1 ,..., 4 , ) =
0

0
2
0
0

0
0
3
0

0 1 12 13 14 1'

0 21 1 23 24 0
0 31 32 1 34 0

4 41 42 43 1 0

0
'2
0
0

0
0
3'
0

0
0

'4

11'
12 1 '2 13 1 3' 14 1 3'

21 2 1'
23 2 3' 24 2 '4
2 '2

=
31 3 1' 32 3 '2
34 3 '4
3 3'

'
'
'
4 '4
41 4 1 42 4 2 43 4 3
In this way, confirmatory factor analysis (or perhaps rather confirmatory PCA), in the context
described here, allows the best first order reconstruction of the intra- and inter-block correlations.
First analysis

Using AMOS 6.0, we obtained Table 11 and the diagram shown in Figure 10. Confirmatory factor
analysis of the four blocks echoes in essence the results of the first principal components of the
separate PCAs for each block. We have put the significant loadings in bold in Table 11. They well
correspond to the strongest variablePC1 correlations given in Table 10. There are two exceptions:
Smell intensity at rest in block 1 and Astringency in block 4. It should be noted that these variables
have fairly high correlations with the second principal components. The GFI is less than 0.9. This is
due to the existence of the second dimensions for blocks 1, 3 and 4.
Second analysis

In order to better identify the first dimension of the phenomenon under study, it is usual in
confirmatory factor analysis to "purify" the scales: the analysis is repeated, omitting the nonsignificant variables. This is where Table 12 and Figure 11 come from. All the correlations between
the manifest and latent variables of the corresponding block, and the correlations between the latent
variables, are now strongly positive. All the correlations are significant. The first dimension of the
phenomenon under study has therefore been perfectly identified. The GFI of 0.983 is excellent and
well shows the unidimensionality of the selected variables.

39

Table 11: Confirmatory factor analysis of the "Wine" data (AMOS 6.0 output)
(Significant coefficient in bold, non-significant in italic)

Parameter

Estimate

Inf (95%)

Sup (95%)

Smell intensity at rest

<---

Rest 1

0.623

-0.768

0.993

0.170

Aromatic quality at rest

<---

Rest 1

0.944

0.176

1.19

0.018

Fruity note at rest

<---

Rest 1

0.808

0.162

1.128

0.010

Floral note at rest

<---

Rest 1

0.529

-0.416

0.935

0.108

Spicy note at rest

<---

Rest 1

-0.003

-1.064

0.716

...

Visual intensity

<---

View

0.951

0.424

1.319

0.046

Shading

<---

View

0.931

0.486

1.256

0.045

Surface impression

<---

View

1.028

0.266

1.375

0.036

Smell intensity

<---

Shaking 1

0.614

-0.612

1.083

0.183

Smell quality

<---

Shaking 1

0.828

0.347

1.062

0.017

Fruity note

<---

Shaking 1

0.752

0.146

1.042

0.018

Floral note

<---

Shaking 1

0.240

-0.658

0.849

0.353

Spicy note

<---

Shaking 1

0.264

-0.655

0.864

0.253

Vegetable note

<---

Shaking 1

-0.570

-0.999

0.114

0.096

Phenolic note

<---

Shaking 1

0.392

-0.219

0.741

0.200

Aromatic intensity in mouth

<---

Shaking 1

0.928

0.176

1.237

0.041

Aromatic persistence in mouth

<---

Shaking 1

0.955

0.105

1.299

0.024

Aromatic quality in mouth

<---

Shaking1

0.801

0.175

1.040

0.020

Intensity of attack

<---

Tasting 1

0.897

0.105

1.334

0.016

Acidity

<---

Tasting 1

-0.208

-1.012

0.546

0.595

Astringency

<---

Tasting 1

0.808

-0.276

1.155

0.076

Alcohol

<---

Tasting 1

0.809

0.104

1.251

0.040

Balance

<---

Tasting 1

0.841

0.129

1.168

0.026

Mellowness

<---

Tasting 1

0.893

0.224

1.207

0.021

Bitterness

<---

Tasting 1

0.373

-0.789

0.851

0.326

Ending intensity in mouth

<---

Tasting 1

0.969

0.195

1.336

0.018

Harmony

<---

Tasting 1

0.958

0.295

1.312

0.014

Estimate

Inf (95%)

Sup (95%)

Rest 1

Parameter
<-->

View

0.724

0.069

0.866

0.027

Rest 1

<-->

Shaking 1

0.866

0.725

0.960

0.010

Rest 1

<-->

Tasting 1

0.736

0.578

0.863

0.010

View

<-->

Shaking 1

0.827

0.218

0.950

0.026

View

<-->

Tasting 1

0.887

0.201

0.962

0.020

Shaking 1

<-->

Tasting 1

0.916

0.764

0.968

0.010

GFI

.849

40

.00

.00

e1
1

.00
e20
1

e3

e2
1

.00

.81.94

rest4

.53

.00

View
.87
shaking1
.83

1
shaking2

.61
.83

1
shaking3

e9
.00
1
e8
.00
1
e21
.00

shaking4

shaking6

.89

.80

.96

e24
1

shaking9

e23

1.00

tasting7

.84

.00
e25
.00

e14
.00

1
e15

tasting4
1
.00

1
e18

e26

1
tasting6

tasting5
tasting3

.00

.37

.81

.81

tasting2
1
.00

tasting8

.89
-.21

e19

.97

Tasting 1

.90

tasting1
1
.00

e27

.96

shaking8
.00

1
tasting9

1
.00

.00

.92

shaking10

shaking7

e22

1.00
Shaking 1

.26
-.57
.39
.93

shaking5

.74

.75
.24

e10
.00
1

.93 .95
1.00

1.03

.72

e12
.00
e11

view1

view2

Rest 1

e13
.00

.00

view3

1.00

.00

rest5

rest1

.62

e7

e6
1

rset2

.00

.00

e5

e4

rest3

.00

.00

.00

e16

e17

Figure 10 : Confirmatory factor analysis of the "Wine" data (AMOS 6.0 output)

41

Table 12: Confirmatory factor analysis of the "Wine" data on


the significant variables (p-value < .05) of Table 11 (AMOS 6.0 output)

Parameter
Aromatic quality at rest
Fruity note at rest
Visual intensity
Shading
Surface impression
Smell quality
Fruity note
Aromatic intensity in mouth
Aromatic persistence in mouth
Aromatic quality in mouth
Intensity of attack
Alcohol
Balance
Mellowness
Ending intensity in mouth
Harmony
Parameter
Rest 1
<--> View
Rest 1
<--> Shaking 1
Rest 1
<--> Tasting 1
View
<--> Tasting 1
View
<--> Shaking 1
Shaking 1 <--> Tasting 1

<--<--<--<--<--<--<--<--<--<--<--<--<--<--<--<---

Rest 1
Rest 1
View
View
View
Shaking 1
Shaking 1
Shaking 1
Shaking 1
Shaking 1
Tasting 1
Tasting 1
Tasting 1
Tasting 1
Tasting 1
Tasting 1
Estimate
0.645
0.870
0.690
0.837
0.792
0.897

Estimate
0.992
0.885
0.942
0.924
1.039
0.885
0.822
0.917
0.923
0.854
0.891
0.795
0.885
0.930
0.960
0.975
Inf (95%)
0.303
0.694
0.302
0.411
0.329
0.752
.983

GFI

42

Inf (95%) Sup (95%)


0.649
1.197
0.510
1.181
0.539
1.311
0.623
1.243
0.536
1.386
0.569
1.090
0.417
1.076
0.385
1.246
0.363
1.288
0.581
1.065
0.346
1.337
0.247
1.232
0.432
1.173
0.496
1.222
0.438
1.327
0.497
1.323
Sup (95%)
0.806
0.948
0.842
0.948
0.921
0.957

P
0.01
0.01
0.01
0.01
0.01
0.01

P
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.02
0.01
0.01
0.01
0.01

.00

.00

.00

e5

e3

e2

.89

.99

view3

rest2

rest3

e7

e6

.00

.00

view1

view2
.94

1.04 .92

.65

1.00

1.00

View

Rest 1
.87
.00

.79

e12

shaking2
.88

.00

1
shaking3

e11

.69
1.00

.82

Shaking 1
.00

.92
1
shaking8

e22

.92

.84
.85

.00
1

e23

.90
shaking9

.00
e24

1.00

shaking10

.97
Tasting 1

e27

.96

.89
.93

.00

tasting8

e26

.89

.79
tasting1

tasting6

.00
e19

tasting5
1
.00

tasting4
1
.00

.00
e14

e15

e16

Figure 11: Confirmatory factor analysis of the "Wine" data on


the significant variables of Table 9 (AMOS 6.0 output)

43

.00

tasting9

Third analysis

As the four LVs appearing in Figure 11 are highly correlated, it is natural to summarize these LVs
through a second order confirmatory factor analysis. This yields to Figure 12. The regression
coefficient of one MV of each block has been put to 1. The second order LV Score 1 is similar to
the standardized first principal component of the first order LVs as the error variances have been
put to zero. The first order LVs are evaluated by using the McDonald approach. For example,
using the path coefficients shown in Figure 12, we get:
Score(Rest 1) 1 rest2* + .88 rest3*

view1

view2

view3

1.00

1.12 .98

1.00

.88

rest2

rest3

e7

e6

e5

.00
1

.00

d1

Rest 1

.00

.00

e3

e2

.00

.00

.00

.00

View

.83

d2

.84

e12

shaking2
1.00

1.04
.00

.96

shaking3

e11

Score 1

.82
Shaking 1

.00

1.08
1
shaking8

e22

1.09

.00

1.00

.00

d3

e23

.94

shaking9
.00

e24

shaking10
1

.00

1.00

1
Tasting 1

d4

tasting9
.99

.91
.95

tasting8

.00
e26

.91

.82

.00
e27

tasting1
tasting6

.00
e19

tasting5
1
.00

tasting4
1
.00

.00
e14

e15

e16

Figure 12: Second order confirmatory factor analysis of the "Wine" data on
the significant variables of Table 9 (AMOS 6.0 output)

44

In the same way, the second order LV (Score 1) can be computed as a weighted sum of all the
MVs. The regression coefficient of Score 1 in the regression of an MV on Score 1 is equal to
the product of the path coefficients related to the link between this MV and its LV and to the link
between the LV and Score 1. For example
Cov(rest2,Score 1)
= Cov(12 Rest 1 + 12 ,Score 1) = 12Cov(Rest 1,Score 1) = 12

Cov(Rest 1,Score 1)
Var (Score 1)

as the latent variable Score 1 is standardized. This leads to:


n1
Score

.83 1 rest2* + .88 rest3* + " + .94 .91 tasting1* + " + 1 tasting9*

But this formula has a severe drawback: it gives more weight to a block containing many variables
than to a block with few variables. From a pragmatic point of view, we prefer to compute a
weighted sum of the first order standardized LV estimates, using the path coefficients relating the
first order LVs to the second order LV. In fact, these weights are reflecting the quality of the
approximation of the second order LV by the first order LVs. This leads to what is called here
Global score (1):

Global score (1)


.83 Score (Rest 1) + " + .94 Score (Tasting 1)
The correlation table between these scores is given in Table 13. All the computed first order LVs
are well positively correlated and very well summarized by the computed second order LV.
Table 13: Correlation between scores related to the first dimensions of the wine data
Correlations

Rest 1
View 1
Shaking 1
Tasting 1
Global score 1

Rest 1
1
.671
.687
.546
.802

View 1
.671
1
.794
.838
.921

Shaking 1
.687
.794
1
.897
.942

Tasting 1
.546
.838
.897
1
.920

Global
score 1
.802
.921
.942
.920
1

Fourth analysis

To identify the second dimension of the phenomenon under study, we will construct a new
confirmatory PCA model for the manifest variables not taken into account in the second analysis.
The non-significant variables were eliminated iteratively as before. This is where Figure 13 and
Table 14 come from. All the correlations between the manifest and latent variables of the
corresponding block, and the correlations between the latent variables, are now strongly positive.
All the correlations are significant. The second dimension of the phenomenon studied has therefore
been identified. The value of the GFI is 0.919; this means that this second dimension can be
accepted.

45

.00

e4

rest1
.96
1.00
.00

Rest 2

.78

e20

rest5

.00
1

.75

shaking1

e13

.94

.77
1.00

.00

.78

Shaking 2

shaking5

e9

.60
.00
e21

.79
shaking7
1.00
Tasting 2
.85

.91

tasting3

tasting7
1

.00

.00

e25

e17

Figure 13: Confirmatory factor analysis of the "Wine" data


on the variables of Table 8 (AMOS 6.0 output)
Table 14: Confirmatory factor analysis of the "Wine" data on the non-significant variables of Table
9 (AMOS 6.0 output). Results after iterative elimination of non-significant variables.
Estimate

Inf (95%)

Sup (95%)

Smell intensity at rest


Spicy note at rest

Parameter
<--- Rest 2
<--- Rest 2

0.957
0.778

0.545
-0.202

1.228
1.196

0.01
0.07

Smell intensity

<--- Shaking 2

0.940

0.475

1.324

0.01

Spicy note
Phenolic note

<--- Shaking 2
<--- Shaking 2

0.781
0.603

0.192
0.054

1.145
1.113

0.01
0.04

Astringency
Bitterness

<--- Tasting 2
<--- Tasting 2

0.912
0.849

0.580
0.042

1.217
1.345

0.01
0.02

Parameter

Estimate

Inf (95%)

Sup (95%)

Shaking 2
Shaking 2

<-->
<-->

Tasting 2
Rest 2

0.791
0.754

0.490
0.487

0.918
0.903

0.01
0.01

Tasting 2

<-->

Rest 2

0.775

0.468

0.903

0.01

GFI

.919

46

Fifth analysis

The three LVs appearing in Figure 13 being highly correlated, they are summarized as above
through a second order confirmatory factor analysis. This yields to Figure 14. Scores related to the
second dimension are computed in the same way as those related to the first dimension. The
correlation table related to these scores is given in Table 15. Comments are the same as for Table
13.
.00

.00

e20
1

e4
1

rest5

rest1
1.00

1.25

.00
1
Rest 2

d1

.70
.00
1

.00
shaking1

e13

1
.00

1.34

Score 2

.54

Shaking 2

shaking5

e9

1.00

d3

1.62

.78

1.00
.00
e21

shaking7
.00
Tasting 2
1.08

tasting3

d4
1.00

tasting7
1

1
.00

.00
e25

e17

Figure 14: Second order confirmatory factor analysis of the "Wine" data on
the variables of Table 12 (AMOS 6.0 output)
Table 15: Correlation between scores related to the second dimensions of the wine data
Correlations

Rest 2
Shaking 2
Tasting 2
Global score 2

Rest 2
1
.758
.776
.908

Shaking 2
.758
1
.793
.933

47

Tasting 2
.776
.793
1
.925

Global
score 2
.908
.933
.925
1

Remarks:

1. The first dimension consists of variables all positively correlated with the global quality grade
(available elsewhere). These correlations are given in Table 16. The second dimension, on the other
hand, is relative to variables not correlated with the global quality grade.
2. It may be wished to obtain orthogonal components in each block. Then, it would be necessary to
use the deflation process, i.e. to construct a new analysis on the residuals of the regression of each
original block Xj on its first computed latent variable LVj.
Table 16: Correlation between the variables related to the two dimensions
and the global quality grade
Variables related to dimension 1

Global quality

Aromatic quality at rest


Fruity note at rest

0.62
0.50

Visual intensity

0.54

Shading (from orange to purple)


Surface impression

0.51
0.67

Smell quality
Aromatic intensity in mouth
Aromatic persistence in mouth
Aromatic quality in mouth

0.76
0.61
0.68
0.85

Intensity of attack
Alcohol
Balance (acidity, astringency, alcohol)
Mellowness
Ending intensity in mouth
Harmony

0.77
0.52
0.95
0.92
0.80
0.88

Global score 1

0.73

Variables related to dimension 2

Global quality

Smell intensity at rest


Spicy note at rest

0.04
-0.31

Smell intensity after shaking


Spicy note after shaking
Phelonic note

0.17
-0.08
0.09

Astringency

0.41

Bitterness

0.05

Global score 2

0.08

Graphical displays

Using Global scores (1) and (2), we obtain three graphical displays. The variables are described
with their correlations with Global scores (1) and (2). The individuals are visualized with these two
global scores using appellation and soil markers. These graphical displays are given in Figures 15,
16 and 17. Figures 16 and 17 show clearly that soil is a much better predictor of wine quality than
appellation. All the wines produced on a reference soil are positive on Score 1. The reader
interested in wine can even detect that the two Saumur 1DAM and 2DAM are the best wines from

48

this sample. I can testify that I drank outstanding Saumur-Champigny produced at Dampierre-surLoire.

Figure 15: Graphical display of the variables

Figure 16: Graphical display of the wine with appellation markers

49

Figure 17: Graphical display of the wine with soil markers


IV.

Comparison between the ULS-SEM and PLS approaches.

The die is not cast and the ULS-SEM approach is not uniformly more powerful than the PLS
approach. We have set out the "pluses" and "minuses" of each approach in Table 16.
V.

Conclusion

Roderick McDonald has thrown a bridge between the SEM and PLS approaches by making use of
three ideas: (1) using the ULS method, (2) setting the variances of the residual terms of the
measurement model to 0, and (3) estimating the latent variables by using the loadings of the MVs
on their LVs. The McDonald approach has some very promising implications. Using a SEM
software such as AMOS 6.0 makes it possible to get back to PCA, to the analysis of multi-block data
and to a "data analysis" approach for SEM completely similar to the PLS approach. We have
illustrated this process with three examples, corresponding to these different themes. We have listed
the advantages and disadvantages of the two approaches. We end this paper with a wish: that this
ULS-SEM approach be included in a PLS-SEM software. The user would then have access to a very
comprehensive toolbox for a "data analysis" approach to structural equation modelling.

50

Table 16: Comparison between the ULS-SEM and PLS approaches.


ULS-SEM

The
"pluses"

PLS

Global criterion well identified


Use of SEM softwares
Parameters can be subject to
constraints
Use of bootstrapping on all the
model parameters
Better measurement of the
quality of the theoretical model
Non-recursive model allowed

The
"minuses"

Possible difficulty in model


identification
Possible non-convergence of the
algorithm
Explicit calculation of LVs is
outside the SEM software
Missing data are not permitted

No identification problem
Systematic convergence of the
PLS algorithm
General framework for multiblock data analysis
Robust method for small-size
samples
Possibility of several LVs per
block exists in PLS-Graph
software
Explicit calculation of LVs
integrated in PLS softwares
Easy handling of missing data
Algorithm is often closer to an
heuristic than to the optimisation
of a global criterion
It is impossible to impose
constraints on the parameters
Measurement of the quality of the
inner model is underestimated
Measurement of the quality of the
outer model is overestimated
Non-recursive model prohibited

References

Arbuckle, J.L. (2005): AMOS 6.0. AMOS Development Corporation, Spring House, PA.
Bollen, K. A. (1989): Structural Equations with Latent Variables, John Wiley & Sons.
Chin W.W. (2001): PLS-Graph Users Guide, C.T. Bauer College of Business, University of Houston,
USA.
Hwang, H. & Takane Y. (2004) : Generalized structured component analysis, Psychometrika, 69, 1, 81-99.

McDonald, R.P. (1996): Path analysis with composite variables, Multivariate Behavioral Research,
31 (2), 239-270.
Noonan, R. & Wold, H. (1982): PLS path modeling with indirectly observed variables: a comparison
of alternative estimates for the latent variable. In: Jreskog, K.G., Wold, H. (Eds.), Systems under
Indirect Observation. North-Holland, Amsterdam, pp. 7594.
Pags J., Asselin C., Morlat R., Robichet J. (1987): Analyse factorielle multiple dans le traitement de
donnes sensorielles : Application des vins rouges de la valle de la Loire, Sciences des aliments,
7, 549-571)
Tenenhaus, M. (2007): Statistique : Mthodes pour dcrire, expliquer et prvoir, Dunod, Paris.
Tenenhaus M., Esposito Vinzi V., Chatelin Y.-M., Lauro C. (2005): PLS path modeling.
Computational Statistics & Data Analysis, 48, 159-205.
51

Tenenhaus, M. & Esposito Vinzi, V. (2005): PLS regression, PLS path modeling and generalized
Procustean analysis: a combined approach for multiblock analysis, Journal of Chemometrics, 19,
145-153.
Tenenhaus, M. & Hanafi M. (2007): A bridge between PLS path modelling and multi-block data
analysis , in Handbook of Partial Least Squares (PLS): Concepts, Methods and Applications (V.
Esposito Vinzi, W. Chin, J. Henseler, H. Wang, Eds), Volume II in the series of the Handbooks of
Computational Statistics, Springer, in press.
Tenenhaus, M., Pags, J., Ambroisine, L. & Guinot, C. (2005): PLS methodology to study
relationships between hedonic judgements and product characteristics, Food Quality and Preference,
vol. 16, n 4, pp. 315-325.
XLSTAT (2007): XLSTAT-PLSPM module, XLSTAT software, Addinsoft, Paris.

52

Potrebbero piacerti anche