Sei sulla pagina 1di 3

Factor analysis is a statistical method used to describe variability among observed,

correlated variables in terms of a potentially lower number of unobserved variables called


factors. In other words, it is possible, for example, that variations in three or four
observed variables mainly reflect the variations in fewer such unobserved variables.
Factor analysis searches for such joint variations in response to unobserved latent
variables. The observed variables are modeled as linear combinations of the potential
factors, plus "error" terms. The information gained about the interdependencies between
observed variables can be used later to reduce the set of variables in a dataset.
Computationally this technique is equivalent to low rank approximation of the matrix of
observed variables.

Factor analysis is a method for investigating whether a number of variables of interest are
linearly related to a smaller number of unobservable factors. In the special vocabulary of
factor analysis, the parameters of these linear functions are referred to as loadings.Under
certain conditions (A1 and A2 in the text), the theoretical variance of each variable and
the covariance of each pair of variables can be expressed in terms of the loadings and the
variance of the error terms.

Factor analysis usually proceeds in two stages. In the first, one set of Loadings is
calculated which yields theoretical variances and covariance that if the observed ones as
closely as possible according to a certain criterion. These loadings, however, may not
agree with the prior expectations, or may not lend themselves to a reasonable
interpretation. Thus, in the second stage, the first loadings are \rotated" in an effort to
arrive at another set of loadings that are equally well observed variances and covariance,
but are more consistent with prior expectations or more easily interpreted.

The communality of a variable is the part of its variance that is explained by the common
factors. The specific variance is the part of the variance of the variable that is not
accounted by the common factors.

² There exist an in¯nite number of sets of loadings yielding the same


theoretical variances and covariances.
²
² A method widely used for determining a ¯rst set of loadings is the
principal component method. This method seeks values of the loadings
that
bring the estimate of the total communality as close as possible to the
total
of the observed variances.
²When the variables are notmeasured in the same units, it is customary
to standardize them prior to subjecting them to the principal component
method so that all have mean equal to zero and variance equal to one.
² The varimax rotation method encourages the detection of factors each
of which is related to few variables. It discourages the detection of factors
in°uencing all variables.
Factor analysis is a method for investigating whether a number of
variables
of interest Y1, Y2, : : :, Yl, are linearly related to a smaller number of
unobservable
factors F1, F2, : : :, Fk .
The fact that the factors are not observable disquali¯es regression and
other methods previously examined. We shall see, however, that under
certain conditions the hypothesized factor model has certain implications,
and these implications in turn can be tested against the observations.
Exactly
what these conditions and implications are, and how the model can be
tested, must be explained with some care.

What is the Correlation Coefficient?

The correlation coefficient a concept from statistics is a measure of how well trends in the
predicted values follow trends in past actual values. It is a measure of how well the
predicted values from a forecast model "fit" with the real-life data.
The correlation coefficient is a number between 0 and 1. If there is no relationship
between the predicted values and the actual values the correlation coefficient is 0 or very
low (the predicted values are no better than random numbers). As the strength of the
relationship between the predicted values and actual values increases so does the
correlation coefficient. A perfect fit gives a coefficient of 1.0. Thus the higher the
correlation coefficient is the better one.
Introduction to Multiple Regression (1 of
3)

In multiple regression, more than one variable is used to predict the criterion. For
example, a college admissions officer wishing to predict the future grades of college
applicants might use three variables (High School GPA, SAT, and Quality of letters of
recommendation) to predict college GPA. The applicants with the highest predicted
college GPA would be admitted. The prediction method would be developed based on
students already attending college and then used on subsequent classes. Predicted scores
from multiple regression are linear combinations of the predictor variables. Therefore, the
general form of a prediction equation from multiple regression is:

Y = b1X1 + b2X2 + ... + bkXk + A


where Y' is the predicted score, X1 is the score on the first predictor variable, X2 is the
score on the second, etc. The Y intercept is A. The regression coefficients (b1, b2, etc.) are
analogous to the slope in simple regression.

Potrebbero piacerti anche