Sei sulla pagina 1di 13

B AD 6243: Applied Univariate Statistics

Correlation

Professor Laku Chidambaram


Price College of Business
University of Oklahoma
Correlation
• Is a measure of association between two (usually)
interval/ratio level variables
• The correlation coefficient, r, refers to the strength
(or magnitude) of the association in the sample; it
serves as an estimate for the population parameter,

• When computing r, three outcomes are possible:
– positive correlation (both scores move together in the
same direction);
– negative correlation (one score moves in one direction,
while the other moves in the opposite direction)
– No correlation (no systematic movement)

BAD 6243: Applied Univariate 2


Statistics
Variable 2
Scatter Plots of Correlations

Variable 2
Variable 1 Variable 1
Variable 2

Variable 2
Variable 1 Variable 1
BAD 6243: Applied Univariate 3
Statistics
An Example
Wage (W) W-Mean SSD of W Exp (E) E-Mean SSD of E Cross Prod
2 -3 9 8 -5 25 15
3 -2 4 10 -3 9 6
3 -2 4 8 -5 25 10
4 -1 1 12 -1 1 1
5 0 0 14 1 1 0
6 1 1 14 1 1 1
6 1 1 16 3 9 3
6 1 1 16 3 9 3
10 5 25 19 6 36 30
Mean 5.00 0.00 5.11 13.00 0.00 12.89 7.67
Sum 45.00 0.00 46.00 117.00 0.00 116.00 69.00

variance of W 5.75 SD of W 2.40 co-eff of corr 0.94


variance of E 14.50 SD of E 3.81
cov of W & E 8.63 SD of W & E 9.13

BAD 6243: Applied Univariate 4


Statistics
Results of the Analysis
Correlations

VAR00001 VAR00002
VAR00001 Pearson Correlation 1 .945**
Sig. (2-tailed) . .000
Sum of Squares and
46.000 69.000
Cross-products
Covariance 5.750 8.625
N 9 9
VAR00002 Pearson Correlation .945** 1
Sig. (2-tailed) .000 .
Sum of Squares and
69.000 116.000
Cross-products
Covariance 8.625 14.500
N 9 9
**. Correlation is significant at the 0.01 level (2-tailed).

BAD 6243: Applied Univariate 5


Statistics
A Graphical Representation
12

10

4
VAR00001

0
6 8 10 12 14 16 18 20

VAR00002

BAD 6243: Applied Univariate 6


Statistics
Significance of Correlation
• Is r significantly different from zero?
– H0: xy = 0
– Ha: xy  0
• One-tailed vs. two-tailed tests
• Significance level is affected by sample size, so
even “small” correlation coefficients can be
significant, if sample sizes are large
• However, despite statistically significant
correlations, their practical relevance may be
difficult to determine
BAD 6243: Applied Univariate 7
Statistics
Coefficient of Determination
• R2 refers to the shared variance between
variables and is obtained by squaring the
correlation coefficient, r
Variance of y

Variance of x

BAD 6243: Applied Univariate 8


Statistics
Some Cautionary Notes
• Problem of outliers
• Correlation is not causation
• It only summarizes linear relationships
• The “third variable” problem
• Restriction in range of values
• Use of extreme groups

BAD 6243: Applied Univariate 9


Statistics
Impact of Outliers
400

300

200

100

0
VAR00001

-100
6 8 10 12 14 16 18 20

VAR00002

Pearson’s Corr Coefficient, r = -0.473


Spearman’s rho = 0.429

BAD 6243: Applied Univariate 10


Statistics
What to Use?
• Pearson’s Product Moment Correlation:
– Interval/Ratio level variables
– Interval/Ratio and dichotomous variables
• Spearman’s Rho:
– Non-normal distributions
– Ordinal variables
• Kendall’s Tau:
– Non-normal distributions
– Ordinal and dichotomous variables
– Small sample with numerous tied ranks

BAD 6243: Applied Univariate 11


Statistics
Partial Correlation
• A partial correlation refers to the correlation
between two variables, with the influence of a
third variable removed from both
• In other words, in our example you become
aware that wage rates are not influenced by
overall work experience per se, but relevant
work experience (say, in a similar job)
• So, we examine if this proposition is true by
looking at the partial correlation

BAD 6243: Applied Univariate 12


Statistics
Results of Analysis
- - - P A R T I A L C O R R E L A T I O N C O E F F I C I E N T S - - -
Zero Order Partials
VAR00001 VAR00002 VAR00003

VAR00001 1.0000 .9446 .9953


( 0) ( 7) ( 7)
P= . P= .000 P= .000
VAR00002 .9446 1.0000 .9402
( 7) ( 0) ( 7)
P= .000 P= . P= .000
VAR00003 .9953 .9402 1.0000
( 7) ( 7) ( 0)
P= .000 P= .000 P= .
(Coefficient / (D.F.) / 2-tailed Significance)
- - - P A R T I A L C O R R E L A T I O N C O E F F I C I E N T S - - -
Controlling for.. VAR00003
VAR00001 VAR00002
VAR00001 1.0000 .2682
( 0) ( 6)
P= . P= .521
VAR00002 .2682 1.0000
( 6) ( 0)
P= .521 P= .
(Coefficient / (D.F.) / 2-tailed Significance)

Potrebbero piacerti anche