Sei sulla pagina 1di 41

mhsagujar@neu.edu.

ph
Correlation Analysis
Is IQ related to
I think savings educational
attainment?
is related to
weight.

• A correlation is a relationship or
association between two variables.
• A correlation coefficient is a
numerical measure of the linear
relationship between two variables.
• Some variables are so related in such a
way that if you know one of them, the
others can be estimated. Think of the
relationship engendered by this boxed
question.
What do you think is the relationship
between the number of hours spent in
studying (variable 1) and the grades
received (variable 2)?

• As the number of hours spent in studying


increases, what happens to the grade?
A direct or positive relationship between
two variables implies that an increase in
value of one of the variables corresponds to
an increase in value of the other variable.

25

20

15
r=1
10

0
10 15 20 25
A direct or positive relationship between
two variables implies that an increase in
value of some of the variables corresponds
to an increase in value of the other variable.

25
Example: IQ and
20 • • academic grades
15 •
• •
10 • •

5

0
10 15 20 25
0<r<1
• In reality, you could seldom find variables
with perfect positive correlation.
Oftentimes, you will come across variables
with only some degree of positive
relationship.
• In a perfect positive correlation, all the
points can be contained in a straight line
whose movement is upward right. Now
what do you notice in the “some positive
correlation”? Can they contained in one
straight line? If not describe the general
direction of the points.
What do you think is the relationship
between the number of absences in class
(variable 1) and the grades received
(variable 2)?

Playing Hooky Getting low grades


• As the number of absences
increases, what do you think will
happen to the grades received? Do
they have the same relationship as in
20

the number of hours spent in


studying and the grades received? If
not, how are they related?
An inverse or negative relationship between
two variables implies that an increase in
value of one of the variables corresponds to
a decrease in value of the other variable.

25

20

15

10

0
20 15 10 5
r = -1
5 10 15 20
• Again, this type of relationship is
not true for all. In real life, you
can only get some degree of
negative relationship.
An inverse or negative relationship between
two variables implies that an increase in
value of some of the variables corresponds
to a decrease in value of the other variable.

25

20
• • •
15
• •• •
10
• •
5 •
0
20 15 10 5
-1 < r < 0
5 10 15 20
• Analyze the relationships of the
following variables.
a) shoe size and IQ
b) waist line and GPA
c) number of books in the library
and the number of points made in
a basketball game.
• Yes! There are many variables
which do not have correlation at
all. Thus, there exists a zero
correlation.
A zero relationship exists between two
variables if an increase in value of one of
the variables is not accompanied by either
an increase or a decrease in value of the
other variable.
25
• • Example: Intelligence
20
• •• • •• • and gender
15
•• •
10

r=0
5

0
20
5 15
10 10
15 5
20
• To determine the degree of
relationship between two variables,
the “Pearson product-moment
correlation coefficient or simply
Pearson’s “r” formula will be used.”
The formula and the extent or the
degree of relationship are given in
the boxes below.
The Pearson product-moment
correlation coefficient or simply
Pearson r
n  XY   X  Y
r
 
n  X 2
    X   n  Y    Y  
2

2 2

A correlation coefficient is the
magnitude or the degree of relationship
between two variables.
between  0.80 to  0.99 high correlation
between  0.60 to  0.79 moderately high correlation
between  0.40 to  0.59 moderate correlation
between  0.20 to  0.39 low correlation
between  0.01 to  0.19 negligible correlation
• For manual computation, you may
refer to the formula. However, it will
be easier if you have the required
calculator with LR/stat1/stat2/statxy
mode.
• Do the computation using the
example on the number of hours
spent in studying and the grades
received.
Hours spent in
studying (x)
2 2 2 3 3 4 5 5 6 6
Grades
57 63 70 72 69 75 73 84 82 89
received (y)

• Set your calculator to LR mode or its


equivalent. Clear the stat memory by
pressing Shift AC thrice or its
equivalent. Then enter the data.
2 57 4 3249 114
2 63 4 3969 126
2 70 4 4900 140
3 72 9 5184 216
3 69 9 4761 207
4 75 16 5625 300
5 73 25 5329 365
5 84 25 7056 420
6 82 36 6724 492
6 89 36 7921 534
38 734 168 54718 2914
n  XY   X  Y
r
 
n  X 2
    X   n  Y    Y  
2

2 2

10  2914    38 734 


r
10 168   38  10  54718   734  
2 2
  

r  0.8851144396
r  0.89
X  or ";" or "," Y then M , that is

2  or ";" or "," 57 then M ,

2  or ";" or "," 63 then M ,

2  or ";" or "," 70 then M ,

3  or ";" or "," 72 then M ,

3  or ";" or "," 69 then M ,

4  or ";" or "," 75 then M ,

5  or ";" or "," 73 then M ,

5  or ";" or "," 84 then M ,

6  or ";" or "," 82 then M ,

6  or ";" or "," 89 then M ,

• To get the correlation coefficient,
press SHIFT or RCL then “r”,
0.8851144396 will be display. In two
decimal places, rxy = 0.89 which is
interpreted as high correlation.
• Another important and interesting
statistics which can be obtained from
the correlation coefficient (r), is the
coefficient of determination “r2”. This
tells us how much of Y (grades) is
due to or can be attributed to X
(number of hours spent in studying).
Thus, if you square “r”, that is
0.8851143962, you will get
0.783427495.
• This value is interpreted as follows:

“Seventy-eight percent (78%) of the


variation in grades received (Y) is due to or
can be attributed to the variation in the
number of hours spent in studying (X), and
the remaining 22% (100% - 78%) is due to
the other factors such as IQ, teacher, etc…”
• Again, the Microsoft EXCEL will be
helpful in getting “r”. Below are the
truncated printout and the scatter
diagram for the number of hours
spent in studying and the grades
received generated by MS Excel.
Some parts of the printout are
deleted.
No. of Hours Grades SUMMARY OUTPUT
2 57
2 63 Regression Statistics
2 70 Multiple R 0.885114397
3 72 R Square 0.783427495
3 69 Adjusted R2 0.756355932
4 75 Standard Error 4.775466966
5 73 Observations 10
5 84
6 82
6 89
• Notice that EXCEL gives “r” as “Multiple
R” and “r2” as “R square”.
1. Go to “Excel”
2. “Tools” then “Data Analysis”
3. “Regression Analysis”
4. Input “Y” (grades) and “X” (hours)
5. Check output range, then press “OK”
6. then you are done!
• So in getting “r” and “r2”, you have
three choices:
1. compute by longhand,
2. use your calculator with LR mode
or its equivalent, or
3. use your EXCEL.
Scatter Diagram

120
100

80 •
• •

Y 60

40
20
0
1 2 3 4 5 6 7
X
Testing the significance of correlation
• After learning how to get and interpret the
value of “r”, your next task is to determine
whether the correlation, which exists
between the variables, is significant and
not just due to chance. This time, it is
testing the significance of correlation.
• There are several ways to test if “r” is
significant. One can use the t-test for
correlation coefficient with r n2
the formula: t
df = n - 2 1 r2
r n2 n2
t or tr
1 r 2 1 r 2

0.8851144396 10  2
t
1   0.8851144396 
2

2.503481689
t
0.465373429
t  5.379511443
t  5.3795
Approaches in Hypothesis Testing
A. Critical value approach
5 – step solution
1. H0: ______________
Ha: ______________
2.  = _____; Cri-value = _____
3. Decision rule: Reject H0 if
Comp  value  Cri  value
4. Decision:
5. Conclusion:
5 – step solution (Let r be the pop. Correlation)
1. H0: r = 0; There is no correlation between the
no. of hours spent in studying and the grades
received. (rho is the symbol for population r)
Ha: r ≠ 0; There is a correlation between the
number of hours spent in studying and the
grades received.

2.  = 0.05; t comp = 5.3795 and cv = 2.306

3. Decision rule: Reject H0 if 5.3795  2.306

4. Decision: Reject H0, because 5.3795 > 2.306


5. Conclusion: There is a significant correlation
between the number of hours spent in
studying and the grades received. Hence, as
the number of hours spent in studying
increases, the grades received also increase.
The T-test or F-test for slope method
• Use the example on the number of hours
spent in studying and grades received.
Hours spent in
studying (x)
2 2 2 3 3 4 5 5 6 6
Grades
57 63 70 72 69 75 73 84 82 89
received (y)

• In regression statistics generated by


Microsoft Excel, you have “r = 0.89”. The
lower portion of the regression printout is
reproduced below.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.885114
R Square 0.783427
Adjusted R Square 0.756356
Standard Error 4.775467
Observations 10
ANOVA
Source of Variation Df SS MS F Significance F

Regression 1 659.959 659.959 28.93913 0.000662


Residual 8 182.440 22.8050
Total 9 842.4
Standard
Coefficients tStat P-value Lower 95%
error

Intercept 53.305 4.02916 13.2298 1.02E-06 44.01382


No. of Hours 5.2881 0.98301 5.37951 0.000662 3.021299
B. p-value approach

5 – step solution
1. H0: ______________
Ha: ______________
2.  = _____; p-value = _____
3. Decision rule: Reject H0 if
p  value  
4. Decision:
5. Conclusion:
5 – step solution (Let r be the pop. Correlation)
1. H0: r = 0; There is no correlation between the
no. of hours spent in studying and grades
received. (rho is the symbol for population r)
Ha: r ≠ 0; There is a correlation between the
number of hours spent in studying and the
grades received.
2.  = 0.05; rcomp = 0.89; p-value for slope =
0.000662 (found in the printout)
3. Decision rule: Reject H0 if p-value for slope
(0.000662  (0.05).
4. Decision: Reject H0, because p-value for
slope (0.000662 < (0.05).
5. Conclusion: There is a significant correlation
between the number of hours spent in
studying and the grades received. Hence, as
the number of hours spent in studying
increases, the grades received also increase.

Potrebbero piacerti anche