Sei sulla pagina 1di 46

CORRELATION

Learning Objectives
Definition of Correlation
Need of Correlation
Computing Correlation
Correlation Coefficient (Pearson's correlation)
3 Characteristics of Relationship
Properties of Correlation
Coefficient of Determination
Steps of a Hypothesis test with Correlation
Spearman Rank Correlation Coefficient (rs)
Definition of Correlation?

Correlation is a statistical technique used to


determine the degree to which two variables are
related.
Measure of association
A single member that describes the degree of
relationship between two variables.
Strong
Correlation

Relationship between X and Y is MORE CORRELATED.


Weak
Correlation

Relationship between X and Y is LESS CORRELATED.


Need of Correlation
Statistical Measures
Pearson Correlation Coefficient

Computing Correlation

Needs information on standard deviation and


covariance.
Correlation Coefficient
Statistic showing the degree of relation between two
variables.
It is also called Pearson's correlation or product moment
correlation coefficient.
It measures the nature and strength between two variables
of the quantitative type.
The sign of r denotes the nature of association
while the value of r denotes the strength of association.
Correlation Coefficient Formula

(,)
Correl(X,Y)=
.. ..()

Very Strong Moderate Weak Positive No Linear Negligible Moderate Very strong
Positive Positive Association Association Negative Negative Negative
Association Association Association Association Association
3 Characteristics of Relationship
Form
Linear
Curvilinear
No pattern
Direction (Based on sign)
Positive
Negative
Strength (Based on Numerical Value)
Closer to 1
Closer to 0
Linear Relationship

As X increases, Y increases As X increases, Y decreases


Curvilinear Relationship

No Linear Relationship
No Relationship

Zero Correlation
3 Characteristics of Relationship
Form
Linear
Curvilinear
No pattern
Direction (Based on sign)
Positive
Negative
Strength (Based on Numerical Value)
Closer to 1
Closer to 0
Direction (Based on sign)
Sign of the CORRELATION COEFFICIENT (+ or -)
POSITIVE X and Y tend to change in the SAME DIRECTION

Higher X Higher Y

Lower X Lower Y

NEGATIVE X and Y tend to change in the OPPOSITE DIRECTION

Higher X Lower Y

Lower X Higher Y
Time Studying Quiz
POSITIVE DIRECTION
in Hours Grade
(X) (y)

0 65
0 60
1 70
2 75
3 80
3 85
4 90
4 90
5 95
5 98
6 99
Strong positive relationship between studying
6 100
and quiz grades (r = .989)
Hours of NEGATIVE DIRECTION
Cardio per % Body Fat
Week (y)
(X)

0 22
0 20
0 19.2
2 18
2 18.1
3 17.7
3 16
4 15
6 14
8 13
10 11 Strong Negative relationship between
12 10 cardio exercise and body fat (r = - .968)
3 Characteristics of Relationship
Form
Linear
Curvilinear
No pattern
Direction (Based on sign)
Positive
Negative
Strength (Based on Numerical Value)
Closer to 1
Closer to 0
Strength (Based on Numerical Value)
Numerical Value of the Correlation Coefficient (i.e., | 0-1|)
Closer to 1
More Consistent Relationship
Data Points in Scatter-Plot are More Linear
Closer to 0
Less Consistent Relationship
Data Points in Scatter-Plot are Less Linear and More Scattered
r Effect Size / Correlation Strength
.01 .10 Small / Weak
.09 .30 Medium / Moderate
.25 .50 Large / Strong
.50 .70 Very Large / Very Strong
STRONGEST STRENGTH
Calories
Daily Weight
(X) (y)

1000 100
1500 150
2000 200
2500 250
3000 300
3500 350
4000 400
4500 450
5000 500

Perfect (strongest possible) positive linear


relationship between calories and weight (r = 1).
# Siblings # Children MODERATE STRENGTH
(X) (y)

1 1
1 1
1 2
1 2
1 3
2 1
2 1
2 4
2 5
2 5
3 5
4 1
4 2 Moderate positive relationship between
5 5 number of siblings and number of children
5 3 (r = .283)
Anatomy and
Physiology
Percentage of
Passing Stations WEAK STRENGTH
Grade on OSCE
(X) (y)
60 60
63 85
65 40
70 90
71 60
73 80
74 55
79 100
79 79
80 70
81 50
82 79
83 82
85 40
87 85
90 40 Weak negative correlation between final grade in
91 100 anatomy and physiology and percentage of passing
95 75 stations on the objective structured clinical
97 30
examination. (r = - .085)
Properties of Correlation

Measure of Related to
Linear Categorical Sample Size
Association Data Sensitive to
OUTLIERS
Coefficient of Determination

Proportion of common
variation of two variables

This measure the strength or


the magnitude of the
relationship

Example: r 2 = 67%
67% of variation in x is related to variation in y.
Example: Rising Hills Manufacturing
Rising Hills Manufacturing wishes to study the relationship
between the number of workers, and number of tables
produced in its plant.

To do so it obtained 10 samples, each one hour in


length, from the production floor.

X = number of workers
y = number of tables produced
How would you describe
this relationship?

What correlation are you


expecting?
Correlation Calculation
How would you describe
this relationship?

r = .989

STRONG, POSITIVE Linear relationship


Steps of a Hypothesis test with
Correlation
State 1. STATE HYPOTHESIS
State 2. Determine Critical Region
State 3. Graph scatter plot and calculate test statistics
State 4. Make decision and state conclusion
State 1. STATE HYPOTHESIS

Two Tailed
Ho : p = 0 H1 : p = 0

One tailed (Positive)


Ho : p < 0 H1 : p > 0

One tailed (Negative)


Ho : p > 0 H1 : p < 0
State 2. Determine Critical Region

Find rCrit in the Critical Values for Pearson Correlation table


rCrit is based on:
- Alpha level
- one or two tailed test
- df = n 2
Where: n = number or study participants of paires of scores
State 3. Graph Scatter Plot and
Calculate Test Statistics
()
r=
( )

(,)
r=
.. ..()
State 4. Make decision and State Conclusion
If the observed (calculated) r falls in the critical region:
- REJECT Ho and the Correlation is significant (i.e., the
relationship between X and Y is not likely due to chance and you
would expect to see a relationship between X and Y in the
population)

If the observed r does not fall in the critical region:


- FAIL TO REJECT Ho and the correlation is not significant (i.e.,
the relationship between X and Y is likely due to chance and you
would not expect to see a relationship between X and Y in the
population).
If the correlation is "high" then you reject the null hypothesis, in favour of the
alternative hypothesis.

If the correlation is "low" then you don't reject the null hypothesis (but you
don't accept it either)
Spearman Rank Correlation
Coefficient (rs)
It is a non-parametric measure of correlation.
This procedure makes use of the two sets of ranks that may
be assigned to the sample values of x and Y.
Spearman Rank correlation coefficient could be computed
in the following cases:
- Both variables are quantitative.
- Both variables are qualitative ordinal.
- One variable is quantitative and the other is qualitative
ordinal.
Formula for Spearman Rank Correlation
Coefficient (rs)

The value of rs denotes the magnitude and nature of


association giving the same interpretation as simple r
Example:

1. In your third column rank the data in your first column from 1 to n (the
number of data you have). Give the lowest number a rank of 1, the next
lowest number a rank of 2, and so on.

2. In your fourth column do the same as in step 3, but instead rank the
second column
. If two (or more) pieces of data in one
column are the same, find the mean of
the ranks as if those pieces of data had
been ranked normally, then rank the
data with this mean.

In the example at right, there are two 5s


that would otherwise have ranks of 2 and
3. Since there are two 5s, take the mean
of their ranks. The mean of 2 and 3 is 2.5,
so assign the rank 2.5 to both 5s.
3. In the "d" column calculate the difference between the two numbers
in each pair of ranks. That is, if one is ranked 1 and the other 3 the
difference would be 2. (The sign doesn't matter, since the next step is
to square this number.)
4. Square each of the numbers in the "d" column and write
these values in the "d2" column.

5. Add up all the data in the "d2" column. This value is d2.

d2 = 6
6. Insert this value into the simplified Spearman's Rank
Correlation Coefficient formula and replace the "n" with the
number of pairs of data you have to calculate the answer.
7. Interpret your result. It can vary between -1 and 1.
Pearson Correlation vs Spearman Rank
Pearson Correlation Spearman Rank
Pearson correlation is the Spearman correlation means
"true" correlation between you rank the data (1st
variables--a measure of their through nth) and then take
tendency to rise and fall the correlation of the ranks
together. instead of the actual data.
The Pearson correlation The Spearman correlation
evaluates the linear coefficient is based on the
relationship between two ranked values for each
continuous variables. variable rather than the raw
data.
If you think the relationship is linear, Pearson is better.
If not, Spearman is better.

Potrebbero piacerti anche