Sei sulla pagina 1di 33

Investigating the Relationship

between Two or More Variables


(Correlation)

Dr. Tarek Tawfik

12/07/21 Dr Tarek Amin 1


The Relationship Between Variables
The variables can be categorized into two main types when
:investigating their relationship

I-Dependent: A dependent variable is


explained or affected by an
.independent variable
Age and height
:II-Independent variable
Two variables are independent if the
pattern of variation in the scores for
one variable is not related or
associated with variation in the
.scores for the other variable
The level of education in
Ecuador and the infant mortality
in Mali

12/07/21 Dr Tarek Amin 3


Techniques used to Analyze the
Relationship between Two Variables
Examples Method
Bivariate table (2x2 table) for :Tabular and graphical methods
categorical data (nominal/ordinal These present data in way that reveals a
data) possible relationship between two
.Scatter plot for interval/ratio .variables

Lambda, Cramer’s V (nominal)


Gamma, Somer’s d, Kendall’s tau-b/c :Numerical methods
(ordinal with few values) Mathematical operations used to
.Spearman’s rank order Co/Co quantify, in a single number, the strength
)ordinal scales with many values( and direction of a relationship (measures
.of association)
Pearson’s product moment correlation
(Interval/ratio)
Regression
These techniques are called collectively as
Bi-variate descriptive statistics
Correlation
Correlational techniques are used to study
relationships. They may be used in exploratory
studies in which one need to determine
whether relationships exist, and
In hypothesis testing about a particular
relationship.

12/07/21 Dr Tarek Amin 5


Pearson Correlation
Numeric (interval/ratio)
The Pearson product moment correlation coefficient
(r or rho) is the usual method by which the
.relation between two variables is quantified
:Type of data required
Interval/ratio sometimes ordinal data.
At least two measures on each subjects at the
interval/ratio level.
:Assumptions
Certain assumptions must be made if we are to
generalize beyond the sample statistics; that if we are
:to make inference about the population itself
The sample must be representative of the
population.
The variables that are being correlated
must be normally distributed.
The relationship between variables must
be LINEAR.

12/07/21 Dr Tarek Amin 7


Directions of Correlations on Scatter Plot

Positive Negative

No Correlation Non-linear (Curvilinear)

12/07/21 Dr Tarek Amin 8


Correlation Coefficient
 The correlation coefficient r allows us to state
mathematically the relationship that exists between
two variables.
 The correlation coefficient may range from +1.00
through 0.00 to – 1.00.
,A + 1.00 indicates a perfect positive relationship
indicates no relationship, and 0.00
.indicates a perfect negative relationship 1.00-
The correlation coefficient also tell us the type of relation
.that exists; that is, whether is positive or negative

- The relationship between job satisfaction and job turnover


has been shown to be negative; an inverse relationship exists
between them.
- When one variable increases, the other decreases.

- Those with higher grades have lower dropout rates (a


positive relationship).
- Increases in the score of one variable is accompanied by
increase in the other.
Relationships Measured
with Correlation Coefficient
The correlation coefficient
is the cross products of

r    zXzY n  
.the z-scores :Where
ZX= the z-score of variable X
ZY= the z-score of variable Y
N= number of observations
Relationships Measured by
:Correlation Coefficients
When using the formula with z-scores, r is the
.average of the corss-products of the z-scores

r   zXzY n 
A five subjects took a quiz X, on which the scores ranged from
to 10 and an examination Y, on which the scores ranged form 6
.to 98 82
.Calculate r and determine the pattern of correlation

12/07/21 Dr Tarek Amin 12


Formula for calculating
.correlation coefficient r


r    zXzY n  
12/07/21 Dr Tarek Amin 13
A perfect positive relationship
.between two variables
zXzY zY zX Y X (quiz) subjects
(examination)
2.0 1.42- 1.42- 82 6 1
0.5 0.71 0.71- 86 7 2
0.0 0.00 0.00 90 8 3
0.5 0.71 0.71 94 9 4
2.0 1.42 1.42 98 10 5

mean X= 8 SD=1.41 mean Y= 90 SD=5.66 ∑zXzY= 5.00

r = ∑zXzY/n = 5.00/5 = +1
Positive Correlation

100
98
96
94
Y score

92
90
88
86
84
82
80
0 5 10 15
X score
Perfect negative relationship
zXzY zY zX Y X Subjects
2.0- 1.42 1.42- 98 6 1
0.5- 0.71 0.71- 94 7 2
0.0 0.00 00.0 90 8 3
0.71- 0.71- 0.71 86 9 4
2.0- 1.42- 1.42 82 10 5
Mean X =8 Mean Y= 90
∑ zXzY= -5.00
SD= 1.41 SD= 5.66

r     zXzY n  1.0-=5.0/5- =
Negative Correlation

100
98
96
94
Y score

92
90
88
86
84
82
80
0 5 10 15
X score
No relationship
zXzY zY zX Y X Subjects

1.0- 0.71 1.42- 94 6 1


1.0 1.42- 0.71- 82 7 2
0.0 0.00 0.00 90 8 3
1.0 1.42 0.71 98 9 4
1.0- 0.71- 1.42 86 10 5

Mean X= 8 Mean Y= 90 ∑ zXzY= 0.00


SD= 1.41 SD= 5.66
0.00/5=0.00

12/07/21 Dr Tarek Amin 18


No Correlation

100
98
96
94
Y score

92
90
88
86
84
82
80
0 5 10 15
X score
Kass et al., 1991
Five variables were included, smoking history in
ordinal, scored from 0 to 2 (0=never, 1= quit, 2= still smoking),
depressed state of mind is also ordinal ranging from 1
(rarely) to 4 (routinely); overall state of health is a 10 points
rating (1= very ill to 10 = very healthy); quality of life in the
past 6 months is a 6 points scale (1= very dissatisfied, to 6=
.extremely happy)

The total score on the Inventory of Positive


.Psychological Attitude (IPPA) ranges from 30 to 210
• Correlation coefficient was calculated to
draw the following conclusions regard
smoking behavior and the quality of life
among the included sample (a 95 % level
of significance was selected).

12/07/21 Dr Tarek Amin 21


Total IPPA Quality of life Overall state Depressed Smoking History
score of health state of mind
Smoking History
Pearson r
Sig.(2 tailed)
.No
Depressed state of mind
.227* Pearson
.000 r
442 Sig.(2 tailed)
.No
Overall state of health
-.409* .200* Pearson r
.000 .000 Sig.(2 tailed)
444 441 .No

.437** -.513** -.102* Quality of life


.000 .000 .033 Pearson r
420 443 440 Sig.(2 tailed)
.No
Total IPPA score
.599** .457** -.674** -.147** Pearson r
.000 .000 .000 .000 Sig.(2 tailed)
419 420 421 418 .No
 Because the means and standard deviations of any
given two sets of variables are different, we
cannot directly compare the two scores.
 However, we can, transform them from the
ordinary absolute figures to z-scores with a mean
of 0 and SD of 1.
 The correlation is the mean of the cross-products
of the z-score for each value included, a measure
of how much each pair of observations (scores)
varies together.

12/07/21 Dr Tarek Amin 23


Strength of the Correlation Coefficient
?How large r should for it to be useful
In decision making at least 0.95 while those concerning
.human behaviors 0.5 is fair
:The strengths of r are as follow
.little if any 0.00-0.25
LOW 0.49- 0.26
Moderate 0.69 -0.50
High 0.89 - 0.70
. Very high 1.00 – 0.90

The direction of the relationship does not affect


the strength of the relationship: a correlation
of -.90 is just high, or just as strong, as one of
.+ .90
Significance of the Correlation
The level of statistical significance is greatly
.affected by the sample size n
If r is based on a sample of 1,000, there is much greater
likelihood that it represents the r of the population
(minimum random variation) than if it were based on
.10
With a two-tailed test and a sample of 100, r= 0.20 is
statistically significant at the 0.05 level, but with a
sample of 10, the correlation must be high (0.632 or
.more) to be significant
With large sample sizes rs that are described as ‘
demonstrating (little if any) relationship are
’statistically significant
Statistical significance implies that
r did not occur by chance, the
.relationship is greater than zero

12/07/21 Dr Tarek Amin 26


,The following table is SPSS output describing the correlation between age, education in years
smoking history, satisfaction with the current weight, and the overall state of health for a randomly
.selected subjects

Subject's Education in Smoking Satisfaction with Overall state


age years history current weight of health
Subject's age
Pearson Correlation
Sig.(2 tailed)
N
Education in years .022
Pearson Correlation .649
Sig.(2 tailed) 419
N
Smoking history .143** -.108*
Pearson Correlation .003 .026
Sig.(2 tailed) 432 423
N
Satisfaction with current weight -.077 .033 -.009
Pearson Correlation .109 .493 .849
Sig.(2 tailed) 432 424 440
N

Overall state of health -.126** .149** -.200* .370*


Pearson Correlation .009 .000 .000 .000
Sig.(2 tailed) 433 425 441 443
N

.Correlation is significant at the 0.05 level (2-tailed) *


** Correlation is significant at the 0.01 level (2-tailed).
Spearman’s rank-order correlation
coefficient
 Used when ordinal data have a wide range of
possible scores and collapsing of such data is not
possible. (more than 5 categories are included).
 Where we have two ordinal scales with a large
number of values, or one ordinal and one
interval/ratio.
Spearman’s rho or Spearman’s rank-
order correlation coefficient is
.indicated
Example
A physiotherapist uses a new treatment on a
group of patients and is interested in whether
their ages affect their ability to respond to
.treatment
Each patient is given a mobility score out of 15,
according to his or her ability to perform
.certain tasks

12/07/21 Dr Tarek Amin 29


Age and mobility score with
rankings
Ranking on Mobility Ranking on Age Patient
mobility age
15 14 1 23 1
16 15 2 25 2
13 12 3 28 3
5 8 4 30 4
14 13 5 35 5
10 10 6 37 6
12 11 7 38 7
5 8 8 39 8
10 10 9 40 9
7.5 9 10 41 10
10 10 11 45 11
7.5 9 12 50 12
3 7 13 52 13
5 8 14 55 14
1 4 15 60 15
2 6 16 62 16
Calculating the value the difference
in rank for each person
D2 Rank Ranking on Ranking on age Patient
difference D mobility
196 14=1-15 15 1 1
196 14=2-16 16 2 2
100 10=3-13 13 3 3
1 1- 5 4 4
81 9- 14 5 5
16 4- 10 6 6
25 5- 12 7 7
9 3 5 8 8
1 1- 10 9 9
6.25 2.5 7.5 10 10
1 1 10 11 11
20.25 4.5 7.5 12 12
100 10 3 13 13
81 9 5 14 14
196 14 1 15 15
196 14 2 16 16

D2 =1225.5∑
.Calculating Spearman’s rho

6 D 2

rs  1 

n n 1 2

0.8- =)16x16-1(16/)1225.5(1-6 =

12/07/21 Dr Tarek Amin 32


Thank you

12/07/21 Dr Tarek Amin 33

Potrebbero piacerti anche