Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
INTERPRETATION AND
IMPORTANCE
Structure
16.1 Introduction
16.2 Objectives
16.3 The Concept of Correlation
16.4 Co-efficient of Correlation
16.5 Maximum Range of values of Co-efficient of Correlation
16.6 Types of Correlation
16.6.1 Positive, Negative and Zero Correlation
16.6.2 Linear and Curvilinear Correlation
16.7 Methods of Computing Co-efficient of Correlation (Ungrouped Data)
16.7.1 Rank Difference Co-efficient of Correlation
16.7.2 Pearson's Product Moment Co-efficient of Correlation
16.7.3 Pearson's Product Moment Co-efficient of Correlation (when grouped data are given)
16.8 Interpretation of the Co-efficient of Correlation
16.9 Misinterpretation of the Co-efficient of Correlation
16.1 INTRODUCTION .
In the foregoing units we have discussed those statistical measures that we use for a single
variable i.e. the distributions relating to one quantitative variable. Now we shall study the
problem of describing the degree of simultaneous variation of two variables.
The data in which we secure measures of one variable for each individual is called aunivariate
distribution. If we have pairs of measures on.two variables of each individual, the joint
presentation of the two sets of scores is called a bivatiate distribution.
We come across a number of situations involving the study of two or more variables. For
example, consider the scores of five students in mathematics and physics as under :
Students 1 2 3 4 5
Scores in Maths (X) 40 !7 29 36 25
Scores in Physics (Y) 38 16 30 32 24
Here each student has values on two variables X and Y i.e. the scores in Mathematics and
Physics respectively; hence the distribution is called bivariate distribution.
Similarly,' the distr~butioninvolving more than two variables are called multivariate
distributions. In the present unit we will deal with bivariate distribution;. In a bivariate
distribution the pair of scores made by the same set of individuals on two variables are given. 73
Statistical Techniques of Analysis
16.2 OBJECTIVES
After reading this unit, you will be able to:
defifle correlation;
define co-efficient of correlation:
calculate the co-efficient of correlation according to the nature of scores and their
distribution;
/ 0.. 0.
... .
... O<r<+l -1 < r < O
0 0
Positive Negative Zero
correlation cor~lation correlation
Y Y
I / * ~
Perfect
Positive
Correlation
X*
Negative
Correlation
Eig. 16.1 : Scatter Diagrams Showing Varying Degree of Relationship between X and Y.
Consider another situation. First, with increase of one variable, the second variable increases
proportionately upto some point; after that with an increase in the first variable the second
variable starts decreasing. The gaphical representation of the two variable will be a curved
line. Such a relationship between the two variables is termed as the curvilinear correlation.
x-0 0 X
Linear Curvilinear
Correlation Correlation Correlation
I1 a
16.7.1 Rank Difference C6-efficient of Correlation (p)
When the observations or measurements of the bivariate variable is based on the ordinal scale
in the form of ranks, the rank difference co-efficient of correlation is computed by using t k
following formula.
~CD'
p=l-
N(N' - I)
The value of co-efficient of correlation is +.83, This shows ahigh degree of agreement between
the two judges.
Example 2
The following data give the scores of 5 students on tests in Hindi and English respectively.
Compute the correlation between the two series of test scores by Rank Difference Method.
Statistical Techniques of Analysis Table 16.2
The value of co-efficient of correlation between scores in Hindi and English is positive and
moderate.
Step 2 : In column 2 and 3 write scores of each student or individual in test I and test 11.
Step 3 : Take one set of score of column 2 and assign a rank of 1 to the highest score,
which is 9, a rank of 2 to the next highest score which is 8 and so on, till the lowest
score get a rank equal to N; which is 5.
Step 4 : Take the 11set of scores of column 3, and assign the rank 1 to highest score. In the
second set the highest score in 10; hence obtain a rank 1 (see column 5). The next
highest score of B student is 8; hence his rank is 2. The rank of student C is 3, the
rank of E is 4, and the rank of D is 5.
Step 6 : Check the sum of the differences recorded in column 6. It is always zero.
Step 7 : Each difference of ranks of column 6 is squared and recorded in column 7. Get the
sum ED2.
Step 8 : Put the value of N and ED2 in the formula of Spearman's co-efficient of correlation.
Example 3 :The following data give the scores of 10 students on two trials of test with a gap
of 2 weeks in Trial I and Trial 11. Compute the correlation between the scores of
78 two trials by rank difference method :
Table 16.3 Correlation - I c ~Interpretation
and Importance
Students Trial I Trial I1 Rank Rank Difference -
on on R1-R2
Trial I Trial I1
(x) (y) R1 R2 (Dl DZ
A 10 16 6.5 5.5 1.O 1.OO
The same procedure has been followed in respect of scores on trial 11. In this case, ties occur at
three places. Students C and F have the same score and hence obtain the average rank of
(v) = I >. Student Aand B have rank position 5 and 6; hence are assigned 5.5 ( 5
- i
6, rank
i ')
each. Similarly student G and J have been assigned 7.5 ( 7- rank each.
If the values are repeated more than twice, the same procedure can be followed t o assign the
ranks. For example, if threB students get a score of 10, at 5th, 6th and 7th ranks, each one of
, I
79
Statistical Techniques of Analysis
5+6+7
them will be assigned a rank of -- -6.
3
The rest of the steps of procedure followed for calculation of p (rho) are the same as explained
in example 2.
The Speamkan's Rank Order Coefficient of Correlation computation is quicker and easier. It is
an acceptable method if data are available only in ordinal form or number of paired variable is
'more than 5 and not greater than 30 with minimum or a few ties in ranks.
NXXY - (CX)(XY)
r=
4{ m-(zx)~}{NxY
~ ~ -(N)'}
where
Example 4
The scores given below were obtained on an Intelligence Test and Algebra Test by 10 students
of class V'III. Compute Pearson's Coefficient of Correlation.
Table 16.4
Students Scores on Scores on
Intelligence Test Algebra Test
X Y X2 YZ XY
Correlation - Its Interpretation
and Importance
The steps in computing 'r' from ungiouped scores may be outlined thus:
Step 1 : Find the sum of the scores of X and Y variable.
Step 2 : Square each score of X variable and find their sum i.e. C x 2 (Col. 4)
Step 3 : Square each score of Y variable and find their sum i.e. Cy2 (Col. 5)
Step 4 : Multiply the X scores and Y scores in the same rows, and enter these products
in the column XY, i.e. Col. 6; and get the sum of XY i.e. (ZXY)
Step 5 : Put all the values of N, CX, CY, Cx2, Cy2 and ZXY in the formula, and simplify.
Total
125-134 135-144 145-154 155-164 165-174 175-184
fy
64-69 0
60-64 1 1
P ~~~~~~
1 1
55-59 1 11 1 1 5
~~~~~~~
1 2 1 1
50-54 1 1 7
1 1
45-49 1 11 1 4
1 2 1
40-44 1 111 4
1
Total fx 2 5 6 4 1 2 20
When the tallying is completed, we write the number of cases, or the cell frequency. Next we
total the ell frequencies in the rows separately, recording each frequency in the last column
under the heading fy. When this column is filled, we have the total frequency distribution for
the text Y. We also sum the cell frequencies in all the columns, writing sums in the bottom row
headed fx. When completed, this row gives us the total frequency distribution for test X. We
can check the summing of the cell frequencies by adding up the last row and the last column.
Their sums should, of course, both be equal to N-in this case 20.
Phase I1
The product-moment 'r' is computed from the bivariate frequency distribution by the formula
Find the product of fy' and fxlby multiplying the corresponding frequencies of Y1
and XI and get the sums [see column 10 and row 111.
Step 4 : Find the square of fyl and fxl and get their sums i.e. fy'' and fxl' see column 11
and row 12.
Step 5 : In column 12 and row 13 thefy' and fxl are simply repeated for the checking purpose.
For this each cell frequencies are multiplied by their corresponding deviations of
scores from assumed means i.e. XLand Y1.
To fill column 13 and row 14, i.e. to find out the products of fxl andfyl, the deviation
of corresponding each cell - row-wise and column-wise - are multiplied and then
multiplied with the cell frequency, see the table in second row there is only single
frequency viz 1. Corresponding to this cell in the column the XI deviation is 0 and
corresponding to this cell row the deviation Y1 = +3 the product of two deviations is
0 (+3 x 0 = 0). Put this product in the upper right hand corner of the cell and multiply
the product with the cell frequency. The result is again zero which is recorded in the
left hand corner of the cell and seaparated by a curved line. Likewise all the product
values are calculated and arranged in the lower corner of each cell.
Find the sum of the products of each cell fxlyl the value shown in the brackets of the
cell; column-wise and row-wise.
Step 8 : Find the sum of the last column and row and check the sum. You will find both the
sums are equal. If the sums are not equal than there is a mistake in the calculations of
fxlyl hence re-check the calculations.
Step 9 : After getting all the values, put these in the formula and solve.
Thus, i s the given scattergram
r =
mfx' y' - zfx' zfy '
-(~fX)'}[mfx~'
i({rnfX1*
I
- [zfxl1 ))
I
-
- ,
20 x 2 - (2)(3)
j(20 x 39 - (312}120 x 57 - (2)')
=*
34
==
- 34
21.767x33.70
-
- 34
985.8756
r = +,036
84
iI The correlation between the two achievement tests is nearly zero. -
Correlation Its Interpretation
and Importance
-
16.8 INTERPRETATION OF THE CO-EFFICIENT OF
1 CORRELATION
I Merely computation of correlation does not have any significance until and unless we determine
I how large must the coefficient be in order to be significant, and what does correlation tell us
about the data? What do we mean by the obtained value of coefficient of correlation?
TCIhave an answer, generally, the coefficient of correlation is interpreted in verbal description.
I The rule of thumb for interpreting the size of a correlation coefficient is presented below :-
. IN
16.11 IMPORTANCE AND USE OF CORRELATION
EDUCATIONAL MEASUREMENT AND
EVALUATION
Correlation is one of the most widely used analytic procedures in the field of Educational
Measurement and Evaluation. It not only describes the relationship of paired variables, but it
is also useful in:
a prediction of one variable - the dependent variable on the basis of the other variable the
indppendent variable.
a ' determining the reliability and validity of the test or the question paper.
a detkrmining the role of various correlates to a certain ability.
a facltor analysis technique for determining the factor loadings of the underlying variables
in human abilities.
4. The following judgements of the two judges were obtained for five individuals on their
musical ability. Compute Spearman's Correlation and determine the extent of agreement
of the two judges.
Individuals A B C D E
Judge I 2 3 5 4 1
Judge I1 1 5 4 3 2
5. For the following pairs of values, complete the calculation of coefficient of correlation
by rank difference method.
Subjects A B C D E F G . H
Marks in Maths 13 18 15 10 16 12 14 18
I 2.
Marks in Physics 26 22 24 21 22 29 25 23
6. Calculate Rank Difference Correlation from the following achievement scores of the 10
students.
Students 1 2 3 4 5 6 7 8 9 10
Marks in
Sanskrit 15,- 17 21 23 13 17 19 23 25 30
Marks in
Hindi 26 25 24 20 22 23 30 25 21 19
7. Find the Pearson's Coefficient of Correlation between the two sets of scores given below:
I 1. i)
ii)
Correlation is the association or relationship between the variation of two variables.
Coefficient of correlation is a number that tells us the extent to which the two given
variables are related or the change in one variable is accompanied by a change in the
other variable.
iii) A coefficient of correlation can vary from a value of + 1.OO to -1.00, through zero.
2. If the increase or decrease in one variable is followed by corresponding increase or decrease
in the other variable, the two variables are said to be positively correlated. But, if the
increase in one variable is followed by corresponding decrease in the other variable or
vice-versa, the two variables are shid to be negatively correlated.
3. The correlation of +.45 found between shoe sizes and intelligence is low positive
correlation. However there is no logical base for this relationship, so the coefficient of
correlation found between shoe sizes and intelligence of the students of Class IX does not
indicate any cause and effect relationship.
Statistical Techniques of Analysis
16.16 SUGGESTED READINGS '
Agarwal, Y.P. (1990), Statistical Methods-Concepts, Applications and Computation, Sterling
Publishets Pvt. Ltd, New Delhi.
Ferguson, G.A. (1974), "Statistical Analysis in Psychology and Education", McGraw Hill
Book Co., New York.
Garrett, H.E. & Woodworth, R.S. (1 969), Statistics in Psychology and Education, Vakils, Feffer
& Simons Pvt. Ltd., Bombay.
Guilford J.P. & Benjamin F. (1973), Fundamental Statistics in Psychology and Education,
McGraw Hill Book Co., New York.