Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Unit 12 Simple Correlation & Regression
Structure
12.1 Introduction
Objectives
12.2 Correlation
12.2.1 Causation and Correlation
12.2.2 Types of Correlation
12.3 Measures of Correlation
12.3.1 Scatter Diagram
12.3.2 Karl Pearson’s Correlation Coefficient
12.3.3 Properties of Karl Pearson’s Correlation Coefficient
12.3.4 Factors Influencing the Size of Correlation Coefficient
12.4 Problems
12.5 Probable Error
12.6 Spearman’s Rank Correlation Coefficient
12.7 Partial Correlation
12.8 Multiple Correlation
12.9 Regression
12.9.1 Regression Analysis
12.9.2 Regression Lines
12.9.3 About Regression Coefficient
12.9.4 Differences Between Correlation Coefficient and Regression Coefficient
12.9.5 Examples
12.10 Standard Error Of Estimate
12.11 Multiple Regression Analysis
12.12 Reliability of Estimates
12.13 Application of Multiple Regression
Self Assessment Questions
12.14 Summary
Terminal Questions
Answer to SAQ’s and TQ’s
Sikkim Manipal University 180
Statistics For Management Unit 12
12.1 Introduction
Both correlation and regression are used to measure the strength of relationships between
variables.
The following statistical tools measure the relationship between the variable analyzed in social
science research.
1. Correlation
a. Simple correlation – Here the relationship between two variables are studied.
b. Partial correlation – Here the relationship of any two variables are studied, keeping all
others constant.
c. Multiple correlation – Here the relationship between variables are studied simultaneously.
2. Regression
a. Simple regression
b. Multiple regression
3. Association of Attributes
Correlation measures the relationship (positive or negative, perfect) between the two variables.
Regression analysis considers relationship between variables and estimates the value of another
variable, having the value of one variable. Association of Attributes attempts to ascertain the
extent of association between two variables.
Learning Objectives
In this unit students will learn about
1. Simple, partial & multiple correlation
2. Parametric and non parametric measures of correlation
The method of estimating unknown values from known values through regression equations
12.2 Correlation
When two or more variables move in sympathy with other, then they are said to be correlated. If
both variables move in the same direction then they are said to be positively correlated. If the
variables move in opposite direction then they are said to be negatively correlated. If they move
haphazardly then there is no correlation between them.
Correlation analysis deals with
1) Measuring the relationship between variables.
2) Testing the relationship for its significance.
3) Giving confidence interval for population correlation measure.
Sikkim Manipal University 181
Statistics For Management Unit 12
12.2.1 Causation and Correlation
The correlation between two variables may be due to the following causes,
i) Due to small sample sizes. Correlation may be present in sample and not in population.
ii) Due to a third factor. Correlation between yield of rice and tea may be due to a third factor
“rain”
12.2.2 Types of Correlation
Types of correlation are given below
a. Positive or Negative
b. Simple, Partial and Multiple
c. Linear and Nonlinear
Positive correlation: Both the variables (X and Y) will vary in the same direction. If variable X
increases, variable Y also will increase; if variable X decreases, variable Y also will decrease.
Negative Correlation: The given variables will vary in opposite direction. If one variable
increases, other variable will decrease.
Simple, Partial and Multiple correlations: In simple correlation, relationship between two variables
are studied. In partial and multiple correlations three or more variables are studied. Three or
more variables are simultaneously studied in multiple correlations. In partial correlation more
than two variables are studied, but the effect on one variable is kept constant and relationship
between other two variables is studied.
Linear and NonLinear correlation: It depends upon the constancy of the ratio of change between
the variables. In linear correlation the percentage change in one variable will be equal to the
percentage change in another variable. It is not so in non linear correlation.
12.3 Measures of correlation
i) Scatter Diagram.
ii) Karl Pearson’s correlation coefficient.
iii) Spearman’s Rank correlation coefficient.
12.3.1 Scatter Diagram
The ordered pair of observed values are plotted on x y plane as dots. Therefore it is also known
as Dot Diagram. It is diagrammatic representation of relationship.
Sikkim Manipal University 182
Statistics For Management Unit 12
If the dots lie exactly on a straight line that runs form left bottom to right top, then the variables are
said to be perfectly positively correlated (fig.i).
If the dots lie close to a straight line that runs from left bottom to right top, then the variables are
said to be positively correlated (fig.ii).
If the dots lie exactly on a straight line that runs from left top to right bottom then the variables are
said to be perfectly negatively correlated (fig iii).
If the dots lie very close to a straight line that runs from left top to right bottom then the variables
are said to be negatively correlated (fig iv).
If the dots lie all over the graph paper then the variables have zero correlation (fig v).
Y Y
Y
0 0 X 0 X
X ii iii
i
Y Y
0 X 0 X
v
iv
Scatter diagram tells us the direction in which they are related and does not give any quantitative
measures for comparison between sets of data.
12.3.2 Karl Pearson’s Correlation Coefficient
It is defined as
i) åxy
r = Nsxsy (A)
Where x =X – X y = Y Y
Sikkim Manipal University 183
Statistics For Management Unit 12
å (x – x) 2
sx 2 = n
å (y – y) 2
sy 2 = n
n – number of paired observations åxy / N is called covariance of x and y. the other forms of
this formula are
ii. å xy
r = Ö(åx 2 ) (åy 2 ) (B)
. Nå XY åXåY
r = NåX 2 – (åX 2 ) 1/2 NåY 2 (åY 2 ) ½ (C)
. Nå dx dy ådx dy
r = Nådx 2 (ådx 2 ) 1/2 Nådy 2 (ådy 2 ) ½ (D)
For all practical purpose we can conveniently use form D. Whenever summary information is
given choose proper form from A to C.
12.3.3 Properties Of Karl Pearson’s Correlation Coefficient.
§ Its value always lies between – 1 and 1.
§ It is not affected by change of origin or change of scale.
§ It is a relative measure (does not have any unit attached to it)
12.3.4 Factors influencing The size of Correlation Coefficient
The size of r is very much dependent upon the variability of measured values in the correlation
sample. The greater the variability, the higher will be the correlation, everything else being equal.
The size of r is altered when researchers select extreme groups of subjects in order to compare
these groups with respect to certain behaviors. Selecting extreme groups on one variable
increases the size of r over what would be obtained with more random sampling.
Combining two groups which differ in their mean values on one of the variables is not likely to
faithfully represent the true situation as far as the correlation is concerned.
Addition of an extreme case (and conversely dropping of an extreme case) can lead to changes
in the amount of correlation. Dropping of such a case leads to reduction in the correlation while
the converse is also true. (Source: Aggarwal.Y.P, Statistical Methods, Sterling Publishers Pvt
Ltd., New Delhi, 1998, p.131).
Sikkim Manipal University 184
Statistics For Management Unit 12
12.4 Problems
Example 1: Find Karl Pearson’s Correlation Coefficient, given
X 20 16 12 8 4
Y 22 14 4 12 8
Applying the formula for r and substituting the respective values from the above table we get r
as:
. Nå XY åXåY
r = NåX 2 – (åX 2 ) 1/2 NåY 2 (åY 2 ) ½
. 5(840 – (60)(60)
r = Ö5(880) – (60) 2 Ö5(904) – (30) 2 = 0.70
r = 0.70
Example 2: Calculate Karl Pearson Coefficient of Correlation from the following data:
Year 1985 1986 1987 1988 1989 1990 1991 1992
Index of Production 100 102 104 107 105 112 103 99
Number of 15 12 13 11 12 12 19 26
unemployed
Sikkim Manipal University 185
Statistics For Management Unit 12
Solution:
X = 104 Y = 15
å xy 92
r = Ö(åx 2 ) (åy 2 ) = r =
Ö120 x 184 = 0.619
Therefore a correlation between production and unemployed is negative.
Example 3: Calculate Correlation Coefficient from the following data:
X 50 60 58 47 49 33 65 43 46 68
Y 48 65 50 48 55 58 63 48 50 70
Solution:
Sikkim Manipal University 186
Statistics For Management Unit 12
Using the formula for calculating r as
. Nå dx dy ådx dy
r = Nådx 2 (ådx 2 ) ½ Nådy 2 (ådy 2 ) ½
And substituting values we get r = 0.611
Example 4: In a Bivariate data on x and y variance of x = 49, variance of y = 9 and covariance
(åx,y) = 17.5. Find coefficient of correlation between x and y.
Solution: we know
åxy
r = Nsxsy
Given åxy
N
= 17.5
Example 5: Ten observation in Weight (x) and Height (y) of a particular age group gave the
following data.
åx = 56 åy = 138 åx 2 = 1357 åy 2 = 2136 åxy = 836
Find “r”
Solution: we know
. Nå xy åxåy
r = Nåx 2 – (åx 2 ) 1/2 Nåy 2 (åy 2 ) ½
Sikkim Manipal University 187
Statistics For Management Unit 12
12.5 Probable Error
It measures the extent to which correlation coefficient is dependable. It is an old measure of
testing the reliability of “r”. It is given by
P.E = (0.6475) [1 – r 2 ] / Ön
Where “r” is measured from sample of size n.
It is used to
i) interpret the value of r
a) If r < P.E, then it not at all significant.
b) If r > 6 P.E, then “r” is highly significance.
c) If P.E < r < 6 P.E, we can not say anything about the significance of “r”
ii) Construct confidence limits within which population “P” is expected to lie.
Conditions under which P.E can be used.
1. Samples should be drawn from a normal population.
2. The value of “r” must be determined from sample values.
3. Samples must have been selected at random
Example 6
If r = 0.6 and N = 64, a) Interpret ‘r’ b) find the limits within which ‘r’ is suppose to lie.
Solution:
1 – (0.6) 2
P.E = 0.6745
64
= 0.054
a) 6 P.E = 6 x 0.054 = 0.324
Since r (0.6) > 6 P.E
It is highly significant
b) Limits for population “r”
= 0.6 ± 0.054
Sikkim Manipal University 188
Statistics For Management Unit 12
= 0.546 – 0.654
12.6 Spearman’s Rank Correlation Coefficient
Karl Pearson’s correlation coefficient assumes that
i) Samples are drawn from a normal population.
ii) The variables under study are affected by a large number of independent causes so
as to form a normal distribution. When we do not know the shape of population
distribution and when the data is qualitative type Spearman’s Ranks correlation
coefficient is used to measure relationship.
It is defined as
6åD 2 Where
r = 1 N 3 – N
Example 7: In a singing competition, two judges assigned the following ranks for 7 candidates.
Find Spearman’s rank correlation coefficient.
Competitor 1 2 3 4 5 6 7
Judge I 5 6 4 3 2 7 1
Judge II 6 4 5 1 2 7 3
Solution:
Competitor R1 (Judge 1) R2 (Judge 2) D = R1 – R2 D 2
1 5 6 1 1
2 6 4 2 4
3 4 5 1 1
4 3 1 2 4
5 2 2 0 0
Sikkim Manipal University 189
Statistics For Management Unit 12
6 7 7 0 0
7 1 3 2 4
13
6 x 13 6 x 13
r = 1 = 1
7(7 2 – 1) 7 x 48
= 0.768
Example 8: Rank Difference Coefficient of Correlation (Case of No Ties)
Difference
Score on Score on Rank Of Rank on Difference
Student between
Test I Test II Test I Test II squared
Ranks
X Y R1 R2 D D 2
A 16 8 2 5 3 9
B 14 14 3 3 0 0
C 18 12 1 4 3 9
D 10 16 4 2 2 4
E 2 20 5 1 4 16
N = 5 åD 2 = 38
Applying the formula of Regulations we get
6åD 2 6(38)
r= 1 N 3 – N = 1 5 3 – 5 1 – 9 = 0.9
Relation between x and y is very high and inverse.
Relationship between score on Test I & II is very high and inverse.
iii) Where ranks are repeated
Example 9: The sales statistics of 6 sales representatives in two different localities. Find whether
there is a relationship between buying habits of the people in the localities.
Representative 1 2 3 4 5 6
Locality I 70 40 65 110 60 20
Locality II 70 30 80 100 90 20
Solution:
Representative Sales in Locality I Sales in locality II R2 D = R1R2 D 2
R1
1 2 4 2 4
2 5 5 0 0
3 3 3 0 0
4 1 1 0 0
Sikkim Manipal University 190
Statistics For Management Unit 12
5 4 2 2 4
6 6 6 0 0
0 8
6 x 8 8
r = 1 6 x (6 2 – 1) = 1 = 0.7714
35
There is high positive correlation between buying habits of the locality people.
iii When Ranks are repeated
Example 10
Find rank correlation coefficient for the following data.
Student A B C D E F G H I J
Score on
20 30 22 28 32 40 20 16 14 18
Test I
Score on
32 32 48 36 44 48 28 20 24 28
Test II
Difference
Score on Score on Rank Of Rank on Difference
Student between
Test I Test II Test I Test II squared
Ranks
X Y R1 R2 D D2
A 20 32 6.5 5.5 0 1.00
B 30 32 3 5.5 2.5 6.25
C 22 48 5 1.5 3.5 12.25
D 28 36 4 4 0 0
E 32 44 2 3 1.0 1.00
F 40 48 1 1.5 0.5 0.25
G 20 28 6.5 7.5 1.0 1.00
H 16 20 9 10 1.0 1.00
I 14 24 10 9 1.0 1.00
J 18 28 8 7.5 0.5 0.25
N = 10 åD 2 = 24
Sikkim Manipal University 191
Statistics For Management Unit 12
[144 + 0.5 + 0.5 + 0.5 + 0.5]
r= 1 10 x 99
146
= 1 = 0.8525
10 x 99
Testing of Correlation
“t” test is used to test correlation coefficient. Height and weight of a random sample of six adults
Height (cm) 170 175 176 178 183 185
Weight (Kg) 57 64 70 76 71 82
It is reasonable to assume that these variables are normally distributed, so the Karl Pearson
Correlation coefficient is the appropriate measure of the degree of association between height
and weight. R = 0.875
Hypothesis test for Pearson’s population correlation coefficient
Ho:r = 0 This implies no correlation between the variables in the population
H1: r > 0 This implies that there is positive correlation in the population (increasing height is
associated with increasing weight) 5% significance level is taken
. n - 2 6 - 2
Statistic “t” test = = 0.875 = 3.61
1 – r 2 1 – 0.875 2
Table value of 5% significance level and 4 degree of freedom (62) = 2.132.
Since the calculated value is more than the table value. Null hypothesis is rejected. There is
significant positive correlation between height and weight.
12.7. Partial Correlation
Partial Correlation is used in a situation where three and four variables involved. Three variables
such as age, height and weight. Correlation between height and weight can be computed by
keeping age constant. Age may be the important factor influencing the strength of relationship
between height and weight. Partial Correlation is used to keep constant the effect of age. The
effect of one variable is partialled out from the correlation between other two variables. This
statistical technique is known as partial correlation.
Correlation between variables x and y is denoted as rxy
Partial Correlation is denoted by the symbol r12.3. Here correlation between variable 1 and 2
keeping 3 rd variable constant.
Sikkim Manipal University 192
Statistics For Management Unit 12
12.8. Multiple Correlation
Three or more variables are involved in multiple correlations. The dependent variable is denoted
by X1 and other variables are denoted by X2, X3 etc. Gupta S.P, has expressed that “the
coefficient of multiple linear correlation is represented by R1 and it is common to add subscripts
designating the variables involved. Thus R1.234 would represent the coefficient of multiple linear
correlations between X1 on the one hand X2, X3 and X4 on the other. The subscript of the
dependent variable is always to the left of the point:
The coefficient of multiple correlations for r12, r13 and r23 can be expressed
R1.23 = Ör12 2 + r13 2 2 r12 r13 r23 / 1 – r23 2
R2.13 = Ör12 2 + r23 2 2 r12 r13 r23 / 1 – r13 2
R3.12 = Ör12 2 + r23 2 2 r12 r13 r23 / 1 – r12 2
Sikkim Manipal University 193
Statistics For Management Unit 12
Coefficient of multiple correlations for R1.23 is the same as R1.32
A coefficient of multiple correlation lies between 0 and 1. If the coefficient of multiple correlation
is 1, it shows that the correlation is prefect. If it is 0, it shows that there is no linear relationship
between the variables. The coefficient of multiple correlation are always positive in sign and
range from 0 to + 1.
Coefficient of multiple determination can be obtained by squaring R1.23. Alternative formula for
computing R1.23 is:
R1.23 = Ör12 2 + r13.2 2 (1 – r12 2 ) or
R 2 1.23 = r12 2 + r13.2 2 (1 – r12 2 )
Similarly alternative formulas for R1.24 and R1.34 can be computed
The following formula can be used to determine a multiple correlation coefficient with three
independent variables.
R1.24 = Ö1 – (1 r14 2 ) (r13.4 2 ) (1 r12.34 2 )
Multiple correlation analysis measures the relationship between the given variables. In this
analysis the degree of association between one variable considered as the dependent variable
and a group of other variables considered as the independent variables.
Example 11: The following zero order correlation coefficients are given
r12 = 0.98; r13 = 0.44 r23 = 0.54
Calculate multiple correlation coefficient treating first variable as dependent and second and third
variables as independent. (source: Gupta S.P, Statistical Method)
Solution:
First variable is dependent.
Second and third variables are independent.
Using the formula for multiple correlation coefficient for R1.23 we get:
State whether the following are True/False
1. Scatter diagram does not give us a quantitative measure of correlation coefficient.
2. Correlation studies estimate the values of one variable from the knowledge of the other.
3. Correlation coefficient is an absolute measure.
4. Correlation coefficient is a geometric mean between regression coefficients.
Sikkim Manipal University 194
Statistics For Management Unit 12
5. The regression lines pass through (x,y).
6. byx = r.
7. The higher the angle between regression coefficients, the lower is the correlation
coefficient.
8. The correlation studied between height and weight, keeping age as constant.
12.9. Regression
Regression is defined as, “the measure of the average relationship between two or more
variables in terms of the original units of the data.”
Correlation analysis attempts to study the relationship between the two variables x and y.
Regression analysis attempts to predict the average x for a given y. In Regression it is attempted
to quantify the dependence of one variable on the other. Example: There are two variables x and
y. y depends on x. The dependence is expressed in the form of the equations.
12.9.1 Regression Analysis
Regression Analysis used to estimate the values of the dependent variables from the values of
the independent variables.
Regression analysis is used to get a measure of the error involved while using the regression line
as a basis for estimation.
Regression coefficient is used to calculate correlation coefficient. The square of correlation that
prevails between the given two variables.
12.9.2 Regression Lines
For a set of paired observations there exist two straight lines. The line drawn such that sum of
vertical deviation is zero and sum of their squares is minimum, is called Regression line of y on x.
It is used to estimate y – values for given x – values. The line drawn such that sum of horizontal
deviation is zero and sum of their squares is minimum, is called Regression line of x on y. it is
used to estimate x – values for given y – values. The smaller angle between these lines, higher is
the correlation between the variables. The regression lines always intersect at (X,Y)
The regression lines have equation,
(i) The regression equation of y on x is given by
Y – Y = byx (X – X)
(ii) The regression equation of x on y is given by
X – X = bxy (Y – Y)
Sikkim Manipal University 195
Statistics For Management Unit 12
Where
. Nå dxdy – (ådx) (ådy)
byx = Nådx 2 (ådx 2 )
And Nå dxdy – (ådx) (ådy)
bxy = Nådy 2 (ådy 2 )
The regression equations found by the above conditions is said to fitted by method of least
squares. byx and bxy are called regression coefficients.
12.9.3 About Regression coefficient
byx . bxy = r 2 Þ ± Öbyx . bxy = 1
· byx . bxy £ 1
· If byx is –ve, then bxy is also –ve and r is –ve.
· They can also be expressed as
sy
byx = r.
sx
sx
bxy = r.
sy
· It is an absolute measure.
12.9.4 Differences Between Correlation Coefficient And Regression Coefficient
Correlation Coefficient Regression Coefficient
rxy = ryx byx = bxy
1< r <1 if byx can be greater than one, but bxy must
be less than one such that byx.byx<1
It has no units attached to it It has unit attached to it
There exist nonsense correlation There is no such nonsense regression
It is not based on cause and effect It is based on cause and effect relationship
relationship
It indirectly helps in estimation It is meant for estimation
Sikkim Manipal University 196
Statistics For Management Unit 12
12.9.5 Examples :
Example 11: Find regression equation from the following data
Age of Husband 18 19 20 21 22 23 24 25 26 27
Age of Wife 17 17 18 18 19 19 19 20 21 22
And hence calculate correlation coefficient.
Solution:
225 190
X = = 22.5 Y = = 19
10 10
Regression equation of Y on X is
Y – Y = byx (X – X)
10 x 43 – (5) (0) 430
byx = 10 x 85 – (5) 2 = 825 = 0.521
Þ Y – 19 = 0.521 (X – 22.5)
Þ Y = 0.521X + 7.2775
Regression Equation of X and Y is
10 x 43 – (5) (0) 43
bxy = 10 x 24 – (5) 2 = 24 = 1.392
Þ X – 22.5 = 1.792 (Y – 19)
Þ X= 1.792Y – 11.548
Sikkim Manipal University 197
Statistics For Management Unit 12
Example 12: In a correlation study we have the following data.
Series X Series Y
Mean S.D 65 67
S.D 2.5 3.5
Correlation coefficient 0.8
Find the two regression equations.
Solution:
sy
Y Y = r. (X – X)
sx
3.5
Y – 67 = (0.8) 2.5 (X – 65)
Þ Y – 67 = 1.12 (X – 65)
Þ Y = 1.12 x – 5.8
Regression equation of x and y
sx
X X = r. (Y – Y)
sy
2.5
X – 65 = (0.8) 3.5 Y 67
Þ X – 65 = 0.57 (Y – 67)
Þ X = 0.57 Y + 26.72
12.10. Standard Error of Estimate
The standard error of estimates helps to measure the accuracy of the estimated figures in
regression analysis. If the value of the standard error of estimate is small, it shows that the
estimate provided by the regression equation is better and closer. If standard error of estimate is
zero, it shows that there is no variation about the line and the correlation will be perfect. “The
standard error of estimate uses to ascertain how good and representative the regression line is as
a description of the average relationship between two series:.
The standard error of regression of X values from Xc is:
Sikkim Manipal University 198
Statistics For Management Unit 12
ÖåX 2 aåX båXY
Sx . y = ÖN
Öå(Y Yc) 2
Sx . y = ÖN
Example 13
1. The following results were worked out from scores in Statistics and Mathematics in
a certain examination.
Scores in Statistics (X) Scores in Mathematics (Y)
Mean 40 48
Standard Deviation 10 15
Karl Pearson’s correlation coefficient between x and y is = + 0.42
Find the regression lines x on y and y on x. Use the regression lines to find the value of y when x
= 50 and value of x when y = 30.
Solution:
Given the following data:
X = 40; Y = 40; sx = 10; sy = 15; r = 0.42
The regression line x on y:
Is (X – X ) = r sx / sy (Y – Y)………….(1)
The regression line y on x: is
Is (Y – Y) = r sy / sx (X – X)………….(2)
Therefore substituting the values we get the respective equation as:
X = 0.279y + 26.608……….(3) And
Y = 0.63 x + 22.80…………(4)
Therefore when y = 30; x =35.518 using equation (3) and
When x =50 y = 54.3 by using equation (4)
2. From the following data obtain the two regression equations
X 12 4 20 8 16
Sikkim Manipal University 199
Statistics For Management Unit 12
Estimate Y for X = 15 and estimate X for Y = 20
Solution
X = (12 + 4 + 20 + 8 + 16)/ 5 =12 = mean of X
Y = (18 + 22 + 10 + 16 + 14) / 5 = 16 = mean of Y
X Y X – X Y – Y (X – X) 2 (Y – Y) 2 (X – X) (Y – Y)
X 12 Y 16
12 8 0 2 0 4 0
4 22 8 6 64 36 48
20 10 8 6 64 36 48
8 16 4 0 16 0 0
16 14 4 2 16 4 8
160 80 104
S(X – X) (Y – Y) 104
yx
b = S(X – X) 2 = 160 = 0.65
and
S(X – X) (Y – Y) 104
b xy = S(Y – Y) 2 = 80 = 1.3
Regression equation X on Y is given by
(X – X) = b 1 (Y – Y)
X – 12 = 1.3 (Y – 16)
Therefore X = 32.8 – 1.3Y
When Y = 20 X = 32.8 – 1.3 x 20 = 6.8
Regression equation Y on X is given by
(Y – Y) = b (X – X)
Y – 16 = 0.65 (X – 12)
Therefore Y = 23.8 – 0.65X
When X = 15 Y = 23.8 – 0.65 x 15 = 14.05
Sikkim Manipal University 200
Statistics For Management Unit 12
12.11 Multiple Regression Analysis
Multiple Regression Analysis is an extension of two variable regression analysis. In this analysis,
two or more independent variables are used to estimate the values of a dependent variable,
instead of one independent variable.
The objective of multiple regression analysis are:
· To derive an equation which provides estimates of the dependent variable from
values of the two or more variables independent variables.
· To obtain the measure of the error involved in using the regression equation as a
basis of estimation.
· To obtain a measure of the proportion of variance in the dependent variable
accounted for or explained by the independent variables.
Multiple regression equation explains the average relationship between the given variables and
the relationship is used to estimate the dependent variable. Regression equation refers the
equation for estimating a dependent variable.
Example 14: Estimating dependent variable X1 from the independent variables X2, X 3…………..
It is known as regression equation of X1 on X2, X3…………..
Regression equation, when three variables are involved, is given below:
X1.23 = a1.23 + b1.23 X2 + b13.2 X3
Where X1.23 = estimated value of the dependent variable
X2 and X3 = independent variables.
a1.23 = (Constant) the intercept made by the regression plan. It gives the value
of the dependent variable, when all the independent variables assume a
value equal to zero.
b1.23 and b13.2 = Partial regression coefficients or net regression coefficients.
b1.23 = measures the amount by which a unit change in X2 is expected to affect
X1 when X3 is held constant.
Deviations Taken From Actual Means
X1.23 = b1.23 X2 + b13.2 X3
X1 = (X1 – X1)
X2 = (X 2 – X2)
X3 = (X3 – X3)
b1.23 and b13.2 can be obtained by solving the following equations.
SX 1X2 = b1.23 X2 2 + b13.2 X2 X3
SX 1X2 = b1.23 SX2 X3 + b13.2 SX3
Sikkim Manipal University 201
Statistics For Management Unit 12
s1.23
b12.3 = r
s3.12
r12 – r13 r23 S1 r12 – r13 r23 S1
(X1 – X1) = 1 – r23 2 S2 (X2 – X2)+ 1 – r23 2 S3 (X3 – X3)
Regression equation of X 3 and X2 and X1 is:
r23 – r13 r12 S3 r13 – r23 r12 S 3
(X3 – X3) = 1 – r23 2 S2 (X2 – X2)+ 1 – r23 2 S1 (X1 – X1)
12.12 Reliability of Estimates
Reliability of estimates test the estimated value obtained by applying regression equation,
whether the estimated value is very close to actual observed value. Standard error uses to
measure the closeness of estimate derived from the regression equation to actual observed
values. The measure of reliability is an average of the deviations of the actual value of non
dependent variable from the estimate from the regression equation. Determining the accuracy of
estimates from the multiple regression is reliability of estimates. It is also known as standard
error of estimate.
Standard error of estimate of X1 on X2 and X3 is given below:
ÖS(X 1 Xlast ) 2
S1.23 = N – 3
Where
S1.23 = Standard error of estimate X1 on X2 and X3
Xlast = Estimate value of X1 as calculated from the regression equations.
12.13 Application of Multiple Regression
Multiple regression can be applied to test the factors such as export elasticity, import elasticity
and structural change (contribution of manufacturing sector towards GDP) influencing over
employment.
Employment is dependent variable. Similarly researchers can attempt to use multiple regression
in their research work appropriately.
Self Assessment Questions
a. Correlation coefficient is a geometric mean between regression coefficients
b. The regression lines pass through (X,Y )
c.byx = r . S.D of X / S.D of Y
Sikkim Manipal University 202
Statistics For Management Unit 12
d. The higher the angle between regression coefficients, the lower is the correlation
coefficient.
12.14 Summary
In this unit we studied the concept of correlation and regression and the different types of
correlation and regression. We saw how regression helps us to study unknown variables with the
help of known variables. It also establishes reliability measure for estimated values.
Terminal Questions
1. Test the significance correlation for the values based on the number of observations i) 10
ii)100 and r is 0.4 an d0.9.
2. The following table gives marks obtained by 10 students in commerce and statistics.
Calculate the rank correlation
Marks in Statistics 35 90 70 40 95 45 60 85 80 50
Marks in Commerce 45 70 65 30 90 40 50 75 85 60
3. Calculate the Spearman’s rank correlation coefficient between the series A and B given
below:
Series A 57 59 62 63 64 65 55 58 57
Series B 113 117 126 126 130 129 111 116 112
4. Obtain the two lines of regression and its estimate the blood pressure when age is 50 yrs.
Age (X) in yrs 56 42 72 39 63 47 52 49 40 42 68 60
B P (Y) 127 112 140 118 129 116 130 125 115 120 135 133
5. The following results were worked out from scores in statistics and mathematics in a
certain examination.
Scores in Statistics (X) Scores in Mathematics (Y)
Mean 39.5 47.5
Standard Deviation 10.8 17.8
Sikkim Manipal University 203
Statistics For Management Unit 12
Karl Pearson’s correlation coefficient between X and Y = 0.42. Find both the regression lines.
Use these lines to estimate the value of Y when X = 50 and the value of X when Y = 30.
Answers To Self Assessment Questions
Reference 13.10
1) True 2) False 3) False 4)True
Reference 13.18
1) True 2) True 3) False 4)True
Answers To Terminal Questions
1. i) Non significant ii, iii, iv) Highly significant
2. 0.903
3. 0.967
4. X = 95 + 1.184
Y = 87.2 + 0.724
5. X = 27.62 + 0.25Y
Y = 20.24 + 0.69X
Sikkim Manipal University 204