Sei sulla pagina 1di 17

Chapter_Seventeen

Correlation & Regression Analysis

Naresh K. Malhotra
Marketing Research-an applied orientation, 4th ed.
Product moment correlation
Product moment correlation is a statistic is used to summarize the strength of association
between two metric (interval or ratio) variables say X and Y. It is also known as Pearson
Correlation Co-efficient, Simple Correlation, Bivariate Correlation or simply Correlation Co-
efficient. It is proposed by Karl Pearson.
Ex: How strongly are sales related to advertising expenditures?
 X i  X  Y i Y 
n


r i 1

   
n n

 i  i
2 2
Formula: X  X Y  Y
i 1 i 1

The value of r varies between -1 and +1. The value of r is equal-

1.0 means there is no linear relationship between X and Y


2.1 means there is a positive strong relationship between
X and Y
3.-1 means there is a negative strong relationship
between X and Y
Regression Analysis
Regression analysis is a powerful and flexible procedure for analyzing associative relationships between a
metric dependent variable and one or more independent variables. It is concerned with the nature and degree
of association between variables and does not imply or assume any causality. It is used in the following ways:

1. Determine whether the independent variables explain a significant


variation in the dependent variable: Whether a relationship exists
2. Determine how much of the variation in the dependent variable can
be explained by the independent variables: Strength of the
relationship
3. Determine the structure or form of the relationship: The
mathematical equation relating the independent and dependent
variables
4. Predict the values of the dependent variable
5. Construct for other independent variables where evaluating the
contributions of a specific variable or set of variables.
Bivariate Regression
Bivariate regression is a procedure for deriving a mathematical relationship
in the form of an equation between a single metric dependent or criterion
variable and a single metric independent or predictor variable.

Ex: Can the variation in market share be accounted for by the size of the
sales force?

Equation:

Y  β0  β1 X
Bivariate Regression’s process

It is a nine-step process-
Plot the Scatter Diagram

Formulate the general model

Estimate the parameters

Estimate the standardized regression coefficient

Test for significance

Determine the strength & significance of association

Check prediction accuracy

Examine the residuals

Cross validate the model


Bivariate Regression’s process
Step I A scatter diagram or scatter gram is a plot of the values of two
variables for all the cases or observations. Simply, it is a form of
relationship between the variables. It is used to plot the dependent
variable on the vertical axis and the independent variable on the
horizontal axis. In the scatter diagram, independent variable is shown
in the horizontal axis whereas the dependent variable is shown in the
vertical axis. If one variable increases, so does the other, then the
relationship is described as linear or a straight line. The most
commonly used technique for fitting a straight line to a scatter gram is
the least-squares procedure. The technique determines the best-fitting
line by minimizing the square of the vertical distances of all the points
from the line. The best-fitting line is called the regression line. Any
point that does not fall on the regression line is not fully accounted for.
The vertical distance from the point to the line is the error,

ej
Bivariate Regression’s process

Step II In the Bivariate regression model, the general form of a


straight line is:
Y  β 0  β1 X
Where,
Y  dependent or criterion variable
X  Independent or predictor variable
β 0  Intercept of the line
β
1  Slope of the line
But in marketing research, the basic regression model will be-

Y  β 0  β X      β X  e
1 1 n n i
Bivariate Regression’s process

Step III In the most cases, β and β1 are unknown and are estimated from the sample
0
observations using the
predicted value of
equation:
Ŷi  a  bx i
; where
Yi
is the estimated or
. The value of a and b will be found by the following
formula: Ŷi

n n
Number One:   X  X  Y  Y   X Y  nXY
i i i i
b i 1
n
 i 1
n

  X  X  X  nX
2 2 2
i i
i 1 i 1
Number Two:
a  Y - bX
Step IV
Bivariate Regression’s process
Standardization is the process by which the raw data are transformed into new
variables that have a mean of 0 and a variance of 1. When the data are
standardized, the intercept assumes a value of 0. The term beta coefficient or beta
weight is used to denote the standardized regression coefficient is

B yx  B xy  rxy

Step V The statistical significance of the linear relationship between X and Y may be
tested by examining the hypotheses:
H 0 : β1  0
H1 : β1  0
The null hypothesis implies that there is no linear relationship between X and Y.
The alternative hypothesis is that there is a relationship-positive or negative
between X and Y. Typically, a two-tailed test is done. A t statistic with n – 2
degrees of freedom can be used where-

b
t 
SE
b
Bivariate Regression’s process
Step V SE b denotes the standard deviation of b and is called the standard
error. When the calculated value of t is larger than the critical value,
then the null hypothesis is rejected means that there is a significant
linear relationship between dependent & independent variable.

Step VI Here the strength of association is measured by the coefficient of


determination, r2. In Bivariate regression, r2 is the square of the
simple correlation coefficient obtained by correlating the two
variables. The coefficient, r2 varies between 0 and 1. The value of
r2 is calculated by-
SS reg SS y  SS res
r2  
SS y SS y
Bivariate Regression’s process
Step VI Where, n
   Yi  Y 
2
SS y
i 1
n
   Ŷi  Y 
2
SS reg
i 1
n
   Yi  Ŷi 
2
SS res
Another equivalent i test
1 for examining the significance of the linear
relationship between X and Y is the test for the significance of the
coefficient of determination. The hypothesis is-

H0 : R  0
2

Here F statistic is used as (c – 1) and (n – c) is compared2 with the


H1 : R  0
calculated value. If the calculated value is larger than the critical
value then null hypothesis is rejected meaning that there is a
significant relationship between dependent and independent
variable.
Bivariate Regression’s process
Step VII To estimate the accuracy of predicted values, Ŷ , it is useful to calculate the
standard error of estimate,
n

Y  Ŷ 
2
i

Two cases of prediction may arise. The researcher  i want


SEE may 1
to predict the
mean value of y for all the cases with a given value of X, say n - 2 predict the
or
value of Y for a single case. Here predicted value is
X0

Ŷ  a  bX 0

Step Latter
VIII

Step IX Latter
Multiple Regression
Multiple regression involves a single dependent variable and two or more
independent variables. Ex: Can variation in sales be explained in terms of
variation in advertising expenditures, prices and level of distribution? The
general form of the multiple regression model:
Y  β0  β X  β X  β X      β X  e
1 1 2 2 3 3 k k

which is estimated by the following equation:


Ŷ  a  b X  b X  b X       b X
1 1 2 2 3 3 k k
Multiple Regression Process
The steps involved in conducting multiple regression analysis are similar to those
for bivariate regression analysis. The discussion focuses on-

Partial Regression The interpretation of the partial regression coefficient, b1 is that


Coefficients it represents the expected change in Y when X 1 is changed by
one unit but X is held constant or otherwise controlled.
2
Likewise, represents the expected change in Y for a unit
b
change in when is held constant. Thus calling
2 and ,
partial regression
X2 coefficients
X1 is appropriate. In otherbwords,
1 b2
if
and are each changed by one unit, the expected change
in
X1 Y would
X2 be . Multiple regression can not be solved if-
1.Sample size,(bn is b 2 ) smaller than or equal to the number of
1
independent variables, k
2.One independent variable is perfectly correlated with another
Multiple Regression Process
Strength of The strength of association is measured by the square of the
association 2
multiple correlation coefficient, which is alsoR called the coefficient
of multiple determination, where-
SS reg
R 
2

SS y
The multiple correlation coefficient, R, can also be viewed as the
simple correlation coefficient, r, between Y and . Several
characteristics of are- R
2

1.The coefficient of multiple determination,
2
cannot be less than
R
the highest Bivariate, r , of any individual independent variable
2

with the dependent variable.


R
2

2. will be larger when the correlations between the independent


variables are low
3.If the independent variables are statistically
2
R independent
(uncorrelated), then will be the sum of Bivariate r2 of each
2
independent variable Rwith the dependent variable.
4. cannot decrease as more independent variables are added to
the regression equation.
Multiple Regression Process

Step IX A residual is the difference between the observed value of Y


i
Examination of residual and the value predicted by the regression equation, Ŷ . Plotting
the residuals against the independent variablesi provide
evidence of the appropriateness or inappropriateness of using a
linear model. Again, the plot should result in a random pattern.
The residuals should fall randomly with relatively equal
distribution dispersion about 0. They should not display any
tendency to be either positive or negative.
Multiple Regression Process

Step X In testing the significance of the overall regression equation as


Significance testing well as specific partial regression coefficients. The null
hypothesis for the overall test is that the coefficient of multiple
determination in the population, is zero. . This is
R
2
equivalent to the following null hypothesis: H 0 : R
2
 0

H :β  β  β    β 0
The overall test can0 be1conducted
2 3
by using an F statistic kwhere-

2
SS reg k R k
F 
SS res (n - k - 1) 1 - R 2  (n - k - 1)
 

Potrebbero piacerti anche