Sweet Biscuits, Snack Bars and Fruit Snacks in India Analysis

Multiple Regression
The Multiple Regression Model and the

Least Squares Point Estimate
 Simple linear regression used one independent
variable to explain the dependent variable
◦ Some relationships are too complex to be described
using a single independent variable
 Multiple regression uses two or more independent
variables to describe the dependent variable
◦ This allows multiple regression models to handle
more complex situations
◦ There is no limit to the number of independent
variables a model can use
 Multiple regression has only one dependent variable
15-2
The Multiple Regression Model
 The linear regression model relating y to x1,
x2,…, xk is y = β0 + β1x1 + β2x2 +…+ βkxk + 
 µy = β0 + β1x1 + β2x2 +…+ βkxk is the mean
value of the dependent variable y when the
values of the independent variables are
x1, x2,…, xk
 β0, β1, β2,… βk are unknown the regression
parameters relating the mean value of y to x1,
x2,…, xk
  is an error term that describes the effects on y
of all factors other than the independent
variables x1, x2,…, xk
15-3
The Least Squares Estimates and Point
Estimation and Prediction
 Estimation/prediction equation
ŷ = b0 + b1x01 + b2x02 + … + bkx0k
is the point estimate of the mean value of the
dependent variable when the values of the
independent variables are x1, x2,…, xk
 It is also the point prediction of an individual value of
the dependent variable when the values of the
independent variables are x1, x2,…, xk
 b0, b1, b2,…, bk are the least squares point estimates of
the parameters β0, β1, β2,…, βk
 x01, x02,…, x0k are specified values of the independent
predictor variables x1, x2,…, xk
 Will use software to find the model parameters
15-4
LO15-1
EXAMPLE 15.1 The Tasty Sub Shop
Case
Figure 15.4 (Part) 15-5

LO15-2: Explain the
assumptions behind
multiple regression and
calculate the standard
error.
15.2 Model Assumptions and the
Standard Error
 The model is
y = β0 + β1x1 + β2x2 + … + βkxk + 
 Assumptions for multiple regression are

stated about the model error terms, ’s
15-6
LO15-2
The Regression Model Assumptions

Continued
1. Mean of Zero Assumption

The mean of the error terms is equal to 0
2. Constant Variance Assumption
The variance of the error terms σ2 is, the same for
every combination values of x1, x2,…, xk
3. Normality Assumption
The error terms follow a normal distribution for
every combination values of x1, x2,…, xk
4. Independence Assumption
The values of the error terms are statistically
independent of each other
15-7
LO15-2
Sum of Squares
 Sum of squared errors
SSE   ei2   ( yi  yˆ i ) 2
 Mean squared error

◦ Point estimate of the residual variance σ2
◦ This formula is slightly different from simple regression
SSE
s 2  MSE 
n-k  1
 Standard error
◦ Point estimate of the residual standard deviation σ
◦ This formula too is slightly different from simple
regression
SSE
s  MSE 
n-k  1
15-8
LO15-3: Calculate and
interpret the multiple
and adjusted multiple
coefficients of
determination.
15.3 R2 and Adjusted R2
1. Total variation is given by the formula
Σ(yi - ȳ)2
2. Explained variation is given by the formula
Σ(ŷi - ȳ)2
3. Unexplained variation is given by the formula
Σ(yi - ŷi)2
4. Total variation is the sum of explained and
unexplained variation
This section can be covered

anytime after reading Section 15.1 15-9
LO15-3
R2 and Adjusted R2 Continued

5. The multiple coefficient of determination is
the ratio of explained variation to total
variation
6. R2 is the proportion of the total variation
that is explained by the overall regression
model
7. Multiple correlation coefficient R is the
square root of R2
15-10
LO15-3
Multiple Correlation Coefficient R

 The multiple correlation coefficient R is just
the square root of R2
 With simple linear regression, r would take
on the sign of b1
 There are multiple bi’s with multiple
regression
 For this reason, R is always positive
 To interpret the direction of the relationship
between the x’s and y, you must look to the
sign of the appropriate bi coefficient
15-11
LO15-3
The Adjusted R2
 Adding an independent variable to multiple
regression will raise R2
 R2 will rise slightly even if the new variable
has no relationship to y
 The adjusted R2 corrects this tendency in R2
 As a result, it gives a better estimate of the
importance of the independent variables
 The adjusted multiple coefficient of
determination is k  n 1 
 2 
R   R  
2

 n  1  n  (k  1) 
15-12
LO15-4: Test the
significance of a
multiple regression
model by using an F
test. 15.4 The Overall F Test
 To test
H0: β1= β2 = …= βk = 0 versus
Ha: At least one of β1, β2,…, βk ≠ 0
 The test statistic is
(Explained variation )/k
F(model) 
(Unexplain ed variation )/[n - (k  1)]
 Reject H0 in favor of Ha if F(model) > F* or

p-value < 
 *F is based on k numerator and n-(k+1)

denominator degrees of freedom
15-13
LO15-5: Test the
significance of a single
independent variable.
15.5 Testing the Significance of an
Independent Variable
 A variable in a multiple regression model is
not likely to be useful unless there is a
significant relationship between it and y
 To test significance, we use the null
hypothesis H0: βj = 0
 Versus the alternative hypothesis
Ha: βj ≠ 0
15-14
LO15-5
Testing Significance of an Independent
Variable #2
Alternative Reject H0 If p-Value
Ha: βj > 0 t > tα Area under t distribution

right of t
Ha: βj < 0 t < –tα Area under t distribution left

of t
Ha: βj ≠ 0 |t| > t/2* Twice area under t

distribution right of |t|
* That is t > t/2 or t < –t/2
15-15
LO15-5
Variable #3
 Test Statistics
bj
t=
sbj
 100(1-)% Confidence Interval for βj
[b1 ± t/2 Sbj]
 t, t/2 and p-values are based on
n-(k+1) degrees of freedom
15-16
LO15-5
Variable #4
 It is customary to test the significance of every
independent variable in a regression model
 If we can reject H0: βj = 0 at the 0.05 level of
significance, we have strong evidence the
independent variable xj is significantly related to
y
 If we can reject H0: βj = 0 at the 0.01 level of
significance, we have very strong evidence that
the independent variable xj is significantly
related to y
 The smaller the significance level  at which H0
can be rejected, the stronger the evidence that xj
is significantly related to y
15-17
LO15-5
A Confidence Interval for the
Regression Parameter βj
 If the regression assumptions hold,
100(1-)% confidence interval for βj
is [b1 ± t/2 Sbj]
 t/2 is based on n – (k + 1) degrees of
freedom
15-18
LO15-6: Find and
interpret a confidence
interval for a mean
value and a prediction
interval for an
15.6 Confidence and Prediction
individual value.
Intervals
 The point on the regression line corresponding to a
particular value of x01, x02,…, x0k, of the
independent variables is
ŷ = b0 + b1x01 + b2x02 + … + bkx0k
 It is unlikely that this value will equal the mean
value of y for these x values
 Therefore, we need to place bounds on how far the
predicted value might be from the actual value
 We can do this by calculating a confidence interval
for the mean value of y and a prediction interval for
an individual value of y
15-19
LO15-6
Distance Value
 Both the confidence interval for the mean
value of y and the prediction interval for an
individual value of y employ a quantity
called the distance value
 With simple regression, we were able to
calculate the distance value fairly easily
 However, for multiple regression, calculating
the distance value requires matrix algebra
 For that reason, we use software
15-20
LO15-6
A Confidence Interval for a Mean
Value of y
 Assume the regression assumptions hold
 The formula for a 100(1-) confidence
interval for the mean value of y is as follows:
[ŷ  t /2 s ( y  yˆ ) ] s ( y  yˆ )  s Distance value
 This is based on n-(k+1) degrees of freedom
15-21
LO15-6
A Prediction Interval for an Individual
Value of y
 Assume the regression assumptions hold
 The formula for a 100(1-) prediction
interval for an individual value of y is as
follows:
[ŷ  t /2 s yˆ ], s yˆ  s 1 + Distance value
 This is based on n-(k+1) degrees of freedom
15-22
15.7 The Sales Representative Case:
Evaluating Employee Performance
yi Yearly sales of the company’s product
x1 Number of months the representative has been
employed
x2 Sales of products in the sales territory
x3 Dollar advertising expenditure in the territory
x4 Weighted average of the company’s market share
in the territory for the previous four years
x5 Change in the company’s market share in the
territory over the previous four years
15-23
Excel Output of a Regression Analysis of
the Sales Representative Performance Data
Figure 15.11 (a) 15-24

LO15-10: Use residual
analysis to check the
assumptions of multiple
regression. 15.11 Residual Analysis in Multiple
Regression
 For an observed value of yi, the residual is
ei = yi - ŷ = yi – (b0 + b1xi1 + … + bkxik)
 If the regression assumptions hold, the

residuals should look like a random sample
from a normal distribution with mean 0 and
variance σ2
15-25
LO15-10
Residual Plots
 Residuals versus each independent variable
 Residuals versus predicted y’s
 Residuals in time order (if the response is a
time series)
Figures 15.32 and 15.33 15-26

Sweet Biscuits, Snack Bars and Fruit Snacks in India Analysis

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Sweet Biscuits, Snack Bars and Fruit Snacks in India Analysis

Caricato da

Copyright:

Formati disponibili

Multiple Regression

The Multiple Regression Model and the

Figure 15.4 (Part) 15-5

y = β0 + β1x1 + β2x2 + … + βkxk + 

 Assumptions for multiple regression are

The Regression Model Assumptions

1. Mean of Zero Assumption

 Mean squared error

This section can be covered

R2 and Adjusted R2 Continued

Multiple Correlation Coefficient R

 Reject H0 in favor of Ha if F(model) > F* or

Ha: βj > 0 t > tα Area under t distribution

Ha: βj < 0 t < –tα Area under t distribution left

Ha: βj ≠ 0 |t| > t/2* Twice area under t

* That is t > t/2 or t < –t/2

[ŷ  t /2 s ( y  yˆ ) ] s ( y  yˆ )  s Distance value

 This is based on n-(k+1) degrees of freedom

[ŷ  t /2 s yˆ ], s yˆ  s 1 + Distance value

 This is based on n-(k+1) degrees of freedom

Figure 15.11 (a) 15-24

ei = yi - ŷ = yi – (b0 + b1xi1 + … + bkxik)

 If the regression assumptions hold, the

Figures 15.32 and 15.33 15-26

Potrebbero piacerti anche