Sei sulla pagina 1di 26

Multiple Regression

The Multiple Regression Model and the


Least Squares Point Estimate
 Simple linear regression used one independent
variable to explain the dependent variable
◦ Some relationships are too complex to be described
using a single independent variable
 Multiple regression uses two or more independent
variables to describe the dependent variable
◦ This allows multiple regression models to handle
more complex situations
◦ There is no limit to the number of independent
variables a model can use
 Multiple regression has only one dependent variable

15-2
The Multiple Regression Model
 The linear regression model relating y to x1,
x2,…, xk is y = β0 + β1x1 + β2x2 +…+ βkxk + 
 µy = β0 + β1x1 + β2x2 +…+ βkxk is the mean
value of the dependent variable y when the
values of the independent variables are
x1, x2,…, xk
 β0, β1, β2,… βk are unknown the regression
parameters relating the mean value of y to x1,
x2,…, xk
  is an error term that describes the effects on y
of all factors other than the independent
variables x1, x2,…, xk
15-3
The Least Squares Estimates and Point
Estimation and Prediction
 Estimation/prediction equation
ŷ = b0 + b1x01 + b2x02 + … + bkx0k
is the point estimate of the mean value of the
dependent variable when the values of the
independent variables are x1, x2,…, xk
 It is also the point prediction of an individual value of
the dependent variable when the values of the
independent variables are x1, x2,…, xk
 b0, b1, b2,…, bk are the least squares point estimates of
the parameters β0, β1, β2,…, βk
 x01, x02,…, x0k are specified values of the independent
predictor variables x1, x2,…, xk
 Will use software to find the model parameters
15-4
LO15-1
EXAMPLE 15.1 The Tasty Sub Shop
Case

Figure 15.4 (Part) 15-5


LO15-2: Explain the
assumptions behind
multiple regression and
calculate the standard
error.
15.2 Model Assumptions and the
Standard Error
 The model is

y = β0 + β1x1 + β2x2 + … + βkxk + 

 Assumptions for multiple regression are


stated about the model error terms, ’s

15-6
LO15-2

The Regression Model Assumptions


Continued

1. Mean of Zero Assumption


The mean of the error terms is equal to 0
2. Constant Variance Assumption
The variance of the error terms σ2 is, the same for
every combination values of x1, x2,…, xk
3. Normality Assumption
The error terms follow a normal distribution for
every combination values of x1, x2,…, xk
4. Independence Assumption
The values of the error terms are statistically
independent of each other

15-7
LO15-2

Sum of Squares
 Sum of squared errors
SSE   ei2   ( yi  yˆ i ) 2

 Mean squared error


◦ Point estimate of the residual variance σ2
◦ This formula is slightly different from simple regression
SSE
s 2  MSE 
n-k  1
 Standard error
◦ Point estimate of the residual standard deviation σ
◦ This formula too is slightly different from simple
regression
SSE
s  MSE 
n-k  1

15-8
LO15-3: Calculate and
interpret the multiple
and adjusted multiple
coefficients of
determination.
15.3 R2 and Adjusted R2
1. Total variation is given by the formula
Σ(yi - ȳ)2
2. Explained variation is given by the formula
Σ(ŷi - ȳ)2
3. Unexplained variation is given by the formula
Σ(yi - ŷi)2
4. Total variation is the sum of explained and
unexplained variation

This section can be covered


anytime after reading Section 15.1 15-9
LO15-3

R2 and Adjusted R2 Continued


5. The multiple coefficient of determination is
the ratio of explained variation to total
variation
6. R2 is the proportion of the total variation
that is explained by the overall regression
model
7. Multiple correlation coefficient R is the
square root of R2

15-10
LO15-3

Multiple Correlation Coefficient R


 The multiple correlation coefficient R is just
the square root of R2
 With simple linear regression, r would take
on the sign of b1
 There are multiple bi’s with multiple
regression
 For this reason, R is always positive
 To interpret the direction of the relationship
between the x’s and y, you must look to the
sign of the appropriate bi coefficient

15-11
LO15-3

The Adjusted R2
 Adding an independent variable to multiple
regression will raise R2
 R2 will rise slightly even if the new variable
has no relationship to y
 The adjusted R2 corrects this tendency in R2
 As a result, it gives a better estimate of the
importance of the independent variables
 The adjusted multiple coefficient of
determination is k  n 1 
 2 
R   R  
2

 n  1  n  (k  1) 
15-12
LO15-4: Test the
significance of a
multiple regression
model by using an F
test. 15.4 The Overall F Test
 To test
H0: β1= β2 = …= βk = 0 versus
Ha: At least one of β1, β2,…, βk ≠ 0
 The test statistic is
(Explained variation )/k
F(model) 
(Unexplain ed variation )/[n - (k  1)]

 Reject H0 in favor of Ha if F(model) > F* or


p-value < 
 *F is based on k numerator and n-(k+1)

denominator degrees of freedom

15-13
LO15-5: Test the
significance of a single
independent variable.
15.5 Testing the Significance of an
Independent Variable
 A variable in a multiple regression model is
not likely to be useful unless there is a
significant relationship between it and y
 To test significance, we use the null
hypothesis H0: βj = 0
 Versus the alternative hypothesis
Ha: βj ≠ 0

15-14
LO15-5
Testing Significance of an Independent
Variable #2
Alternative Reject H0 If p-Value

Ha: βj > 0 t > tα Area under t distribution


right of t

Ha: βj < 0 t < –tα Area under t distribution left


of t

Ha: βj ≠ 0 |t| > t/2* Twice area under t


distribution right of |t|

* That is t > t/2 or t < –t/2

15-15
LO15-5
Testing Significance of an Independent
Variable #3
 Test Statistics
bj
t=
sbj
 100(1-)% Confidence Interval for βj
[b1 ± t/2 Sbj]
 t, t/2 and p-values are based on
n-(k+1) degrees of freedom

15-16
LO15-5
Testing Significance of an Independent
Variable #4
 It is customary to test the significance of every
independent variable in a regression model
 If we can reject H0: βj = 0 at the 0.05 level of
significance, we have strong evidence the
independent variable xj is significantly related to
y
 If we can reject H0: βj = 0 at the 0.01 level of
significance, we have very strong evidence that
the independent variable xj is significantly
related to y
 The smaller the significance level  at which H0
can be rejected, the stronger the evidence that xj
is significantly related to y
15-17
LO15-5
A Confidence Interval for the
Regression Parameter βj
 If the regression assumptions hold,
100(1-)% confidence interval for βj
is [b1 ± t/2 Sbj]
 t/2 is based on n – (k + 1) degrees of
freedom

15-18
LO15-6: Find and
interpret a confidence
interval for a mean
value and a prediction
interval for an
15.6 Confidence and Prediction
individual value.
Intervals
 The point on the regression line corresponding to a
particular value of x01, x02,…, x0k, of the
independent variables is
ŷ = b0 + b1x01 + b2x02 + … + bkx0k
 It is unlikely that this value will equal the mean
value of y for these x values
 Therefore, we need to place bounds on how far the
predicted value might be from the actual value
 We can do this by calculating a confidence interval
for the mean value of y and a prediction interval for
an individual value of y

15-19
LO15-6

Distance Value
 Both the confidence interval for the mean
value of y and the prediction interval for an
individual value of y employ a quantity
called the distance value
 With simple regression, we were able to
calculate the distance value fairly easily
 However, for multiple regression, calculating
the distance value requires matrix algebra
 For that reason, we use software

15-20
LO15-6
A Confidence Interval for a Mean
Value of y
 Assume the regression assumptions hold
 The formula for a 100(1-) confidence
interval for the mean value of y is as follows:

[ŷ  t /2 s ( y  yˆ ) ] s ( y  yˆ )  s Distance value

 This is based on n-(k+1) degrees of freedom

15-21
LO15-6
A Prediction Interval for an Individual
Value of y
 Assume the regression assumptions hold
 The formula for a 100(1-) prediction
interval for an individual value of y is as
follows:

[ŷ  t /2 s yˆ ], s yˆ  s 1 + Distance value

 This is based on n-(k+1) degrees of freedom

15-22
15.7 The Sales Representative Case:
Evaluating Employee Performance
yi Yearly sales of the company’s product
x1 Number of months the representative has been
employed
x2 Sales of products in the sales territory
x3 Dollar advertising expenditure in the territory
x4 Weighted average of the company’s market share
in the territory for the previous four years
x5 Change in the company’s market share in the
territory over the previous four years

15-23
Excel Output of a Regression Analysis of
the Sales Representative Performance Data

Figure 15.11 (a) 15-24


LO15-10: Use residual
analysis to check the
assumptions of multiple
regression. 15.11 Residual Analysis in Multiple
Regression
 For an observed value of yi, the residual is

ei = yi - ŷ = yi – (b0 + b1xi1 + … + bkxik)

 If the regression assumptions hold, the


residuals should look like a random sample
from a normal distribution with mean 0 and
variance σ2

15-25
LO15-10

Residual Plots
 Residuals versus each independent variable
 Residuals versus predicted y’s
 Residuals in time order (if the response is a
time series)

Figures 15.32 and 15.33 15-26

Potrebbero piacerti anche