Sei sulla pagina 1di 4

Inference for Regression Equations

In a beginning course in statistics, most often, the computational formulas for inference in regression settings are simply given to the students. Some attempt is made to illustrate why the components of the formula make sense, but the derivations are “beyond the scope of the course”. For advanced students, however, these formulas become applications of the expected value theorems studied earlier in the year. To derive the regression inference

equations, students must remember that

independent, Var ( X +=Y )

(

Var kX

)

= k Var( X ) , and, when X and Y are

. Finally,

2

Var( X ) +Var (Y ) and Var ( XY ) =Var( X )Var ()Y

Var

(

X

n )

=

Var ( X ) n
Var
(
X
)
n

.

In addition, the Modeling Assumptions for Regression are:

1. There is a normally distributed subpopulation of responses for each value of the explanatory variable. These subpopulations all have a common variance.

.

So,

|~

yx

(

N µ

|

yx

,

σ

e

)

2. The means of the subpopulations fall on a

straight-line function of the explanatory

variable. This means that

that

for a given value of the explanatory variable.

µ y|x

= α + β x and

yˆ = a + bx estimates the mean response

+ β x and y ˆ = a + bx estimates the mean response Another way

Another way to describe this is to say that

Y

=+α

β X + ε with

ε

~

N

(0, )

σ

e

Graphical Representation of regression

. assumptions

3.

selection of any other observation. The values of the explanatory variable are assumed to

be fixed. This fixed (and known) value for the independent variable is essential for developing the formulae.

The selection of an observation from any of the subpopulations is independent of the

The key to understanding the various standard errors for regression is to realize that the
The key to understanding the various standard errors for regression is to realize that the
variation of interest comes from the distribution of y around µ
.
This is
ε
~
N
(0, )
σ
.
y|x
e

From our initial work on regression, we saw that yˆ = a + bx and

(

yˆ = y + bx x

)

.

Now, if we let

X

i

= x x

i

and

Yyy=

i

i

, then

b

=

i

X Y

i

i

i

X

2

i

. All of the regression

equations originate with this computational formula for b.

To see that this is true, consider

(

yˆ = y + bx x

)

. In this form, we have a one

variable problem. Since we know all the individual values of x and y, and, consequently,

the means x and y , we can use first semester calculus to solve for b.

Define

S

S

=

=

dS

db

n

1

i

=

n

=

i

1

=

2

ˆ

y

)

2

=

n

=

)))

2

.

(

y

(

y

i

−+

y

(

i

x

(

bx

Now, let

X

i

= x x

i

 

2

bX

i

)

= 0 .

ii

(

Y

i

bX

i

)

2

i

1

n

i = 1

(

Y

−−

bX

ii

. Find the value of b that minimizes S.

)(

X

i

)

.

If

dS

db =

0

, then

n

i = 1

(

X Y

ii

+

b

n

=

1

i

X

2

i

=

n

=

1

i

XY

ii

and b

=

i

X Y

i

i

i

X

2

i

.

and

Yyy=

i

i

, so

Solving for b, we find

The Standard Error for the Slope To compute a confidence interval for β , we need to determine the variance of b, using the expected value theorems.

=

i

i

X Y

i

i

X

i

Y

i

Since b

X

2

i

, we compute Var b

X

(

)

=

Var

i Since the values of X are

X

2

i

.

i

2 in the denominator is a constant. So,

assumed to be fixed,

Var

⎜ ⎝

X Y

i

i

1

i ⎟ ⎟ ⎠ =

i

i

X

2

X

2

⎜ ⎝

i

2

i

⎟ ⎠

Var

⎜ ⎝

i

X Y

i

i

⎟ ⎠

⎜ ⎝

i

X Y

ii

⎟ =

and Var

interested in the variation of Y for the given

Var

(

X Y

11

+

i

X Y

2

2

++

X Y

nn

)

. The X’s are constants and we are

2

σ

e

.

X , which is the common variance

i

So,

Var

(

X Y + X Y

11

2

2

++ X Y

nn

)

2

= X Var

1

(

Y

1

)

2

+ X Var

2

(

Y

2

)

++ X Var Y

n

n

2

()

= σ

22

e

X

i

i

.

Putting it all together, we find

(

Var b

)



⎜⎜

⎝⎝

⎠ ⎠ ⎟

2

⎝ ⎜

XY

ii

X

2

i

== Var

i

i

X

2 ⎠ ⎟

i

σ

2

2

σ

e

ie

=

i

X

2

i

i

X

2

i

is often written as

(

Var b

)

=

2

σ

e

2

.

(

x

i

x

)

i

. This

So, the standard error for the slope in regression can be estimated by

s e = or s s b ˆ 2 ∑ ( x − x )
s
e
=
or
s
s b
ˆ
2
(
x
x
)
i
i
s e = . ˆ b n − 1 s x
s
e
=
.
ˆ
b
n
− 1
s
x

The Standard Error for yˆ , the Predicted Mean Confidence intervals for a predicted mean can now be obtained. The standard

error can be determined by computing

before, using the expected value theorems, we find

Var

(

yˆ

)

.

We know that

(

yˆ = y + bx x

)

, so, as

(

with Var b

So,

)

=

Var

()

yˆ

= Var

(

x x

y + b

(

))

= Var

(

)

y +−x

(

x

)

2

()

Var b

,

⎛ ∑ y i ⎞ 2 2 2 σ ⎜ ⎟ 1 ⎛ ⎞ n
y
i ⎞
2
2
2
σ
1
n σ
σ
e
and Var
(
y
)
i
ee
=
Var
=
Var
y
==
2
2
i
2
(
x
x
)
n
n
⎜ ⎝
⎟ ⎠
nn
i
i
⎝ ⎜
⎟ ⎟ ⎠
2
2
2
σ
σ
(
x
x
)
Var
()
y
ˆ
=
Var
(
y
+
b
(
x
−=+
x
))
e
e
.
2
n
(
x
x
)
i

.

The standard error for predicting a mean response for a given value of x can be estimated

by

2 1 ( x − x ) s s + . ˆ = y e
2
1 (
x
− x
)
s
s
+
.
ˆ =
y
e
2
n ∑ (
x
− x
)
i

The Standard Error for the Intercept

The variance of the intercept a can be estimated using the previous formula for the standard error for yˆ . Since yˆ = a + bx , the variance of a is the variance of yˆ when

x = 0 .

So,

2 1 () x s = s + . a e 2 n ∑ (
2
1
()
x
s
=
s
+
.
a
e
2
n
∑ (
x
− x
)
i

The Standard Error for a Predicted Value

Finally, to predict a y-value,

y

p , for a given x, we need to consider two

independent errors. We know that y is normally distributed around µ

y|x

so

|~

yx

(

N µ

|

yx

,

σ . Given µ , we can estimate our error in predicting y. But, as we

e

)

y|x

have just seen, there is also variation in our predictions of

taking into account its own variation and then we use that prediction in predicting y. So

µ y|x

. First, we predict yˆ

Var

(

y

p

)

=

Var

()

ˆ

y

+

Var

()

ε

2 ⎛ 2 2 σ σ ( xx − ) e e =⎜ ⎜ +
2
2
2
σ
σ
(
xx −
)
e
e
=⎜ ⎜
+
2
n
(
xx −
)
i
2 ⎞⎛ ⎞ 1 ( xx − ) ⎟⎜+ = ++ ( 2 ) 2
2
⎞⎛
1
(
xx −
)
⎟⎜+ = ++
(
2
)
2
σσ
1
e
e
2
n
(
xx −
)
⎠⎝ ⎟⎜
i
⎟ ⎠

.

The standard error for this prediction can be estimated with

2 ⎛ ⎞ 1 ( x − x ) 2 s = ⎜ 1 ++
2
1
(
x
x
)
2
s
= ⎜ 1 ++
s
.
y
e
2
p
n
(
x
x
)
⎝ ⎜
i
⎠ ⎟

Now we have all the equations found in the texts.

   

s

e

Standard error for Slope:

s

ˆ

=

n − 1 s x
n
1
s
x
 

b

Standard error for Predicted Mean:

s

ˆ

=

s

e

2 1 ( x − x ) + 2 n ∑ ( x − x
2
1
(
x
x
)
+
2
n
(
x
x
)
i
 

y

Standard error for the Intercept:

s

a

=

s

e

2 1 () x + 2 n ∑ ( x ) − i x
2
1
()
x
+
2
n
(
x )
i x
 

Standard error for a Predicted Value:

s

y

p

s

2

e

⎜ ⎝

2 1 ( x − x ) 2 n ∑ ( x − x )
2
1
(
x
x
)
2
n
(
x
x
)
i

= ⎜ ++

1

⎟ ⎠

Reference: Kennedy, Joh B. and Adam M. Neville, Basic Statistical Methods for Engineers and Scientists, 3 rd , Harper and Row, 1986.