Sei sulla pagina 1di 6

Brief Notes #11

Linear Regression

(a) Simple Linear Regression

• Model: Y = β0 + β1 g(X) + ε, with ε ~ (0,σ2)

⇒ Y = β0 + β1 X + ε

• Data: (Xi, Yi), with i = 1, … , n

Yi = β0 + β1 Xi + εi

yi
( y i − ŷ i )
ŷ i
( y i − y)
( ŷ i − y)
y
Ŷ = βˆ 0 + βˆ 1 X

x
x xi

• Least Squares Estimation of (β0, β1)

Find ( β̂ 0 , β̂1 ) such that ∑ (Y


i
i i
ˆ = βˆ + βˆ X .
ˆ ) 2 = min , where Y
−Y i 0 1 i
• Solution

S S XY
β̂ 0 = Y − βˆ 1 X = Y − XY X , β̂1 = ,

S XX S XX

1 1
where X = ∑ Xi ,
n i
Y= ∑ Yi

n i

S XX = ∑ (X i − X) 2 , S XY = ∑ (X i − X )(Yi − Y) .
i i

⎡β̂ ⎤
• Properties of ⎢ 0 ⎥ for εi ~ iid N(0, σ2)
ˆ
⎣ β1 ⎦

⎛ ⎡⎛ 1 X 2 ⎞ ⎛ X ⎞⎤ ⎞⎟

⎢⎜⎜ + ⎟⎟ ⎜⎜ − ⎟⎟⎥
⎡β̂ 0 ⎤ ⎜ ⎡β 0 ⎤ 2 ⎝ n S XX ⎠ ⎝ S XX ⎠ ⎥ ⎟

⎢ ˆ ⎥ ~ N⎜ ⎢ ⎥ , σ ⎢⎢
β
⎣ 1⎦ β
⎜⎣ 1⎦ ⎛ X ⎞ ⎛ 1 ⎞ ⎥ ⎟⎟

⎢ ⎜⎜ −
⎟⎟ ⎜⎜ ⎟⎟ ⎥

⎜ ⎝ S ⎠ ⎝ S XX ⎠ ⎦ ⎠⎟
⎝ ⎣ XX

• ˆ
Properties of Residuals, e i = Yi − Y i

- ∑e = ∑e X = ∑e Y
i
i
i
i i
i

i
i =0

- SSe = ∑ (Y
i

i
ˆ ) 2 = the residual sum of squares.
−Yi

SS e
~ χ 2n− 2 , E[SSe] = σ2(n − 2)
σ 2

⇒ σ̂ 2 = SS e /(n − 2) = MSe (mean square error)


σ̂ = MS e = "standard error of regression"

• Significance of Regression

Let S YY = ∑ (Yi
− Y) 2 = total sum of squares.
i

Property: SYY = SSe + SSR,


where SSe = ∑ (Y
i

i
ˆ ) 2 = residual sum of squares,
−Yi

SSR = ∑ (Yˆ
i

i
− Y) 2 = sum of squares explained by the regression.

Also SSe and SSR are statistically independent.


SS
Notice: if β1 = 0, then 2
R ~ χ12

Definition:

SS R SS

R 2 = = 1− e , coefficient of determination of the regression.


S YY S YY

• Hypothesis Testing for the Slope β1

1. H0: β1 = β10 against H1: β1 ≠ β10 (t-test)

βˆ 1 − β1
Property: t(β1 ) = ~ t n −2
MS e / S XX

⇒ Accept H0 at confidence level α if:

| t(β10 ) |< t n− 2,α / 2

2. H0: β1 = 0 against H1: β1 ≠ 0 (F-test)

From distributional properties and independence of SSR and SSe, and under H0,

SS R /1

F= ~ F1,n − 2

SS e /(n − 2)

⇒ Accept H0 if F < F1, n − 2, α

Notice that for H0: β1 = 0, the t-test and the F-test are equivalent.

(b) Multiple Linear Regression

k
• Model: Y = β 0 + ∑ β j g j (X) + ε , with ε ~ (0,σ2)
j=1

k
⇒ Y = β 0 + ∑ β j X j
+ ε
j=1

• Data: (Yi, Xi), with i = 1, … , n

k
Y = β 0 + ∑ β j X ij + ε i , with i = 1, … , n
j=1

⎡β 0 ⎤
⎡ Y1 ⎤ ⎢β ⎥ ⎡ ε1 ⎤ ⎡1 X 11 X 12 Λ X 1k ⎤
Let Y = ⎢⎢ Μ⎥⎥ , β = ⎢ 1⎥, ε = ⎢⎢ Μ⎥⎥ , H = ⎢⎢Μ Μ Μ Μ Μ ⎥⎥
⎢ Μ⎥
⎢⎣Yn ⎥⎦ ⎢ ⎥ ⎣⎢ε n ⎥⎦ ⎢⎣1 X n1 X n 2 Λ X nk ⎥⎦
⎣β k ⎦

⇒Y=Hβ+ε

• Least Squares Estimation

i i 0 ∑ j ij

ˆ = Y − βˆ − βˆ X
e i = Yi − Y
j=1

⎡ e1 ⎤

e = ⎢⎢ Μ⎥
ˆ
⎥ = Y − Hβ
⎢⎣e n ⎦⎥

SS e
= ∑ e i2 = ∑ e e = (Y − Hβˆ ) T (Y − Hβˆ ) = Y Y − 2Y Hβˆ + βˆ H Hβˆ
T T T T T

i i

dSS e (β)
βˆ = (H H) −1 H Y
T T
=0 ⇒

• Properties of βˆ (if εi ~ iid N(0, σ2))

βˆ ~ N(β, σ 2 (H H) −1 )
T

• Properties of Residuals

SS e
~ χ n2 − k −1
σ 2

SS e
⇒ σˆ 2 = = MS e
n − k − 1

SYY = SSe + SSR

SS
R 2 = 1 −
e
S YY

• Hypothesis Testing

⎡β ⎤
Let β = ⎢ 1 ⎥ , where β1 has τ1 components and β2 has τ2 = k − τ1 components. We
⎣β 2 ⎦
want to test H0: β2 = 0 against H1: β2 ≠ 0 (at least one component of β2 is non-zero).

The procedure is as follows:

- Fit the complete regression model and calculate SSR and SSe;

- Fit the reduced model with β2 = 0, and calculate SS R1 ;

- Let SS2|1 = SSR − SS R1 = extra sum of squares due to β2 when β1 is in the

regression.

- Distributional property of SS2|1. Under H0,

SS 2|1
~ χ τ22
σ2

Also, SS2|1 and SSe are independent. Therefore,


SS 2|1 / τ 2
F= ~ Fτ2 ,n − k −1
SS e /(n − k − 1)

⇒ Accept H0 if F < Fτ 2 ,n − k −1,α

Potrebbero piacerti anche