Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Given a random sample: The OLS estimators of 1 and 0 are: Fitted value: Residual: To assess the goodness-of-fit of our model we decomposed the total sum of squares SST:
Computed the fraction of the total sum of squares (SST) that is explained by the model, denoted this as the R-squared of the regression R2 = SSE/SST = 1 SSR/SST Can show this is the squared correlation between y and the fitted values
Simple Linear Regression 1
To assess whether the distribution of the estimators was concentrated around the true parameters we added an additional assumption
( )
SSTx
SSTx = i =1 ( xi x ) 2
n
These are conditional variances: treat the xs as known constants Problem: dont know 2
Simple Linear Regression 2
Estimated standard error (se) of the parameters (Square root of estimated variances, substituting unknown parameters by estimates) :
this implies
Multiple Regression Model 7
OLS Estimators
Let us minimise the Residuals Sum of Squares (SSR) given above. Take derivatives...
...
We are imposing zero correlation between the regressors and the residuals: in the population, there is also zero correlation between the error term and the regressors...
Multiple Regression Model 8
Fitted value:
Residual:
i y )) ( yi y )( y n n 2 (i=1 ( yi y ) )(i=1 ( yi y )2 )
n 2 i =1
R2 can never decrease when another independent variable is added to a regression, and usually will increase
Because R2 will usually increase with the number of independent variables, it is not a good measure to compare models
Multiple Regression Model 10
11
...
or
which leads to
12
OLS Estimator
where:
Assumption MLR.3 (No perfect Collinearity) implies that X has full column rank (=k+1) This means: No column of X is a linear combination of the other columns This in turn implies that (XX)-1 exists
Multiple Regression Model 13
So, conditional on X, and using the fact that and that the sample is random, we can conclude that:
14
Since XX is symmetric (equal to its transpose) inverse is also symmetric Random sampling and homoskedasticity imply: Therefore:
15
(n k 1) SSR
df
16
E.g.:
17
this implies
18
OLS Estimators
OLS obtained by minimising the Residuals Sum of Squares (SSR). First order conditions are:
...
or
which leads to
20
(n k 1) SSR
df
21
R2 can never decrease when another independent variable is added to a regression, and usually will increase
Because R2 will usually increase with the number of independent variables, it is not a good measure to compare models
22
+ x + x, = y 0 1 1 2 2
n i1 yi 1 = r i =1 2 r i1 n i =1
i1 r
1 = 0 + 2 x 2 x
i1 r
instead of the original regressor x1 (note that the average of the residuals is always 0) Therefore, the estimated effect of x1 on y equals the (simple regression) estimated effect of the part of x1 that is not explained by x2
Multiple Regression Model 23
24
25
Estimated coefficient of Education is the same as the coefficient from the previous regression (that controls for experience)!
26
~ ~ ~ y = 0 + 1 x1 + x + x = y 0 1 1 2 2
1 = 0 + 2 x 2 and it turns out that 2 = 0, If we run the regression x this means that x1 and x2 are uncorrelated. Then, the residuals of this equation are the deviations of x1 from its mean, % = (check formula for % in the two interpretations) which implies that
1 1 1
~ (check first order conditions) If 2 = 0, it will also be the case that 1 = 1 ~ However, in general it will be the case that 1 1
Multiple Regression Model 27
If we include irrelevant variables in our model, the estimated effects will be zero or close to zero and the remaining parameter estimates will remain unchanged (check first order conditions!!!)
28
So, always more variables? Even if they are irrelevant (or almost irrelevant) and therefore do not induce bias in the other estimators? No!
Why? Variances of the estimators can become large! Can show, under MLR.1 to MLR.5, that: j=1,2,,k
Is the coefficient of determination from regressing xj on all the other regressors. Tells us how much the other regressors explain xj
Multiple Regression Model 30