Sei sulla pagina 1di 3

REGRESSION CHECKLIST

The following points are useful in both time series regressions and regular regressions.
In each point, if appropriate, the time series regression ideas are detailed as well.

Sample Size Rule must be checked at the outset, namely, n/k ≥ 15. Often this might not
be satisfied, in which case your answers may not be reliable.

Scatter Plots will help you determine if the relationship between Y and each of the
(continuous) X variables is linear or not; and detect outliers. In time series, a time plot of
the dependent variable will help you look for seasonality, cycles, outliers, and trends.

Correlation Tables will tell you which of the X variables correlate most with Y. Also, it
will help you identify multicollinearity, which is the problem when two or more X
variables are highly correlated with one another.

Transformations can now be attempted, if needed. You can have (a) quadratic
regression; (b) log transform of one or more of the (continuous) X variables; (b) log
transform of the Y variable, if the histogram of Y is skewed; and (c) log transform of
both Y and all the continuous X variables. This last one is called a Multiplicative
Regression Equation.

Interaction and dummy variables can also be conjectured and created at this stage.

Run a regression at this point.

Multiple R is the correlation between Y and X in a simple linear regression. It is the


correlation between Y and Y-hat in a multiple regression.

Adjusted R-squared is the proportion (or percentage) of variation in Y that is explained


by the model.

Standard Error of the Regression (or Estimate) is in the same units as the Y variable.
You can compare the standard errors of two models for the same Y variable, provided Y
has not been log-transformed. More importantly, you can compare the standard error of
the regression to the standard deviation of the Y variable. The smaller the standard error
is the better the model.

The p-values for the individual regression coefficients (excluding the constant) is checked
to see which ones are significant.

Interpret each of the unstandardized regression coefficients from the output at this stage

Standardizing the coefficients of the continuous X variables is useful to measure the


relative impact of each X on Y.
Why should you do this? Suppose Y is measured in dollars. X1 is measured in
thousands of dollars, X2 is measured in cents, and X3 is measured in inches. How do
you measure the relative impact of X1, X2 and X3 on Y, given that they are all in
different units? The answer is through Standardized Regression Coefficients.

How to do it?

1. Run a multiple regression and save the regression coefficients, excluding the
intercept term in a spreadsheet. Call these b1, b2, b3 and so on. For the sake of
discussion, let us restrict ourselves to two X variables, resulting in two slopes, b1,
and b2.
2. Using the STDEV function in Excel, calculate the standard deviation for the
column of data corresponding to the dependent variable, Y. Call this Sy. Repeat
this calculation for each of the independent variables in your model. Call these
Sx1 and Sx2.
3. Next calculate each of the two standardized coefficients like so. b1S = (b1 times
Sx1) divided by Sy; and b2S = (b2 times Sx2) divided by Sy. Of course if you
have more than three independent variables, then you would do the same thing for
those as well.

What can you take away?

Suppose b1S is larger than b2S. If X1 increases by one standard deviation, keeping X2
fixed, then it leads to a much larger increase in the expected value of Y than does an
increase of one standard deviation of X2, keeping X1 fixed. That is, X1 has much larger
relative impact on Y than X2.

Residual plot analysis is used to see if the residuals are random; detect outliers; and
ensure that 95% of the residuals are within two standard errors of zero. StatTools will
automatically create this plot if you ask it. This is the plot of the residual versus the y-hat
(fitted) values.

Forecasting the Y variable using your final model is useful. First you obtain a point
forecast of Y using your final model. In addition, you must obtain a forecast interval.
StatTools does this automatically. Recall that the regression model can be used to
forecast for only those values of the X variables that are in the range of your data. In a
time series regression, you can predict out of the range of your time interval, but these
predictions will become dodgy if you predict too many time periods into the future.

Autocorrelation and Lagged variables are unique to time series regressions.

Interpretation of regression coefficients in log regressions

(1) If the data Y are skewed right, then take the log of Y. Interpret the coefficients
with care! Consider Example 11.3: log (salary) = 3.58 +.0188 Years of
Experience - .1616 Female. For each extra year of experience, salary increases by
1.88 %. Compared to a male, a female, on the average, earns 16.16% less. In this
case we multiply all the regression coefficients by 100.
(2) If the scatter plot of Y and X is concave downward, then take the log of X only.
Consider Example 11.4: Cost = -63993 + 16,654 log(units). If units increase by
1% then cost increases by $166.54. In this case we divide the particular
coefficient by 100.
(3) If the time plot of Y is exponential looking or if you believe there is a
multiplicative regression, then take the log of X and Y. In this case, all the
coefficients are elasticities. Consider the following equation: Log(Sales) = 4.675 –
1.185 log(price) + 2.183 log (income) - .19 log (interest). Now for a 1% increase
in price, sales declines by 1.185%; for a 1% increase in income; sales increases by
2.18%; for a 1% increase in interest rate sales decreases by .19%. Here you
simply read off the coefficients as percentages.

Potrebbero piacerti anche