Sei sulla pagina 1di 6

REGRESSION ANALYSIS SUMMARY NOTES

I. The data analysis add-in under Excel:

Excel provides useful statistical tools to carry out regression analysis. But you have to
check that the data analysis (utilitaire danalyse) add-in (complment) is enabled. (Click on
Data)

If data analysis is not enabled, do the following (EXCEL 2010):

1- Go to file and select options

2- Select Add-ins on the left hand side

3- Check that excel add-ins is selected (on the right hand-side) and click on GO

4- Select Analysis Toolpak and click on OK

NB: if you have a different version of EXCEL, the steps are quite similar: search for the
add-ins and select Analysis Toolpak; it may be helpful to download related videos.

If data analysis is enabled, it will appear within Data. For regression analysis under Excel,
you need the following tools within the data analysis add-in:

- Correlation
- Regression

1
II. Main steps of regression analysis:

1- Identification of the dependent variable and the independent variable(s):

The dependent variable Y is the variable you want to explain/ predict; its also called
explained variable. Example: total Cost, overhead cost

The independent variables X are the potential cost drivers of the dependent variable and
are also called explanatory variables. Regression analysis aims to find the independent
variable(s) which explain(s) the dependent variable. In other words, the best estimators of
the explained variable Y are those cost drivers which have the strongest correlation with it.
Potential cost drivers could be, for instance, direct labor hours, physical output produced,
number of machine hours

NB: The objective of regression analysis is to find the factors which best explain and help
to predict the item of cost which constitutes the dependent variable. In other words, we look
for the cost drivers whose change explains the change in the cost. Thats why, it is essential
to adjust cost data for inflation. As a matter of fact, inflation causes monetary amounts to
change. However, that change is due to the variation of the price index and is not caused by
cost drivers. Therefore, in order to isolate the change in the cost which may be due to cost
drivers, data expressed in monetary units must be adjusted for inflation before regression
analysis is undertaken.

2. Analysis of the coefficients of correlation in the correlation matrix:

The correlation matrix aims to determine the correlation coefficients between any pair
of variables. All the variables are included in the matrix: dependent and independent
variables.

The interpretation of the strength of the correlation coefficient is different depending on


whether the coefficient is calculated between the dependent variable and an independent
variable or between two independent variables.

2
Coefficient of correlation between the dependent variable and an independent variable:

A high coefficient of correlation between the dependent variable and an independent


variable suggests that the considered independent variable is likely to be a good estimator of
the cost. Regression analysis is however required to support this observation.

A weak coefficient of correlation between the dependent variable and an independent


variable suggests that the considered independent variable is likely to be a poor estimator of
the cost. Regression analysis is however required to support this observation.

Coefficient of correlation between two independent variables:

The study of the correlation coefficient between independent variables aims to detect
multicollinearity problem which is useful for the multiple regression. As a matter of fact,
multicollinearity exists when the correlation coefficient between two X (independent)
variables is 70%. Therefore, the high correlation between two independent variables
means that, if they are included in the same (multiple) regression equation, it is impossible
to separate their impact on the cost.

3. Setting out and interpreting simple regressions:

For each independent variable X, write down the simple regression equation:

Y=a+bX

a is the estimated fixed cost component (intercept)

b is the estimated unit variable cost component (coefficient)

The analysis of the simple regression involves two steps: the study of the overall
significance of the model and the study of the specific significance of the coefficients of the
model.

Overall significance of the model:

R square, also called coefficient of determination, is a reliability test of the


regression: is the model reliable? Is there any significant association between X and Y?

3
R square is expressed as a percentage and it measures the percentage of the change
in the dependent variable which is explained by the change in the independent variable. The
maximum value is 1 or 100%. The minimal value is 0. If R square is equal to 80% for instance,
then we say that the change in X explains 80% of the change in Y. The remaining 20% pertain
to omitted variables and to the error term.

The significance of R square is subject to testing. Two approaches are possible:

The Fisher statistics: Fisher statistics measures the % of the variability of Y explained
by the model divided by the % of the variability included in the error term. Rule of thumb: F
must be higher than 2 to say that R square is high.

The p-value approach: We set out a null hypothesis such that there is no association
between X and Y (the same results could be observed by chance). If p-value is less than 5%,
we reject the null hypothesis. An overall significant association between X and Y therefore
exists.

Thus:

- If R square is low and significance of F is higher than 5% (or F is lower than 2): the
overall association between the variability of X and Y is poor. No need to carry on with the
investigation of the significance of coefficients.

-If R square is high and significance of F is less than 5% (or F is higher than 2): the
variability of X explains a high percentage of variability of Y. In other terms, the probability to
obtain the same observations by chance is negligible.

In this case, check the significance of the coefficients and interpret the expected
response of Y to a change in X.

Significance of the coefficients of the simple regression:

Dont interpret coefficients with p-values higher than 5%.

For coefficients with p-value less than 5%, the coefficient measures the expected
impact on the dependent variable caused by a change of one unit in the independent
variable.

4
Example: total cost = 1000+50DLH

Interpretation: The OLS coefficient estimate is 50. So, if DLH changes by one unit, total
cost changes by 50.

Note that at a confidence interval of 95%, Excel displays a range of possible expected
values for the coefficient. Suppose for instance that we have the following results: at a
confidence interval of 95%: lower limit 40; upper limit 70; p-value<<5%

Interpretation: at a confidence interval of 95%, if DLH changes by 1 unit, the expected


total cost change will be (or the true impact on the total cost will be) between 40 and 70
units.

Choosing the best simple regression model:

Choose the best simple regression with respect to the following criteria:

- Highest R square
- Confidence in the parameters of the model: coefficient estimates with p-value less
than 5%, absence of statistical problems such as unexpected sign of the variable
(example :a negative intercept in a cost function)

4. Setting out and interpreting a multiple regression:

Multiple regression aims to explain the change in the dependent variable by taking into
account more than one single variable. The cost would be therefore determined by many
factors which, together, would cause it to change.

The analysis steps are similar to simple regression steps: study of the overall significance
of the regression and then study of the significance of the coefficients of the regression.

5
BUT, pay attention to the following:

If R square is high (significance of F is less than 5%) and the different coefficients are
significant but multicollinearity exists: the overall association measured by the
coefficient of determination is significant but the coefficient estimates are
biased/unreliable.
In other terms, the model as a whole is reliable and can be used to predict the
dependent variable. However, due to multicollinearity, it is impossible to separate
the relative impact (measured by the coefficients) of each X variable on Y.

Avoid interpreting multiple regression in the same manner as simple regression:

Example: Y=100+2X1+5X2

For a change of 1 unit in X1, the expected value of Y will change by 2 units; if X2 is
held constant.

If two (significant) independent variables have opposite signs in a multiple regression


equation, they act as substitutes or have opposite effects on the dependent variable
Compare the multiple regression to the best simple regression and choose the model
that provides better estimation of Y (compare R square, check the absence of
statistical problems such as poor statistical significance of coefficients or existence of
multicollinearity)

Potrebbero piacerti anche