Sei sulla pagina 1di 13

Indian Institute of Management Bangalore

QUANTITATIVE METHODS II
Quiz 1
Saturday, November 1, 2008

Time : 120 minutes

Total No. of Pages : 11 Name ________________________

Total No. of Questions: 3 Roll No. ________________________

Total marks: 40 Section: _______________________

Instructions

1. This is an Open Book Exam. You are allowed to carry text books and class room notes. Laptops and past
question papers are not allowed.
2. Answer all questions only in the space provided following the question.
3. Show all work and give adequate explanations to get credit.
4. You may use the backside of the last page for rough work only if needed. Do NOT attach any rough
work/sheets.
5. Encircle or underline your final answer for each part.
6. No clarifications will be made during the exam.
7. Use the text book for critical t, F and chi-square values if required, use = 0.05.

1
Question I

NLNN Kumar who just completed a short term course at the International Institute of Management and
Accounting (IIMA), is trying his newly acquired knowledge with the data he received from his company.
He is working in NS Software Services (popularly known as NS3), and received the data from the
software testing division of the company. He had presumed that the number of bugs per 10 kilolines of
code can be explained by two explanatory variables namely, the months of experience of the team
(calculated as the average no. of months of total experience for the team) and size of the team (number of
team members). He fitted the regression in Excel and got the following output.
Summary Statistics:
n= 10 X1i = 150 X2i = 831 X1iX2i = 13990 X21i = 2580 X22i = 76117
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.981583
R Square 0.963504
Adjusted R
Square
Standard Error
Observations 10
ANOVA
Significance
df SS MS F F
Regression 2 70.7897 35.39485
Residual 7 2.681371 0.383053
Total 9 73.47107

1.1 Calculate the adjusted R2 for this model (2 points). Explain how adjusted R2 differs from R2 and
what is its significance?
SSE /( n k 1) 2.6813 / 7 0.3830
Adjusted R2 = 1 =1 =1 = 0.953077
SST /( n 1) 73.4710 / 9 8.1634
n 1 9
Alternative approach: 1 (1 R 2 ) = 1 (1 0.963504) = 0.953077
n ( k + 1) 7
Adjusted R2 is an adjustment of R2 to adjust for one model having more degrees of freedom than
another model. For comparison of different models one should use adjusted R2 since R2 change
can be deceptive.

1.2 Test the following null hypothesis. Use = 0.01. State the conclusion of the result of the
hypothesis test clearly (2 points)
H0: 1 = 2 = 0

2
H1 : At least one is not equal to zero

The critical value at = 0.01 is 9.54 (F0.01,2,7),

F statistic = MSR/MSE = 35.39485/0.383053 = 92.40197

Since F statistic is greater than the critical value we have enough evidence to reject the null
hypothesis and conclude that the overall model is significant.

Kumar also got the following output regarding the regression coefficients and the corresponding
confidence intervals

Standard Lower Upper


Coefficients Error t Stat P-value 95% 95%
Intercept 4.3036 2.2625 1.9022 0.0989 -1.0462 9.6535
Teamsize 0.8559 0.7481 1.1442 0.2902 -0.9130 2.6249
Months of Exp -0.0849 0.1613 -0.5261 0.6151 -0.4664 0.2966

1.3 Test the null hypothesis that the team size has no impact on number of bugs and also that months
of experience has no impact on the number of bugs. State your conclusion of the hypothesis test
clearly. Use = 0.05 (2 points)

Hypothesis for Team Size: H0: 1 = 0, H1: 1 0.

From the table, the t-statistic value is 1.1442, whereas the corresponding critical value of t (with n-2
df) is 2.364. Since the t-statistic value is less than the critical value of t, we cannot reject the null
hypothesis that 1 = 0. This is also confirmed by the corresponding p-value which is 0.2902 (> 0.05)

Hypothesis for Months of experience: H0: 2 = 0, H1: 2 0.

From the table, we have that t-statistic value is -0.5261, whereas the corresponding critical value of t
is 2.364. Since the t-statistic value is less than the critical value of t, we accept the null hypothesis that
2 = 0. This is also confirmed by the corresponding p-value which is 0.6151 (> 0.05)

1.4 Comment on the results of Question nos. 1, 2 and 3, by explaining any consistency or
inconsistency among your answers. (2 points)

There is an inconsistency. Since adjusted R2 value is 0.9530, we would conclude that there is a
relationship between the response variable and the explanatory variable. This again confirmed by
F-test, in which the overall model is accepted. However, using t-test, we accept the null
hypothesis that 1 = 0 and 2 = 0.

One of the reasons for this inconsistency could be the presence of multi-collinearity. When there
is multi-collinearity, the overall model may be accepted, but none of regression coefficients may
be accepted.

3
1.5 Calculate the Variance Inflation Factor (VIF) of the regression coefficients and comment on the
value obtained. (2 points)
1
VIF = 2
, where Ri2 is the co-efficient of determination between the explanatory variables.
1 Ri
Here since we have 2 explanatory variables, Ri2 = r 2 , where r is the correlation between the
explanatory variables.

CoV ( X 1, X 2 ) n X 1i X 2i X 1i X 2i
r ( X1, X 2 ) = = = 0.9990
X 1X 2 n X 12i ( X 1i ) 2 n X 22i ( X 2i ) 2

1
Therefore, VIF = = 512.04
1 R12

The variance inflation factor is very high, and thus the explanatory variables are collinear. One of
them should be removed from the regression model.

Question 2:

A regression analysis is performed on the annual food expenditure (AFE) with annual income (AI) and
family size (FS) as explanatory variable. The following two models are used to establish the relationship
between Annual Food Expenditure and Annual Income and Family Size.

Model 1: AFE = B0 + B1 AI

Model 2: AFE = 0 + 1 AI + 2 FS

Regression output for Model 1 is shown below:

SUMMARY OUTPUT (Model 1)

Regression Statistics

Multiple R

R Square

Adjusted R Square 0.77684

Standard Error 1.907748

Observations 20

4
ANOVA

Df SS MS F Significance F

Regression 1 244.3584 244.3584 67.1405 1.73 x 10-7

Residual 18 65.5111 3.6395

Total 19 309.8695

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 1.043354 0.934366 1.116644 0.278831 -0.91968 3.006385

AI 0.149707 0.01827 8.1947

5
Model 2: Summary Output

Regression Statistics

Multiple R 0.894788

R Square

Adjusted R Square

Standard Error 1.906239

Observations 20

ANOVA

Df SS MS F Significance F

Regression 2 248.1558 124.0479

Residual 17 61.7137

Total 19 309.8695

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 0.396888 1.130483 0.351079 0.729843 -1.98822 2.781999

AI 0.140701 0.020301 6.930686 2.43E-06 0.097869 0.183532

FS 0.391196 0.385735

Table 1 shows the predicted value of AFE, distance measures, Mahalanobis Distance (Mah_D), Cooks
Distance (Cook_D), leverage values and DFBeta Values.

6
Table 1:

Predicted
S.No AFE AI FS Y Cook_D Leverage DFB0 DFB1 DFB2
1 5.2 28 3 5.5101 0.00107 0.04832 -0.02595 0.00078 -0.0099
2 5.1 26 3 5.2287 0.00021 0.05784 -0.01164 0.00036 -0.00442
3 5.6 32 2 5.6817 0.00005 0.02292 -0.01169 0.00008 0.0014
4 7.2 24 1 4.1649 0.17839 0.10186 0.7918 -0.00299 -0.17658
5 11.3 54 4 9.55951 0.0371 0.05656 -0.11727 -0.00051 0.08809
6 8.1 59 2 9.48062 0.02388 0.05855 -0.09611 -0.00339 0.0641
7 7.8 44 3 7.76131 0.00001 0.00479 0.00138 -0.00002 0.00056
8 5.8 30 2 5.4003 0.00132 0.02686 0.05971 -0.00048 -0.00608
9 5.1 40 1 6.41611 0.03423 0.10414 -0.27917 -0.00152 0.10024
10 18 82 3 13.10793 0.60735 0.13414 -0.39432 0.02313 -0.13265
11 4.9 42 3 7.47991 0.03918 0.00705 -0.1063 0.00186 -0.04264
12 11.8 58 4 10.12231 0.03436 0.05629 -0.13264 0.00036 0.07781
13 5.2 28 1 4.7277 0.00413 0.09698 0.11672 -0.00021 -0.02941
14 4.8 20 5 5.16688 0.01847 0.40093 0.02602 0.00338 -0.07899
15 7.9 42 3 7.47991 0.00104 0.00705 0.01731 -0.0003 0.00694
16 6.4 47 1 7.40102 0.02333 0.1234 -0.19511 -0.00215 0.08601
17 15.2 112 4 17.72015 0.90717 0.40778 0.98484 -0.02936 0.04395
18 13.7 85 5 14.31243 0.01769 0.22229 0.17584 -0.00195 -0.04792
19 5.1 31 2 5.541 0.00156 0.02477 -0.06449 0.00047 0.00715
20 2.9 26 2 4.8375 0.03617 0.03746 -0.31501 0.00329 0.02183

2.1 What percentage of variations in Annual food expenditure (AFE) is explained by Model 1? (2 points)

The percentage variation is given by the co-efficient of determination, R2.

R 2 = SSR / SST = 244. 3584/309.8697 = 0.7885, that is 78.85% of variations in AFE is explained by the
model 1.

(SSR = MSR x df, here df (degrees of freedom is 1), thus SSR = MSR

2.2 In Model 1, calculate F statistic and comment what inference can be obtained using the F statistic value of
model 1. (2 points)

F = MSR/MSE, MSE = SSE/df

SSE = SST SSR = 65.5111, MSE = 65.5111/18 = 3.6395

F = 244.3584 / 3.6395 = 67.1405

7
F-critical (F0.05,1,18) = 4.41, since F-statistic is greater than the F-critical, we reject the null hypothesis
that all slopes (just 1 slope here) and conclude that the model as a whole is significant.

Alternative approach for finding F: F = t2, t = 0.1497/0.01827 = 8.1941, F = 67.14

2.3 In model 1, what is the 95% confidence interval for mean value of AFE when the annual income is 30?
(2 points)



1 ( xi x ) 2
95% confidence interval is given by: yi t / 2 , n2 se +
n
i i ) 2
( x x


y i = B0 + B1 AI = 1.043354 + 0.149707 30 = 5.5345

t/2,n-2 = t0.025,18 = 2.1009

se = 1.9077


( xi x) 2

= 0.02808
2
i ( xi x)

1 ( xi x) 2
yi t / 2 , n2 se +
= 5.5345 2.1009 1.9077 x 0.27944
n
i( xi x) 2

95% confidence interval is [4.4145, 6.6544]

2.4 Comment whether model 2 (addition of new variable, family size) adds any value in predicting the annual
food expenditure compared to model 1. (4 points)

We can check whether the new variable adds any value using the partial F-test.

The test statistic is given by:

( SSE R SSE F ) / r
Fpartial =
MSEF

SSER = 65.51, SSEF = 61.7137

MSEF = 3.6302

Fpartial = (65.51 61.71)/3.6302 = 1.0460


8
The corresponding critical F-value is (F0.05,1,18) = 4.4138

Since the partial F-statistic is less than the critical F (or less than the suggested cutoff of 4), we accept
the null hypothesis. That is, we conclude that addition of new variable (family size) does not add any
value.

2.5 In model 2, check whether the beta coefficient for FS is different from zero. ( 2 points)

Use t-test to check this. The corresponding hypothesis is: H0 = 2 = 0, H1: 2 0.


2 0.3911
The t-statistic is: = = 1.0141
S e ( 2 ) 0.3857

The corresponding critical t value (t0.05,17) = 2.1098

Since the t-statistic is less than the critical t value, we accept the null hypothesis, that is, 2 = 0:

2.6 Identify if there are any potential outliers in the model. ( 2 points)

Ans1: Since the Cooks distance for all cases is less than 1, we can conclude that there are no potential
outliers.

Alternative ans:

We can check the standardized residuals given by:


(Y Y ) (Y Y )
Standardized Residuali = i i = i i
se 1.9062

(18 13.1079)
Standardized residual for sample point 10 is: = 2.5663
1.9062

That is, the standardized residual for sample point 10 is around 2.5 standard deviation away from the
mean, since this is below 3 sigma level, we can conclude that there are no potential outliers. The
standardized residual for all other sample points are much less than 2.5.

2.7 Calculate the change in the predicted value of AFE for the sample point 1, when it is removed from the
sample in estimating the parameter values. (4 points)

The corresponding DFBeta values are: DFB0 = - 0.02592, DFB1 = 0.00078, DFB2 = -0.0099

The corresponding DFFit is = 0.02592 0.00078 x 28 + 0.0099 x 3 = 0.03381


9
2.8 Identify the sample point that has highest influence on the regression model 2. What will be the change in
the Beta co-efficient values if this sample point is removed from the sample? ( 2 points)

The sample that has the highest leverage value is sample point 17 and the corresponding leverage value is
= 0.40778

If this point is removed, then the corresponding change in the beta coefficient values are:

DFB0 = 0.9848, DFB1 = - 0.02936, DFB2 = 0.04395

The new coefficient values would be: 0 = -0.588, 1 = 0.1700 and 3 = 0.3471

Question 3:

The quarterly revenue of H&B, a catalog company, from 2005 till 2008 quarter 3 is shown in the
following table (Table 1) along with ratio to moving average values calculated for the multiplicative
model Yt = Tt St Et (where Yt is the value at time t, Tt is the trend component in the value, St is the
seasonal component and Et is the error component). The regression output for trend component is shown
in table 2.

Table 1: Quarterly revenue data and ratio to moving average

S.No Year, Quarter Revenue (in Millions) Moving Average Ratio to Moving Average

1 2005,Q1 72

2 2005,Q2 68

3 2005,Q3 80 72.5 1.1034

4 2005,Q4 70 73.5 0.9523

5 2006,Q1 76 74 1.0270

6 2006,Q2 70 74.5 0.9395

7 2006,Q3 82 75.5 1.0860

8 2006,Q4 74 75 0.9866

9 2007,Q1 74 74 1

10
10 2007,Q2 66 74.5 0.8859

11 2007,Q3 84 76 1.1052

12 2007,Q4 80 76.5 1.0457

13 2008,Q1 76 78.5 0.9681

14 2008,Q2 74 78.5 0.9426

15 2008,Q3 84

11
Table 2: Regression output for Trend

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.892401

R Square 0.79638

Adjusted R
Square 0.776018

Standard Error 0.882523

Observations 12

ANOVA

df SS MS F

Regression 1 30.46154 30.46154 39.11111

Residual 10 7.788462 0.778846

Total 11 38.25

Standard
Coefficients Error t Stat P-value

Intercept 71.32692 0.677061 105.3478 1.46E-16

T 0.461538 0.0738 6.253888 9.46E-05

3.1 Calculate the seasonality index for Quarters 1, 2, 3 and 4 (5 points)

Quarter R-MA R-MA R-MA R-MA AVE-R-MA USI SI


1.027 1 0.9681 0.998367 99.83667 99.48598
Q1
0.9395 0.8859 0.9426 0.922667 92.26667 91.94257
Q2
1.1034 1.086 1.1052 1.0982 109.82 109.4342
Q3
0.9523 0.9866 1.0457 0.994867 99.48667 99.13721
Q4
AVE:
100.35 401.41 400

12
R-MA: Ratio to moving average; AVE-R-MA: Average values of ratio to moving average

USI= Unadjusted seasonality index: AVE-R-MA x 100

Seasonality Index: USI x (400/402.01) or AVE-R-MA/Avg

The seasonality index for Q1, Q2, Q3 and Q4 are 99.4859, 91.9425, 109.4342 and 99.1372 respectively.

3.2 Estimate the trend component for the revenue data shown in table 1 (2 points)

The trend equation is: 71.32692 + 0.461538 t

3.3 Forecast the revenue for 2008 quarter 4 and 2009 quarter 2. (3 points)

2008 Q4 is t = 16 and 2009 Q4 is t = 18

The trend value for t = 16 and 18 are:

T16 = 71.32692 + 0.461538 x 16 = 78.71151

T18 = 71.32692 + 0.461538 x 18 = 79.6354

Forecasted Value is:

F16 = T16 x S4 = 78.71151 x 0.9913 = 78.0323

F18 = T18 x S2 = 79.6345 x 0.9194 = 73.2180

13

Potrebbero piacerti anche