Sei sulla pagina 1di 9

A linear regression is performed with 3 independent variables.

The following (partial) ANOVA table is generated: Source Regression Error Total DF 3 16 19 SS MS F 185202 61734 35.64061 27714 1732.125 212916

F crit = 3.2389 from F table with 3 numerator and 16 denominator df and alpha = .05

a) What are the missing values in the table above? (fill in the missing values above) b) What proportion of the variation in the dependent variable is explained by variation in the independent variables? calculate adjusted R-squared and interpret its meaning adjusted R-squared = 1 - (SSE/df)/(SSTot/df) .845, so 84.5% of variation in dependent variable is explained by variation in indep variables c) Is this a "good model" for predicting the value of the dependent variable? Justify your answer. yes, because: a) 84.5% of variation is explained b) overall F test is significant with 35.64 > 3.23

d) What are the null and alternative hypotheses for the overall test? Ho: all beta coefficients are equal to 0 H1: at least one beta coefficient is NOT 0

e) What are the decision and conclusion regarding the overall test? decision: reject null because 35.64 > 3.23 conclusion: at the .05 level of significance, there is sufficient evidence to suggest that all regression coefficients are NOT zero; so, this is "overall" a good model

ominator df

ep variables

Two boats, the Prada (from Italy) and Oracle (United States) are competing for a spot in the upcoming America's Cup race. They race over a part of the course several times, and their times, in minutes, are recorded (see tab Q2 data)

a) At the .05 significance level, can we conclude that there is a difference in variability for the times of the two boats? Ho: variability of Prada = variability of Oracle H1: variability of two boats is NOT equal F with .041 significance means reject null hypothesis of equal variability conclude: at the .05 level of significance, variability of 2 boats is NOT equal
Levene's Test for Equality of Variances F time Equal variances assumed Equal variances not assumed 4.795 Sig. .041 Independent Samples Test t-test for Equality of Means t -3.541 df 20 Sig. (2tailed) .002 Mean Difference -2.70500 95% Confidence Interval of the Difference Std. Error Lower Upper Difference .76393 -4.29852 -1.11148

-3.759

16.363

.002

-2.70500

.71958

-4.22770

-1.18230

b) At the .05 significance level, can we conclude that there is a difference in their mean times? Ho: mean of oracle = mean of prada h1: mean times are NOT equal assuming unequal variances, reject null hypothesis of equal means because t = -3.759 has significance level of .002 < alpha = .05 conclusion: at the .05 level of sig., there is sufficient evidence to suggest that the mean times of Oracle and Prada are NOT equal

Prada 12.9 12.5 11 13.3 11.2 11.4 11.6 12.3 14.2 11.3

Oracle 14.1 14.1 14.2 17.4 15.8 16.7 16.1 13.3 13.4 13.6 10.8 19

A consumer buying cooperative tested the effective heating area (in square feet) of 20 different electric space heaters with different wattages. The tab "Q3 Data" shows the results. a) Compute the correlation between wattage and heating area. How would you describe this relationship? correlation coeff. = .939, a very strong positive correlation r-squared = 0.881721

b) At the .01 level, is there sufficient evidence to suggest that the true population correlation coefficient is greater than 0? Ho: population correlation, rho, is <= 0 H1: population correlation > 0 c) Develop a regression equation for heating area based on wattage. t = r * sqrt(n-2)/sqrt(1 - r-squared) t = .939 * sqrt (20 - 2)/sqrt(1-.881721) compare to t with n-2 df = 20 - 2 = 18 t with 18 df and .01 = 2.552, from table t = 11.57 > 2.552, so reject null

yhat = -22.581 + .149 * wattage d) What are the interpretations of the regression coefficients in the context of this problem? when wattage = 0, the area heated is -22.58 square feet for every 1 unit increase in wattage, we can heat an additional .149 square feet

e) What is the estimate of heating area for a 1900 watt heater? yhat(1900) = -22.581 + .149 * 1900 260.519 260.5 square feet f) Are each of the assumptions for simple linear regression satisfied? normality - good since normal prob. Plot has points along the line linearity - very good due to strength of r = .939; also, plot of wattage versus area appears very linear independence - plot of residuals versus predicted values shows no pattern from left to right homoscedasticity - size of residuals are the same as graph moves from left to right

g) Is this a good model for predicting heating area? Give at least 3 reasons for your answer. YES: 1. All the assumptions for linear regression are valid 2. r-squared = .88 means most variation is explained by independent variable (60% considered good, so 88% is very good) 3. Overall F test is significant, indicating a good fit 4. r = .939 is a very strong positive correlation, a measure of strength of linear relationship between wattage and heating area

Heater 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Wattage (in watts) Area (in square feet) 1500 205 750 70 1500 199 1250 151 1250 181 1250 217 1000 94 2000 298 1000 135 1500 211 1250 116 500 72 500 82 1500 206 2000 245 1500 219 750 63 1500 200 1250 151 500 44

A mortgage company of a large bank is studying its recent loans. Of particular interest is how such factors as the value of the home (in thousands of dollars), education level of the head of the household, current monthly mortgage payment (in dollars), and gender of the head of the household (male = 1, female = 0) relate to the family's household income. a) Determine the regression equation. What is your estimate of household income for a 40 year old female head of household with 14 years of education living in a $200K home with a $500 monthly mortgage?

b) Are these variables (or a subset of them) effective predictors of the household income. Justify your answer in at least 3 ways.

c) Determine the set of variables that are effective predictors using an alpha of .02. those that are effective predictors have significant t scores

d) Is there an interaction effect between home value and years of education on household income? create a new variable by multiplying home value and years of education, then include this new variable in the regression If the t statistic for this new variable is signficant, then there is an interaction effect. If not, there is no interaction effect. e) Are the assumptions of multiple regression satisfied? Check L, I, N, and E

f) Would you consider removing any variables because of multicollinearity? general guideline: want r values between -.7 < r < .7 more specific test: want tolerance > .1, or equivalently VIF < 10

Income Value Years of ($000s) ($000s) Education Age 40.3 190 14 39.6 121 15 40.8 161 14 40.3 161 14 40.0 179 14 38.1 99 14 40.4 114 15 40.7 202 14 40.8 184 13 37.1 90 14 39.9 181 14 40.4 143 15 38.0 132 14 39.0 127 14 39.5 153 14 40.6 145 14 40.3 174 15 40.1 177 15 41.7 188 15 40.1 153 15 40.6 150 16 40.4 173 13 40.9 163 14 40.1 150 15 38.5 139 14

53 49 44 39 53 46 42 49 37 43 48 54 44 37 50 50 52 47 49 53 58 42 46 50 45

Mortgage Payment Gender 230 1 370 1 397 1 181 1 378 0 304 0 285 1 551 0 370 0 135 0 332 1 217 1 490 0 220 0 270 1 279 1 329 1 274 0 433 1 333 1 148 0 390 1 142 1 343 0 373 0

Potrebbero piacerti anche