Sei sulla pagina 1di 3

Regression Analysis of mtcars dataset

Arnulfo Perez
Saturday, February 21, 2015

Executive summary
For cars that are lighter than 2.61, milage is always better for manual transmission, and for cars heavier than
3.32, milage is always better for automatic transmission.

Analysis
Lets start by testing the relation between weight and milage per galon. There is an inverse correlation but
there are some outliers or possible a non-linear relation. Because the question asks to compare automatic vs
manual transmission, I procede to segment the data by this attribute. The manual data show a clear inverse
relation between milage and weight, except for two outlieres that have mpg > 30. This points correspond to
4 cylinder manual transmission vehicles.
The automatic transmission data do not show a clear relation between milage and weight, but seem to have
three outlieres that have weight > 5. This points correspond to 8 cylinder automatic transmission vehicles.
Therefore, I will divide the data in three groups. Those with mpg > 32, those with wt > 5, and the rest that
will be used as the test points to generate the regression linear models.
testData <- mtcars[(mtcars$mpg<=30)&(mtcars$wt<=5),]
testAutomatic <- testData[testData$am==0,]
testManual <- testData[testData$am==1,]
The linear models are generated using lm():
fitAutomatic <- lm(mpg ~ wt + cyl, data=testAutomatic)
fitManual <- lm(mpg ~ wt + cyl, data=testManual)
# Global test of model assumptions
library(gvlma)
gvAutomatic <- gvlma(fitAutomatic)
gvManual <- gvlma(fitManual)
gvAutomatic
##
##
##
##
##
##
##
##
##
##
##

Call:
lm(formula = mpg ~ wt + cyl, data = testAutomatic)
Coefficients:
(Intercept)
28.0718

wt
0.5855

cyl
-1.7722

ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS


USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
1

##
##
##
##
##
##
##
##
##
##
##

Level of Significance =

0.05

Call:
gvlma(x = fitAutomatic)

Global Stat
Skewness
Kurtosis
Link Function
Heteroscedasticity

Value p-value
Decision
1.19177 0.8795 Assumptions acceptable.
0.41090 0.5215 Assumptions acceptable.
0.40468 0.5247 Assumptions acceptable.
0.31453 0.5749 Assumptions acceptable.
0.06166 0.8039 Assumptions acceptable.

gvManual
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

Call:
lm(formula = mpg ~ wt + cyl, data = testManual)
Coefficients:
(Intercept)
40.6192

wt
-5.9706

cyl
-0.6241

ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS


USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
Level of Significance = 0.05
Call:
gvlma(x = fitManual)

Global Stat
Skewness
Kurtosis
Link Function
Heteroscedasticity

Value p-value
Decision
2.23487 0.6927 Assumptions acceptable.
0.05182 0.8199 Assumptions acceptable.
0.46779 0.4940 Assumptions acceptable.
0.95039 0.3296 Assumptions acceptable.
0.76488 0.3818 Assumptions acceptable.

According to the linear regression model, the expected difference in milage between manual and automatic
transmission is given by:
## (Intercept)
##
12.547405

wt
-6.556098

cyl
1.148119

From the model it can be seen that for cars that are lighter than 2.61, milage is always better for manual
transmission, and for cars heavier than 3.32, milage is always better for automatic transmission.

References
Harold V. Henderson and Paul F. Velleman, Building Multilple Regression Models Interactively,
Biometrics, Vol. 37, No. 2 (Jun., 1981), pp. 391-411, Published by: International Biometric Society, Article
Stable URL: http://www.jstor.org/stable/2530428
2

t Quantiles

1.0

0.0

1.0

2.0

Studentized Residuals(fitManual)

Studentized Residuals(fitAutomatic)
10

15

20

mpg
25

30

Appendix

3
4

QQ Auto Trans Plot

1.5

wt

QQ Manual Trans Plot

0.5

0.5

t Quantiles

1.5

Potrebbero piacerti anche