Sei sulla pagina 1di 9

PartI

1.Whatisthepartialeffectofageonsleeping?

sleep/age = 8.697 + 2 * 0, 128ageage = 33.97 34

Everything else asconstant,on average upto 34yearsof age the sleep timedecreases, above34 years the sleep
timeincreases.

2.Onaverage,howisbeingmalevsfemalerelatedtosleeptime?

Weconsideredthedummyvariable,
male:
>1ifmale
>0iffemale

sleep/male = 87.75 , Everything else as constant, a male sleeps 87.75 minutes/week more than a female, on
average.

3.If someone increases their worktime by 5 hours/week, byhowmanyminutesissleeptimepredictedtofall? Isthis


alargetradeoff?

sleep
and
worktime
aremeasuredinminutes,(5hours=300minutes)
5 = sleep/totwrk = 0.163
Prediction: sleep = 5totwrk = 0.163 * 300 = 48.9
50
Everything else as constant, if someone increases their worktime by 5 hours/week, on average,they will sleep 50
minutes/weekless.
If we consider the week has five working days, itis not alarge tradeoffbecause ifsomeone work 1hourmore per
workingdaythiswouldreducesleeptimeby10minutesonaworkingday.

4. Discuss the sign and magnitude of the estimated coefficient of education. Can you reject the hypothesis that
educationisnotrelatedwithsleeptime?

sleep/educ = 11.71 < 0


Peoplemoreeducatedtendtohavebusyworkinglifesandthereforetheytendtospendlesstimetosleep.
Everything else asconstant,an increase of 1 yearineducation leads to a reductionof11.71 minutes/weekof sleep
time,onaverage.

Individualsignificancetest:
H0: 3 = 0 (educationisnotrelatedwithsleeptime)
H1: 3 =/ 0 (educationisrelatedwithsleeptime)
t3obs = 11.71/5.867 = 1.996 2 ,Sincepvalue<0.05werejecttheH0,soeducationisrelatedwithsleeptimefor
a95%confidence.

5.Wouldyousaythat work time, gender, education, andageexplainmuchofthe variationinsleeptime?Whatother


factors mightaffect thetime spent sleeping?Give at leasttwoexamples. Are these likely tobe correlated with work
time?

If we do the individual test for


age
and
agesqwe can check pvalue >0.1, so theyare not significant to ourmodel.
Worktime, genderandeducation areindividually significant,sincepvalue<0.1(atleast) sotheyexplainthevariations
intimesleepinourmodel.
R2 is ameasuretoexplainhowmuchofthedeviationsarounditsmeanareexplainedbythe model. Only12.3%ofthe
errors are explained by the variables of this modelso we conclude it doesnt explainmuch of the variation insleep
time.Furthermore,the R2 ofthismodelisalittleinflatedbytheadditionoftwoirrelevantvariables.

Others factors that might affect time spent sleeping can be


the number of hoursexposedto lightatnightinyour
house
and
alcoholcomsumption
.
For the first one,itcanbepositivelycorrelatedwithworktimebecausesomeonecanbeexposedtolightwhileworking
from home (light from the pc) while for the second one,
alcohol comsuption can be negatively correlated with
worktimebecausealcoholcomsuptiontendstoincreaseduringleisuretimeanddecreaseduringworktime.

6.Suppose thatyouhaveinformationabouttheheightoftheindividualsinyoursample.Wouldyouaddheighttoyour
model?Whatwouldbethebenefitsandconsequencesofaddingsuchavariable?

The heightoftheindividualsdoesnotaffectsleep timethereforeitisairrelevantvariable.Ifweincludethisvariablein


our model R2 would increase however R2 adjusted would be lower ( in comparison to my previous model). If we
include the variable height it would inflate the variance in all others variables in the equation. We might reject a
relevantvariablewhenweshouldnot.

PartII

1.Whichofthefollowingvariablescrsgpa,cumgpa,andtothrsarestatisticallysignificantatthe5percentlevel?
Hipothesis:
H0: crsgpa = 0
H0: cumgpa = 0
H0: tothrs = 0

H1: crsgpa =/ 0

H1: cumgpa =/ 0

H1: tothrs =/ 0

Criticalregion(RegionwhereH0isrejected)
tCR 1.96 CR : [1.96 + [

Fora5%level:
Ifweusethehomoscedasticerrors:
tcrsgpaobs = 0.9/0.175 = 5.14 CR Rej.H0

,Itisstatiscallysignificantfor95%confidencelevel

tcumgpaobs = 0.193/0.064 = 3.01 CR Rej.H0

,Itisstatiscallysignificantfor95%confidencelevel

ttothrsobs = 0.0014/0.0012 = 1.167


/ CR N otRej.H0

,Itisnotstatiscallysignificantfor95%confidencelevel

ifweusethe
heteroscedasticerrors:
tcrsgpaobs = 0.9/0.166 = 5.42 CR Rej.H0

,Itisstatiscallysignificantfor95%confidencelevel

tcumgpaobs = 0.193/0.074 = 2.61 CR Rej.H0

,Itisstatiscallysignificantfor95%confidencelevel

ttothrsobs = 0.0014/0.0012 = 1.167


/ CR N otRej.H0

,Itisnotstatiscallysignificantfor95%confidencelevel

2.Doesitmatterwhichstandarderrorsareusedin(1)?
Itappearstodoesnotmatterwhichstandarderrorsareusedin(1). Both
crsgpa
and
cumgpa
arestatiscally significant
for our model when we use homoscedasticor heteroscedastic errors.As for
tothrs variable not beingsignificantwe
concludeitisnotduetoaheteroscedasticityproblem.

3.Test whether there is aninseason effect on term GPA, usingbothstandard errors. Does the significancelevelat
whichthenullcanberejecteddependonthestandarderrorused?
Hipothesis:
H0: season = 0

H1: season =/ 0
Ifweusethehomoscedasticerrors:
tseasonobs = 0.157/0.098 = 1.6

ifweusetheheteroscedasticerrors:
tseasonobs = 0.157/0.08 = 1.96

The significance level atwhich the null can berejected depend onthe standarderror used.Inordertorejectthenull
hipothesis in case weusehomoscedastic errors wewouldneedan 10% < < 20% ,in case we use heteroscedastic
errorstorejectthenullhipothesiswewouldneedan 5% < < 10%.

PartIII

setup
>setwd("//lambe/152115145/Desktop/Rstudio/homework4")
>library(data.table)
>library(ggplot2)
>library(stargazer)
>load("apple.RData")

namemydatasetandconvertitintoadata.table
>dt.apple<data.table(data)

1.Istheremissingdatainyourdataset?

>summary(dt.apple)

There is no missing data otherwise with summary command it would appear number of
NA
observations in each
variable.

2.Reportthesummarystatistics(mean,standarddeviation,minimum,andmaximum)ofthe
variablesinyourdataset.Brieflycommentonthecharacteristicsofyoursample.

>stargazer(dt.apple,type="text")

All non numerical values areignored bystargazer command,so


date and
state arenotincludedinthistable(itislike
thatbecauseNonnumerivalvaluescannothaveamean,standarddeviationaminimumandamaximum).
Asfortherestofthevariables,wecanseeallofthemhave660observations.Forthevariable:
"id It isjust the personal identifier,it does not haveanimpact onthedependent variablesothesummarystatistics
alsodonothaveahelpfulmeaning.


educ We assume that in order to reach university educ > 12 and since meaneduc = 14.382andSdeduc = 2.274 , The
sampleismostlycomposedbypeoplethatreachuniversity.

regprc and ecoprc Bothvariablesvalues wererandomly assigned toeachfamilysotheyarenotrelatedtoother


observed factors. Regularapplesandecolabelledappleshaveafixedminimumpriceof minregprc = minecoprc = 0.59 and
a maximum priceforregular applesof maxregprc = 1.19 and for ecolabelled applesof maxecoprc = 1.59 .Themeanprice
forEcolabelledapplesishigherthanregularapples meanregprc < meanecoprc 0.883 < 1.082

inseason Since mininseason = 0andmaxinseason = 1 , itmustbe a dummy variable. Since meaninseason = 0.336 < 0.5 most
peopleanswered0sotheywerenotinterviewedinnovember.

hhsize In our sample, minimum and maximum household size is minhhsize = 1andmaxhhsize = 9 . Since the
meanhhsize = 2.941andSdhhsize = 1.526 ,themajorityofhouseholdsarebetween [1 4] householdmembers.

male Since minmale = 0andmaxmale = 1 , it must be a dummy variable. Since meanmale = 0.262 < 0.5 most people
answered0sooursampleismostlycomposedbywomen.

faminc In our sample, minimum and maximum family income is minf aminc = 5andmaxf aminc = 250 .We haveabig
discrepancy income. Since the meanf aminc = 53.409andSdf aminc = 35.741 ,

most families income is between

[17.668 89.15] thousandsofdollars.

age In our sample,minimum and maximum age of the household leader is minage = 19andmaxage = 88 .Sincethe
meanage = 44.523andSdage = 15.213 ,mosthouseholdleadersageisbetween [29.31 59.736] yearsold.

Reglbs Minimum and maximum consumption of regular apples is minreglbs = 0andmaxreglbs = 42 . We have a big
discrepancy of consumption of regular apples and this can be due to prices pairs were randomly assigned (for
example, low prices for high family incomes and high prices for low family incomes). Since the
meanreglbs = 1.282andSdreglbs = 2.91 ,themajorityconsumptionofregularapplesisbetween [0 4.192] pounds.

EcolbsMinimumandmaximumconsumptionofecolabelledapplesis minecolbs = 0andmaxecolbs = 42 .Wehaveabig


discrepancy of consumptionofecolabelledapples andthis can be due to pricespairswere randomly assigned (for
example, low prices for high family incomes and high prices for low family incomes). Since the
meanecolbs = 1.474andSdecolbs = 2.526 ,themajorityconsumptionofecolabelledapplesisbetween [0 4] pounds.

numlt5, num5_17 ,num18_64 and numgt64 It differentiates people of each household by their age. In our
sample, people withage between 18and64 have the higher mean,whichmeansonaveragethereare2adultswith
agebetween18and64ineachhousehold

3.Createa correlationtable. Identify any pairsof variables that exhibitacorrelationyoubelieveisworthmentioning.


Commentonwhyyouobservethesevalues.

>cor(dt.apple[, list(educ, inseason, hhsize, male, faminc, age, reglbs, ecolbs, numlt5, num5_17, num18_64,
numgt64)])


Thecorrelationbetweenthesamepairsisequalto1thisisdueto:
corr(x, x) = cov(x, x)/(Sdx * Sdx) = V ar(X)/(Sdx)2 = 1
AsforthevaluesIhighlighted:
corr(educ, faminc) = 0.2971 > 0 , there is a positive correlation between education and family income, mediumlow
linearrelationshipbetween educandfaminc .
corr(hhsize, age) = 0.32 < 0 , there is a negative correlation between household size and age, mediumlow linear
relationshipbetween hhsizeandage .
corr(hhsize, numlt5) = 0.5 > 0 , there is a positive correlation between household size and
number of household
membersyoungerthan5
,mediumlinearrelationshipbetween hhsizeandnumlt5 .
corr(hhsize, num5 17) = 0.684 > 0 , there is a positive correlation between householdsizeand
numberof household
memberswithagebetween5and17
,mediumhighlinearrelationshipbetween hhsizeandnum5 17 .
corr(hhsize, num18 64) = 0.6 > 0 , there is a positive correlation between household sizeand numberof household
memberswithagebetween18and64
,mediumlinearrelationshipbetween hhsizeandnum18 64 .
corr(hhsize, numgt64) = 0.145 < 0 , there is anegativecorrelationbetweenhouseholdsizeandnumberof household
members,lowlinearrelationshipbetween hhsizeandnumgt64 .

4.Plotthedistributionofecolbs.Commentonyourplot.

>qplot(data=dt.apple
,x=ecolbs
,geom="density")

Mostpeoplebuylessthan4poundsofecolabelledapplesandalmost50%donotbuyecolabelledapples.
5.Buildascatterplotoftherelationshipbetweenthepriceandquantityofecolabelledapples
includingastraightlinedepictingtherelationshipbetweenthesetwovariables.

>qplot(data=dt.apple
,x=ecoprc

,y=ecolbs
,geom=c("point","smooth")
,method=lm)

6.Canyoufindanyoutliersinthesample?Ifso,makeadecisionofwhattodowithsuch
observation(s)andcarefullyjustifyyourchoice.

>qplot(data=dt.apple
,x=ecoprc
,y=ecolbs
,geom='boxplot')

Wesuspectthatthedotsabovethelinearetheoutliers.Howevertobesureweshoulddothis:

>zscore<(dt.apple$ecolbsmean(dt.apple$ecolbs))/sd(dt.apple$ecolbs)
>dt.apple[zscore<=4|zscore>4,list(ecolbs,zscore)]


Onlydotsthatcorrespondto21or42ecolbsintheboxplotarereallytheoutliers.
We should not remove the outliers since
is not due to incorrectlyenteredor measured data (prices wereassigned
randomly)and itdoes not
create a significant association,in otherwords therelationshipbetweenpriceandquantity
ofecolabelledapplesarenotcreatedbytheoutliersasyoucanseeinthegraphofquestion5.

7.Estimateamodelofthequantityofecolabelledapplesasafunctionofthepriceofregular
andecolabelledapples.

>lm.quantity<lm(ecolbs~ecoprc+regprc,data=dt.apple)
>stargazer(lm.quantity,type="text")

8.Writedownthemodelestimatedin(7)inequationform.
##checktheorderoftheBetas
>coefficients(lm.quantity)

Ecolbs = 0 + 1ecoprc + 2regprc +


Ecolbs = 1.965 2.926ecoprc + 3.029regprc

9.Whatisthemeaningoftheinterceptofyourestimatedmodel?

0 = 1.965 ,whenregularandecolabelledapplesarefree,theconsumptionofecolabelledapplesis1.965pounds.

10.Interpretthesigns,magnitude,andsignificancelevelsofthecoefficientsofecoprcandregprc.

ecoprc = 2.926 Thesign:Aspriceofecolabelledapplesincreaseitsconsumptiondecrease.


We estimatethat ifwe increase ecolabelledapple priceby 1unit, the consumptionofecolabelledapplesdecrease
decreaseby2.926pounds.
Sincepvalue<0.01wedorejectthenullhypothesisofnotbeingsignificant.Ecolbsisstatiscallysignificant.

regprc = 3.029 Thesign:Aspriceofregularapplesincreaseecolabelledapplesconsumptionincrease.

We estimatethat ifwe increase priceby 1unit, the consumption ofecolabelled applesdecreasedecreaseby3.029


pounds.
Sincepvalue<0.01wedorejectthenullhypothesisofnotbeingsignificant.Regprcisstatiscallysignificant.

11.Whatisyourbestguessfortheamountofecolabelledapplesthatafamilypresentedwitha
ecoprcof1.05andaregprcof0.98wouldbuy?

>eco.apple<data.table(ecoprc=1.05,regprc=0.98)
>predict(lm.quantity,eco.apple)

.
My bestguess fortheamount boughtofecolabelledapplesthatafamilypresentedwithaecoprcof1.05unitsand a
regprcof0.98unitsis1.86pounds.

12.Computeandplottheregressionsresiduals.Commentonyourplot.

>lm.quantity.res<resid(lm.quantity)
>dt.apple[,residuals:=lm.quantity.res]
>dt.apple[,num.id:=row.names(dt.apple)]
>qplot(data=dt.apple

,x=num.id

,y=residuals

,geom="point")+theme_bw().

Thecloserarethedotsfrom0thebetterthemodelexplain
13.Calculatethetotalsumofsquares,explainedsumofsquaresandresidualsumofsquares
foryourmodel.Howwelldoesthismodelexplaintheamountofecolabelledapples?

>dt.apple[,ecolbs_hat:=predict(lm.quantity)]
>SST<dt.apple[,sum((ecolbsmean(ecolbs))^2)]
>SSE<dt.apple[,sum((ecolbs_hatmean(ecolbs))^2)]
>SSR<dt.apple[,sum((ecolbsecolbs_hat)^2)]


SST=SSE+SSR

This model explains R2 = SSE/SST = 0.0364 of the deviations around its mean value. It is a low R2 so we should
incorporatemorerelevantvariablesorrejectthemodel.

Potrebbero piacerti anche