Homework4final PDF

PartI
1.Whatisthepartialeffectofageonsleeping?
sleep/age = 8.697 + 2 * 0, 128ageage = 33.97 34
Everything else asconstant,on average upto 34yearsof age the sleep timedecreases, above34 years the sleep
timeincreases.
2.Onaverage,howisbeingmalevsfemalerelatedtosleeptime?
Weconsideredthedummyvariable,
male:
>1ifmale
>0iffemale
sleep/male = 87.75 , Everything else as constant, a male sleeps 87.75 minutes/week more than a female, on
average.
3.If someone increases their worktime by 5 hours/week, byhowmanyminutesissleeptimepredictedtofall? Isthis

alargetradeoff?
sleep
and
worktime
aremeasuredinminutes,(5hours=300minutes)
5 = sleep/totwrk = 0.163
Prediction: sleep = 5totwrk = 0.163 * 300 = 48.9
50
Everything else as constant, if someone increases their worktime by 5 hours/week, on average,they will sleep 50
minutes/weekless.
If we consider the week has five working days, itis not alarge tradeoffbecause ifsomeone work 1hourmore per
workingdaythiswouldreducesleeptimeby10minutesonaworkingday.
4. Discuss the sign and magnitude of the estimated coefficient of education. Can you reject the hypothesis that
educationisnotrelatedwithsleeptime?
sleep/educ = 11.71 < 0

Peoplemoreeducatedtendtohavebusyworkinglifesandthereforetheytendtospendlesstimetosleep.
Everything else asconstant,an increase of 1 yearineducation leads to a reductionof11.71 minutes/weekof sleep
time,onaverage.
Individualsignificancetest:
H0: 3 = 0 (educationisnotrelatedwithsleeptime)
H1: 3 =/ 0 (educationisrelatedwithsleeptime)
t3obs = 11.71/5.867 = 1.996 2 ,Sincepvalue<0.05werejecttheH0,soeducationisrelatedwithsleeptimefor
a95%confidence.
5.Wouldyousaythat work time, gender, education, andageexplainmuchofthe variationinsleeptime?Whatother

factors mightaffect thetime spent sleeping?Give at leasttwoexamples. Are these likely tobe correlated with work
time?
If we do the individual test for

age
and
agesqwe can check pvalue >0.1, so theyare not significant to ourmodel.
Worktime, genderandeducation areindividually significant,sincepvalue<0.1(atleast) sotheyexplainthevariations
intimesleepinourmodel.
R2 is ameasuretoexplainhowmuchofthedeviationsarounditsmeanareexplainedbythe model. Only12.3%ofthe
errors are explained by the variables of this modelso we conclude it doesnt explainmuch of the variation insleep
time.Furthermore,the R2 ofthismodelisalittleinflatedbytheadditionoftwoirrelevantvariables.
Others factors that might affect time spent sleeping can be

the number of hoursexposedto lightatnightinyour
house
and
alcoholcomsumption
.
For the first one,itcanbepositivelycorrelatedwithworktimebecausesomeonecanbeexposedtolightwhileworking
from home (light from the pc) while for the second one,
alcohol comsuption can be negatively correlated with
worktimebecausealcoholcomsuptiontendstoincreaseduringleisuretimeanddecreaseduringworktime.
6.Suppose thatyouhaveinformationabouttheheightoftheindividualsinyoursample.Wouldyouaddheighttoyour
model?Whatwouldbethebenefitsandconsequencesofaddingsuchavariable?
The heightoftheindividualsdoesnotaffectsleep timethereforeitisairrelevantvariable.Ifweincludethisvariablein

our model R2 would increase however R2 adjusted would be lower ( in comparison to my previous model). If we
include the variable height it would inflate the variance in all others variables in the equation. We might reject a
relevantvariablewhenweshouldnot.
PartII
1.Whichofthefollowingvariablescrsgpa,cumgpa,andtothrsarestatisticallysignificantatthe5percentlevel?
Hipothesis:
H0: crsgpa = 0
H0: cumgpa = 0
H0: tothrs = 0
H1: crsgpa =/ 0
H1: cumgpa =/ 0
H1: tothrs =/ 0
Criticalregion(RegionwhereH0isrejected)
tCR 1.96 CR : [1.96 + [
Fora5%level:
Ifweusethehomoscedasticerrors:
tcrsgpaobs = 0.9/0.175 = 5.14 CR Rej.H0
,Itisstatiscallysignificantfor95%confidencelevel
tcumgpaobs = 0.193/0.064 = 3.01 CR Rej.H0
ttothrsobs = 0.0014/0.0012 = 1.167

/ CR N otRej.H0
,Itisnotstatiscallysignificantfor95%confidencelevel
ifweusethe
heteroscedasticerrors:
tcrsgpaobs = 0.9/0.166 = 5.42 CR Rej.H0
tcumgpaobs = 0.193/0.074 = 2.61 CR Rej.H0
ttothrsobs = 0.0014/0.0012 = 1.167

/ CR N otRej.H0
,Itisnotstatiscallysignificantfor95%confidencelevel
2.Doesitmatterwhichstandarderrorsareusedin(1)?
Itappearstodoesnotmatterwhichstandarderrorsareusedin(1). Both
crsgpa
and
cumgpa
arestatiscally significant
for our model when we use homoscedasticor heteroscedastic errors.As for
tothrs variable not beingsignificantwe
concludeitisnotduetoaheteroscedasticityproblem.
3.Test whether there is aninseason effect on term GPA, usingbothstandard errors. Does the significancelevelat
whichthenullcanberejecteddependonthestandarderrorused?
Hipothesis:
H0: season = 0
H1: season =/ 0
Ifweusethehomoscedasticerrors:
tseasonobs = 0.157/0.098 = 1.6
ifweusetheheteroscedasticerrors:
tseasonobs = 0.157/0.08 = 1.96
The significance level atwhich the null can berejected depend onthe standarderror used.Inordertorejectthenull
hipothesis in case weusehomoscedastic errors wewouldneedan 10% < < 20% ,in case we use heteroscedastic
errorstorejectthenullhipothesiswewouldneedan 5% < < 10%.
PartIII
setup
>setwd("//lambe/152115145/Desktop/Rstudio/homework4")
>library(data.table)
>library(ggplot2)
>library(stargazer)
>load("apple.RData")
namemydatasetandconvertitintoadata.table
>dt.apple<data.table(data)
1.Istheremissingdatainyourdataset?
>summary(dt.apple)
There is no missing data otherwise with summary command it would appear number of
NA
observations in each
variable.
2.Reportthesummarystatistics(mean,standarddeviation,minimum,andmaximum)ofthe
variablesinyourdataset.Brieflycommentonthecharacteristicsofyoursample.
>stargazer(dt.apple,type="text")
All non numerical values areignored bystargazer command,so

date and
state arenotincludedinthistable(itislike
thatbecauseNonnumerivalvaluescannothaveamean,standarddeviationaminimumandamaximum).
Asfortherestofthevariables,wecanseeallofthemhave660observations.Forthevariable:
"id It isjust the personal identifier,it does not haveanimpact onthedependent variablesothesummarystatistics
alsodonothaveahelpfulmeaning.

educ We assume that in order to reach university educ > 12 and since meaneduc = 14.382andSdeduc = 2.274 , The
sampleismostlycomposedbypeoplethatreachuniversity.
regprc and ecoprc Bothvariablesvalues wererandomly assigned toeachfamilysotheyarenotrelatedtoother

observed factors. Regularapplesandecolabelledappleshaveafixedminimumpriceof minregprc = minecoprc = 0.59 and
a maximum priceforregular applesof maxregprc = 1.19 and for ecolabelled applesof maxecoprc = 1.59 .Themeanprice
forEcolabelledapplesishigherthanregularapples meanregprc < meanecoprc 0.883 < 1.082
inseason Since mininseason = 0andmaxinseason = 1 , itmustbe a dummy variable. Since meaninseason = 0.336 < 0.5 most
peopleanswered0sotheywerenotinterviewedinnovember.
hhsize In our sample, minimum and maximum household size is minhhsize = 1andmaxhhsize = 9 . Since the
meanhhsize = 2.941andSdhhsize = 1.526 ,themajorityofhouseholdsarebetween [1 4] householdmembers.
male Since minmale = 0andmaxmale = 1 , it must be a dummy variable. Since meanmale = 0.262 < 0.5 most people
answered0sooursampleismostlycomposedbywomen.
faminc In our sample, minimum and maximum family income is minf aminc = 5andmaxf aminc = 250 .We haveabig
discrepancy income. Since the meanf aminc = 53.409andSdf aminc = 35.741 ,
most families income is between
[17.668 89.15] thousandsofdollars.
age In our sample,minimum and maximum age of the household leader is minage = 19andmaxage = 88 .Sincethe
meanage = 44.523andSdage = 15.213 ,mosthouseholdleadersageisbetween [29.31 59.736] yearsold.
Reglbs Minimum and maximum consumption of regular apples is minreglbs = 0andmaxreglbs = 42 . We have a big
discrepancy of consumption of regular apples and this can be due to prices pairs were randomly assigned (for
example, low prices for high family incomes and high prices for low family incomes). Since the
meanreglbs = 1.282andSdreglbs = 2.91 ,themajorityconsumptionofregularapplesisbetween [0 4.192] pounds.
EcolbsMinimumandmaximumconsumptionofecolabelledapplesis minecolbs = 0andmaxecolbs = 42 .Wehaveabig

discrepancy of consumptionofecolabelledapples andthis can be due to pricespairswere randomly assigned (for
example, low prices for high family incomes and high prices for low family incomes). Since the
meanecolbs = 1.474andSdecolbs = 2.526 ,themajorityconsumptionofecolabelledapplesisbetween [0 4] pounds.
numlt5, num5_17 ,num18_64 and numgt64 It differentiates people of each household by their age. In our
sample, people withage between 18and64 have the higher mean,whichmeansonaveragethereare2adultswith
agebetween18and64ineachhousehold
3.Createa correlationtable. Identify any pairsof variables that exhibitacorrelationyoubelieveisworthmentioning.

Commentonwhyyouobservethesevalues.
>cor(dt.apple[, list(educ, inseason, hhsize, male, faminc, age, reglbs, ecolbs, numlt5, num5_17, num18_64,
numgt64)])

Thecorrelationbetweenthesamepairsisequalto1thisisdueto:
corr(x, x) = cov(x, x)/(Sdx * Sdx) = V ar(X)/(Sdx)2 = 1
AsforthevaluesIhighlighted:
corr(educ, faminc) = 0.2971 > 0 , there is a positive correlation between education and family income, mediumlow
linearrelationshipbetween educandfaminc .
corr(hhsize, age) = 0.32 < 0 , there is a negative correlation between household size and age, mediumlow linear
relationshipbetween hhsizeandage .
corr(hhsize, numlt5) = 0.5 > 0 , there is a positive correlation between household size and
number of household
membersyoungerthan5
,mediumlinearrelationshipbetween hhsizeandnumlt5 .
corr(hhsize, num5 17) = 0.684 > 0 , there is a positive correlation between householdsizeand
numberof household
memberswithagebetween5and17
,mediumhighlinearrelationshipbetween hhsizeandnum5 17 .
corr(hhsize, num18 64) = 0.6 > 0 , there is a positive correlation between household sizeand numberof household
memberswithagebetween18and64
,mediumlinearrelationshipbetween hhsizeandnum18 64 .
corr(hhsize, numgt64) = 0.145 < 0 , there is anegativecorrelationbetweenhouseholdsizeandnumberof household
members,lowlinearrelationshipbetween hhsizeandnumgt64 .
4.Plotthedistributionofecolbs.Commentonyourplot.
>qplot(data=dt.apple
,x=ecolbs
,geom="density")
Mostpeoplebuylessthan4poundsofecolabelledapplesandalmost50%donotbuyecolabelledapples.
5.Buildascatterplotoftherelationshipbetweenthepriceandquantityofecolabelledapples
includingastraightlinedepictingtherelationshipbetweenthesetwovariables.
,x=ecoprc
,y=ecolbs
,geom=c("point","smooth")
,method=lm)
6.Canyoufindanyoutliersinthesample?Ifso,makeadecisionofwhattodowithsuch
observation(s)andcarefullyjustifyyourchoice.
,x=ecoprc
,y=ecolbs
,geom='boxplot')
Wesuspectthatthedotsabovethelinearetheoutliers.Howevertobesureweshoulddothis:
>zscore<(dt.apple$ecolbsmean(dt.apple$ecolbs))/sd(dt.apple$ecolbs)
>dt.apple[zscore<=4|zscore>4,list(ecolbs,zscore)]

Onlydotsthatcorrespondto21or42ecolbsintheboxplotarereallytheoutliers.
We should not remove the outliers since
is not due to incorrectlyenteredor measured data (prices wereassigned
randomly)and itdoes not
create a significant association,in otherwords therelationshipbetweenpriceandquantity
ofecolabelledapplesarenotcreatedbytheoutliersasyoucanseeinthegraphofquestion5.
7.Estimateamodelofthequantityofecolabelledapplesasafunctionofthepriceofregular
andecolabelledapples.
>lm.quantity<lm(ecolbs~ecoprc+regprc,data=dt.apple)
>stargazer(lm.quantity,type="text")
8.Writedownthemodelestimatedin(7)inequationform.
##checktheorderoftheBetas
>coefficients(lm.quantity)
Ecolbs = 0 + 1ecoprc + 2regprc +

Ecolbs = 1.965 2.926ecoprc + 3.029regprc
9.Whatisthemeaningoftheinterceptofyourestimatedmodel?
0 = 1.965 ,whenregularandecolabelledapplesarefree,theconsumptionofecolabelledapplesis1.965pounds.
10.Interpretthesigns,magnitude,andsignificancelevelsofthecoefficientsofecoprcandregprc.
ecoprc = 2.926 Thesign:Aspriceofecolabelledapplesincreaseitsconsumptiondecrease.

We estimatethat ifwe increase ecolabelledapple priceby 1unit, the consumptionofecolabelledapplesdecrease
decreaseby2.926pounds.
Sincepvalue<0.01wedorejectthenullhypothesisofnotbeingsignificant.Ecolbsisstatiscallysignificant.
regprc = 3.029 Thesign:Aspriceofregularapplesincreaseecolabelledapplesconsumptionincrease.
We estimatethat ifwe increase priceby 1unit, the consumption ofecolabelled applesdecreasedecreaseby3.029

pounds.
Sincepvalue<0.01wedorejectthenullhypothesisofnotbeingsignificant.Regprcisstatiscallysignificant.
11.Whatisyourbestguessfortheamountofecolabelledapplesthatafamilypresentedwitha
ecoprcof1.05andaregprcof0.98wouldbuy?
>eco.apple<data.table(ecoprc=1.05,regprc=0.98)
>predict(lm.quantity,eco.apple)
.
My bestguess fortheamount boughtofecolabelledapplesthatafamilypresentedwithaecoprcof1.05unitsand a
regprcof0.98unitsis1.86pounds.
12.Computeandplottheregressionsresiduals.Commentonyourplot.
>lm.quantity.res<resid(lm.quantity)
>dt.apple[,residuals:=lm.quantity.res]
>dt.apple[,num.id:=row.names(dt.apple)]
,x=num.id
,y=residuals
,geom="point")+theme_bw().
Thecloserarethedotsfrom0thebetterthemodelexplain
13.Calculatethetotalsumofsquares,explainedsumofsquaresandresidualsumofsquares
foryourmodel.Howwelldoesthismodelexplaintheamountofecolabelledapples?
>dt.apple[,ecolbs_hat:=predict(lm.quantity)]
>SST<dt.apple[,sum((ecolbsmean(ecolbs))^2)]
>SSE<dt.apple[,sum((ecolbs_hatmean(ecolbs))^2)]
>SSR<dt.apple[,sum((ecolbsecolbs_hat)^2)]

SST=SSE+SSR
This model explains R2 = SSE/SST = 0.0364 of the deviations around its mean value. It is a low R2 so we should
incorporatemorerelevantvariablesorrejectthemodel.

Homework4final PDF

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Homework4final PDF

Caricato da

Copyright:

Formati disponibili

PartI

sleep/age = 8.697 + 2 * 0, 128ageage = 33.97 34

3.If someone increases their worktime by 5 hours/week, byhowmanyminutesissleeptimepredictedtofall? Isthis

sleep/educ = 11.71 < 0

5.Wouldyousaythat work time, gender, education, andageexplainmuchofthe variationinsleeptime?Whatother

If we do the individual test for

Others factors that might affect time spent sleeping can be

The heightoftheindividualsdoesnotaffectsleep timethereforeitisairrelevantvariable.Ifweincludethisvariablein

tcumgpaobs = 0.193/0.064 = 3.01 CR Rej.H0

ttothrsobs = 0.0014/0.0012 = 1.167

tcumgpaobs = 0.193/0.074 = 2.61 CR Rej.H0

ttothrsobs = 0.0014/0.0012 = 1.167

All non numerical values areignored bystargazer command,so

regprc and ecoprc Bothvariablesvalues wererandomly assigned toeachfamilysotheyarenotrelatedtoother

most families income is between

[17.668 89.15] thousandsofdollars.

EcolbsMinimumandmaximumconsumptionofecolabelledapplesis minecolbs = 0andmaxecolbs = 42 .Wehaveabig

3.Createa correlationtable. Identify any pairsof variables that exhibitacorrelationyoubelieveisworthmentioning.

Ecolbs = 0 + 1ecoprc + 2regprc +

ecoprc = 2.926 Thesign:Aspriceofecolabelledapplesincreaseitsconsumptiondecrease.

regprc = 3.029 Thesign:Aspriceofregularapplesincreaseecolabelledapplesconsumptionincrease.

We estimatethat ifwe increase priceby 1unit, the consumption ofecolabelled applesdecreasedecreaseby3.029

Potrebbero piacerti anche