Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1.Whatisthepartialeffectofageonsleeping?
Everything else asconstant,on average upto 34yearsof age the sleep timedecreases, above34 years the sleep
timeincreases.
2.Onaverage,howisbeingmalevsfemalerelatedtosleeptime?
Weconsideredthedummyvariable,
male:
>1ifmale
>0iffemale
sleep/male = 87.75 , Everything else as constant, a male sleeps 87.75 minutes/week more than a female, on
average.
sleep
and
worktime
aremeasuredinminutes,(5hours=300minutes)
5 = sleep/totwrk = 0.163
Prediction: sleep = 5totwrk = 0.163 * 300 = 48.9
50
Everything else as constant, if someone increases their worktime by 5 hours/week, on average,they will sleep 50
minutes/weekless.
If we consider the week has five working days, itis not alarge tradeoffbecause ifsomeone work 1hourmore per
workingdaythiswouldreducesleeptimeby10minutesonaworkingday.
4. Discuss the sign and magnitude of the estimated coefficient of education. Can you reject the hypothesis that
educationisnotrelatedwithsleeptime?
Individualsignificancetest:
H0: 3 = 0 (educationisnotrelatedwithsleeptime)
H1: 3 =/ 0 (educationisrelatedwithsleeptime)
t3obs = 11.71/5.867 = 1.996 2 ,Sincepvalue<0.05werejecttheH0,soeducationisrelatedwithsleeptimefor
a95%confidence.
6.Suppose thatyouhaveinformationabouttheheightoftheindividualsinyoursample.Wouldyouaddheighttoyour
model?Whatwouldbethebenefitsandconsequencesofaddingsuchavariable?
PartII
1.Whichofthefollowingvariablescrsgpa,cumgpa,andtothrsarestatisticallysignificantatthe5percentlevel?
Hipothesis:
H0: crsgpa = 0
H0: cumgpa = 0
H0: tothrs = 0
H1: crsgpa =/ 0
H1: cumgpa =/ 0
H1: tothrs =/ 0
Criticalregion(RegionwhereH0isrejected)
tCR 1.96 CR : [1.96 + [
Fora5%level:
Ifweusethehomoscedasticerrors:
tcrsgpaobs = 0.9/0.175 = 5.14 CR Rej.H0
,Itisstatiscallysignificantfor95%confidencelevel
,Itisstatiscallysignificantfor95%confidencelevel
,Itisnotstatiscallysignificantfor95%confidencelevel
ifweusethe
heteroscedasticerrors:
tcrsgpaobs = 0.9/0.166 = 5.42 CR Rej.H0
,Itisstatiscallysignificantfor95%confidencelevel
,Itisstatiscallysignificantfor95%confidencelevel
,Itisnotstatiscallysignificantfor95%confidencelevel
2.Doesitmatterwhichstandarderrorsareusedin(1)?
Itappearstodoesnotmatterwhichstandarderrorsareusedin(1). Both
crsgpa
and
cumgpa
arestatiscally significant
for our model when we use homoscedasticor heteroscedastic errors.As for
tothrs variable not beingsignificantwe
concludeitisnotduetoaheteroscedasticityproblem.
3.Test whether there is aninseason effect on term GPA, usingbothstandard errors. Does the significancelevelat
whichthenullcanberejecteddependonthestandarderrorused?
Hipothesis:
H0: season = 0
H1: season =/ 0
Ifweusethehomoscedasticerrors:
tseasonobs = 0.157/0.098 = 1.6
ifweusetheheteroscedasticerrors:
tseasonobs = 0.157/0.08 = 1.96
The significance level atwhich the null can berejected depend onthe standarderror used.Inordertorejectthenull
hipothesis in case weusehomoscedastic errors wewouldneedan 10% < < 20% ,in case we use heteroscedastic
errorstorejectthenullhipothesiswewouldneedan 5% < < 10%.
PartIII
setup
>setwd("//lambe/152115145/Desktop/Rstudio/homework4")
>library(data.table)
>library(ggplot2)
>library(stargazer)
>load("apple.RData")
namemydatasetandconvertitintoadata.table
>dt.apple<data.table(data)
1.Istheremissingdatainyourdataset?
>summary(dt.apple)
There is no missing data otherwise with summary command it would appear number of
NA
observations in each
variable.
2.Reportthesummarystatistics(mean,standarddeviation,minimum,andmaximum)ofthe
variablesinyourdataset.Brieflycommentonthecharacteristicsofyoursample.
>stargazer(dt.apple,type="text")
educ We assume that in order to reach university educ > 12 and since meaneduc = 14.382andSdeduc = 2.274 , The
sampleismostlycomposedbypeoplethatreachuniversity.
inseason Since mininseason = 0andmaxinseason = 1 , itmustbe a dummy variable. Since meaninseason = 0.336 < 0.5 most
peopleanswered0sotheywerenotinterviewedinnovember.
hhsize In our sample, minimum and maximum household size is minhhsize = 1andmaxhhsize = 9 . Since the
meanhhsize = 2.941andSdhhsize = 1.526 ,themajorityofhouseholdsarebetween [1 4] householdmembers.
male Since minmale = 0andmaxmale = 1 , it must be a dummy variable. Since meanmale = 0.262 < 0.5 most people
answered0sooursampleismostlycomposedbywomen.
faminc In our sample, minimum and maximum family income is minf aminc = 5andmaxf aminc = 250 .We haveabig
discrepancy income. Since the meanf aminc = 53.409andSdf aminc = 35.741 ,
age In our sample,minimum and maximum age of the household leader is minage = 19andmaxage = 88 .Sincethe
meanage = 44.523andSdage = 15.213 ,mosthouseholdleadersageisbetween [29.31 59.736] yearsold.
Reglbs Minimum and maximum consumption of regular apples is minreglbs = 0andmaxreglbs = 42 . We have a big
discrepancy of consumption of regular apples and this can be due to prices pairs were randomly assigned (for
example, low prices for high family incomes and high prices for low family incomes). Since the
meanreglbs = 1.282andSdreglbs = 2.91 ,themajorityconsumptionofregularapplesisbetween [0 4.192] pounds.
numlt5, num5_17 ,num18_64 and numgt64 It differentiates people of each household by their age. In our
sample, people withage between 18and64 have the higher mean,whichmeansonaveragethereare2adultswith
agebetween18and64ineachhousehold
>cor(dt.apple[, list(educ, inseason, hhsize, male, faminc, age, reglbs, ecolbs, numlt5, num5_17, num18_64,
numgt64)])
Thecorrelationbetweenthesamepairsisequalto1thisisdueto:
corr(x, x) = cov(x, x)/(Sdx * Sdx) = V ar(X)/(Sdx)2 = 1
AsforthevaluesIhighlighted:
corr(educ, faminc) = 0.2971 > 0 , there is a positive correlation between education and family income, mediumlow
linearrelationshipbetween educandfaminc .
corr(hhsize, age) = 0.32 < 0 , there is a negative correlation between household size and age, mediumlow linear
relationshipbetween hhsizeandage .
corr(hhsize, numlt5) = 0.5 > 0 , there is a positive correlation between household size and
number of household
membersyoungerthan5
,mediumlinearrelationshipbetween hhsizeandnumlt5 .
corr(hhsize, num5 17) = 0.684 > 0 , there is a positive correlation between householdsizeand
numberof household
memberswithagebetween5and17
,mediumhighlinearrelationshipbetween hhsizeandnum5 17 .
corr(hhsize, num18 64) = 0.6 > 0 , there is a positive correlation between household sizeand numberof household
memberswithagebetween18and64
,mediumlinearrelationshipbetween hhsizeandnum18 64 .
corr(hhsize, numgt64) = 0.145 < 0 , there is anegativecorrelationbetweenhouseholdsizeandnumberof household
members,lowlinearrelationshipbetween hhsizeandnumgt64 .
4.Plotthedistributionofecolbs.Commentonyourplot.
>qplot(data=dt.apple
,x=ecolbs
,geom="density")
Mostpeoplebuylessthan4poundsofecolabelledapplesandalmost50%donotbuyecolabelledapples.
5.Buildascatterplotoftherelationshipbetweenthepriceandquantityofecolabelledapples
includingastraightlinedepictingtherelationshipbetweenthesetwovariables.
>qplot(data=dt.apple
,x=ecoprc
,y=ecolbs
,geom=c("point","smooth")
,method=lm)
6.Canyoufindanyoutliersinthesample?Ifso,makeadecisionofwhattodowithsuch
observation(s)andcarefullyjustifyyourchoice.
>qplot(data=dt.apple
,x=ecoprc
,y=ecolbs
,geom='boxplot')
Wesuspectthatthedotsabovethelinearetheoutliers.Howevertobesureweshoulddothis:
>zscore<(dt.apple$ecolbsmean(dt.apple$ecolbs))/sd(dt.apple$ecolbs)
>dt.apple[zscore<=4|zscore>4,list(ecolbs,zscore)]
Onlydotsthatcorrespondto21or42ecolbsintheboxplotarereallytheoutliers.
We should not remove the outliers since
is not due to incorrectlyenteredor measured data (prices wereassigned
randomly)and itdoes not
create a significant association,in otherwords therelationshipbetweenpriceandquantity
ofecolabelledapplesarenotcreatedbytheoutliersasyoucanseeinthegraphofquestion5.
7.Estimateamodelofthequantityofecolabelledapplesasafunctionofthepriceofregular
andecolabelledapples.
>lm.quantity<lm(ecolbs~ecoprc+regprc,data=dt.apple)
>stargazer(lm.quantity,type="text")
8.Writedownthemodelestimatedin(7)inequationform.
##checktheorderoftheBetas
>coefficients(lm.quantity)
9.Whatisthemeaningoftheinterceptofyourestimatedmodel?
0 = 1.965 ,whenregularandecolabelledapplesarefree,theconsumptionofecolabelledapplesis1.965pounds.
10.Interpretthesigns,magnitude,andsignificancelevelsofthecoefficientsofecoprcandregprc.
11.Whatisyourbestguessfortheamountofecolabelledapplesthatafamilypresentedwitha
ecoprcof1.05andaregprcof0.98wouldbuy?
>eco.apple<data.table(ecoprc=1.05,regprc=0.98)
>predict(lm.quantity,eco.apple)
.
My bestguess fortheamount boughtofecolabelledapplesthatafamilypresentedwithaecoprcof1.05unitsand a
regprcof0.98unitsis1.86pounds.
12.Computeandplottheregressionsresiduals.Commentonyourplot.
>lm.quantity.res<resid(lm.quantity)
>dt.apple[,residuals:=lm.quantity.res]
>dt.apple[,num.id:=row.names(dt.apple)]
>qplot(data=dt.apple
,x=num.id
,y=residuals
,geom="point")+theme_bw().
Thecloserarethedotsfrom0thebetterthemodelexplain
13.Calculatethetotalsumofsquares,explainedsumofsquaresandresidualsumofsquares
foryourmodel.Howwelldoesthismodelexplaintheamountofecolabelledapples?
>dt.apple[,ecolbs_hat:=predict(lm.quantity)]
>SST<dt.apple[,sum((ecolbsmean(ecolbs))^2)]
>SSE<dt.apple[,sum((ecolbs_hatmean(ecolbs))^2)]
>SSR<dt.apple[,sum((ecolbsecolbs_hat)^2)]
SST=SSE+SSR
This model explains R2 = SSE/SST = 0.0364 of the deviations around its mean value. It is a low R2 so we should
incorporatemorerelevantvariablesorrejectthemodel.