Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
IntroductiontoROutline
I. DataDescription I II. DataAnalysis
i. Commandfunctions i ii. Handrolling
11/20/2007
Christenson&Powell:IntrotoR
DataAnalysis:DescriptiveStats
Rhasseveralbuiltin commandsfor describingdata Thelist() commandcanoutput p allelementsofan object
DataAnalysis:DescriptiveStats
Thesummary() y commandcanbe usedtodescribeall variablescontained withinadataframe Thesummary() commandcanalso beusedwith individualvariables
DataAnalysis:DescriptiveStats
Simple p plots p canalso providefamiliarity withthedata Thehist() commandp producesa histogramforany givendatavalues
DataAnalysis:DescriptiveStats
Simple p plots p canalso providefamiliarity withthedata Theplot() commandcan produceboth univariateand bivariateplotsfor anygivenobjects
DataAnalysis:Regression
Asmentionedabove, ,oneofthebig gp perksofusing gRis flexibility. Rcomeswithitsowncannedlinearregressioncommand: lm(y ~ x) However,weregoingtouseRtomakeourownOLS estimator.Thenwewillcomparewiththecanned procedure,aswellasStata.
DataAnalysis:Regression
First,letstakealookatour codeforthehandrolledOLS estimator TheHolyGrail: (XX) (X X)-1 X XY Y Weneedasinglematrixof independentvariables Thecbind() command takestheindividualvariable vectorsandcombinesthem intoonexvariablematrix A1isincludedasthefirst elementtoaccountforthe constant.
DataAnalysis:Regression
Withthexandy matricescomplete, wecannow manipulatethemto producecoefficients. Afterperformingthe divinemultiplication, wecanobservethe estimatesbyentering theobjectname(in thiscaseb).
DataAnalysis:Regression
Withthexandy matricescomplete, wecannow manipulatethemto producecoefficients. Afterperformingthe divinemultiplication, wecanobservethe estimates byentering theobjectname(in thiscaseb).
DataAnalysis:Regression
Tofindthestandard errors,weneedto computeboththe varianceoftheresiduals and dthe h covmatrixof fthe h xs. Thesqrtofthediagonal elements l of fthis hi varcov matrix willgiveusthe standarderrors. O h teststatistics Other i i canbe b easilycomputed. Viewthestandarderrors.
DataAnalysis:Regression
Tofindthestandard errors,weneedto computeboththe varianceoftheresiduals and dthe h covmatrixof fthe h xs. Thesqrtofthediagonal elements l of fthis hi varcov matrixwillgiveusthe standarderrors. O h teststatistics Other i i can beeasilycomputed. Viewthestandarderrors.
DataAnalysis:Regression
Tofindthestandard errors,weneedto computeboththe varianceoftheresiduals and dthe h covmatrixof fthe h xs. Thesqrtofthediagonal elements l of fthis hi varcov matrixwillgiveusthe standarderrors. O h teststatistics Other i i canbe b easilycomputed. Viewthestandarderrors.
DataAnalysis:Regression
TimetoCompare p Usethelm() commandtoestimate themodelusingRs cannedprocedure p Aswecansee,the estimatesarevery similar
DataAnalysis:Regression
TimetoCompare p Wecanalsoseehow boththehandrolled and dcanned dOLS proceduresstackup toStata Usethereg commandtoestimate themodel Aswecansee,the estimatesareonce againverysimilar
DataAnalysis:Regression
DataAnalysis: y Regression g
OtherUsefulCommands
lm
Linear Model
glm
- General lm
lme
Mixed Effects
multinom
- Multinomial
Logit
anova
optim
- General
Optimizer
OLSDiagnosticsinR
Postestimationdiagnostics g arekey ytodata analysis
Wewanttomakesureweestimatedtheproper model Besides,Irfan willhurtyouifyouneglecttodothis
OLSDiagnosticsinR
Whatcouldbeunjustifiably drivingourdata?
Outlier: O tli unusual lobservation b ti Leverage:abilitytochangetheslopeofthe regressionline Influence:thecombinedimpactofstrongleverage andoutlierstatus
AccordingtoJohnFox,influence=leverage*outliers
11/20/2007
Christenson&Powell:IntrotoR
20
OLSDiagnostics:Leverage
Recall eca ou ouro ols s model ode
ols.model1<-lm(formula = repvshr~income+presvote+pressup)
Ourmeasureofleverage:isthehi orhatvalue
Itsjustthepredictedvalueswrittenintermsofhi Where, Where Hij isthecontributionofobservationYitothefitted valueYj Ifhij islarge,thentheith observationhasasignificantimpacton thejth fittedvalue So,skippingtheformulas,weknowthatthelargerthehatvalue thegreatertheleverageofthatobservation
11/20/2007
Christenson&Powell:IntrotoR
21
OLSDiagnostics:Leverage
Findthehatvalues
hatvalues(ols.model1)
Calculatetheaveragehatvalue
avg.mod1<-ncol(x)/nrow(x)
11/20/2007 Christenson&Powell:IntrotoR 22
OLSDiagnostics:Leverage
Butapictureisworthahundred numbers? Graphthehatvalueswithlinesfor theaverage,twicetheavg (large samples)andthreetimestheavg (smallsamples)hatvalues
plot(hatvalues(ols.model 1)) abline(h=1*(ncol(x))/nro w(x)) abline(h=2*(ncol(x))/nro bli (h 2*( l( ))/ w(x)) abline(h=3*(ncol(x))/nro w(x)) identify(hatvalues(ols.m odel1))
identify letsusselectthedata pointsinthenewgraph
5 hatvalues(ols.model1) 2 0.35 18 0.30 20
0.20
0.25
3 11
0.15
14 0.10
1 5
19
State#2isovertwicetheavg Nothingabovethreetimes
Christenson&Powell:IntrotoR
10 Index
15
20
11/20/2007
23
OLSDiagnostics:Outliers
Canwefindany ydatap pointsthatareunusualforY ui giventheXs? * ui = u ( 1 ) 1 hi Usestudentized residuals
Wecanseewhether h h there h isasignificant f change h in themodel Iftheirabsolutevaluesarelarger g than2,thenthe correspondingobservationsarelikelytobeoutliers) rstudent(ols.model1)
11/20/2007
Christenson&Powell:IntrotoR
24
OLSDiagnostics:Outliers
Again,letsplotthemwith li f lines for2&2 States2and3appeartobe outliers,ordarnclose Weshoulddefinitelytakea lookatwhatmakesthese statesunusual
Perhapsthereisamistake i data in d entry Perhapsthemodelis misspecified intermsof functionalform (forthcoming)oromitted vars Maybeyoucanthrowout yourbadobservation Ify youmustincludethebad observation,tryrobust regression
11/20/2007
2 2 14 1 15 1 19 0 10
rstu udent(ols.model1)
5 -1
22 3 -2
10 Index
15
20
Christenson&Powell:IntrotoR
25
OLSDiagnostics:Influence
0.5
coo okd(ols.model1)
0.2
0.3
0.4
'2 i
k + 1
h 1 hi
States2and(maybe)3areinthe troublezone
20
11/20/2007
Christenson&Powell:IntrotoR
26
OLSDiagnostics:Influence
Forahostofmeasures ofinfluence, influence including df betasanddf fits
influence.measu res(ols.model1)
dfbeta givesthe influenceofan observationonthe coefficients orthe changeinivscoefficient causedbydeletinga singleobservation Simplecommandsfor partialregressionplots canbefoundonFoxs website website
11/20/2007 Christenson&Powell:IntrotoR 27
OLSDiagnostics:Normality
Studen ntized Residuals(ols s.model1)
2 14
-1
22 3 -2 13 -2 -1 0 norm Quantiles 1 2
Christenson&Powell:IntrotoR
28
OLSDiagnostics:Normality
Asimple p density yp plot ofthestudentized residualshelpsto determinethenature ofourdata Theapparent deviationfromthe normalcurveisnot severe butthere severe, certainlyseemstobe aslightnegativeskew
11/20/2007
density.default(x = rstudent(ols.model1))
0.4 Density 0.0 -4 0.1 0.2 0. .3
-2
0 N = 22 Bandwidth = 0.4217
Christenson&Powell:IntrotoR
29
OLSDiagnostics:ErrorVariance
Wecanalsoeasilylookfor heteroskedasticity Plottingtheresidualsagainstthe fittedvaluesandthecontinuous independentvariablesletsus examineourstatistical lmodel d lfor f thepresenceofunbalanced errorvariance
par(mfrow=c(2,2)) plot(resid(ols.model1) ~fitted.values(ols.mod el1)) plot(resid(ols.model1) p ~income) plot(resid(ols.model1) ~presvote) p plot(resid(ols.model1) ( ( ) ~pressup)
11/20/2007
10 resid(ols.model1) resid(ols.model1) 30 40 50 60 70 0 -10 -10 -20 30000 0 10
-20
35000
40000 income
45000
50000
fitted.values(ols.model1)
10
resid(o ols.model1)
-10
-20
-20 65
-10
10
70
75
80
85
90
95
pressup
Christenson&Powell:IntrotoR
30
OLSDiagnostics:ErrorVariance
Formaltestsforheteroskedasticity areavailablefromthelmtest library
library(lmtest) bptest(ols.model1) willgiveyoutheBreuschPaganteststat gqtest(ols.model1) willgiveyoutheGoldfeldQuandttest stat hmctest(ols.model1)willgiveyoutheHarrisonMcCabeteststat
11/20/2007
Christenson&Powell:IntrotoR
31
OLSDiagnostics:Collinearity
Finally,letslookoutfor collinearity Togetthevarianceinflation factors
vif(ols.model1)
OLSDiagnostics:Shortcut
Residuals vs Fitted Normal Q-Q
14
10
Residuals
-10
-1
-20
13
-2 13
30
40
50 Fitted values
60
70
-2
-1
Theoretical Quantiles
N Nowyouh haveno excusenottorunsome diagnostics! Btw, Bt l look kat tthe th high hi h residualsinthervf plot for14,13and3 suggestingoutliers
11/20/2007
Scale-Location
1.5
13 3 2
Cook's distance
0.5 Cook's d distance 0.3 0 0.4
2
Standardize ed residuals
1.0
0.5
0.2
3 13
0.0
30
40
50 Fitted values
60
70
0.0
0.1
10
15
20
Obs. number
Christenson&Powell:IntrotoR
33
TheFinalAct:LoopsandFunctions
Aswasmentionedabove, ,Rsbiggest gg assetisitsflexibility. y Loopsandfunctionsdirectlyutilizethisasset. Loopscanbeimplementedforanumberofpurposes, essentiallywhenrepeatedactionsareneeded(i.e. simulations). ) Functionsallowustocreateourowncommands.Thisis especiallyusefulwhenacannedproceduredoesnotexist. WewillcreateourownOLSfunctionwiththehandrolled codeusedearlier.
Loops
for loops p arethe mostcommonandthe onlytypeofloopwe willlookattoday. today Thefirstloop p commandattheright showssimpleloop iteration. iteration
Loops
However, ,wecanalso seehowloopscanbe alittlemoreuseful. The Th second dexample l atright(although inefficient)calculates themeanofincome Notehowtheindex accesseselementsof theincomevector. LoopsandMonte Carlo
Loops
However, ,wecanalso seehowloopscanbe alittlemoreuseful. The Th second dexample l atright(although inefficient)calculates themeanofincome Notehowtheindex accesseselementsof theincomevector. LoopsandMonte Carlo
Functions
Nowwewillmakeourown linearregressionfunction usingourhandrolledOLS code Functionsrequireinputs (whicharetheobjectstobe utilized)andarguments (whicharethecommands thatthefunctionperforms) Theactualestimation proceduredoesnotchange. However somechangesare However, made.
Functions
First,wehavetotellRthat wearecreatingafunction. Wellnameitols. Thislets Thi l t usgeneralize li the th proceduretomultiple objects. Second,wehavetotellthe functionwhatwewant returnedorwhatwewant theoutputtolooklike.
Functions
First,wehavetotellRthat wearecreatingafunction. Wellnameitols. Thisl Thi lets t usgeneralize li th the proceduretomultiple objects. Second,wehavetotellthe functionwhatwewant returnedorwhatwewant theoutputtolooklike.
Functions
First,wehavetotellRthat wearecreatingafunction. Wellnameitols. Thislets Thi l t usgeneralize li the th proceduretomultiple objects. Second,wehavetotellthe functionwhatwewant returnedorwhatwe wanttheoutputtolook like.
Functions
OLS:HandrolledvsFunction
Functions
Implementing p gour newfunctionols, wegetpreciselythe outputthatwe askedfor. Wecancheckthis againsttheresults producedbythe standardlm function.
Functions
Implementing p gour newfunctionols, wegetpreciselythe outputthatweasked for. Wecancheckthis againsttheresults producedbythe standardlm function.
FavoriteResources
InvaluableResourcesonline
The h Rmanuals l http://cran.rproject.org/manuals.html Foxsslideshttp://socserv.mcmaster.ca/jfox/Courses/Rcourse/index.html Faraway's book http://cran.r // project.org/doc/contrib/Faraway / / / PRA.pdf Anderson'sICPSRlecturesusingR http://socserv.mcmaster.ca/andersen/icpsr.html Arai'sguidehttp://people.su.se/~ma/R_intro/ UCLAnoteshttp://www.ats.ucla.edu/stat/SPLUS/default.htm Keeles introguidehttp://www.polisci.ohiostate.edu/faculty/lkeele/RIntro.pdf
G tRbooks Great b k
Verzanis book http://www.amazon.com/UsingIntroductoryStatisticsJohn Verzani/dp/1584884509 Maindonald M i d ld and dBrauns B book b k http://www.amazon.com/DataAnalysisGraphicsUsingR/dp/0521813360
11/20/2007
Christenson&Powell:IntrotoR
45