Sei sulla pagina 1di 8

Regression101Tutorial 1 SpiderFinancialCorp,2013

Tutorial:Regression101
Thisisthefirstentryinwhatwillbecomeanongoingseriesonregressionanalysisandmodeling.Inthis
tutorial,wewillstartwiththegeneraldefinitionortopologyofaregressionmodel,andthenuseNumXL
programtoconstructapreliminarymodel.Next,wewillcloselyexaminethedifferentoutputelements
inanattempttodevelopasolidunderstandingofregression,whichwillpavethewaytoamore
advancedtreatmentinfutureissues.
Inthistutorial,wewilluseasampledatasetgatheredfrom20differentsalespersons.Theregression
modelattemptstoexplainandpredictasalespersonsweeklysales(dependentvariable)usingtwo
explanatoryvariables:Intelligence(IQ)andextroversion.
DataPreparation
First,letsorganizeourinputdata.Althoughnotnecessary,itiscustomarytoplaceallindependent
variables(Xs)ontheleft,whereeachcolumnrepresentsasinglevariable.Intherightmostcolumn,we
placetheresponseorthedependentvariablevalues.

Inthisexample,wehave20observationsandtwoindependent(explanatory)variables.Theamountof
weeklysalesistheresponseordependentvariable.

Process
Nowwearereadytoconductourregressionanalysis.First,selectanemptycellinyourworksheet
whereyouwishtheoutputtobegenerated,thenlocateandclickontheregressioniconintheNumXL

Regression101Tutorial 2 SpiderFinancialCorp,2013

tab(ortoolbar).

NowtheRegressionWizardwillappear.

Selectthecellsrangefortheresponse/dependentvariablevalues(i.e.weeklysales).Selectthecells
rangefortheexplanatory(independent)variablesvalues.
Notes:
1. Thecellsrangeincludes(optional)theheading(Label)cell,whichwouldbeusedintheoutput
tableswhereitreferencesthosevariables.
2. Theexplanatoryvariables(i.e.X)arealreadygroupedbycolumns(eachcolumnrepresentsa
variable),sowedontneedtochangethat.
3. LeavetheVariableMaskfieldblankfornow.Wewillrevisitthisfieldinlaterentries.
4. Bydefault,theoutputcellsrangeissettothecurrentselectedcellinyourworksheet.
Finally,onceweselecttheXandYcellsrange,theoptions,ForecastandMissingValuestabswill
becomeavailable(enabled).
Next,selecttheOptionstab.

Regression101Tutorial 3 SpiderFinancialCorp,2013

Initially,thetabissettothefollowingvalues:
- Theregressionintercept/constantisleftblank.Thisindicatesthattheregressioninterceptwill
beestimatedbytheregression.Tosettheregressiontoafixedvalue(e.g.zero(0)),enterit
there.
- Thesignificancelevel(aka. o )issetto5%.
- Intheoutputsection,themostcommonregressionanalysisisselected.
- Forautomodeling,letsleaveitunchecked.Wewilldiscussthisfunctionalityinalaterissue.
Now,clickontheMissingValuestab.

Regression101Tutorial 4 SpiderFinancialCorp,2013

Inthistab,youcanselecttheapproachtohandlemissingvaluesinthedataset(XandY).Bydefault,any
missingvaluefoundinXorinYinanyobservationwouldexcludetheobservationfromtheanalysis.
Thistreatmentisagoodapproachforouranalysis,soletsleaveitunchanged.
Now,clickOktogeneratetheoutputtables.

Analysis
Letsnowexaminethedifferentoutputtablesmoreclosely.
1. RegressionStatistics
Inthistable,anumberofsummarystatisticsforthegoodnessoffitoftheregressionmodel,giventhe
sample,isdisplayed.
1. Thecoefficientofdetermination(Rsquare)describes
theratioofvariationinYdescribedbytheregression.
2. TheadjustedRsquareisanalterationofRsquareto
takeintoaccountthenumberofexplanatoryvariables.
3. Thestandarderror(o )istheregressionerror.In
otherwords,theerrorintheforecasthasastandard
deviationaround$332.
4. Loglikelihoodfunction(LLF),Akaikeinformation
criterion(AIC),andSchwartz/Bayesianinformationcriterion(SBIC)aredifferentprobabilistic
measuresforthegoodnessoffit.
5. Finally,Observationsisthenumberofnonmissingobservationsusedintheanalysis.

Regression101Tutorial 5 SpiderFinancialCorp,2013

2. ANOVA
Beforewecanseriouslyconsidertheregressionmodel,wemustanswerthefollowingquestion:
Istheregressionmodelstatisticallysignificantorastatisticaldataanomaly?
Theregressionmodelwehavehypothesizedis:

1 1, 2 2,
2

~i .i .d~ (0, )
i i i i i i
i
Y Y e X X e
e N
o | |
o
= + = + + +

Where:
-

i
Y istheestimatedvaluefortheithobservation.
-
i
e istheerrortermfortheithobservation.
-
i
e isassumedtobeindependentandidenticallydistributed(Gaussian).
-
2
o istheregressionvariance(standarderrorsquared).
-
1 2
, | | aretheregressioncoefficients.
- o istheinterceptortheconstantoftheregression.
Alternatively,thequestioncanbestatedasfollows:

1 2
1
: 0
: 0
1 k 2
o
k
H
H
| |
|
= =
- =
s s

Theanalysisofvariance(ANOVA)tableanswersthisquestion.

Inthefirstrowofthetable(i.e.Regression),wecomputethetestscore(FStat)andPValue,then
comparethemagainstthesignificancelevel(o ).Inourcase,theregressionmodelisstatisticallyvalid,
anditdoesexplainsomeofthevariationinvaluesofthedependentvariable(weeklysales).
Theremainingcalculationsinthetablearesimplytohelpustogettothispoint.Tobecomplete,we
describeditscomputation,butyoucanskipthattothenexttable.

Regression101Tutorial 6 SpiderFinancialCorp,2013

- df isthedegreesoffreedom.(Forregression,itisthenumberofexplanatoryvariables( p ).
Forthetotal,itisthenumberofnonmissingobservationsminusone ( 1) N ,andforresiduals,
itisthedifferencebetweenthetwo( 1 N p )).
- SumofSquare(SS):

( )
( )
( )
2
1
2
1
2
1

N
i
i
N
i
i
N
i i
i
SSR Y Y
SST Y Y
SSE Y Y
=
=
=
=
=
=

- MeanSquare(MS):

1
SSR
MSR
p
SSE
SSE
N p
=
=

- TestStatistics:

, 1
~ ()
p N p
MSR
F F
MSE

=

3. ResidualsDiagnosisTable
Onceweconfirmthattheregressionmodelexplainssomeofthevariationinthevaluesoftheresponse
variable(weeklysales),wecanexaminetheresidualstomakesurethattheunderlyingmodels
assumptionsaremet.
1 1, 2 2,
2

~i .i .d~ (0, )
i i i i i i
i
Y Y e X X e
e N
o | |
o
= + = + + +

Usingthestandardizedresiduals(i.e.
i
i
e
o
),weperformaseriesofstatisticalteststothemean,variance,
skew,excesskurtosisandfinally,thenormalityassumption.

Regression101Tutorial 7 SpiderFinancialCorp,2013

Inthisexample,thestandardizedresidualspassthetestswith95%confidence.
Note:thestandardized(akastudentized)residualsarecomputedusingthepredictionerror(
pred
S )
foreachobservation.
pred
S takesintoaccounttheerrorsinthevaluesoftheregressioncoefficient,in
additiontothegeneralregressionerror(RMSEor o ).
4. RegressionCoefficientsTable
Onceweestablishthattheregressionmodelissignificant,wecanlookcloserattheregression
coefficients.

Eachcoefficient(includingtheintercept)isshownonaseparaterow,andwecomputethefollowing
statistics:
- Value(i.e.
1
, ,... o | )
- Standarderrorinthecoefficientvalue.
- Testscore(Tstat)forthefollowinghypothesis:

: 0
: 0
o k
o k
H
H
|
|
=
=

- ThePValuesoftheteststatistics(usingStudentstdistribution)
- Upperandlowerlimitsoftheconfidenceintervalforthecoefficientvalue.
- Areject/acceptdecisionforthesignificanceofthecoefficientvalue.
Inourexample,onlytheextroversionvariableisfoundsignificantwhiletheinterceptandthe
Intelligencearenotfoundsignificant.

Regression101Tutorial 8 SpiderFinancialCorp,2013

Conclusion
Inthisexample,wefoundthattheregressionmodelisstatisticallysignificantinexplainingthevariation
inthevaluesoftheweeklysalesvariable,itsatisfiesthemodelsassumptions,butthevalueofoneor
moreregressioncoefficientisnotsignificantlydifferentfromzero.
Whatdowedonow?
Theremaybeanumberofreasonswhythisisthecase,includingpossiblemulticollinearitybetweenthe
variablesorsimplythatonevariableshouldnotbeincludedinthemodel.Asthenumberofexplanatory
variablesincreases,answeringsuchquestiongetsmoreinvolved,andweneedfurtheranalysis.
Wewillcoverthisparticularissueinaseparateentryofourseries.

Potrebbero piacerti anche