Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
/ User Conference/|
Esri International
AppliedSpatialStatistics
AreaDataAnalysisIV
l i
San Diego, CA
Technical Workshops |
PatrickDeLuca,MA,GISP
November3,2015
b
Exploring2nd OrderEffects
Doattributesin
Do
attributes in neighboring
neighboring zonesshowspatial
zones show spatial
dependency?i.e.Dotheycovary?
SpatialAutocorrelation
p
11/3/2015
Involvescorrelationbetweenvaluesofthesamevariableat
differentspatiallocations
Itisconceptuallyandempiricallythetwodimensional
equivalentofredundancy.
PatrickDeLuca AppliedSpatialStatistics
53
SpatialAutocorrelation
11/3/2015
Valuesobservedatalocationdonotdependonvalues
observedatneighbouring locations
Observedspatialpatternofvaluesisequallylikelyasany
otherspatialpattern
PatrickDeLuca AppliedSpatialStatistics
54
AlternativeHypothesesofSA
Likevaluestendtoclusterinspace
Neighbours aresimilar
NegativeSpatialAutocorrelation
11/3/2015
Neighbours aredissimilar
Checkerboardpattern
PatrickDeLuca AppliedSpatialStatistics
55
SpatialAutocorrelation
Moststatisticsarebasedontheassumptionthatthevalues
ofobservationsineachsampleareindependent
Iftheobservations,however,arespatiallycorrelatedinsomeway,
theestimatesobtainedwillbebiasedandoverlyprecise.
11/3/2015
Biased theareaswithhigherconcentrationofeventswillhavea
greaterimpactonthemodelestimate
t i
t
th
d l ti t
Overestimateprecision sinceeventstendtobeconcentrated,there
areactuallyfewernumberofindependentobservationsthanarebeing
assumed
PatrickDeLuca AppliedSpatialStatistics
56
IndicesofSpatialCorrelation
JoinCountStatistics
MoransI
GetisOrdGeneralG
LocalApproaches
11/3/2015
LocalMoransI
LocalGetis Statistic
PatrickDeLuca AppliedSpatialStatistics
57
JoinCountStatistics
Appliedtobinaryvariablesmappedastwocolours
Applied
to binary variables mapped as two colours
(BlackandWhite)suchthatajoin,oredgeisclassified
aseitherWW(00),BB(11)orBW(10)
Interestedinnumberofoccurrencesofeachpossible
joinbetweenneighbouringcells
Canshow
11/3/2015
Positivespatialautocorrelation(clustering)ifthenumberofBWjoinsis
significantly lower than what we would expect by chance
significantlylowerthanwhatwewouldexpectbychance
Negativespatialautocorrelation(dispersion)ifthenumberofBWjoinsis
significantlyhigherthanwhatwouldexpectbychance
N ll
NullspatialautocorrelationifnumberofBWjoinsissameasexpected
ti l t
l ti if
b
f BW j i i
t d
PatrickDeLuca AppliedSpatialStatistics
58
JoinCountStatistics
11/3/2015
BB
WW
BW
TOTAL
24
24
12
24
10
10
24
PatrickDeLuca AppliedSpatialStatistics
KeyistheBW
OBW =EBW,random
OBW neE
ne EBW,notrandom
not random
OBW>EBW,moredispersed
OBW<EBW,moreclustered
59
JoinCountStatistics
Free(ornormal)samplingusedwhenyoucandetermine
theprobabilityofanareabeingblackorwhite
JBBE=kp2B=6
JWWE= kp2W=6
JBWE=2kp
k BpW=12
11/3/2015
k=totalnumberofjoins(=24)
pB=probabilityofbeingcodedblack(=0.5)
p
y
g
(
)
pW =probabilityofbeingcodedwhite(=0.5)
PatrickDeLuca AppliedSpatialStatistics
60
JoinCountStatistics
Standard Deviations
StandardDeviations
Needtocomputethetotalsetofallpossibilities
Givenby
n
1
m = ki (ki 1) = 52
2 i =1
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
2
w
61
JoinCountStatistics
ZScores
BB
WW
BW
TOTAL
Join Type
yp
24
24
BB
-1.81
-0.30
1.20
12
24
WW
-1.81
0.30
1.20
10
10
24
BW
4 90
4.90
0 00
0.00
-3.27
3 27
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
62
JoinCountStatistics
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
63
JoinCountStatistics
Contiguity Matrix
ContiguityMatrix
UsedRooksCase
TotalJoins=214/2=107
ThisonehereisQueensCase
TotalJoins=218/2=109
O
SullivanandUnwin,2003
OSullivan
and Unwin, 2003
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
64
JoinCountStatistics
Obama won
Obamawon
Romneywon
59647121votes,p(Obama)=0.511
303electoralvotes,p(Obama)e=0.595
57022021votes,p(Romney)=0.489
206electoralvotes,p(Romney)e=0.405
k=107
m=421
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
65
JoinCountStatistics
PatrickDeLuca AppliedSpatialStatistics
66
JoinCountStatistics
P based on Votes
Join Type
Measured
Estimated
Std.Dev
ZScore
ObamaObama
33.5
27.94
8.694
0.640
RomneyRomney
40
25.586
8.353
1.726
ObamaRomney
Obama
Romney
33 5
33.5
53 474
53.474
3 066
3.066
6
6.514
514
Measured
Estimated
Std.Dev
ZScore
ObamaObama
33.5
37.881
9.813
0.446
R
RomneyRomney
R
40
17 551
17.551
6 925
6.925
3 242
3.242
ObamaRomney
33.5
51.569
15.592
1.133
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
67
JoinCountStatistics
2 JBW
E =
N ( N 1)
BW
E + 2 J (2 J 1) BW 4[ J ( J 1 2 J (2 J 1)]B ( B 1)W (W 1)
+
E
N ( N + 1)
N ( N 1)( N 2)( N 3)
BW
BW
Zb =
OBW EBW
11/3/2015
BW
33.5 53.5
= 3.4
=
5.888
PatrickDeLuca AppliedSpatialStatistics
68
2
BW
JoinCountStatistics
Limitations
Thejoincountstatisticcanonlybeusedonbinarydata.
Equationsfordeterminingthestandarddeviationsare
q
g
reasonablycomplex
11/3/2015
Butalotofdatacanbetransformedintobinary.
e.g.rainfalldatacaneasilybeconvertedinto"wet"or"dry"
regionsbydeterminingthoseareasaboveorbelowthemean.
Easytomakeamistakewhenimplementingthem
PatrickDeLuca AppliedSpatialStatistics
69
MoransI
Oneoftheoldestindicatorsofspatialautocorrelation
One
of the oldest indicators of spatial autocorrelation
(Moran,1950).Thedefactostandardfordetermining
spatialautocorrelation
Appliedtozonesorpointswithcontinuousvariables
associatedwiththem.
Comparesthevalueofthevariableatanyone
locationwiththevalueatallotherlocationsfora
spatialmatrixW
l
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
70
MoransI
BehavesinamannersimilartoPearsonscorrelation
coefficient
n
n
I=
n wij ( zi z )(z j z )
i =1 j =1
n
2
( zi z ) wij
i =1
i j
Valuesboundedby1to+1
11/3/2015
ve hasacheckerboardpattern
0isuncorrelated
+ve isclustered(nodistinctionbetweenhighandlow
values)
l )
PatrickDeLuca AppliedSpatialStatistics
71
MoransI
I >1/(n1)
Spatialclusteringofhighand/orlowvalues
NegativeSpatialAutocorrelation
g
p
11/3/2015
I <1/(n1)
Checkerboardpattern
PatrickDeLuca AppliedSpatialStatistics
72
MoransI
AssessingsignificanceusingtheNormal
Assessing
significance using the Normal
approximationmethod
11/3/2015
Nullhypothesisstatesthatvaluesrepresentone
yp
p
manypossiblesamplesofvalues
Ifyourandomlyselectvaluestodistributeacross
yourstudyarea,mostofthetimeitwouldproduce
d
f h i
i
ld
d
apatternanddistributionofvaluesthatwouldnot
be markedly different from the observed pattern
bemarkedlydifferentfromtheobservedpattern
Assumingthatyourdataanditsarrangementare
oneofmany,many,possiblerandomsamples
PatrickDeLuca AppliedSpatialStatistics
73
MoransI
AssessingsignificanceusingtheNormal
Assessing
significance using the Normal
approximationmethod
Empiricaldistributioncanbecomparedtothetheoretical
distributionthroughZtest
I E(I )
Z (I ) =
SE(I )
SE(I )
11/3/2015
= SQRT
2
2
( N 1)(ijj wij )
PatrickDeLuca AppliedSpatialStatistics
74
MoransI
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
75
MoransI
11/3/2015
Supposewehavenvaluesyi relatingtoourstudyarea
Thenn!permutationsofthemaparepossible,each
correspondingtoadifferentarrangementofthendatavalues
ThevalueofI canbecalculatedforanyofthesepermutations,
so we can create an empirical distribution for possible values
sowecancreateanempiricaldistributionforpossiblevalues
ofI underrandompermutationsofthendatavalues.
Plotthedistributionofthesepermutationsandcompareour
p
p
observedtothedistribution
PatrickDeLuca AppliedSpatialStatistics
76
MoransI
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
77
MoranScatterplots
Linearassociationbetweenvalueati
Linear
association between value at i andweighted
and weighted
averageofneighbours
Fourquadrants
q
HighHigh,LowLow=spatialclusters
HighLow,LowHigh=spatialoutliers
Whatcanbereadoffofthisgraph
11/3/2015
Slope=MoransI
Outliers
Highleveragepoints
Spatialregimes
l
PatrickDeLuca AppliedSpatialStatistics
78
Correlograms
UsingMoran
Using
MoranssItoproducecorrelogram
I to produce correlogram
UseproximitymatrixWk,wherekislag
Visualization
Spatialautocorrelationstatisticsforincreasinglag
Interpretation
Identificationofspatialprocess
Rangeofassociation
g
11/3/2015
Possibleindicationofmisspecifiedspatialweights
PatrickDeLuca AppliedSpatialStatistics
79
Correlograms
Malczewski,2009
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
80
Correlograms
Correlogram of Respiratory Disease,
Hamilton, 2008
0.7
Mora
ans I
0.6
0.5
0.4
0.3
0.2
01
0.1
0
1
Spatial Lag
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
81
Correlograms
Malczewski,2009
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
82
Correlograms
Correlogram
&PartialCorrelogram
ofRespiratoryDisease
g
g
p
y
0.7
0.6
0.5
MoraansI
0.4
Correlogram
0.3
Partial Correlogram
0.2
0.1
0
1
-0.1
-0.2
SpatialLag
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
83
GetisOrd GeneralG
measureshowconcentratedthehighorlowvalues
measures
how concentrated the high or low values
areforagivenstudyarea.
Thenullhypothesis:"thereisnospatialclusteringof
thevalues".
11/3/2015
SignificantandpositiveZscoresindicatehighvaluescluster
SignificantandnegativeZscoresindicatelowvaluescluster
PatrickDeLuca AppliedSpatialStatistics
84
GlobalandLocalAutocorrelation
Global
Onestatistictosummarizepattern
Informsifclusteringexistsinthedata
Local
11/3/2015
Locationspecificstatistics
Showsuswheretheclustersarelocated
PatrickDeLuca AppliedSpatialStatistics
85
LISAStatistics
LocalIndicatorofSpatialAssociation
p
Satisfiestworequirements:
indicatessignificantspatialclusteringforeachlocation
SumofLISAproportionaltoaglobalindicatorofspatial
association
11/3/2015
LocalMoransI
LocalGetisOrdGi*
PatrickDeLuca AppliedSpatialStatistics
86
LISAStatistics
Use:
Identifyhotspots
Significantlocalclustersinabsenceofglobalassociation
Significantlocaloutliers
11/3/2015
Highsurroundedbylowandviceversa
Localdeviationsfromglobalpatternofspatialassociation
PatrickDeLuca AppliedSpatialStatistics
87
LocalMoran
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
88
LocalMoran
Interpretation
p
11/3/2015
Assessinglackofspatialrandomness
Suggestssignificantspatialstructure
Suggestinterestinglocations
Doesnotexplainthem
PatrickDeLuca AppliedSpatialStatistics
89
GetisOrd Gi*
Thelocalsumiscomparedproportionallytothesumofall
features
Whenlocalsumisverydifferentthantheexpectedlocalsum,and
thatdifferenceistoolargetobetheresultofrandomchance,a
statisticallysignificantZscoreresults.
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
90
GetisOrd Gi*
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
91
ModelingAreaData
Consideramultipleregression
Consider
a multiple regression
equation
11/3/2015
yi=dependentvariable
x1,x2...xn =independentvariables
a =constant(intercept)
= constant (intercept)
b1,b2 ...bn =regressioncoefficients
ei =errorterm(residualor
difference between predicted and
differencebetweenpredictedand
observedvaluesofyi)
PatrickDeLuca AppliedSpatialStatistics
92
RegressionAnalysis:Assumptions
Multicollinearity:thereisnointer
Multicollinearity:
there is no intercorrelation
correlationof
of
independentvariables
Normality:Errorterms(e
y
( i))arenormallydistributed.
y
Andthemeanoftheerrortermis0
Homoskedasticity (equalvariance):theresidualsare
dispersedrandomlythroughouttherangeofthe
estimateddependentvariable
Spatial independence: there is no spatial
Spatialindependence:thereisnospatial
autocorrelationoftheresiduals
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
93
Example:AverageAgeofDeath
Whatexplainsaverageageofdeath?
What
explains average age of death?
Variablesthatwerestatistically
significant in a bivariate
significantinabivariate
Variable
ttest
regression
Dwell
5.04
11/3/2015
MedInc
4 41
4.41
LICO_All
3.97
NoEdu
5.31
Univ
5.53
DropOut
4.02
Seniors
6.47
PatrickDeLuca AppliedSpatialStatistics
94
Example:CardiacAdmissions
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
95
AnalysisofResiduals
Multicollinearity conditionnumber>30is
condition number > 30 is
problematic
JarqueBera testsjointhypothesisofskewness
tests joint hypothesis of skewness =0
=0
andkurtosis=0
11/3/2015
Isthedataconsistentwithhavingskewness
g
andkurtosis
equalto0?
Whenp>0.05itisconsistentwith0skewand0kurtosis
PatrickDeLuca AppliedSpatialStatistics
96
AnalysisofResiduals
BreuschPagan
11/3/2015
ttestsnullhypothesisthattheerrorvariancesareallequalvs
t
ll h
th i th t th
i
ll
l
thealternativethattheerrorvariancesareamultiplicative
functionofoneormorevariables
Alt.hyp.Statesthattheerrorvariancesincreaseordecrease
asthepredictedvaluesofyincreaseordecrease
P 0 05 i di t h t
P>0.05indicatesheteroskedasticity
k d ti it
PatrickDeLuca AppliedSpatialStatistics
97
AnalysisofResiduals
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
98
AnalysisofResiduals
11/3/2015
Mapofresiduals
PatrickDeLuca AppliedSpatialStatistics
99
AnalysisofResiduals
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
100
MyocardialInfarctionExample
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
101
OLSOutput
Suggests
Non normality
Non-normality
Suggests
heteroskedasticity
Suggests
spatial
autocorrelation
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
102
ModelingAreaData
Modelshowserrorautocorrelation
Twotypesofmodelspossiblebasedonthetwo
yp
p
primarytypesofspatialdependence
11/3/2015
SpatialErrorModel
SpatialLagModel
PatrickDeLuca AppliedSpatialStatistics
103
ModelingSpatialDependence
Spatial error
Spatialerror
Observationsinterdependentthrough
unmeasuredvariablesthatare
correlatedacrossspaceOR
measurementerrorthatiscorrelated
with space
withspace
11/3/2015
arisesbecausewecannotmodelallthe
facetsofageographicalregionthatmay
influence all nearby locations
influenceallnearbylocations
Mayalsoarisefromboundariesthatare
notperfectmeasures
PatrickDeLuca AppliedSpatialStatistics
Xi
Xj
YiYj
i
104
ModelingSpatialDependence
Spatial error
Spatialerror
Theoreticallypossibletoeliminatethistypeofspatial
dependencewithproperexplanatoryvariablesandcorrect
boundariesofobservations
Spacemattersonlyintheerrorprocess,notinthe
substantiveportionofthemodel
p
Assumptionofuncorrelatederrortermsisviolated
11/3/2015
Indicativeofomitted(spatiallycorrelated)covariates
PatrickDeLuca AppliedSpatialStatistics
105
ModelingSpatialDependence
Spatial Lag
SpatialLag
Dependentvariableisaffectedbythe
valuesofthedependentvariablesin
nearbyplaces
11/3/2015
E.g.LandvalueinaCTisafunctionof
landvalueinnearbyCTs,notjustrelated
tocommonunmeasuredvariables
Assumptionofuncorrelatederror
terms is violated
termsisviolated
Assumptionofindependent
observationsisviolated
PatrickDeLuca AppliedSpatialStatistics
Xi
Xj
YiYj
i
106
AnalysisofResiduals
LMLagandRobustLMLag
PertaintoSpatialLagmodelasalternative
p
g
Robust:testsforlagdependencyinpresenceofmissingerror
LMErrorandRobustLMError
11/3/2015
PertaintoSpatialErrormodelasalternative
Robust:testsforerrordependenceinpresenceofmissinglag
PatrickDeLuca AppliedSpatialStatistics
107
AnalysisofResiduals
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
108
SpatialLagModelResults
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
109
RegressionDiagnostics
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
110
RegressionDiagnostics
Noobviouspatternin
residuals
Nofunnellikepattern,noincrease/decrease,
suggests homoskedasticity
suggestshomoskedasticity
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
111
FinalLagModelSpecification
y=a +Wy +X +
Myocardial~0.86+0.69W(Myocardial)+0.31(JarmanScore)+
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
112
HypotheticalErrorModelSpecification
y=a +X +W +
Myocardial~75.99+0.25(JarmanScore)+0.71W +
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
113
SummaryofStepsforModeling
Exploration
Aspatial examinedependentvariablefornormality
11/3/2015
Histogram
Boxplot
Normalitystatistics
Spatial
Spatial
ComputeMoranCoefficientScatterplot andMoransItosearchfor
evidenceofthepresenceofspatialand/oraspatial outliers
C
Canalsoexamineonalocallevel
l
i
l ll l
PatrickDeLuca AppliedSpatialStatistics
114
SummaryofStepsforModeling
ComputeOLSresults
Compute
OLS results
UsingOLSresiduals,computeMoransI
Ifsignificantautocorrelationisdetectedintheresiduals,
If
significant autocorrelation is detected in the residuals,
thenrerunmodelwithaspatialmodelandestimatethe
respectiveparameters
Continuewithotherregressiondiagnostics
11/3/2015
Nonnormality(JarqueBera Test)
H t
Heteroskedasticity
k d ti it (BreuschPagan,KoenkerBasset)
(B
hP
K
k B
t)
Multicollinearity (ConditionNumber)
MoranssIforspatialdependenceofresiduals
Moran
I for spatial dependence of residuals
PatrickDeLuca AppliedSpatialStatistics
115
SummaryofStepsforModeling
Fitaspatialmodelonlyifwarranted
Fit
a spatial model only if warranted
Usetheoryifpossibletodecidewhichmodeltofit,if
notpossible,usethediagnostics
p
,
g
11/3/2015
PatrickDeLuca AppliedSpatialStatistics
116