Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Social Networks
journal homepage: www.elsevier.com/locate/socnet
Department of Sociology, Institute for Mathematical and Behavioral Sciences, University of California, Irvine, United States
Department of Sociology, University of Massachusetts, Amherst, United States
c
Department of Criminology, Law, and Society, University of California, Irvine, United States
d
Department of Geography, University of Tennessee, Knoxville, United States
b
a r t i c l e
i n f o
Keywords:
Spatially embedded networks
Spatial Bernoulli graphs
Geographical variability
Settlement patterns
Graph-level indices
a b s t r a c t
In this paper, we explore the potential implications of geographical variability for the structure of social
networks. Beginning with some basic simplifying assumptions, we derive a number of ways in which
local network structure should be expected to vary across a region whose population is unevenly distributed. To examine the manner in which these effects would be expected to manifest given realistic
population distributions, we then perform an exploratory simulation study that examines the features of
large-scale interpersonal networks generated using block-level data from the 2000 U.S. Census. Using a
stratied sample of micropolitan and metropolitan areas with populations ranging from approximately
1000 to 1,000,000 persons, we extrapolatively simulate network structure using spatial network models
calibrated to two fairly proximate social relations. From this sample of simulated networks, we examine
the effect of both within-location and between-location heterogeneity on a variety of structural properties. As we demonstrate, geographical variability produces large and distinctive features in the social
fabric that overlies it; at the same time, however, many aggregate network properties can be fairly wellpredicted from relatively simple spatial demographic variables. The impact of geographical variability is
thus predicted to depend substantially on the type of network property being assessed, and on the spatial
scale involved.
2011 Published by Elsevier B.V.
This research was supported by NSF award BCS-0827027 and ONR award
N00014-08-1-1015.
Corresponding author at: University of California, Irvine, SSPA 2145, Irvine, CA
92697-5100, United States.
E-mail addresses: buttsc@uci.edu (C.T. Butts), racton@soc.umass.edu
(R.M. Acton), hippj@uci.edu (J.R. Hipp), nnagle@utk.edu (N.N. Nagle).
1
This and all maps shown are based on orthographic projections about a central point in the MSA, with distances in meters. Distance and area calculations
throughout the paper are based on these projections.
0378-8733/$ see front matter 2011 Published by Elsevier B.V.
doi:10.1016/j.socnet.2011.08.003
density in this area is approximately 3 persons per square kilometer (far below the U.S. mean of approximately 32 persons/km2 ), this
is not reective of the conditions under which most residents of
this region live. Indeed, as the gure indicates, most of the population of this region is concentrated into a small number of dense
communities, surrounded by large areas with few residents. The
extent of this concentration may be appreciated by comparing
the panel on the left with that on the right, which depicts a uniform population distribution over the same area. The difference
is stark. Rather than being embedded in a uniform, low-density
environment, the median resident of this region faces a local population density of approximately 1000 persons/km2 (as computed
from block-level data), with densities in some areas being as high
as 12,700 persons/km2 or as low as 0.09. The micro-environments
in which individuals form ties may thus differ greatly from a uniform baseline, and these micro-environments may themselves be
distributed unevenly across the landscape.
The above observations raise an important question: if human
settlement patterns are extremely heterogeneous, and if spatial
structure inuences network structure, then should we not expect
that the geographical variability in population distribution will
have a substantial impact on the structure of social networks? And,
if this is so, what will be the nature of that impact? To investigate these questions, we begin by employing a simple modeling
framework which preserves the marginal relationship between
83
Fig. 1. Population distributions for Cheyenne, NE MSA. Points in left panel are placed uniformly within census blocks; points in right panel are placed uniformly over the
convex hull of positions from the census-constrained model. Both contain the same number of points (N = 9830).
Pr (Y = y|) =
ij B(Yij = yij |ij )), with parameter matrix given by
ij = F(Dij , ). Models of this form have been studied in the context of geographical distances by Butts (2002), Hipp and Perrin
(2009), Butts and Acton (2011), and are closely related to the latent
space models of Hoff et al. (2002), Handcock et al. (2007). They
can also be viewed as special cases of the family of gravity models (Haynes and Fotheringham, 1984), which have been used for
several decades in the geographical literature to model interaction between areal units. Butts (2006a) has further shown that the
spatial Bernoulli graphs can be written as a special case of a more
general curved exponential family of graph distributions. By dening canonical parameters (, d) = logitF(d, ), we may write the
pmf for adjacency matrix Y with support Y as
(1)
{i,j}
Pr(Y = y|D, ,
) exp[
t(y)],
(2)
{i,j}
84
Pr(Yij = 1|i , ) =
Y1j |1 =
j=2
k
j=2
EY1j |1 =
k
j=2
j=2
Y1j =
which again by the iid assumption reduces to k times a function that depends only on the xed vertex location pdf and the
the above right-hand expression can be rewritten as
SIF.
Indeed,
k A A p()p()F(D, , )dd, with the double area integral being
the marginal probability of an edge between two randomly selected
vertices.
This simple exercise leads to an observation that we may call
the in-lling principle: adding vertices to a xed region in a
uniform way leads to a linear increase in expected local (withinregion) degree, while holding the expected local density constant.
(This last follows immediately from the well-known graph identity
= d/(N 1), where is the density and d is the mean degree.) Likewise, given two regions of equivalent shape and area and the same
local population distribution, we expect the ratio of their internal
mean degrees to be equal to the ratio of their population sizes (with
their internal densities again being equal). Where the population
gradient is relatively uniform, we thus expect local mean degree to
scale linearly with population density.
2
Throughout this paper, we will use the term spatial heterogeneity to refer
generically to variation in local population density across space. Other forms of
heterogeneity are also possible (e.g., non-stationarity of the SIF), but we focus here
on this particular case.
85
Fig. 2. Effects of increasing order on connectivity and cohesion for random graphs
of xed expected density. Left panel shows probability of connectivity and biconnectivity by order and density. Right panel shows fraction belonging to each k-core
(and no higher) and mean core number (dotted line) at 1% expected density, by
graph order.
core number (black line) rises steadily, growing approximately linearly with N. As with connectivity, these behaviors may be altered
somewhat by spatial clustering. They provide a useful intuition,
however, for the baseline impact of in-lling on local structure.
3
E.g., we can generate G and G using the same random inputs, such that every
edge in G is also in G ; see Butts (2010) for a general treatment of this approach.
86
mean core number, probability of k-connectivity) satisfy the condition of z, it follows that we can employ the properties of G to bound
the behavior of G. On the other hand, not all properties are preserved under graph union (e.g., betweenness centralization), and
the bound obtained in some cases may be too loose to be useful
(e.g., if
). Thus, this is an incomplete solution.
Another factor ignored thus far has been the role of non-local
ties. Clearly, it is possible to identify some region A whose induced
subgraph is disconnected with high probability, while the induced
subgraph of some larger region B A is very likely connected
consider, for instance, the case of vertices distributed at unit
intervals on the real line, with A being a segment of length 2, B
being a segment of arbitrarily long length , and F a constant
k such that ln /
k
1. Although this example is implausible,
it points to a real phenomenon: particularly when F is heavytailed, the locally induced subgraph for a small region may not
effectively characterize the properties of its members within the
larger network. This effect will itself vary across the population
surface, to the extent that high-population regions will tend to be
embedded within or adjacent to other high-population regions, and
low-population regions will likewise be associated with other lowpopulation regions. The exact impact of this effect will vary with
the model SIF, graph statistic examined, and population surface,
and is difcult to characterize on an a priori basis.
Considering simplied scenarios gives us general insights into
the rst-order effects of spatial heterogeneity on network structure,
but does not tell us how these factors will play out in practice. To
move beyond these basic intuitions, we must examine the behavior
of spatial Bernoulli graphs under realistic conditions: that is, with
SIFs based on real data, and vertex positions that are reective of
geographical reality. It is to this problem that we now turn.
3. Bringing geography back in: a simulation study
While the arguments of the previous section suggest various
general ways in which network structure should vary across space,
they also underscore the fact that such effects are contingent on the
underlying population surface: the same mechanisms can lead to
very different networks when applied to populations that are differently distributed. What would we expect to see in real networks,
then, given empirically observed settlement patterns? Simulation
provides a natural means of addressing this problem. Specically,
we may utilize detailed data on population distributions to simulate draws from spatial Bernoulli graphs with xed SIFs, and analyze
the resulting networks to examine the impact of spatial structure
on network properties (holding other factors constant); it is this
approach that we employ here. Although the models we employ
are empirically realizable (and our simulations based on extrapolations from observed data, rather than rst principles alone), it
should be borne in mind that our purpose is to explore the theoretical implications of spatial structure for network structure given
a minimal set of assumptions, as opposed to making detailed predictions about particular cases. In this, sense, this effort lies at a
midpoint on the intellective versus emulative modeling continuum discussed by Carley (2002), incorporating certain aspects of
empirical detail while retaining enough simplifying assumptions to
permit a relatively general analysis. At the same time, the generality
of the framework in which our models are embedded (i.e., the spatial ERGs of Eq. (2)) permits subsequent extension and elaboration
where appropriate.
3.1. Simulation design
Our simulation study proceeds as follows. First, we choose a set
of locations to examine, selecting them so as to evenly cover the
Fig. 4. Population size and land area distribution, U.S. micro and metropolitan statistical areas. Vertical and horizontal lines indicate sampling strata for this study;
selected locations are those closest to line intersections.
range of observed population sizes and land areas for U.S. micropolitan and metropolitan areas (as well as a number of other, smaller,
inhabited locations). For each location in this set, we then simulate
population microdistribution using data on block-level population
counts and household size distributions. Finally, we simulate networks using two previously calibrated spatial Bernoulli models for
each location. By analyzing the resulting set of networks, we are
then able to examine the impact of within and across-location geographical variability on network structure.
3.1.1. Location selection
To compare network structure across regions, one would ideally like to employ units that are both well-dened and socially
bounded. While few places in the developed world are fully isolated
from one another, it is nevertheless possible to identify regions that
correspond to relatively well-dened social units with respect to
such processes as daily migration, employment, local commerce,
and everyday interaction. Using such criteria, the U.S. Department
of the Census divides the populated regions of the United States into
micropolitan and metropolitan statistical areas, each of which contains one or more towns, cities, or other agglomerations together
with the immediately surrounding area (to the nearest county
or parish boundary, as applicable). A metropolitan area contains
at least one city with a population of at least 50,000, whereas
a micropolitan area contains at least one city with a population
between 10,000 and 50,000. An area in this sense includes the
primary city and the surrounding county; it also includes adjacent counties if they are highly socially integrated based on journey
to work patterns. At a minimum, micropolitan/metropolitan areas
are thus collections of locally dense population surrounded by a
low-population buffer, although the Census denition also seeks
to avoid splitting areas experiencing strong interaction in other
respects. The unied set of 922 micropolitan and metropolitan
areas, along with the other 1379 smaller counties in the United
States, serve as the population for our study, with the individual
area (henceforth location) being our primary unit of analysis.
In order to cover areas with a wide range of geographical properties, we employ a stratied sample of locations selected by
population size and land area (respectively). As location sizes on
both dimensions scale roughly logarithmically, we identied four
target strata on each dimension based on the overall distribution of
locations. (See Fig. 4.) These stratum values were 103 , 104 , 105 , and
106 for population size, and e7 km2 , e8 km2 , e9 km2 , and e10 km2
for land area. For each pair of population size and land area values, the location was identied whose population and area was as
close as possible to the target (in a least squares sense), yielding
87
Table 1
Sample locations, stratied by population and land area.
Population stratum
Location
1 10
1 104
1 105
1 106
Population
1258
1042
971
808
9758
9830
10,155
9181
99,962
93,417
101,677
97,470
876,156
1,148,618
1,037,831
968,858
4
A Halton sequence is a deterministic sequence of points that lls space in
a uniform manner, while also maintaining a high nearest-neighbor distance. The
result (sometimes called a quasi-random distribution) is similar to a set of draws
from the uniform distribution, but substantially more evenly placed; see Gentle
(1998) for algorithmic details.
88
Fig. 5. Comparison of uniform and quasi-random vertex placement, Quay County, NM MSA. Lines indicate census block boundaries, with articial elevation shown via vertex
color. Insets provide detail of 2 km 2 km portion of Tucumcari, NM. (For interpretation of the references to color in this gure legend, the reader is referred to the web
version of the article.)
5
Since Manhattan distance is affected by the choice of coordinates, we also
considered the correlation of Euclidean distance against Manhattan distance on a
randomly rotated axis set. The resulting median correlations were nearly identical
(apx 0.99 in the unconstrained case, and 0.97 within 100 m).
6
Simulation and analysis was performed using the statnet and sna libraries for
R (Handcock et al., 2008; Butts, 2008) and the R spatial tools (Bivand et al., 2008),
along with additional functions created by the authors.
discuss our major ndings in each area, beginning with crosslocational comparisons.
3.2.1. Graph-level properties across locations
Given our widely dispersed set of study locations, and the high
level of spatial heterogeneity within each location, it seems plausible that few systematic patterns will be found that span the
sample of simulated graphs. If, on the other hand, we nd that
there are aggregate properties that remain stable or that change
in a predictable way across locations, then we may provisionally conclude that these properties will be robust enough to justify
empirical investigation (for networks with similar SIFs to those
considered here, at least). In this section we consider several contexts in which graph-level relationships can occur: comparisons of
spatially conditioned and uniform random graphs; associations of
graph-level properties with aggregate geographical features; and
correlations among multiple global properties on the same networks.
3.2.1.1. Comparison with random baselines. As we have noted, spatial structure affects network structure by adding heterogeneity to
edge probabilities, and by creating correlations among those probabilities (an effect which is distinct from conditional dependence
among edges, as in Pattison and Robins (2002)). Nevertheless, the
possibility exists that such changes will have only a limited impact
on the global properties of the resulting networks. It has long
been known that many global network properties are sharply constrained by basic factors such as size and density (Mayhew and
Levinger, 1976; Anderson et al., 1999; Faust, 2007) and analyses such as those of Watts and Strogatz (1998) show that highly
structured networks can behave much like random graphs in
certain respects. Apart from their substantive adequacy, simple
random graph models are also useful as baselines (Mayhew, 1984a)
against which to compare the behavior of more complex models. To compare the behavior of networks generated under the
spatial Bernoulli model with their homogeneous counterparts, we
therefore constructed a paired comparison sample to our 1600
spatially conditioned networks. For each of our spatially conditioned networks, we drew a single conditional uniform graph (CUG)
with identical size and density to that in the original sample. This
resulted in a sample of equal size to the original, whose corresponding members had the same size and densities (and, by extension,
mean degrees) as the original networks, but which were free of
spatial structure. To assess the ways in which space distorts global
structure in the present context (above and beyond density), we
compare the distributions of several graph-level indices (GLIs) on
the spatial and CUG samples.
Fig. 6 summarizes the relationships between global properties of the spatial and uniform networks, as captured by several
standard graph-level indices. Each panel shows case pairs, with
vertical and horizontal coordinates respectively indicating GLI values for the spatially conditioned and CUG networks. While all of
the selected GLIs show clear differences between the models, the
nature and extent of the deviations vary. The top left panel of Fig.
6, for instance, shows the standard deviation of the degree distribution for each simulated pair. Although the mean degrees in each
case are constrained to be equal, their variations are not: as can
clearly be seen, virtually all of the spatial networks are well above
the 45-degree line, indicating that the degree distributions under
the spatial models are substantially more variable (and, in practice,
more right-skewed) than their random counterparts. Interestingly,
this amplication of variability appears to be quite systematic,
with degree standard deviation in the spatial model scaling as
approximately the 5/3 power of the random baseline (R2 = 0.96).
Moreover, we see that this relationship appears to be generally
homogeneous with respect both to the choice of SIF and to the
89
7
Note that this is not a regression artifact; as the high R2 suggests, the result does
not reverse when one regresses the CUG score on the spatial model score.
90
Fig. 6. Graph-Level properties, spatial models versus paired CUG baselines. Each point represents a single simulation outcome (location by SIF by microdistribution), with
color indicating choice of microdistribution. 45-Degree line indicates equality; red lines, where applicable, show least squares prediction of spatial from baseline GLI values.
Inset in last panel shows variables in loglog scale. (For interpretation of the references to color in this gure legend, the reader is referred to the web version of the article.)
from the spatial model are necessarily similar to those of random graphs with equivalent dyadic characteristics (although they
are somewhat in the cases of mean core number and degree centralization), but that the spatial model deviates from the baseline
model in a quantitatively predictable way. That such relationships
would arise across locations of such highly variable structure is
an interesting nding, and suggests an avenue for gaining further
theoretical leverage. On the other hand, it is also true that some
network properties differ greatly between the spatial and uniform
cases, with little linkage between the two. Transitivity seems to be
a clear example of this, having behavior in the spatial context that
is essentially decoupled from the size/density controlled baseline.
As we shall see, however, this does not mean that transitivity is
91
92
Table 2
Regression of logged mean degree on population density (PopDen) and median empty space function (FFunMed), with an interaction for model type.
(Intercept)
FriendSIF
log(PopDen)
log(FFunMed)
FriendSIF log(PopDen)
FriendSIF log(FFunMed)
Estimate
Std. error
t-Value
Pr (> |t|)
0.47
2.37
0.11
0.13
0.28
0.25
0.035
0.050
0.004
0.006
0.005
0.008
13.39
47.26
30.78
22.45
55.74
30.56
<2e16***
<2e16***
<2e16***
<2e16***
<2e16***
<2e16***
(Intercept)
log(NNDist)
log(PopDen)
FriendSIF
log(NNDist) log(PopDen)
log(NNDist) FriendSIF
log(PopDen) FriendSIF
log(NNDist) log(PopDen) FriendSIF
Estimate
Std. error
t-Value
Pr (> |t|)
2.75
0.84
0.09
5.11
0.05
1.35
0.18
0.07
0.141
0.060
0.013
0.199
0.004
0.085
0.019
0.006
19.56
14.00
6.99
25.71
12.53
15.89
9.53
12.31
<2e16***
<2e16***
4.13e12***
<2e16***
<2e16***
<2e16***
<2e16***
<2e16***
centralization and density (R2 = 0.98 on log scale). It is noteworthy in this regard that a similar relationship does not hold for mean
degree (R2 = 0.13 on log scale), bearing in mind that density is more
heavily driven by population size than by mean degree in this sample. It should also be borne in mind that the centralization measure
employed here is already normalized for density as well as graph
size, and hence the observed density relationship is more subtle
than it might appear. We note in this regard that the highest density graphs are simultaneously those that are small and that have
higher mean degree. These graphs will be more likely to, on the
one hand, produce high-degree outliers (since they have relatively
higher degree variance), and, on the other, to have maximum raw
centralization scores (with respect to which the measure is normalized) that are small enough that the normalized measure is not
damped heavily towards zero. Complex measures such as centralization thus require careful interpretation (even when adjusted
for baseline effects), no less so for these than for other network
models.
In summary, strong and systematic relationships exist between
GLIs within this sample. Some appear to be driven by baseline
Table 4
Regression of logged edge length on logged population count (Pop) and SIF, with a population/SIF interaction.
(Intercept)
log(Pop)
FriendSIF
log(Pop) FriendSIF
Estimate
Std. error
t-Value
Pr (> |t|)
0.57
0.10
0.83
0.18
0.051
0.005
0.072
0.007
10.97
20.72
11.60
26.98
<2e16***
<2e16***
<2e16***
<2e16***
93
Fig. 7. Relationships among various graph-level properties, simulated spatial networks. Each point represents a single simulation outcome (location by SIF by microdistribution), with color indicating choice of microdistribution. Red lines, where applicable, show least squares prediction of vertical axis GLI from horizontal axis GLI. Insets show
variables in log-log scale. (For interpretation of the references to color in this gure legend, the reader is referred to the web version of the article.)
94
Fig. 8. Spatial variation in predicted Degree (by SIF and layout model), Navajo County, AZ. Vertex color indicates degree, with bluer colors indicating higher greater numbers
of ties. (For interpretation of the references to color in this gure legend, the reader is referred to the web version of the article.)
Table 5
AICC selected models for degree distribution, by location, SIF, and placement model.
Bristol Bay, AK
Golden Valley, MT
Esmeralda, NV
Yakutat, AK
Choctaw, MS
Cheyenne, NE
Quay, NM
White Pine, NV
Lawrence, KS
Cookeville, TN
Idaho Falls, ID
Navajo, AZ
Honolulu, HI
Hartford, CT
Rochester, NY
Salt Lake City, UT
Freeman/uniform
Festinger/uniform
Freeman/quasi
Festinger/quasi
NB
NB
NB
NB
NB
NB
NB
NB
NB
W
NB
NB
NB
W
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
W
NB
NB
NB
W
NB
NB
NB
P
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
95
Fig. 9. Spatial variation in predicted core number (by SIF and layout model), Cookeville, TN MSA. Vertex color indicates maximum core membership, with bluer colors
indicating membership in higher-order cores. (For interpretation of the references to color in this gure legend, the reader is referred to the web version of the article.)
the convex hull of the set of vertices in a particular subgraph provides an intuitive notion of the region covered by that subgraph,
as illustrated in Fig. 11. The gure graphically illustrates the emergence of large cohesive subgroups (here, members of high-order
cores who are themselves biconnected) within the Cookeville, TN
case. As argued in Section 2, such groups should develop relatively suddenly when a sufciently large area exceeds the requisite
threshold density; the location of large cores covering the highdensity regions of the map is consistent with this behavior. Such
spatially large cohesive sets are of potential interest for theories
such as those of Sampson et al. (1997), which relate to the ability of
social groups to monitor and control activities within a given area.
Models of the kind studied here suggest a relatively sharp boundary between the conditions under which such cohesion is feasible,
and those under which it is not. Such boundaries may account
in part for the frequently voiced sense of qualitative difference
between social interactions in cities and those in sparsely populated
environments.
While the detailed structure of the simulated networks reveals
substantial variation in both positional and group characteristics,
there are nevertheless strong similarities across cases. Fig. 12 shows
the marginal degree distributions for all 64 networks, each representing the aggregate effect of the interaction of population surface
and SIF across vertices. While the proles of Fig. 12 obviously differ
96
Fig. 10. Detail of edge structure (quasi-random placement, Friendship SIF) for a portion of the Cookeville, TN MSA. Vertices are shaded by core number, from red (minimum)
to violet (maximum); dark lines indicate census block boundaries. (For interpretation of the references to color in this gure legend, the reader is referred to the web version
of the article.)
in some respects, there are also some similarities (not the least
of which being a relatively long upper tail). Are these similarities only a matter of appearance, or do they indicate a common
underlying functional form? To assess this, we attempted to t
geometric, negative binomial, Poisson, Waring, and Yule distributions to each of the depicted degree distributions (models t via
degreenet (Handcock, 2003) using maximum likelihood). After
Fig. 11. Spatial structure of cohesive components, Cookeville TN MSA (uniform placement, Friendship SIF). Shaded regions indicate convex hulls of membership locations
for biconnected sets of k-core members, with pink shading indicating 2-cores, and green indicating 3-cores. Right-hand panel shows detail of dotted area. (For interpretation
of the references to color in this gure legend, the reader is referred to the web version of the article.)
97
Fig. 12. Marginal degree distributions by location, SIF, and placement model. Friendship model distributions are shown in blue, interaction model distributions in black;
solid lines indicate uniform placement, with quasi-random placement in dotted lines. (For interpretation of the references to color in this gure legend, the reader is referred
to the web version of the article.)
the Poisson preferred in only 1. The Waring cases seem to be associated with the Hartford, CT and Cookeville, TN MSAs under the
Friendship SIF, and may reect particularly high levels of heterogeneity in these locations. These possible exceptions aside, the vast
majority of cases can be seen to be well-approximated by distributions of the same form, despite differing in population size and land
area by several orders of magnitude.
Turning to core number, we note in the marginal distributions
of Fig. 13 the same combination of family resemblance and difference in detail seen earlier in Fig. 12. As before, we attempt to assess
Table 6
AICC selected models for core number distribution, by location, SIF, and placement model.
Bristol Bay, AK
Golden Valley, MT
Esmeralda, NV
Yakutat, AK
Choctaw, MS
Cheyenne, NE
Quay, NM
White Pine, NV
Lawrence, KS
Cookeville, TN
Idaho Falls, ID
Navajo, AZ
Honolulu, HI
Hartford, CT
Rochester, NY
Salt Lake City, UT
Freeman/uniform
Festinger/uniform
Freeman/quasi
Festinger/quasi
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
G
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
W
NB
NB
NB
G
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
NB
98
Fig. 13. Marginal core number distributions by location, SIF, and placement model. Friendship model distributions are shown in blue, interaction model distributions in
black; solid lines indicate uniform placement, with quasi-random placement in dotted lines. (For interpretation of the references to color in this gure legend, the reader is
referred to the web version of the article.)
To summarize our ndings regarding within-region/withinnetwork variation, the results from our simulated networks closely
follow our a priori expectations. Degree and core number vary
with the population surface, and cohesively connected subgroups
appear over regions with systematically above-threshold density.
While all of this suggests that many local structural properties
will vary greatly both within and across regions, this variation is
also systematically structured. In addition to the above patterns,
we nd that both degree and core number for these networks
are well-modeled by negative binomial distributions (though the
parameters of those distributions clearly vary by location and SIF).
Although it is not clear how robust this result is to the imposition
of other (currently unmodeled) factors, its prevalence here leads to
the speculation that the negative binomial degree and core number
distributions may serve as easily falsiable signatures for a spatial
Bernoulli process. It is interesting in that light to note that the negative binomial has been found to provide a reasonable t to at least
some empirical data sets (see, e.g. Hamilton et al., 2008), suggesting
that this pattern is not beyond the bounds of plausibility.
but may still be adequate for many purposes. If so, their simplicity
and tractability (theoretical, computational, and inferential) have
much to recommend them.
In like vein, we would suggest that in settings for which spatial Bernoulli graphs are inadequate, augmenting the base model
(per Eq. (2)) with covariate effects preserving Bernoulli structure
(e.g., age or race mixing) rather than sources of dependence among
edges should be considered as an initial strategy (at least for modeling of large-scale networks). Although more complex than purely
spatial models, such hybrids retain many of the scalability and theoretical tractability advantages exploited here. One natural avenue
for such expansion is via the use of Blau-space models (McPherson,
1983, 2004) that employ a notion of distance over a combined sociophysical space. (Some initial steps in this direction have been taken
by Hipp and Perrin (2009).) One major obstacle to current progress
in this area is a lack of high-quality network data sets containing
sufcient information on both geography and demographic characteristics to permit estimation of a joint socio-physical SIF. As such
data becomes available for multiple relational types, it will be possible to substantially expand the range of questions that can be
asked within the spatial network paradigm.
Another obstacle to further theoretical progress is a lack of
detailed population data on the co-evolution of social ties and residential mobility. In this paper, we have focused on the problem
of instantaneous prediction: given the (current) distance structure, predict the contemporaneous properties of an associated
social network. In so doing, we neither make nor require assumptions regarding the dynamic processes that produce this joint
socio-geographic structure so long as we know how space is
instantaneously related to social relationships, we can predict the
latter from the former. Many interesting questions, however, relate
to these underlying processes, and a detailed understanding of
them would advance current knowledge in many respects. For
instance, we here take the SIF as given (estimated from prior data),
but the marginal relationship it describes clearly arises from a
combination of (possibly tie-inuenced) movement of persons in
space, and (possibly spatially inuenced) formation and dissolution of social ties. A richer understanding of these mechanisms
could potentially allow for the prediction of spatial interaction
functions themselves, as well as prediction of the circumstances
in which such functions might vary or change. Likewise, sudden
perturbations to normal mobility patterns (e.g., displacement following wars or natural disasters) may produce short-term changes
in socio-geographic structure that are poorly predicted by comparative statics (i.e., equilibrium structure before and signicantly
after the event). Modeling such shocks requires knowledge of how
rapidly ties decay following relocation, the rate at which new ties
form in response to this relocation (and to whom), and selective
inuences of tie acquisition and loss on secondary mobility. These
are complex phenomena, which in our opinion require a much
deeper understanding of social dynamics than is currently available. However, the rst step in developing this understanding will
clearly be the design of studies that are sensitive to both spatial and
temporal concerns.
Finally, we close by reiterating some simple predictions that,
seeming to be robustly present in the cases studied here, lend
themselves to empirical evaluation. First, we predict a positive correlation between local population density and both mean degree
and core number over scales that are at least comparable to the
relevant SIF. Second, we predict transitivity far in excess of CUG
baselines, declining globally in mean degree, increasing in nearestneighbor distance, and declining in SIF tail weight. Third, we predict
sudden changes in the formation of large, cohesively connected
subgroups with population density, with such groups transitioning
from small and rare to large and common as a threshold function of local population. Finally, we predict that internal density
99
References
Anderson, B.S., Butts, C.T., Carley, K.M., 1999. The interaction of size and density with
graph-level indices. Social Networks 21 (3), 239267.
Bivand, R.S., Pebesma, E.J., Gmez-Rubio, V., 2008. Applied Spatial Data Analysis with
R. Springer, New York.
Bossard, J.H.S., 1932. Residential propinquity as a factor in marriage selection. American Journal of Sociology 38, 219244.
Brakman, S., Garretsen, H., Van Marrewijk, C., Van Den Berg, M., 1999. The return of
Zipf: towards a further understanding of the rank-size distribution. Journal of
Regional Science 39 (1), 183213.
Burian, S.J., Brown, M.J., Velugubantla, S.P.,2002. Building height characteristics in
three U.S. cities. In: Fourth Symposium on Urban Environment. American Meteorological Society, Norfolk, VA, pp. 129130.
Butts, C.T., 2001. The complexity of social networks: theoretical and empirical ndings. Social Networks 23 (1), 3171.
Butts, C.T., 2002. Spatial Models of Large-scale Interpersonal Networks. Carnegie
Mellon University, Doctoral Dissertation.
Butts, C.T., 2003. Predictability of large-scale spatially embedded networks. In:
Breiger, R., Carley, K.M., Pattison, P. (Eds.), Dynamic Social Network Modeling and Analysis: Workshop Summary and Papers. National Academies Press,
Washington, DC.
Butts, C.T., 2006a. Curved exponential family parameterizations for spatial network
models. In: Presentation to the 26th Sunbelt Network Conference (INSNA).
Butts, C.T., 2006b. Exact bounds for degree centralization. Social Networks 28 (4),
283296.
Butts, C.T., 2008. Social network analysis with SNA. Journal of Statistical Software
24 (6).
Butts, C.T., 2010. Bernoulli graph bounds for general random graphs. Technical
Report MBS 10-07, Irvine, CA.
Butts, C.T., Acton, R.M., 2011. Spatial modeling of social networks. In: Nyerges, T.,
Couclelis, H., McMaster, R. (Eds.), The SAGE Handbook of GIS and Society. SAGE
Publications (Chapter 12).
Carley, K.M., 2002. Computational organizational science and organizational engineering. Simulation Modeling Practice and Theory 10, 253269.
Daraganova, G., Pattison, P., 2007. Social networks and space. In: Presentation to the
2007 International Workshop on Social Space and Geographical Space.
Faust, K., 2007. Very local structure in social networks. Sociological Methodology 37
(1), 209256.
Festinger, L., Schachter, S., Back, K., 1950. Social Pressures in Informal Groups. Stanford University Press, Stanford, CA.
Freeman, L.C., Freeman, S.C., Michaelson, A.G., 1988. On human social intelligence.
Journal of Social and Biological Structure 11, 415425.
Gentle, J.E., 1998. Random Number Generation and Monte Carlo Methods. Springer,
New York.
Hgerstrand, T., 1967. Innovation Diffusion as a Spatial Process. University of Chicago
Press, Chicago.
Hamilton, D., Handcock, M.S., Morris, M., 2008. Degree distributions in sexual networks: a framework for evaluating evidence. Sexually Transmitted Diseases 35
(1), 3040.
Handcock, M.S., 2003. degreenet: Models for Skewed Count Distributions Relevant
to Networks. Seattle, WA. Version 1.0.
Handcock, M.S., Hunter, D.R., Butts, C.T., Goodreau, S.M., Morris, M., 2008. statnet:
software tools for the representation, visualization, analysis and simulation of
network data. Journal of Statistical Software 24 (1).
Handcock, M.S., Raftery, A.E., Tantrum, J.M., 2007. Model based clustering for social
networks. Journal of the Royal Statistical Society, Series A 170, 301354.
Haynes, K.E., Fotheringham, A.S., 1984. Gravity and Spatial Interaction Models. Sage,
Beverly Hills, CA.
Hipp, J.R., Perrin, A.J., 2009. The simultaneous effect of social distance and physical distance on the formation of neighborhood ties. City and Community 8 (1),
525.
Hoff, P.D., Raftery, A.E., Handcock, M.S., 2002. Latent space approaches to social
network analysis. Journal of the American Statistical Association 97 (460),
10901098.
Latan, B., Liu, J.H., Nowak, A., Bonevento, M., Zheng, L., 1995. Distance matters:
physical space and social impact. Personality and Social Psychology Bulletin 21
(8), 795805.
Mayhew, B.H., 1984a. Baseline models of sociological phenomena. Journal of Mathematical Sociology 9, 259281.
Mayhew, B.H., 1984b. Chance and necessity in sociological theory. Journal of Mathematical Sociology 9, 305339.
100
Mayhew, B.H., Levinger, R.L., 1976. Size and density of interaction in human aggregates. American Journal of Sociology 82, 86110.
McPherson, J.M., 1983. An ecology of afliation. American Sociological Review 48,
519532.
McPherson, J.M., 2004. A blau space primer: prolegomenon to an ecology of afliations. Industrial and Corporate Change 13, 263280.
McPherson, J.M., Smith-Lovin, L., Cook, J.M., 2001. Birds of a feather: homophily in
social networks. Annual Review of Sociology 27, 415444.
Pattison, P.E., Robins, G.L., 2002. Neighborhood-based models for social networks.
Sociological Methodology 32, 301337.
Ripley, B.D., 1988. Statistical Inference for Spatial Processes. Cambridge University
Press, Cambridge.
Sampson, R.J., Raudenbush, S.W., Earls, F., 1997. Neighborhoods and violent crime:
a multilevel study of collective efcacy. Science 277, 918923.
Snijders, T.A.B., 2002. Markov chain Monte Carlo estimation of exponential random
graph models. Journal of Social Structure 3 (2).
Wasserman, S., Faust, K., 1994. Social Network Analysis: Methods and Applications.
Cambridge University Press, Cambridge.
Watts, D.J., Strogatz, S.H., 1998. Collective dynamics of small-world networks.
Nature 393, 440442.
West, D.B., 1996. Introduction to Graph Theory. Prentice Hall, Upper Saddle River,
NJ.
White, D.R., Tambayong, L., Kejzar, N., 2008. Oscillatory dynamics of city-size distributions in world historical systems. In: Modelski, G., Devezas, T., Thompson,
W.R. (Eds.), Globalization as an Evolutionary Process: Modeling Global Change.
Routledge, London.
Zipf, G.K., 1949. Human Behavior and the Principle of Least Effort.