Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
T
the functional relationship of the ancillary data
here has been much recent research on with population density. In traditional dasymetric
areal interpolation of population data. mapping approaches, this relationship has been
Much of this interest has been driven by specified subjectively by the analyst (Wright 1936;
demand for small-area population estimates for Eicher and Brewer 2001). More recently, statisti-
regions in which only relatively coarse resolution cal methods have been used to characterize this
population data can readily be obtained. Such relationship (Goodchild et al. 1993; Langford et
data sets are useful in a wide range of applica- al. 1991).
tions, such as emergency planning and manage- In previous research, we described the develop-
ment (Dobson et al. 2000), public health (Hay ment of an algorithm for dasymetric mapping that
et al. 2005), and monitoring global population relies on sampling the source population data
(Sutton et al. 2001). The increasing volume and to quantify the population density of individual
availability of remotely sensed imagery, which ancillary data classes (Mennis 2003; Mennis and
has been shown to indicate population distribu- Hultgren 2005). Here, we extend this previous
tion (Liu 2003; Holt et al. 2004; Wu et al. 2005), research to present a new “intelligent” dasymetric
has driven much of the recent research in areal mapping (IDM) technique that supports a variety
interpolation of population. of methods for characterizing the relationship
A prominent method in areal interpolation is between the ancillary data and underlying statis-
dasymetric mapping, defined here generally as tical surface. We refer to the technique as intel-
the use of an ancillary data set to disaggregate ligent because an analyst may: 1) establish this
coarse resolution population data to a finer reso- relationship subjectively using their own domain
lution (Eicher and Brewer 2001). Recent research knowledge; 2) extract this relationship from the
suggests that dasymetric mapping can provide data using a novel empirical sampling technique;
more accurate small-area population estimates than or 3) combine the subjective and empirically based
many areal interpolation techniques that do not methods. The IDM method is implemented as a
use ancillary data (Mrozinski and Cromley 1999; geographic information system (GIS) extension that
Gregory, 2002). However, this research has not facilitates the parameterization of the technique
and returns a set of statistics that summarize the
Jeremy Mennis, Department of Geography and Urban Studies, quality of the resulting dasymetric map. As a case
1115 W. Berks St., 309 Gladfelter Hall, Temple University, study, IDM is used to redistribute U.S. Census tract-
Philadelphia, PA 19122. Phone: (215) 204-4748, Fax: (215) level data for four population variables for the
204-7388, Email: <jmennis@temple.edu>. Torrin Hultgren,
Denver, Colorado, region to sub-tract units using
Department of Geography, UCB 360, University of Colorado,
Boulder, CO 80309. Email: <torrin47@yahoo.com>.
ancillary land cover data. U.S. Census block-level
data for the same region are used to analyze the
Cartography and Geographic Information Science, Vol. 33, No. 3, 2006, pp. 179-194
accuracy of the derived dasymetric map. Different within which dasymetric mapping may be placed.
parameterizations of IDM are compared with other, The typology delineates methods for combining
conventional areal interpolation methods. choropleth and area-class maps; in the latter
case, zone boundaries demark regions of rela-
tively homogeneous character (Mark and Csillig
Previous Research in Dasymetric 1989). Mrozinski and Cromley (1999) distinguish
Mapping and Areal Interpolation between the “alternate geography” problem, in
which areal interpolation is used to transform
To our knowledge, the earliest reference to data from the choropleth map source zones to
dasymetric mapping is the 1922 population the area-class map target zones, and the “polygon
map of European Russia by Russian cartogra- overlay” problem, in which the target zones are
pher Semenov Tian-Shansky (for discussion, see formed by the intersection of the choropleth and
Fabrikant’s (2003) and Bielecka’s (2005) read- area-class maps.
ings of the work of Preobrazenski (1954) and The most basic method for areal interpolation is
(1956), respectively). J.K. Wright (1936) popu- areal weighting, in which a homogeneous distribu-
larized dasymetric mapping in the U.S. and is tion of the data throughout each source zone is
often incorrectly cited as its inventor, though he assumed. Each source zone therefore contributes to
noted the Russian origin of the term “dasymet- the target zone a portion of its data proportional
ric” (Wright 1936, p. 104). Modern cartography to the percentage of its area that the target zone
textbooks define a dasymetric map as one that occupies. In the case of the alternate geography
displays statistical surface data by exhaustively problem, if we denote a choropleth source zone
partitioning space into zones that reflect the s and an area-class map zone z, then the target
underlying statistical surface variation (e.g., Dent zone t = z. The estimation of the count for the
1999; Slocum et al. 2003). Ideally, the zones in target zone is:
a dasymetric map should be as near to homo- n
y s As ∩ z
geneous in character as possible, having near yˆ t = ∑
constant values within, and having boundaries s =1 As (1)
coincident with, the surface’s steepest escarp-
ments. where:
Dasymetric mapping as a procedure is applied ŷ t = the estimated count of the target zone;
to data sets for which the underlying statistical ys = the count of the source zone;
surface is unknown, but for which aggregated data As ∩ z = the area of the intersection between
already exist, though the zones of aggregation are the source and target zone;
not derived from the variation in the underly- As = the area of the source zone; and
ing statistical surface but are rather the result of n = the number of source zones with
some convenience of enumeration. The process of which z overlaps (Goodchild and Lam
dasymetric mapping is thus the transformation 1980).
of data from the arbitrary zones of data aggre- In the polygon overlay problem, where t = s ∩ z
gation to a dasymetric map in order to recover , and each target zone intersects one and only
and depict the underlying statistical surface. In one source zone, Equation (1) may be simplified
dasymetric mapping, the transformation of data to read:
from the arbitrary zones of the original source y s At
data to the meaningful zones of the dasymetric yˆ t =
map incorporates the use of an ancillary data set As (2)
that is separate from, but related to, the variation
in the statistical surface (Eicher and Brewer 2001). where At is the area of the target zone.
Dasymetric mapping therefore has a close rela- Dasymetric mapping can be considered an
tionship to areal interpolation—the transforma- approach to the polygon overlay areal interpo-
tion of data from a set of source zones to a set of lation problem which seeks to improve on areal
target zones with different geometry (Goodchild weighting by establishing a relationship between
and Lam 1980). the underlying statistical surface and the differ-
Recent research in dasymetric mapping has been ent classes contained within the area-class map.
subsumed in large measure under the topic of Dasymetric areal interpolation techniques can
areal interpolation. Mrozinski and Cromley (1999) be distinguished from other areal interpola-
provide a helpful typology of areal interpolation tion approaches that either do not make use of
10. Percent cover (80 percent) sampling with 18. Percent cover (80 percent) with regions and
regions and presets; without presets;
11. Percent cover (90 percent) sampling with 19. Percent cover (90 percent) with regions and
regions and presets; without presets.
12. Centroid sampling without regions and To support the error analysis and comparison
without presets; of the different areal interpolation maps, the
13. Percent cover (70 percent) sampling without difference between the estimated and actual
regions and without presets; population data for each Census block was cal-
14. Percent cover (80 percent) sampling without culated. As has been done in previous research
regions and without presets; (Eicher and Brewer 2001), we use maps of the
15. Percent cover (90 percent) sampling without count error (the difference between the actual
regions and without presets; and estimated variable counts) at the validation
16. Centroid with regions and without presets; data level to visually explore the nature of the
17. Percent cover (70 percent) with regions and error. Our quantitative assessment of error is also
without presets; similar to that used by previous researchers in
Results
Dasymetric Map Output
Because the volume of results
(including 19 areal interpola-
Figure 2. Land cover map of the study area used as the ancillary data in the tion maps, error maps, and
dasymetric mapping methods (detail of inset box area shown at bottom). County summary files) is too large to
boundaries, clipped to the study area, and county names are also shown for present here in full, we focus
reference. on just one representative IDM
output as an example before
its use of the root mean square (RMS) error as a turning to the quantitative analysis of all 19 maps.
summary of the error within each original source Figure 3 shows the map of total population pro-
zone (Fisher and Langford 1995; Gregory 2002; duced using the centroid sampling method with
Mrozinski and Cromley 1999). The error within regions and presets (#8). Note that the map
a given source zone is calculated as: depicted in Figure 3 is only one visualization
∑ (yb − yˆ b )2 of the vector polygon data layer produced by
RMS the IDM run; a more detailed map could easily
Es = b∈s
(7)
q be generated by using a larger number of class
intervals or by altering the interval boundaries.
where: Clearly, the map presented in Figure 3 offers a
Es
RMS =the RMS error of source zone s; far more detailed depiction of population den-
yb = the actual population of block b; sity than the analogous choropleth map shown
ŷ b = the estimated population of block b; in Figure 1. This is particularly true in subur-
and ban and exurban areas, where the tracts tend to
q = the number of blocks contained within s. be larger, and where the land cover tends to be
Error Maps
Figure 6 shows a block-level count error map for
the map shown in Figure 3, where count error
is calculated as the actual population subtracted
from the estimated population of the block. The
mean count error is zero and the standard devi-
ation is 84. Clearly, a far greater area of blocks
is subject to overestimation, as compared to
underestimation, at greater than one standard
deviation. This reflects the fact that relatively
large rural blocks tend to be overestimated while
relatively small urban blocks tend to be under-
estimated. Similar patterns have been found by
other researchers in dasymetric mapping (Eicher
and Brewer 2001; Harvey, 2002a). In the study
region, these overestimated rural areas occur
primarily on the western border, where the
plains meet the foothills of the Rocky Mountains.
These areas are typically large swaths of sparsely
populated shrubland, encoded as part of the
vegetated class on the land cover map (Figure
2).
Figure 4. A visual comparison of the tract-level choropleth Figure 7 shows a close-up view of the boxed area
map of population density (top) with the dasymetric map in Figure 6, demonstrating the nature of the error.
of population density (bottom) for the inset box area
shown in Figure 3. Note that the class interval and color
schemes are the same as for Figure 1A.
Note: “Sample Number” is the number of sampled source zones. “Sample Mean” is the mean data density (in this case, population density)
of the sampled source zones. “Sample SD” is the standard deviation of the data density of the sampled source zones. “Estim. Method” is the
method used to estimate the target density. “Target Mean” is the mean data density of the target zones (the dasymetric map layer polygons).
And “Target SD” is the standard deviation of the data density of the target zones. “Hi Den Res” and “Lo Den Res” stand, respectively, for High and
Low Density Resolution.
Table 2. Abridged version of the summary table associated with a dasymetric mapping of total population using percent
cover sampling with an 80 percent setting, regions, and presets.
tinguish populated ancillary classes from unpop- the others, given the variability in the results. This
ulated ones, the empirical sampling approach of variability in accuracy stems in large measure from
IDM can recover that information and produce the advantages and disadvantages of each of the
an areal interpolation map equal in quality to, or IDM sampling methods. The centroid method
better than, that using the binary method. guarantees a high sample rate, as each source zone
This study also indicates that the binary method centroid falls within an ancillary class zone. The
can be improved by combining an analyst’s domain potential problem with centroid sampling is that it
knowledge of populated/unpopulated areas with a is vulnerable to outliers in the form of, essentially,
sampling approach for specifying the other ancil- incorrect samples. For instance, consider the case
lary class-statistical surface relationships. Those where a source zone with a high density is covered
methods that consistently scored among the lowest primarily by ancillary class A, but its centroid lies
in mean CV were those that combined presets with in ancillary class B. Thus, the high density value
sampling. Our results suggest that improvements will be associated with class B, whereas, in reality,
to the binary method can be obtained with even the ancillary class A has the high density.
moderate sampling quality, when combined with This shortcoming is corrected by the contained
appropriate preset values. The use of refined areal sampling method. However, the requirement for
weighting also appears to contribute to improved complete containment of the source zone within
estimates in the face of poor sampling. a target zone is rarely met in situations where the
Although IDM shows a clear improvement over ancillary data are encoded at a finer resolution than
areal weighting, caution is advised in interpreting the source zone data, as in the present study. A low
the superiority of one IDM sampling method over number of samples is thus obtained, leading to a
Note: Interpolation methods are listed 1-19 on the X and Y axes (see text for the method associated with each number). A significant difference
between methods at the 0.10 level is indicated by a letter at the intersection of two methods. An ‘a’ indicates significant difference in total
population, a ‘b’ in Hispanic population, a ‘c’ in children, and a ‘d’ in households.
Table 4. Results of ANOVA Tamhanes T2 posthoc test of significant difference in mean CV score between each pair-wise
combination of areal interpolation maps, for fach variable.
reliance on other density estimation methods. The rural areas would capture such spatial variation—if,
percent cover method offers a compromise between indeed, it exists—better than counties, which can
the sampling rate/vulnerability to outliers trade- contain both urban and rural areas.
off by allowing the user to increase the sampling We also argue that through the evaluation pro-
rate by lowering the percent cover threshold. But vided by the summary file that accompanies the
as the threshold is lowered, the method becomes dasymetric map output, an analyst is able to identify
increasingly vulnerable to the problems associated the parameterization(s) that are best suited for
with the centroid method. that particular application. While a comparison
We emphasize, however, that our purpose here is of the summary files from multiple IDM runs will
not to identify the best specific IDM parameteriza- not provide a comprehensive accuracy assessment
tion for all applications. In fact, our results suggest
(which is only possible if one has validation data
that there is not one most accurate parameteriza-
already in hand), it allows the analyst to make an
tion, but rather the accuracy of each varies accord-
informed decision as to the relative accuracies
ing to the nature of the statistical surface under
of multiple IDM parameterizations based on a
analysis and its relation to the geometry of the
source zones, ancillary class zones, and regions. For number of factors:
instance, although IDM runs utilizing regions did • The logical rankings of the mean estimated
not provide any more accurate areal interpolations data densities among classes;
than those that did not use regions, we speculate • The sampling rate and how the samples were
that the use of regions may provide a significant acquired for each class;
improvement in accuracy in other areal interpola- • The variability in the sampled source zone
tion contexts where the regions data better capture data densities for each class;
spatial variation in the functional relationships • The variability in the estimated target zone
between the statistical surface and the individual data densities for each class; and
ancillary classes. In the present study, for example, • The similarity between the sampled data den-
perhaps an ancillary layer distinguishing urban and sity mean and standard deviation and the
estimated data density mean and standard dasymetric mapping outperforms areal weight-
deviation, respectively. ing, and certain IDM parameterizations outper-
form the binary method.
This research advances previous approaches in
Conclusion areal interpolation and dasymetric mapping in a
The primary contribution of this research is the number of ways. First, unlike previous approaches
presentation of a new dasymetric mapping tech- that quantify the functional relationship between
nique, IDM, which combines an analyst’s domain the statistical surface and the ancillary data through
knowledge with a data-driven methodology to strictly subjective or statistical means, IDM sup-
specify the functional relationship of the ancil- ports the ability to combine domain knowledge
lary data classes with the underlying statistical (using preset density values) with statistical estima-
surface being mapped. The data-driven compo- tion (using empirical sampling) to quantify this
nent of IDM employs a flexible empirical sam- relationship. The information provided by the
pling approach to acquire information on the use of presets and sampling is, in a sense, re-used
data densities of individual ancillary classes, and to makes density estimates for ancillary classes
it uses the ratio of class densities to redistribute for which no other density information is avail-
population to sub-source zone areas. We have able (using refined areal weighting). Intelligent
also shown how summary statistics of the result- dasymetric mapping also differs from previous
ing dasymetric map can be used to compare the approaches in that it offers a variety of parameter-
quality of the output of different IDM param- ization options regarding the sampling technique
eterizations. Finally, we have demonstrated IDM employed, as well as the use of regions. Thus,
using a case study of four population variables instead of a single areal interpolation product,
and presented a visual and quantitative error IDM can return multiple products whose relative
assessment comparing various IDM parameter- performance may be compared using the auto-
izations with conventional areal weighting and matically generated summary statistics that IDM
binary dasymetric mapping methods. Intelligent provides. Since areal interpolation is, by defini-