Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
doi: 10.1111/j.1365-2389.2011.01375.x
Summary
In central Queensland, Australia, relatively little is known about where gullies occur (gully presence). This is despite a general acceptance among scientists and politicians that gully erosion in the region is an ecologically important process, exacerbated by grazing pressure. We aimed to create a risk map of gully presence for a 4.86 106 -ha area of central Queensland dominated by grazing and thought to be particularly prone to gully erosion. We achieved this by using (i) light detection and ranging (lidar) technology (vertical accuracy < 0.15 m; spatial resolution 0.5 m) to observe topography on transects at eight selected sites within the study area, (ii) object-oriented classication to derive gully presence from lidar observations and (iii) a random forest to model the relationship between gully presence and a set of readily available explanatory variables (comprising soil, topography, and vegetation information; nest spatial resolution 25 m) and (iv) extrapolating the model to unsampled locations. Cross-validation indicated that the predictive ability of the model was modest, with an average area under the receiver operating characteristic curve of 0.62 (where 1.0 is a perfect model and 0.5 is no better than chance). The greatest risk of gully presence was associated with areas of large topographic variation, and where, coincidentally, there was relatively little long-term vegetation cover. Ultimately, however, we acknowledge that the quality of the map is limited by the small area of observed lidar data relative to the study area, the relatively coarse spatial resolution of the explanatory variables and the possibility that gully presence is the result of different processes at different locations.
Introduction
Soil erosion by water can have dramatic and far-reaching consequences. This is especially true in central Queensland, Australia, where east-owing rivers carry sediment to the southern reaches of the Great Barrier Reef lagoon (Figure 1). The adverse effect of sediment on the water quality of the World Heritage-listed marine ecosystem stimulates environmental and political interest. A plausible hypothesis for the source of at least some of the sediment is gully erosion (Prosser et al., 2001; Rustomji, 2006). We follow Hughes et al. (2001) in dening a gully as a steep-walled, poorly vegetated incision in the landscape with a catchment area of 10 km2 or less. This study was motivated by the Australian and Queensland governments political and environmental imperatives (Reef Water Quality Protection Plan Secretariat (2009) and the Delbessie Agreement (DERM, 2007)) to improve water quality and land condition in the catchments that drain into the Great Barrier Reef
Correspondence: A. H. Eustace. E-mail: Alisa.Eustace@qld.gov.au Received 29 September 2009; revised version accepted 28 February 2011
lagoon. These imperatives emphasize on-ground investment by land managers for the prevention or remediation of erosional features, including gullies. There is, therefore, a clear need for accurate, ne-scale mapping of where gullies occur in the landscape to (i) help target sites for investment, (ii) assist post-investment monitoring and (iii) quantify the contribution of gullies to the sediment budget. Hughes et al. (2001) predicted that the Nogoa River catchment in central Queensland had the largest gully density (line-length of gullies per unit area) of anywhere in Australia. Unfortunately, their mapping was not at a spatial scale suitable to address these imperatives. They also acknowledged that their model was most uncertain in central Queensland, because of a lack of data. With these results in mind, we targeted the Nogoa catchment and the surrounding area for an investigation of where gullies occur. We use the term gully presence to describe a binomial variable, determined on a ne grid over the surface of the land, coded as Gully at locations where incised areas occur, and as Non-gully elsewhere. To map gully presence is an ambitious undertaking, particularly when the area of interest is large. In such a case, the conventional approach by expert interpretation of ne-resolution
2011 The Authors Journal compilation 2011 British Society of Soil Science
431
N 0
100
Kilometres 200
Isaac
G a re rie ar tB
22S
rR ee fL ag oo n
4 3
EMERALD
Mackenzie
Fitzroy
Latitude
24S
2
Nogoa
5 6
Comet
1 7
Dawson
26S
146E
148E
150E
Longitude
Figure 1 Six catchments comprising the Fitzroy Basin (Fitzroy, Isaac, MacKenzie, Dawson, Comet and Nogoa); the boundaries of each are shown as bold grey lines. The main drainage line in each catchment is shown as a black line. The study area is the western portion of the basin, bounded by the black box. The locations of the eight x-congured sites inside the study area are labelled (NB not drawn to scale). Each site contains two lidar transects. Inset: the location of the Fitzroy Basin relative to Queensland (the grey polygon) and to Australia.
imagery followed by extensive eld validation might not be an efcient use of resources. A more efcient approach, which we follow here, would be to map gully presence according to its
Table 1 Studies that have used statistical modelling to characterize the spatial distribution of gully attributes Study Meyer & Martnez-Casasnovas (1999) Hughes et al. (2001) Country (area of interest) Spain (two catchments, 2500 ha each) Australia (continent) Response variable Presence Explanatory variablesa T, S, L, H Model Logistic regression Accuracy Overall accuracy 85%.
Density/mm2
T, S, L, C, G, V
Piece-wise regression
Spain (60 ha) USA (three watersheds; 8750 ha, 2600 and 2448 ha) Lebanon (67 600 ha) Brazil (5200 ha) Belgium (1329 ha)
T, H, B T, H, B
Correlation of predicted with observed ranged from r = 0.43 to 0.83. Model accounted for 87% of the variation. Overall accuracy 78%.
Bou Kheir et al. (2007) Vrieling et al. (2007) Vanwalleghem et al. (2008) Guti errez et al. (2009)
Best model explained 80% of variation in gully size. Best overall accuracy 75%. Overall accuracy 77%.
Presence
Many individual variables were used, but they can be grouped as: B, basin metrics; C, climate; G, geology; H, hydrology; L, land-use; P, proximity metrics; S, soil; T, topography; V, vegetation cover. 2011 The Authors Journal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441
A potential barrier to developing a statistical model of gully presence concerns the response variable itself, because obtaining adequate spatial information on gully presence is a difcult task. One possibility, which we follow, is to gather ne-resolution topographic information at various locations about the landscape. Light detection and ranging (lidar) technology (Petrie & Toth, 2009) is a source of detailed topographic information, and has been used successfully to quantify change in the landscape caused by erosion (Ritchie, 1995; Thoma et al., 2005). Lidar is costly, and so is generally used for relatively small areas only; however, the data that it provides (typically accurate to <0.25 m laterally and <0.15 m vertically) will be adequate to characterize topographic variation caused by gullies.
Aims
Our aim was to create a risk map for gully presence for a region of central Queensland, Australia, identied as a potential hot-spot of soil erosion. We intended to do this by (i) deducing gully presence from intensive local lidar studies, (ii) creating a statistical model that relates gully presence in the lidar-surveyed areas to a set of readily available explanatory variables and (iii) extrapolating the probability-based predictions of the model to the extent of the study area.
Methods
Study region
Our study area is a 4.86 106 -ha region in central Queensland, comprising almost all of the Nogoa River and Comet River catchments, and about half of the MacKenzie River catchment (Figure 1). These three catchments drain the western portion of the Fitzroy Basin, the largest east-owing river system in Australia. The study area has a subtropical climate, and receives about 600 mm year1 of rain. Generally, about half of this falls during the summer months. Native vegetation is predominantly Brigalow (Acacia harpophylla F. Muell. ex Benth.) and woodlands of Eucalyptus species. These species have been cleared extensively since European settlement such that, now, 78% of the study area is devoted to grazing (Rowland et al., 2006). The dominant Orders of the Australian Soil Classication (Isbell, 1996) are Vertosols and Sodosols (32 and 44% of the area, respectively). The sodicity of the latter order has, in sloping areas where vegetation has been disturbed, resulted in often spectacular erosional features (Hubble & Isbell, 1983).
Delineation of gullies
Lidar is a source of nely detailed topographic information, suitable for the study of gullies. We selected eight sites within the study area where aircraft-mounted lidar observations would be made. The sites were selected by striking an informal balance between geographical spread, prior knowledge about gully hot-spots, and budgetary constraints. Ninety per cent of the
lidar-observed area was classied as grazing according to a Fitzroy Basin land-use map (Rowland et al., 2006); the remainder was associated with nature conservation. Five of the sites were observed on 35 February 2007, with an Optech ALS ALTM 3100 Enhanced Accuracy lidar scanner (Optech International Inc., Kiln, MS, USA). The altitude of the sensor was approximately 850 m, which gives a vertical accuracy of <0.10 m to the observations (excluding GPS error) (Petrie & Toth, 2009). The remaining three sites were observed on 27 July 2007, using a Leica ALS50 scanner (Leica Geosystems AG, Heerbrugg St Gallen, Switzerland), at an altitude of approximately 2000 m. The vertical accuracy of this sensor is <0.15 m (Petrie & Toth, 2009). For each site, two lidar transects were acquired in an x conguration (Figure 1), with an average point density of 3.3 points per 1 m2 . There was a total of 16 transects across the eight sites. The x conguration was used to enable future research into the co-registration of repeated lidar acquisitions within the overlapping area. The dimension of each transect was 5000 m (the length of the ight path) by 250300 m (the swath width varied according to the sensor used). Data were supplied from the contractors in a point-cloud conguration that consisted of lateral and vertical coordinates (the latter in metres above sea-level) and the backscatter intensity (i.e. the relative strength of the returning lidar signal). A classication of the point-cloud as either ground or above-ground was also provided, which separated Earths surface from standing vegetation. The ground classication of the 16 point-cloud les was converted to digital elevation models (DEMs) using an inversedistance-weighted interpolation algorithm. The backscatter intensities were processed with nearest-neighbour interpolation. The DEMs and backscatter intensities were stored in raster format with 0.5-m pixels. We aimed to delineate gullies from the lidar-derived DEMs to serve as the response variable in a statistical model. To do this, object-oriented classication (Baatz & Sch ape, 2000; Brennan & Webster, 2006) was identied as an appropriate tool, because of the characteristic shapes of gullies, and their afnity for certain parts of the landscape. An additional advantage of object-oriented classication is that it has the potential to semi-automate the laborious, and somewhat subjective, process of manual delineation. Object-oriented classication uses multiresolution segmentation (MRS) (Baatz & Sch ape, 2000; Deniens, 2006) to classify an image (or set of images) into regions of relatively homogenous pixels, known as image-objects, on the basis of both spectral and spatial characteristics. This is distinct from a conventional per pixel classication, which is solely spectral (Benz et al., 2004). We used the MRS algorithm of Deniens 5.0 object-oriented classication software (Deniens, 2006) to segment the lidar information into two classes, Gully and Non-gully. For each of the 16 lidar transects across the eight sites, the images input to the classication algorithm were (i) the DEM, (ii) backscatter intensity, (iii) slope, calculated from the DEM, (iv) local standard deviation of the DEM in a 3-pixel 3-pixel moving window and (v) as for (iv), but using slope. The MRS requires, as input, user-dened shape and colour factors to control the spectral and
2011 The Authors Journal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441
where z = 0 m. Pixels where 0 m < z 0.15 m were excluded from further analysis to avoid confusion with possible lidar inaccuracies. The nal Gully observations were analysed further at a 25-m scale for each of the eight sites.
Explanatory variables
From the digital archives of the Queensland Government we gathered a set of 17 ancillary variables that might plausibly relate to gully presence (Table 3). This information covered the extent of the study area, and related aspects of soil, topography and vegetation. Explanatory variables originating in polygon formats were converted into 25-m pixel rasters. Differences in resolution were resolved by using a near-neighbour algorithm to resample all the explanatory variables to the same grid. Continuous soil attributes were retrieved from an archive of nationwide soil information (CSIRO, 2006; Brough et al., 2006). The attributes were stored as interpolated surfaces in a polygon-based format, unfortunately without estimation variances. This was not ideal, but we retained the information because attributes such as dispersivity and texture
Table 3 Ancillary spatial information for the study area, used as explanatory variables for modelling gully presence Variablea Soil Clay content CEC ESP Organic carbon Soil order Salinity hazard Topographic DSM Local slope (3 3 window) Drainage network Vegetation Bare-ground index Label Units Comment
Clay_a Clay_b Cec_a Cec_b Xna_a Xna_b oc_a oc_b Ord Sal
A horizon B horizon A horizon B horizon A horizon B horizon A horizon B horizon 10 classesb 3 classes (low, medium, high) See Tickle et al. (2009) Variance Maximum Minimum 2 classes (in, out) Mean, 19882006 Standard deviation, 19882006
m ( )2
key: CEC, cation exchange capacity; DSM, digital surface model; ESP, exchangeable sodium percentage. b The orders (followed by the per cent coverage of the study area): Calcarosols (2%), Dermosols (5%), Ferrosols (4%), Kandosols (<1%), Kurosols (2%), Organosols (<1%), Rudosols (8%), Sodosols (44%), Tenosols (1%) and Vertosols (32%). () Indicates that the variable was either dimensionless or categorical.
a Acronym
2011 The Authors Journal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441
are known to affect the susceptibility of a site to erosion. We also retrieved a map of the soil order from the Australian Soil Classication (Isbell, 1996), and a map of the perceived salinity hazard for the study area. Topographic information was derived from a one-second (approximately 30-m spatial resolution) digital surface model (DSM), obtained from the Space Shuttle Radar Topography Mission (Farr et al., 2007; Tickle et al., 2009). The DSM was corrected for striping artefacts that affected the original signal; however, the coverage for Australia has not yet been corrected for vegetation height. From the DSM we computed slope, to which we applied a 3 3-pixel moving window that derived the local variance, minima and maxima of the slope. The remaining topographic information was a binary variable that determined whether a location fell within 25 m of the drainage network, derived from a map of the stream network of Queensland (1:250 000 scale). Vegetation information was based on a calibrated empirical model, known as the Bare-Ground Index (BGI) (Scarth et al., 2006), applied to Landsat imagery. BGI is the proportion of ground not covered by vegetation (living or dead) when viewed vertically downwards from a standing position on the ground. The BGI model is applied to the Landsat imagery on a per-pixel basis, but only at those pixels considered to have less than a threshold proportion of tree cover; we used <0.15 as a threshold. Estimates of tree cover are determined by a calibrated empirical model of foliage projective cover (FPC) (Armston et al., 2009). Landsat imagery, from two different sensors (Thematic Mapper and Enhanced Thematic Mapper+), is the basis of the Queensland Governments statewide vegetation-monitoring programme. Imagery for the entire state is acquired once per year, mostly during the dry season (May to October). Details of the geometric and radiometric corrections applied to the imagery can be found in DERM (2008). As part of the rectication process, the images are resampled to 25-m pixels from the original 30 m. Once all the gully observations and the explanatory variables were rasterized and aligned on the same scale (25 m) and grid, we intersected the gully presence map (25 m) with the set of explanatory variables.
according to the users requirements. Under the random forest many trees are grown, not just one; however, there are three crucial differences in the computation of a single tree: (i) each tree is based on a bootstrap sample of n, (ii) at each node a random selection of the p explanatory variables is used to nd the best split and (iii) each tree is left unpruned (Liaw & Wiener, 2002). For any combination of values taken by the explanatory variables, the random forest will return one prediction from each tree. The multitude of alternatives describes a probability distribution for a single prediction, which is said to be bootstrap-aggregated or bagged. It has been found that the bagged predictions of a random forest are relatively robust compared with the predictions of other classiers (Breiman, 2001b; Prasad et al., 2006; Moriondo et al., 2008). Compared with logistic regression, the random forest has two favourable characteristics: (i) it copes with non-linear relationships and (ii) it deals implicitly with interactions between explanatory variables. A disadvantage of the random forest is that it is not possible to make a formal inference about the marginal effects of explanatory variables. Other disadvantages are that it is easy for a user to include an inappropriate explanatory variable, leading to an over-parameterized model. We used the library randomForest (Liaw & Wiener, 2002), written for the R statistical software (R Development Core Team, 2009), to relate the explanatory variables to gully presence. For a constant number of explanatory variables, the complexity of a random forest is controlled by three parameters: (i) the number of trees in the forest (t ), (ii) the number of randomly selected explanatory variables used to construct each tree (m) and (iii) the minimum number of cases needed for a terminal node in a tree (q ). The default values used by randomForest are t = 500, m = p and q = 1, when applied to a categorical response variable such as gully presence. We tted two random forests to the data: the rst used t = 500, and the second t = 100; the values of m and q were held constant at their respective defaults. Two forests were made because we wished to see the effect of a reduced forest on the predictive accuracy of the model, particularly because we were concerned about the length of time it would take the larger forest to extrapolate predictions across the study area. The goodnessof-t of a random forest, when applied to a categorical variable, is given by the out-of-bag error rate: an individual tree is used to predict the response variable at the rows excluded from the bootstrap sample used to make the tree; the amount of misclassication is then averaged over all trees. For the 500-tree forest only, we assessed the relative importance of each explanatory variable (Liaw & Wiener, 2002) to the classication of gully presence. The randomForest library does this by quantifying how the outof-bag error changes when the values of an explanatory variable excluded from the bootstrap sample are shufed randomly; the variable that has the greatest importance to a model is that which, upon shufing, increases the out-of-bag error most. Daz-Uriate & Alvarez de Andr es (2006) suggest that the importance of variables, as calculated by the random forest, are not robust, and should consequently be calculated for the largest possible number of trees.
2011 The Authors Journal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441
Model extrapolation
Following appraisal of the validation and cross-validation results, we used all the data to make a nal model, Mf , to use for extrapolation across the grazing areas of the study site. If, during the process of extrapolation, a class of a categorical variable that was not included in Mf was found, we switched this class with one of those available in the model, selected at random.
Results
Delineation of gullies
Gullies were visually apparent in the 0.6-m resolution Quickbird images associated with the lidar transects (Figure 2a,e). The rules used to classify image-objects as Gully (Table 2a) tended to over-allocate the class shown as the red areas of Figure 2(b,f). The over-allocated areas included hills with variable slopes, some roads and infestations of Currant Bush (Carissa ovata R. Br.), a low-lying woody weed. The C. ovata heights were included in the original lidar classication of Ground because the sprawling, dense structure of the shrub could not be penetrated by the lidar signal. These artefacts were removed by using the rules in Table 2(b), which reduced greatly the gully-affected area (Figure 2c,g) such that it conformed to our perception of the features in the Quickbird images. For comparative purposes, the centre-lines of the gullies delineated by the independent expert are also shown (Figure 2d,h). There was good agreement between the contrasting methods. By her own admission the experts linework was conservative, because of obstruction by trees or, in some cases, cloud. The lidar-based method, which computes the area of gullies, could be useful for future studies of how gullies change through time. Overall, Figure 2 gave us condence that lidar, coupled to the rules devised for object-oriented classication (Table 2), adequately characterized gully presence.
Modelling
Following the removal of null values from the explanatory variables, n = 21 312 pixels remained for modelling. The out-of-bag error of the random forest M0 was 7.5% for both t = 500 and t = 100. This implied that in more than 90% of cases the models allocated a location correctly to Gully or Non-gully. However, this is a misleading result that reects the fact that about 90% of the pixels were Non-gully, which the models could predict with relatively good accuracy. The importance of each explanatory
2011 The Authors Journal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441
(a)
247'0"S
0.25
Kilometres 0.5
Table 4 The importance of each explanatory variable to gully presence, calculated by the random forest (t = 500)
(b)
Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Variable dsm bgi_me slo_mx bgi_sd slo_vr ord slo_mn sal oc_b oc_a clay_a xna_a xna_b cec_a clay_b cec_b drn
Importance 0.038 0.029 0.023 0.023 0.015 0.012 0.011 0.011 0.007 0.006 0.004 0.004 0.004 0.003 0.003 0.002 0.002
Latitude
(c)
247'20"S
(d)
14711'20"E 14711'40"E
Longitude (e)
Latitude
238'40"S
(f)
(g)
14724'20"E
14724'40"E
(h)
Longitude
Figure 2 Gully extents mapped using lidar and object-oriented classication: (a) a true-colour Quickbird image (0.6-m resolution, panchromaticsharpened), for part of one lidar transect, (b) the extent of gullies according to the allocation rules in Table 2(a), (c) the extent of gullies according to the reallocation rules in Table 2(b), and (d) the centre-line of gullies delineated by expert assessment of (a). Panels (eh) illustrate the same, but for part of another lidar transect.
variable in M0 (for t = 500) is shown in Table 4, ranked in order from most important to the least important. The ve most important variables in the model were the DSM, the BGI-related variables and the maximum slope and variance of the slope. This was consistent with the general notion that topography and vegetative cover determine the propensity of soil to erode. The soil order was the next most important variable, which supported the notion that some soil types are intrinsically more erodible than others. Rather than exchangeable sodium percentage or texture as expected, the most important individual soil attributes were the organic carbon of the topsoil and the subsoil. The least important variable was the map of the drainage network. The fact that the individual soil attributes were relatively unimportant suggests one of two possibilities: either soil attributes do not inuence gully formation in the study area, or the soil information, as held by the database, was not suited to our particular task. We suspect the latter, because the soil attributes are interpolated surfaces intended for use in models that operate at scales much coarser than 25-m pixels.
The ROC curves associated with the tted values of the random forest models and the validation and cross-validation predictions are shown in Figure 3. The ROC curves for the tted values of random forest M0 showed that, at both t = 500 and t = 100, the models predicted gully presence accurately (Figure 3a); the ROC curves are relatively close to the top-left corner of the plot, and the areas under each ROC curve were identical at A = 0.81. Similar results were seen for the validation data (Figure 3b). It was clear that, for predictive purposes, a 100-tree forest would sufce. The ROC curve for the cross-validation predictions (t = 100) of the eight sites is shown in Figure 3(c). The predictive ability of the random forest varied markedly across the study area, with the data in sites 1, 3 and 7 being predicted less well than those in other sites. Site 1 was unique in that its gullies were widespread rather than the localized incisions seen in other sites. This suggests that different processes determine gully presence at different locations. Sites 3 and 7 were associated mainly with minority soil orders rather than the dominant Sodosols and Vertosols (Table 3). During cross-validation, these minority classes were systematically excluded from the random forests. In locations where soil orders in the validation data did not exist in the training data, a soil order from the training data was randomly substituted to enable predictions at these locations. This was carried out on the basis that a less accurate prediction is better than no prediction. As ord was a relatively important variable (Table 4) the predictions at these locations were effectively random. The average area under the ROC curve for cross-validation was A = 0.62, which suggested that the model had a relatively weak ability to predict gully presence accurately over a large area. We contend, however, that the modied cross-validation procedure is likely to have under-estimated predictive ability. As the sample for each step in the modied cross-validation was based on a spatial region rather than a random sample of all the data, it is possible that the
239'0"S
2011 The Authors Journal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441
(a) Fitted values 1.0 Mean true-positive rate 0.2 0.4 0.6 0.8
0.0
t A 500 0.81 100 0.81 0.0 0.2 0.4 0.6 0.8 Mean false-positive rate 1.0
(b) Validation 1.0 Mean true-positive rate 0.2 0.4 0.6 0.8
Discussion
We have shown that lidar and object-oriented classication characterizes gully presence (Figure 2) in a useful way. However, it is not practical or reasonable to acquire lidar information for the entire study area because of current costs. A viable alternative, however, is based on the premise that gully presence is determined by soil, topography and vegetation cover, which can be characterized through statistical modelling. We have not been able to consider important history-related variables that can trigger gully development such as tree clearing or animal stocking rates because such information was not readily available for the entire study area. Four studies (Meyer & Martnez-Casasnovas, 1999; Vrieling et al., 2007; Vanwalleghem et al., 2008; Guti errez et al., 2009) have adopted a similar approach for modelling gully presence and reported results with varying accuracies. The only study that used the area under the ROC for accuracy assessment was Guti errez et al. (2009). The performance of our model was worse than that reported in their study. Our model of gully presence for central Queensland had reasonable accuracy at locations near to training sites, but accuracy diminished as spatial distance from the training sites increased. There are three reasons for our modest result. First, the lidar information was concentrated in too few locations across the study site, with the x-congured transects effectively halving the amount of topographic information that might otherwise have been gained. Second, the soil-related explanatory variables were not suited to a mapping exercise at a scale as ne as 25-m pixels. Third, gullies may be caused by different processes at different locations. Lentz et al. (1993), found that, even in small study areas (<5 ha), rills had site-specic correlations with soil attributes. Our study area was much larger than any of those used by the four studies above, and it may have been unrealistic of us to think that gully presence in this region could be described by a global model.
0.0
t A 500 0.83 100 0.80 0.0 0.2 0.4 0.6 0.8 1.0
Mean false-positive rate (c) Cross-validation 1.0 Mean true-positive rate 0.6 0.8
0.0
(All t = 100; mean A = 0.62) 0.0 0.2 0.4 0.6 0.8 1.0
2011 The Authors Journal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441
0 23S
15 30
60
90
d Latitude 24S b
25S
148E Longitude
0 1 2 4 6 8
149E (d)
Kilometres
Figure 4 (a) Risk map of gully presence. White represents locations either outside the study area or masked (because of water, tree cover or a non-grazing land-use). (b) Relatively large probabilities are found where there is a large variation in terrain and variable vegetation cover. (c) Volcanic plugs have relatively large probabilities around their bases. (d) Discontinuities in the surface signify a change in soil Order.
McBratney et al. (2003) proposed a framework for digital soil mapping, which we have tried to follow. They also foresaw potential problems, such as (i) missing, uninformative or circularlyderived explanatory variables, (ii) poor quality of soil information in databases, (iii) black box data-mining techniques and (iv) over-tting of the model. Each of these problems has, to some degree, inuenced our study: (i) and (ii) are the reality of digital soil mapping, where there is an innate urge to make as much use of existing data as possible; we encouraged the possibility of
(iii) and (iv) by electing to use a random forest. Breiman (2001b) argued that the predictive ability and the parsimony of a model are mutually exclusive concepts: simple models are undoubtedly easier to interpret but are less accurate. In our case, we considered robust prediction of gully presence to be more important than inference about the underlying mechanism of the process. Random forest is known to be a robust predictor (Breiman, 2001b; Prasad et al., 2006; Moriondo et al., 2008). Breiman (2001a) showed that a random forest does not overt the information in the sense that
2011 The Authors Journal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441
Conclusions
We have created a risk map of gully presence for our study area within central Queensland, Australia. This has been achieved by (i) using ne-resolution lidar to quantify local topography at eight sites in the study area, (ii) carrying out object-oriented classication to derive gully extent from the lidar observations, (iii) developing a random forest to model the relationship between gully presence and soil, topography and vegetation status and (iv) extrapolating the model across the study area at the scale of 25-m pixels. The predictive ability of the model was modest. The risk map of gully presence showed that there is a large probability of gully presence in areas of large variation in topography coincident with relatively low long-term vegetation cover. This agrees with our expectation of where gullies should occur. The quality of the map is constrained by the small area of lidar information collected relative to the study area, the relatively coarse spatial resolution of the explanatory variables and the possibility that gully presence is the result of different processes at different locations. The accuracy of the risk map of gully presence would improve with further lidar acquisitions. A ner-resolution, nationwide, bare-earth digital elevation model and improved soil mapping over the area of interest would also enhance the risk map.
Acknowledgements
This study was funded by the Fitzroy Basin Association and the Queensland Department of Environment and Resource Management (DERM). We have greatly appreciated the support of Christian Witte, Neil Flood, Ken Brook and Cameron Dougall as the study progressed. We thank Dan Tindall and Tessa Chamberlain for the comments on a draft version, and Rebecca Trevithick, DERMs expert gully-delineator.
References
Armston, J.D., Denham, R.J., Danaher, T.J., Scarth, P.F. & Mofet, T.N. 2009. Prediction and validation of foliage projective cover from
2011 The Authors Journal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441
Hubble, G. & Isbell, R.F. 1983. Eastern highlands. In Soils: An Australian Viewpoint, (eds Lenaghan, J. & Katsntoni, G.), pp. 219230. CSIRO, Melbourne, Australia/Academic Press, London. Hughes, A.O., Prosser, I.P., Stevenson, J., Scott, A., Lu, H., Gallant, J. et al. 2001. Gully Erosion Mapping for the National Land and Water Resources Audit . Technical Report 26/01, CSIRO Land and Water, Canberra [WWW document]. URL http://www.clw.csiro.au/publications/ technical2001/tr26-01.pdf [accessed on 6 April 2010]. Hyde, K., Woods, S.W. & Donahue, J. 2006. Predicting gully rejuvenation after wildre using remotely sensed burn severity data. Geomorphology, 86, 496511. Isbell, R.F. 1996. The Australian Soil Classication. CSIRO Publishing, Melbourne. Kuhnert, P.M., KinseyHenderson, A., Bartley, R. & Herr, A. 2009. Incorporating uncertainty in gully erosion calculations using the random forests modelling approach. Environmetrics, 21, 493509. Lal, R., Mokma, D. & Lowery, B. 1999. Relation between soil quality and erosion. In: Soil Quality and Soil Erosion (ed. R. Lal), pp. 237258. Soil and Water Conservation Society/CRC Press, Boca Raton, FL. Lentz, R.D., Dowdy, R.H. & Rust, R.H. 1993. Soil property patterns and topographic parameters associated with ephemeral gully erosion. Journal of Soil & Water Conservation, 48, 354360. Liaw, A. & Wiener, M. 2002. Classication and regression by random Forest. R News, 2, 1822 [WWW document]. URL http://cran.rproject. org/doc/Rnews/Rnews_20023.pdf [accessed on 7 April 2010]. MartnezCasasnovas, J.A., Ramos, M.C. & Poesen, J. 2004. Assessment of sidewall erosion in large gullies using multitemporal DEMs and logistic regression analysis. Geomorphology, 58, 305321. McBratney, A.B., Mendonc a Santos, M.L. & Minasny, B. 2003. On digital soil mapping. Geoderma, 117, 352. Meyer, A. & MartnezCasasnovas, J.A. 1999. Prediction of existing gully erosion in vineyard parcels of NE Spain: a logistic modelling approach. Soil Tillage & Research, 50, 319331. Moriondo, M., Stefanini, F.M. & Bindi, M. 2008. Reproduction of olive tree habitat suitability for global change impact assessment. Ecological Modelling, 218, 95109. Petrie, G. & Toth, C.K. 2009. Airborne and spaceborne laser prolers and scanners. In: Topographic Laser Ranging and Scanning (eds J. Shan & C.K. Toth), pp. 2985. CRC Press, Boca Raton, FL. Prasad, A.M., Iverson, L.R. & Liaw, A. 2006. Newer classication and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems, 9, 181199. Prosser, I.P., Rutherfurd, I.D., Olley, J.M., Young, W.J., Wallbrink, P.J. & Moran, C.J. 2001. Largescale patterns of erosion and sediment transport in river networks, with examples from Australia. Marine & Freshwater Research, 52, 8199.
R Development Core Team 2009. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna [WWW document]. URL http://www.Rproject.org [accessed on 7 April 2010]. (ISBN 3900051070). Reef Water Quality Protection Plan Secretariat 2009. Reef Water Quality Protection Plan [WWW document]. URL http://www.reefplan.qld.gov. au/library/pdf/reefplan2009.pdf [accessed on 6 April 2010]. Rienks, S.M., Botha, G.A. & Hughes, J.C. 2000. Some physical and chemical properties of sediments exposed in a gully (donga) in northern KwaZuluNatal, South Africa and their relationship to the erodibility of the colluvial layers. Catena, 39, 1131. Ritchie, J.C. 1995. Airborne laser altitude measurements of landscape topography. Remote Sensing of Environment, 53, 9196. Rowland, T., van den Berg, D., Denham, R., ODonnell, T. & Witte, C. 2006. Land Use Change Mapping from 1999 to 2004 for the Fitzroy River Catchment. Queensland Department of Natural Resources & Water, Brisbane. Rustomji, P. 2006. Analysis of gully dimensions and sediment texture from southeast Australia for catchment sediment budgeting. Catena, 67, 119127. Scarth, P., Byrne, M., Danaher, T., Henry, B., Hassett, R., Carter, J. et al. 2006. State of the paddock: monitoring condition and trend in groundcover across Queensland. In: Proceedings of the 13th Australasian Remote Sensing and Photogrammetry Conference: Earth observation From Science to Solutions. 2024 November 2006, Canberra. Sing, T., Sander, O., Beerenwinkel, N. & Lengauer, T. 2005. ROCR: visualizing classier performance in R. Bioinformatics, 21, 39403941. Thoma, D.P., Gupta, S.C., Bauer, M.E. & Kirchoff, C.E. 2005. Airborne laser scanning for riverbank erosion assessment. Remote Sensing of Environment, 95, 493501. Tickle, P., Wilson, N., Inskeep, C., Gallant, J., Dowling, T. & Read, A. 2009. Digital Surface Model (DSM) & Digital Elevation Model (DEM) (1 Second SRTM Derived): User Guide, Version 1.0. Geoscience Australia, Canberra. Vanwalleghem, T., Van Den Eeckhaut, M., Poesen, J., Govers, G. & Deckers, J. 2008. Spatial analysis of factors controlling the presence of closed depressions and gullies under forest: application of rare event logistic regression. Geomorphology, 95, 504517. Vrieling, A., Rodrigues, S.C., Bartholomeus, H. & Sterk, G. 2007. Automatic identication of erosion gullies with ASTER imagery in the Brazilian Cerrados. International Journal of Remote Sensing, 28, 27232738. Zou, K.H., OMalley, A.J. & Mauri, L. 2007. Receiveroperating characteristic analysis for evaluating diagnostic tests and predictive models. Circulation, 115, 654657.
2011 The Authors Journal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441