Sei sulla pagina 1di 11

European Journal of Soil Science, June 2011, 62, 431441

doi: 10.1111/j.1365-2389.2011.01375.x

A risk map for gully locations in central Queensland, Australia


A. H. Eustace, M. J. Pringle & R. J. Denham
Queensland Department of Environment and Resource Management, Remote Sensing Centre, Ecosciences Precinct, 41 Boggo Road, Dutton Park, Queensland 4102, Australia

Summary
In central Queensland, Australia, relatively little is known about where gullies occur (gully presence). This is despite a general acceptance among scientists and politicians that gully erosion in the region is an ecologically important process, exacerbated by grazing pressure. We aimed to create a risk map of gully presence for a 4.86 106 -ha area of central Queensland dominated by grazing and thought to be particularly prone to gully erosion. We achieved this by using (i) light detection and ranging (lidar) technology (vertical accuracy < 0.15 m; spatial resolution 0.5 m) to observe topography on transects at eight selected sites within the study area, (ii) object-oriented classication to derive gully presence from lidar observations and (iii) a random forest to model the relationship between gully presence and a set of readily available explanatory variables (comprising soil, topography, and vegetation information; nest spatial resolution 25 m) and (iv) extrapolating the model to unsampled locations. Cross-validation indicated that the predictive ability of the model was modest, with an average area under the receiver operating characteristic curve of 0.62 (where 1.0 is a perfect model and 0.5 is no better than chance). The greatest risk of gully presence was associated with areas of large topographic variation, and where, coincidentally, there was relatively little long-term vegetation cover. Ultimately, however, we acknowledge that the quality of the map is limited by the small area of observed lidar data relative to the study area, the relatively coarse spatial resolution of the explanatory variables and the possibility that gully presence is the result of different processes at different locations.

Introduction
Soil erosion by water can have dramatic and far-reaching consequences. This is especially true in central Queensland, Australia, where east-owing rivers carry sediment to the southern reaches of the Great Barrier Reef lagoon (Figure 1). The adverse effect of sediment on the water quality of the World Heritage-listed marine ecosystem stimulates environmental and political interest. A plausible hypothesis for the source of at least some of the sediment is gully erosion (Prosser et al., 2001; Rustomji, 2006). We follow Hughes et al. (2001) in dening a gully as a steep-walled, poorly vegetated incision in the landscape with a catchment area of 10 km2 or less. This study was motivated by the Australian and Queensland governments political and environmental imperatives (Reef Water Quality Protection Plan Secretariat (2009) and the Delbessie Agreement (DERM, 2007)) to improve water quality and land condition in the catchments that drain into the Great Barrier Reef
Correspondence: A. H. Eustace. E-mail: Alisa.Eustace@qld.gov.au Received 29 September 2009; revised version accepted 28 February 2011

lagoon. These imperatives emphasize on-ground investment by land managers for the prevention or remediation of erosional features, including gullies. There is, therefore, a clear need for accurate, ne-scale mapping of where gullies occur in the landscape to (i) help target sites for investment, (ii) assist post-investment monitoring and (iii) quantify the contribution of gullies to the sediment budget. Hughes et al. (2001) predicted that the Nogoa River catchment in central Queensland had the largest gully density (line-length of gullies per unit area) of anywhere in Australia. Unfortunately, their mapping was not at a spatial scale suitable to address these imperatives. They also acknowledged that their model was most uncertain in central Queensland, because of a lack of data. With these results in mind, we targeted the Nogoa catchment and the surrounding area for an investigation of where gullies occur. We use the term gully presence to describe a binomial variable, determined on a ne grid over the surface of the land, coded as Gully at locations where incised areas occur, and as Non-gully elsewhere. To map gully presence is an ambitious undertaking, particularly when the area of interest is large. In such a case, the conventional approach by expert interpretation of ne-resolution

2011 The Authors Journal compilation 2011 British Society of Soil Science

431

432 A. H. Eustace et al.


relation to a set of readily available environmental attributes. McBratney et al. (2003) describe the general framework by which this might proceed: (i) a set of explanatory variables (the environmental attributes) are assembled from, for example, a database of historical spatial information, (ii) the variables are sampled to correspond to the locations of the observed response variable (gully presence), and fed into an empirical statistical model and (iii) the model is used to predict the value of the response variable at unsampled locations. The procedure might not create a map as accurate as a conventional assessment of gully presence but has the advantage of generating a quantitative estimate of uncertainty (McBratney et al., 2003). For gully presence, the predictions of (iii) are arguably more useful if presented as the probability of occurrence as a risk map. For a statistical model of gully presence, spatial information on topography and vegetation will be essential, in line with the denition of a gully proffered above. The susceptibility of soil to erosion will also be affected by its inherent physical, chemical and biological attributes (Lal et al., 1999), although Lentz et al. (1993) showed that such relationships are likely to be sitespecic. There is a generalization that dispersivity of soil (measured by the exchangeable sodium percentage and/or the sodium adsorption ratio) has a strong inuence on erosion (Rienks et al., 2000; Faulkner et al., 2003). Of the previous studies that have used statistical modelling to characterize the spatial distribution of gully attributes (Table 1), all but one considered topographic information, while information related to soil, hydrology and vegetation cover have only been used occasionally.

N 0
100

Kilometres 200
Isaac

G a re rie ar tB

22S

rR ee fL ag oo n

4 3
EMERALD

Mackenzie

Fitzroy

Latitude

24S

2
Nogoa

5 6
Comet

1 7

Dawson

26S

146E

148E

150E

Longitude

Figure 1 Six catchments comprising the Fitzroy Basin (Fitzroy, Isaac, MacKenzie, Dawson, Comet and Nogoa); the boundaries of each are shown as bold grey lines. The main drainage line in each catchment is shown as a black line. The study area is the western portion of the basin, bounded by the black box. The locations of the eight x-congured sites inside the study area are labelled (NB not drawn to scale). Each site contains two lidar transects. Inset: the location of the Fitzroy Basin relative to Queensland (the grey polygon) and to Australia.

imagery followed by extensive eld validation might not be an efcient use of resources. A more efcient approach, which we follow here, would be to map gully presence according to its

Table 1 Studies that have used statistical modelling to characterize the spatial distribution of gully attributes Study Meyer & Martnez-Casasnovas (1999) Hughes et al. (2001) Country (area of interest) Spain (two catchments, 2500 ha each) Australia (continent) Response variable Presence Explanatory variablesa T, S, L, H Model Logistic regression Accuracy Overall accuracy 85%.

Density/mm2

T, S, L, C, G, V

Piece-wise regression

Martnez-Casasnovas et al. (2004) Hyde et al. (2006)

Spain (60 ha) USA (three watersheds; 8750 ha, 2600 and 2448 ha) Lebanon (67 600 ha) Brazil (5200 ha) Belgium (1329 ha)

Presence of sidewall erosion Rejuvenation

T, H, B T, H, B

Logistic regression Logistic regression

Correlation of predicted with observed ranged from r = 0.43 to 0.83. Model accounted for 87% of the variation. Overall accuracy 78%.

Bou Kheir et al. (2007) Vrieling et al. (2007) Vanwalleghem et al. (2008) Guti errez et al. (2009)

Distribution and size Presence Presence

T, S, H, G ASTER satellite imagery T, S, P, height above sea-level T, V, C

Tree-based models Maximum likelihood classier Logistic regression

Best model explained 80% of variation in gully size. Best overall accuracy 75%. Overall accuracy 77%.

Spain (54 farms, each of at least 100 ha)

Presence

Multivariate adaptive regression splines (MARS)

Areas under the ROC curves between 0.75 and 0.98.

Many individual variables were used, but they can be grouped as: B, basin metrics; C, climate; G, geology; H, hydrology; L, land-use; P, proximity metrics; S, soil; T, topography; V, vegetation cover. 2011 The Authors Journal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441

A risk map for gully locations 433

A potential barrier to developing a statistical model of gully presence concerns the response variable itself, because obtaining adequate spatial information on gully presence is a difcult task. One possibility, which we follow, is to gather ne-resolution topographic information at various locations about the landscape. Light detection and ranging (lidar) technology (Petrie & Toth, 2009) is a source of detailed topographic information, and has been used successfully to quantify change in the landscape caused by erosion (Ritchie, 1995; Thoma et al., 2005). Lidar is costly, and so is generally used for relatively small areas only; however, the data that it provides (typically accurate to <0.25 m laterally and <0.15 m vertically) will be adequate to characterize topographic variation caused by gullies.

Aims
Our aim was to create a risk map for gully presence for a region of central Queensland, Australia, identied as a potential hot-spot of soil erosion. We intended to do this by (i) deducing gully presence from intensive local lidar studies, (ii) creating a statistical model that relates gully presence in the lidar-surveyed areas to a set of readily available explanatory variables and (iii) extrapolating the probability-based predictions of the model to the extent of the study area.

Methods
Study region
Our study area is a 4.86 106 -ha region in central Queensland, comprising almost all of the Nogoa River and Comet River catchments, and about half of the MacKenzie River catchment (Figure 1). These three catchments drain the western portion of the Fitzroy Basin, the largest east-owing river system in Australia. The study area has a subtropical climate, and receives about 600 mm year1 of rain. Generally, about half of this falls during the summer months. Native vegetation is predominantly Brigalow (Acacia harpophylla F. Muell. ex Benth.) and woodlands of Eucalyptus species. These species have been cleared extensively since European settlement such that, now, 78% of the study area is devoted to grazing (Rowland et al., 2006). The dominant Orders of the Australian Soil Classication (Isbell, 1996) are Vertosols and Sodosols (32 and 44% of the area, respectively). The sodicity of the latter order has, in sloping areas where vegetation has been disturbed, resulted in often spectacular erosional features (Hubble & Isbell, 1983).

Delineation of gullies
Lidar is a source of nely detailed topographic information, suitable for the study of gullies. We selected eight sites within the study area where aircraft-mounted lidar observations would be made. The sites were selected by striking an informal balance between geographical spread, prior knowledge about gully hot-spots, and budgetary constraints. Ninety per cent of the

lidar-observed area was classied as grazing according to a Fitzroy Basin land-use map (Rowland et al., 2006); the remainder was associated with nature conservation. Five of the sites were observed on 35 February 2007, with an Optech ALS ALTM 3100 Enhanced Accuracy lidar scanner (Optech International Inc., Kiln, MS, USA). The altitude of the sensor was approximately 850 m, which gives a vertical accuracy of <0.10 m to the observations (excluding GPS error) (Petrie & Toth, 2009). The remaining three sites were observed on 27 July 2007, using a Leica ALS50 scanner (Leica Geosystems AG, Heerbrugg St Gallen, Switzerland), at an altitude of approximately 2000 m. The vertical accuracy of this sensor is <0.15 m (Petrie & Toth, 2009). For each site, two lidar transects were acquired in an x conguration (Figure 1), with an average point density of 3.3 points per 1 m2 . There was a total of 16 transects across the eight sites. The x conguration was used to enable future research into the co-registration of repeated lidar acquisitions within the overlapping area. The dimension of each transect was 5000 m (the length of the ight path) by 250300 m (the swath width varied according to the sensor used). Data were supplied from the contractors in a point-cloud conguration that consisted of lateral and vertical coordinates (the latter in metres above sea-level) and the backscatter intensity (i.e. the relative strength of the returning lidar signal). A classication of the point-cloud as either ground or above-ground was also provided, which separated Earths surface from standing vegetation. The ground classication of the 16 point-cloud les was converted to digital elevation models (DEMs) using an inversedistance-weighted interpolation algorithm. The backscatter intensities were processed with nearest-neighbour interpolation. The DEMs and backscatter intensities were stored in raster format with 0.5-m pixels. We aimed to delineate gullies from the lidar-derived DEMs to serve as the response variable in a statistical model. To do this, object-oriented classication (Baatz & Sch ape, 2000; Brennan & Webster, 2006) was identied as an appropriate tool, because of the characteristic shapes of gullies, and their afnity for certain parts of the landscape. An additional advantage of object-oriented classication is that it has the potential to semi-automate the laborious, and somewhat subjective, process of manual delineation. Object-oriented classication uses multiresolution segmentation (MRS) (Baatz & Sch ape, 2000; Deniens, 2006) to classify an image (or set of images) into regions of relatively homogenous pixels, known as image-objects, on the basis of both spectral and spatial characteristics. This is distinct from a conventional per pixel classication, which is solely spectral (Benz et al., 2004). We used the MRS algorithm of Deniens 5.0 object-oriented classication software (Deniens, 2006) to segment the lidar information into two classes, Gully and Non-gully. For each of the 16 lidar transects across the eight sites, the images input to the classication algorithm were (i) the DEM, (ii) backscatter intensity, (iii) slope, calculated from the DEM, (iv) local standard deviation of the DEM in a 3-pixel 3-pixel moving window and (v) as for (iv), but using slope. The MRS requires, as input, user-dened shape and colour factors to control the spectral and

2011 The Authors Journal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441

434 A. H. Eustace et al.


spatial heterogeneity thresholds for the image-objects (Brennan & Webster, 2006). The heterogeneity of the pixels within an imageobject is minimized according to a user-dened scale factor. Following a process of trial-and-error, we used a scale factor of 50, and shape and colour factors of 0.5 each to segment the images. From the properties of the image-objects, we created a set of rules that allocated an image-object to the Gully class (Table 2a). We found that these rules tended to over-allocate the occurrence of Gully image-objects, so additional rules were added to reallocate misclassied Gully image-objects to Non-gully (Table 2b). We applied a visual check to the polygons to ensure that water-holding bodies (creeks and rivers) had not been allocated to Gully. This check was informed by the backscatter intensity, Quickbird satellite imagery (DigitalGlobe Incorporated, 2010) associated with each transect (0.6-m resolution, panchromatic-sharpened), and a Queensland-wide map of the drainage network (at 1:250 000 scale). As a further check on the quality of the object-oriented classication, we asked an independent expert to delineate the centre-line of the gullies in each of the 16 transects, using only the Quickbird imagery as a guide. We produced a gully extent image by merging the Gully/ Non-gully image-objects into a binary raster with 0.5-m pixels for each of the eight sites. An intermediate variable, gully depth, was obtained for each transect to remove potential spurious lidar measurements within the vertical error range of the lidar observations. Gully depth was calculated by linear interpolation of the DEM values associated with Non-gully at the locations of the Gully class, then subtracting the Non-gully DEM. Considering the difference in the spatial resolution of the lidar data and the explanatory variables (see below; 0.5-m pixels versus, at best, 25m pixels), we aggregated (by arithmetic averaging) gully depth to the 25-m pixels of Landsat imagery. From the image of aggregated gully depth (z) we derived an image of gully presence, coded thematically as Gully at pixels where z > 0.15 m and Non-gully
Table 2 The rules required to classify lidar image-objects as either gully or non-gully categories Operation (a) Allocate to gully Rule Mean slope 15 Mean standard deviation of DEM (3 3 window) 50 (m2 ) Mean standard deviation of slope 6 Length of longest edge of polygon 18 m Mean standard deviation of slope 7 Polygon rectangular t 0.9 (proportion between 0 and 1) Polygon length-to-width ratio 1.5 and rectangular t 0.9

where z = 0 m. Pixels where 0 m < z 0.15 m were excluded from further analysis to avoid confusion with possible lidar inaccuracies. The nal Gully observations were analysed further at a 25-m scale for each of the eight sites.

Explanatory variables
From the digital archives of the Queensland Government we gathered a set of 17 ancillary variables that might plausibly relate to gully presence (Table 3). This information covered the extent of the study area, and related aspects of soil, topography and vegetation. Explanatory variables originating in polygon formats were converted into 25-m pixel rasters. Differences in resolution were resolved by using a near-neighbour algorithm to resample all the explanatory variables to the same grid. Continuous soil attributes were retrieved from an archive of nationwide soil information (CSIRO, 2006; Brough et al., 2006). The attributes were stored as interpolated surfaces in a polygon-based format, unfortunately without estimation variances. This was not ideal, but we retained the information because attributes such as dispersivity and texture
Table 3 Ancillary spatial information for the study area, used as explanatory variables for modelling gully presence Variablea Soil Clay content CEC ESP Organic carbon Soil order Salinity hazard Topographic DSM Local slope (3 3 window) Drainage network Vegetation Bare-ground index Label Units Comment

Clay_a Clay_b Cec_a Cec_b Xna_a Xna_b oc_a oc_b Ord Sal

% % cmol kg1 cmol kg1 % % % %

A horizon B horizon A horizon B horizon A horizon B horizon A horizon B horizon 10 classesb 3 classes (low, medium, high) See Tickle et al. (2009) Variance Maximum Minimum 2 classes (in, out) Mean, 19882006 Standard deviation, 19882006

Dsm slo_vr slo_mx slo_mn Drn bgi_me bgi_sd

m ( )2

(b) Reallocate gully to non-gully

key: CEC, cation exchange capacity; DSM, digital surface model; ESP, exchangeable sodium percentage. b The orders (followed by the per cent coverage of the study area): Calcarosols (2%), Dermosols (5%), Ferrosols (4%), Kandosols (<1%), Kurosols (2%), Organosols (<1%), Rudosols (8%), Sodosols (44%), Tenosols (1%) and Vertosols (32%). () Indicates that the variable was either dimensionless or categorical.

a Acronym

2011 The Authors Journal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441

A risk map for gully locations 435

are known to affect the susceptibility of a site to erosion. We also retrieved a map of the soil order from the Australian Soil Classication (Isbell, 1996), and a map of the perceived salinity hazard for the study area. Topographic information was derived from a one-second (approximately 30-m spatial resolution) digital surface model (DSM), obtained from the Space Shuttle Radar Topography Mission (Farr et al., 2007; Tickle et al., 2009). The DSM was corrected for striping artefacts that affected the original signal; however, the coverage for Australia has not yet been corrected for vegetation height. From the DSM we computed slope, to which we applied a 3 3-pixel moving window that derived the local variance, minima and maxima of the slope. The remaining topographic information was a binary variable that determined whether a location fell within 25 m of the drainage network, derived from a map of the stream network of Queensland (1:250 000 scale). Vegetation information was based on a calibrated empirical model, known as the Bare-Ground Index (BGI) (Scarth et al., 2006), applied to Landsat imagery. BGI is the proportion of ground not covered by vegetation (living or dead) when viewed vertically downwards from a standing position on the ground. The BGI model is applied to the Landsat imagery on a per-pixel basis, but only at those pixels considered to have less than a threshold proportion of tree cover; we used <0.15 as a threshold. Estimates of tree cover are determined by a calibrated empirical model of foliage projective cover (FPC) (Armston et al., 2009). Landsat imagery, from two different sensors (Thematic Mapper and Enhanced Thematic Mapper+), is the basis of the Queensland Governments statewide vegetation-monitoring programme. Imagery for the entire state is acquired once per year, mostly during the dry season (May to October). Details of the geometric and radiometric corrections applied to the imagery can be found in DERM (2008). As part of the rectication process, the images are resampled to 25-m pixels from the original 30 m. Once all the gully observations and the explanatory variables were rasterized and aligned on the same scale (25 m) and grid, we intersected the gully presence map (25 m) with the set of explanatory variables.

Random forest as a statistical model of gully presence


Our choice of statistical model was the random forest (Breiman, 2001a). This has become a popular tool for the discovery of useful, otherwise hidden, patterns within large volumes of digital data. Soil scientists have recently applied random forests to the problem of attribute mapping (Grimm et al., 2008; Kuhnert et al., 2009). A random forest is an ensemble classication tree. A classication tree (Breiman et al., 1984) takes an n p array of explanatory variables (where n is the number of data and p is the number of variables) and splits it into two subsets that increase the withinclass homogeneity of a categorical response variable. The rule that denes a split is called a node. Each subset is then split in two, and so on until a predened threshold of homogeneity is reached. The tree is then usually pruned to a manageable number of nodes

according to the users requirements. Under the random forest many trees are grown, not just one; however, there are three crucial differences in the computation of a single tree: (i) each tree is based on a bootstrap sample of n, (ii) at each node a random selection of the p explanatory variables is used to nd the best split and (iii) each tree is left unpruned (Liaw & Wiener, 2002). For any combination of values taken by the explanatory variables, the random forest will return one prediction from each tree. The multitude of alternatives describes a probability distribution for a single prediction, which is said to be bootstrap-aggregated or bagged. It has been found that the bagged predictions of a random forest are relatively robust compared with the predictions of other classiers (Breiman, 2001b; Prasad et al., 2006; Moriondo et al., 2008). Compared with logistic regression, the random forest has two favourable characteristics: (i) it copes with non-linear relationships and (ii) it deals implicitly with interactions between explanatory variables. A disadvantage of the random forest is that it is not possible to make a formal inference about the marginal effects of explanatory variables. Other disadvantages are that it is easy for a user to include an inappropriate explanatory variable, leading to an over-parameterized model. We used the library randomForest (Liaw & Wiener, 2002), written for the R statistical software (R Development Core Team, 2009), to relate the explanatory variables to gully presence. For a constant number of explanatory variables, the complexity of a random forest is controlled by three parameters: (i) the number of trees in the forest (t ), (ii) the number of randomly selected explanatory variables used to construct each tree (m) and (iii) the minimum number of cases needed for a terminal node in a tree (q ). The default values used by randomForest are t = 500, m = p and q = 1, when applied to a categorical response variable such as gully presence. We tted two random forests to the data: the rst used t = 500, and the second t = 100; the values of m and q were held constant at their respective defaults. Two forests were made because we wished to see the effect of a reduced forest on the predictive accuracy of the model, particularly because we were concerned about the length of time it would take the larger forest to extrapolate predictions across the study area. The goodnessof-t of a random forest, when applied to a categorical variable, is given by the out-of-bag error rate: an individual tree is used to predict the response variable at the rows excluded from the bootstrap sample used to make the tree; the amount of misclassication is then averaged over all trees. For the 500-tree forest only, we assessed the relative importance of each explanatory variable (Liaw & Wiener, 2002) to the classication of gully presence. The randomForest library does this by quantifying how the outof-bag error changes when the values of an explanatory variable excluded from the bootstrap sample are shufed randomly; the variable that has the greatest importance to a model is that which, upon shufing, increases the out-of-bag error most. Daz-Uriate & Alvarez de Andr es (2006) suggest that the importance of variables, as calculated by the random forest, are not robust, and should consequently be calculated for the largest possible number of trees.

2011 The Authors Journal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441

436 A. H. Eustace et al. Model validation and cross-validation


We excluded pixels where any of the explanatory variables were associated with null values. There were n pixels with which to build and validate a model of gully presence. The model validation proceeded in two stages. The rst stage was a standard validation, where the observations were partitioned randomly into a training dataset (66%) and a validation dataset (33%). The training sample was used to train model M0 , which was then used to predict the probability of Gully at the locations associated with the validation dataset. The second stage was a modied crossvalidation. We indexed the n observations according to the site (out of eight) to which they belonged and then used all observations in each site in turn to validate predictions from a model formed from the remaining seven sites. It was possible to observe a class within a categorical variable that was not included in the training of the random forest model during cross-validation because of the unique characteristics of some sites and the spatial distance between some sites. In these rare cases, we switched the untrained categorical class with one of those used to train the model, selected at random. The soil variable ord was the only categorical variable that required this class-switching method to ensure predictions of the model could be carried out at all validation locations, despite some ord values that did not occur in the sampled training data appearing in the validation data. This was a conservative yet pragmatic approach when our limited sample area is considered. Validation and cross-validation served different purposes. The former allowed us to assess the performance of a random forest when the model was extrapolated at locations relatively close to where the model was trained. Cross-validation, on the other hand, was needed to address concerns about the relatively small area covered by the transects, and their limited geographical spread. As an example of how the two methods differed, the mean minimum distance between the training sample and the validation sample was 25.7 m on the ground, that is, just more than one pixel; however, for cross-validation the mean minimum distance was 4.5 104 m. The correspondence of observed with predicted probability of gully presence was assessed with a receiver operating characteristic (ROC) curve (Zou et al., 2007). Prior to modelling, all n locations had been identied as either Gully or Non-gully. On the other hand, the random forest returns a probability that a particular location belongs to Gully. At a particular probability threshold, all the predictions greater than the threshold were allocated to Gully and vice versa ; tabulation of the results reveals the proportion of correctly identied Gully locations (a true-positive prediction), as well as the proportion of Non-gully locations wrongly allocated as Gully (a false-positive prediction). The proportions change as the threshold changes. A ROC curve summarizes the trade-off between true-positive and false-positive proportions. Ideally, the proportion of true positives should be 1.0 and that of false positives should be 0. Thus, when plotting true positives (ordinate) against false positives (abscissa), the ROC curve for a good model will align closely with the top-left corner; a ROC curve that lies on the 1:1 line implies that the model is no better than random chance. The area under the ROC curve, A, is a useful metric to quantify the predictive accuracy of a model of a binomial variable, where a value of 1.0 indicates perfect agreement, and a value of 0.5 indicates no agreement. We computed ROC curves with the ROCR library (Sing et al., 2005) written for the R statistical software (R Development Core Team, 2009).

Model extrapolation
Following appraisal of the validation and cross-validation results, we used all the data to make a nal model, Mf , to use for extrapolation across the grazing areas of the study site. If, during the process of extrapolation, a class of a categorical variable that was not included in Mf was found, we switched this class with one of those available in the model, selected at random.

Results
Delineation of gullies
Gullies were visually apparent in the 0.6-m resolution Quickbird images associated with the lidar transects (Figure 2a,e). The rules used to classify image-objects as Gully (Table 2a) tended to over-allocate the class shown as the red areas of Figure 2(b,f). The over-allocated areas included hills with variable slopes, some roads and infestations of Currant Bush (Carissa ovata R. Br.), a low-lying woody weed. The C. ovata heights were included in the original lidar classication of Ground because the sprawling, dense structure of the shrub could not be penetrated by the lidar signal. These artefacts were removed by using the rules in Table 2(b), which reduced greatly the gully-affected area (Figure 2c,g) such that it conformed to our perception of the features in the Quickbird images. For comparative purposes, the centre-lines of the gullies delineated by the independent expert are also shown (Figure 2d,h). There was good agreement between the contrasting methods. By her own admission the experts linework was conservative, because of obstruction by trees or, in some cases, cloud. The lidar-based method, which computes the area of gullies, could be useful for future studies of how gullies change through time. Overall, Figure 2 gave us condence that lidar, coupled to the rules devised for object-oriented classication (Table 2), adequately characterized gully presence.

Modelling
Following the removal of null values from the explanatory variables, n = 21 312 pixels remained for modelling. The out-of-bag error of the random forest M0 was 7.5% for both t = 500 and t = 100. This implied that in more than 90% of cases the models allocated a location correctly to Gully or Non-gully. However, this is a misleading result that reects the fact that about 90% of the pixels were Non-gully, which the models could predict with relatively good accuracy. The importance of each explanatory

2011 The Authors Journal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441

A risk map for gully locations 437

(a)
247'0"S

0.25

Kilometres 0.5

Table 4 The importance of each explanatory variable to gully presence, calculated by the random forest (t = 500)
(b)

Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Variable dsm bgi_me slo_mx bgi_sd slo_vr ord slo_mn sal oc_b oc_a clay_a xna_a xna_b cec_a clay_b cec_b drn

Importance 0.038 0.029 0.023 0.023 0.015 0.012 0.011 0.011 0.007 0.006 0.004 0.004 0.004 0.003 0.003 0.002 0.002

Latitude

(c)
247'20"S

(d)
14711'20"E 14711'40"E

Longitude (e)

Latitude

238'40"S

(f)

(g)

14724'20"E

14724'40"E

(h)

Longitude

Figure 2 Gully extents mapped using lidar and object-oriented classication: (a) a true-colour Quickbird image (0.6-m resolution, panchromaticsharpened), for part of one lidar transect, (b) the extent of gullies according to the allocation rules in Table 2(a), (c) the extent of gullies according to the reallocation rules in Table 2(b), and (d) the centre-line of gullies delineated by expert assessment of (a). Panels (eh) illustrate the same, but for part of another lidar transect.

variable in M0 (for t = 500) is shown in Table 4, ranked in order from most important to the least important. The ve most important variables in the model were the DSM, the BGI-related variables and the maximum slope and variance of the slope. This was consistent with the general notion that topography and vegetative cover determine the propensity of soil to erode. The soil order was the next most important variable, which supported the notion that some soil types are intrinsically more erodible than others. Rather than exchangeable sodium percentage or texture as expected, the most important individual soil attributes were the organic carbon of the topsoil and the subsoil. The least important variable was the map of the drainage network. The fact that the individual soil attributes were relatively unimportant suggests one of two possibilities: either soil attributes do not inuence gully formation in the study area, or the soil information, as held by the database, was not suited to our particular task. We suspect the latter, because the soil attributes are interpolated surfaces intended for use in models that operate at scales much coarser than 25-m pixels.

The ROC curves associated with the tted values of the random forest models and the validation and cross-validation predictions are shown in Figure 3. The ROC curves for the tted values of random forest M0 showed that, at both t = 500 and t = 100, the models predicted gully presence accurately (Figure 3a); the ROC curves are relatively close to the top-left corner of the plot, and the areas under each ROC curve were identical at A = 0.81. Similar results were seen for the validation data (Figure 3b). It was clear that, for predictive purposes, a 100-tree forest would sufce. The ROC curve for the cross-validation predictions (t = 100) of the eight sites is shown in Figure 3(c). The predictive ability of the random forest varied markedly across the study area, with the data in sites 1, 3 and 7 being predicted less well than those in other sites. Site 1 was unique in that its gullies were widespread rather than the localized incisions seen in other sites. This suggests that different processes determine gully presence at different locations. Sites 3 and 7 were associated mainly with minority soil orders rather than the dominant Sodosols and Vertosols (Table 3). During cross-validation, these minority classes were systematically excluded from the random forests. In locations where soil orders in the validation data did not exist in the training data, a soil order from the training data was randomly substituted to enable predictions at these locations. This was carried out on the basis that a less accurate prediction is better than no prediction. As ord was a relatively important variable (Table 4) the predictions at these locations were effectively random. The average area under the ROC curve for cross-validation was A = 0.62, which suggested that the model had a relatively weak ability to predict gully presence accurately over a large area. We contend, however, that the modied cross-validation procedure is likely to have under-estimated predictive ability. As the sample for each step in the modied cross-validation was based on a spatial region rather than a random sample of all the data, it is possible that the

239'0"S

2011 The Authors Journal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441

438 A. H. Eustace et al.


excluded data contained useful information about gully presence not found elsewhere. This is particularly the case for sites 1, 3 and 7. For t = 100, it is likely that the true area under the curve is somewhere between A = 0.62 and A = 0.80. The risk of gully presence predicted by Mf (t = 100) for the study area is shown in Figure 4. Only a minor proportion of the study area was affected by class-switching among the ord variable, certainly too small to make a visual impression on the map. In the insets of Figure 4, we have magnied selected areas to highlight their interesting features. Figure 4(b) highlights an area of dramatic topographic variation, known to have historically variable vegetative cover; the risk of gully presence in this area is relatively large. Figure 4(c) shows that the risk of a gully increases at the base of remnant volcanic plugs (the circular features), while the surrounding landscape has a relatively small risk of gully presence. The obvious discontinuities in the spatial pattern in Figure 4(d) are related to the boundaries of soil orders. These boundaries are particularly uncertain, and diminish the quality of the risk map at these locations.

(a) Fitted values 1.0 Mean true-positive rate 0.2 0.4 0.6 0.8

0.0

t A 500 0.81 100 0.81 0.0 0.2 0.4 0.6 0.8 Mean false-positive rate 1.0

(b) Validation 1.0 Mean true-positive rate 0.2 0.4 0.6 0.8

Discussion
We have shown that lidar and object-oriented classication characterizes gully presence (Figure 2) in a useful way. However, it is not practical or reasonable to acquire lidar information for the entire study area because of current costs. A viable alternative, however, is based on the premise that gully presence is determined by soil, topography and vegetation cover, which can be characterized through statistical modelling. We have not been able to consider important history-related variables that can trigger gully development such as tree clearing or animal stocking rates because such information was not readily available for the entire study area. Four studies (Meyer & Martnez-Casasnovas, 1999; Vrieling et al., 2007; Vanwalleghem et al., 2008; Guti errez et al., 2009) have adopted a similar approach for modelling gully presence and reported results with varying accuracies. The only study that used the area under the ROC for accuracy assessment was Guti errez et al. (2009). The performance of our model was worse than that reported in their study. Our model of gully presence for central Queensland had reasonable accuracy at locations near to training sites, but accuracy diminished as spatial distance from the training sites increased. There are three reasons for our modest result. First, the lidar information was concentrated in too few locations across the study site, with the x-congured transects effectively halving the amount of topographic information that might otherwise have been gained. Second, the soil-related explanatory variables were not suited to a mapping exercise at a scale as ne as 25-m pixels. Third, gullies may be caused by different processes at different locations. Lentz et al. (1993), found that, even in small study areas (<5 ha), rills had site-specic correlations with soil attributes. Our study area was much larger than any of those used by the four studies above, and it may have been unrealistic of us to think that gully presence in this region could be described by a global model.

0.0

t A 500 0.83 100 0.80 0.0 0.2 0.4 0.6 0.8 1.0

Mean false-positive rate (c) Cross-validation 1.0 Mean true-positive rate 0.6 0.8

Sites 1,3,7 0.4 0.2

0.0

(All t = 100; mean A = 0.62) 0.0 0.2 0.4 0.6 0.8 1.0

Mean false-positive rate


Figure 3 Receiver operating characteristic (ROC) curves for the random forest models of gully presence: (a) tted values of model M0 , for t = {500,100}, (b) values predicted by M0 at the validation locations, for t = {500,100}, and (c) values predicted by models M1,...,8 at the cross-validation locations, for t = 100. The area under the ROC curve is denoted A.

2011 The Authors Journal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441

A risk map for gully locations 439

(a) Kilometres 120

0 23S

15 30

60

90

d Latitude 24S b

25S

147E (b) (c)

148E Longitude
0 1 2 4 6 8

149E (d)

Kilometres

Figure 4 (a) Risk map of gully presence. White represents locations either outside the study area or masked (because of water, tree cover or a non-grazing land-use). (b) Relatively large probabilities are found where there is a large variation in terrain and variable vegetation cover. (c) Volcanic plugs have relatively large probabilities around their bases. (d) Discontinuities in the surface signify a change in soil Order.

McBratney et al. (2003) proposed a framework for digital soil mapping, which we have tried to follow. They also foresaw potential problems, such as (i) missing, uninformative or circularlyderived explanatory variables, (ii) poor quality of soil information in databases, (iii) black box data-mining techniques and (iv) over-tting of the model. Each of these problems has, to some degree, inuenced our study: (i) and (ii) are the reality of digital soil mapping, where there is an innate urge to make as much use of existing data as possible; we encouraged the possibility of

(iii) and (iv) by electing to use a random forest. Breiman (2001b) argued that the predictive ability and the parsimony of a model are mutually exclusive concepts: simple models are undoubtedly easier to interpret but are less accurate. In our case, we considered robust prediction of gully presence to be more important than inference about the underlying mechanism of the process. Random forest is known to be a robust predictor (Breiman, 2001b; Prasad et al., 2006; Moriondo et al., 2008). Breiman (2001a) showed that a random forest does not overt the information in the sense that

2011 The Authors Journal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441

440 A. H. Eustace et al.


increasing innitely the number of trees will not change the model error. This may be so, but we argue that the complexity of a random forest should always concern the user. We have shown that a relatively small forest can predict as well as a larger forest. When there are many millions of predictions to make, as there were for our study site, the smaller forest will complete the task more efciently. While the accuracy of the gully risk map can only be regarded as being modest, we anticipate that it will be used by policymakers to identify areas for gully prevention and remediation. Furthermore, the risk map could conceivably be used by hydrological modellers interested in calculating sediment budgets for particular subcatchments. Additional lidar acquisitions about the Fitzroy Basin will enable us to update the model and increase the spatial extent of the risk map.
Landsat-5 TM and Landsat-7 ETM+ imagery. Journal of Applied Remote Sensing, 3, 335340. Baatz, M. & Sch ape, A. 2000. Multiresolution segmentation: an optimization approach for high quality multi-scale image segmentation. In: Andewandte Geographische Informationsverarbeitung, Volume XII (eds J. Strobl, T. Blaschke & G. Griesebner), pp. 1223. Wichmann-Verlag, Heidelberg. Benz, U.C., Hofmann, P., Willhauck, G., Lingenfelder, I. & Heynen, M. 2004. Multi-resolution, object-oriented fuzzy analysis of remote sensing data for GIS-ready information. ISPRS Journal of Photogrammetry & Remote Sensing, 58, 239258. Bou Kheir, R., Wilson, J. & Deng, Y. 2007. Use of terrain variables for mapping gully erosion susceptibility in Lebanon. Earth Surface Processes & Landforms, 32, 17701782. Breiman, L. 2001a. Random forests. Machine Learning, 45, 532. Breiman, L. 2001b. Statistical modelling: the two cultures. Statistical Science, 16, 199231. Breiman, L., Friedman, J.H., Olshen, R.A. & Stone, C.J. 1984. Classication and Regression Trees. Wadsworth, Belmont, CA. Brennan, R. & Webster, T.L. 2006. Object-oriented land cover classication of lidar-derived surfaces. Canadian Journal of Remote Sensing, 32, 162172. Brough, D.M., Claridge, J. & Grundy, M.J. 2006. Soil and Landscape Attributes: A Report on the Creation of a Soil and Landscape Information System for Queensland. Natural Resources, Mines & Water, Brisbane. QNRM06186. CSIRO Australia. 2006. ASRIS (Australian Soil Resource Information System) [WWW document]. URL http://www.asris.csiro.au/methods.html [accessed on 20 April 2010]. Deniens 2006. Deniens Professional 5 Reference Book. Version 5.0.6.1. Deniens AG, M unchen, Germany. DERM (Queensland Department of Environment & Resource Management) 2007. Delbessie Agreement [WWW document]. URL http://www. derm.qld.gov.au/land/state/rural_leasehold/pdf/agreement.pdf [accessed on 6 April 2010]. DERM (Queensland Department of Environment & Resource Management) 2008. Land Cover Change in Queensland 20072008 [WWW document]. URL http://www.derm.qld.gov.au/slats/pdf/slats_report_ and_regions_0708/slats_report07_08.pdf [accessed on 6 April 2010]. Daz-Uriate, R. & Alvarez de Andr es, S. 2006. Gene selection and classication of microarray data using random forest. BMC Bioinformatics, 7, 3. DigitalGlobe Incorporated 2010. DigitalGlobe Constellation: Quickbird Imaging Satellite [WWW document]. URL http://www.digitalglobe. com/digitalglobe2/le.php/784/QuickBird-DS-QB.pdf [accessed on 16 December 2010]. Farr, T.G., Rosen, P.A., Caro, E., Crippen, R., Duren, R., Hensley, S. et al. 2007. The shuttle radar topography mission. Reviews of Geophysics, 45, RG2004. Faulkner, H., Alexander, R. & Wilson, B.R. 2003. Changes to the dispersive characteristics of soils along an evolutionary slope sequence in the Vera badlands, southeast Spain: implications for site stabilisation. Catena, 50, 243254. Grimm, R., Behrens, T., M arker, M. & Elsenbeer, H. 2008. Soil organic carbon concentrations and stocks on Barro Colorado Island digital mapping using random forests analysis. Geoderma, 146, 102113. Guti errez, A.G., Schnabel, S. & Felicsimo, A.M. 2009. Modelling the occurrence of gullies in rangelands of southwest Spain. Earth Surface Processes & Landforms, 34, 18941902.

Conclusions
We have created a risk map of gully presence for our study area within central Queensland, Australia. This has been achieved by (i) using ne-resolution lidar to quantify local topography at eight sites in the study area, (ii) carrying out object-oriented classication to derive gully extent from the lidar observations, (iii) developing a random forest to model the relationship between gully presence and soil, topography and vegetation status and (iv) extrapolating the model across the study area at the scale of 25-m pixels. The predictive ability of the model was modest. The risk map of gully presence showed that there is a large probability of gully presence in areas of large variation in topography coincident with relatively low long-term vegetation cover. This agrees with our expectation of where gullies should occur. The quality of the map is constrained by the small area of lidar information collected relative to the study area, the relatively coarse spatial resolution of the explanatory variables and the possibility that gully presence is the result of different processes at different locations. The accuracy of the risk map of gully presence would improve with further lidar acquisitions. A ner-resolution, nationwide, bare-earth digital elevation model and improved soil mapping over the area of interest would also enhance the risk map.

Acknowledgements
This study was funded by the Fitzroy Basin Association and the Queensland Department of Environment and Resource Management (DERM). We have greatly appreciated the support of Christian Witte, Neil Flood, Ken Brook and Cameron Dougall as the study progressed. We thank Dan Tindall and Tessa Chamberlain for the comments on a draft version, and Rebecca Trevithick, DERMs expert gully-delineator.

References
Armston, J.D., Denham, R.J., Danaher, T.J., Scarth, P.F. & Mofet, T.N. 2009. Prediction and validation of foliage projective cover from

2011 The Authors Journal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441

A risk map for gully locations 441

Hubble, G. & Isbell, R.F. 1983. Eastern highlands. In Soils: An Australian Viewpoint, (eds Lenaghan, J. & Katsntoni, G.), pp. 219230. CSIRO, Melbourne, Australia/Academic Press, London. Hughes, A.O., Prosser, I.P., Stevenson, J., Scott, A., Lu, H., Gallant, J. et al. 2001. Gully Erosion Mapping for the National Land and Water Resources Audit . Technical Report 26/01, CSIRO Land and Water, Canberra [WWW document]. URL http://www.clw.csiro.au/publications/ technical2001/tr26-01.pdf [accessed on 6 April 2010]. Hyde, K., Woods, S.W. & Donahue, J. 2006. Predicting gully rejuvenation after wildre using remotely sensed burn severity data. Geomorphology, 86, 496511. Isbell, R.F. 1996. The Australian Soil Classication. CSIRO Publishing, Melbourne. Kuhnert, P.M., KinseyHenderson, A., Bartley, R. & Herr, A. 2009. Incorporating uncertainty in gully erosion calculations using the random forests modelling approach. Environmetrics, 21, 493509. Lal, R., Mokma, D. & Lowery, B. 1999. Relation between soil quality and erosion. In: Soil Quality and Soil Erosion (ed. R. Lal), pp. 237258. Soil and Water Conservation Society/CRC Press, Boca Raton, FL. Lentz, R.D., Dowdy, R.H. & Rust, R.H. 1993. Soil property patterns and topographic parameters associated with ephemeral gully erosion. Journal of Soil & Water Conservation, 48, 354360. Liaw, A. & Wiener, M. 2002. Classication and regression by random Forest. R News, 2, 1822 [WWW document]. URL http://cran.rproject. org/doc/Rnews/Rnews_20023.pdf [accessed on 7 April 2010]. MartnezCasasnovas, J.A., Ramos, M.C. & Poesen, J. 2004. Assessment of sidewall erosion in large gullies using multitemporal DEMs and logistic regression analysis. Geomorphology, 58, 305321. McBratney, A.B., Mendonc a Santos, M.L. & Minasny, B. 2003. On digital soil mapping. Geoderma, 117, 352. Meyer, A. & MartnezCasasnovas, J.A. 1999. Prediction of existing gully erosion in vineyard parcels of NE Spain: a logistic modelling approach. Soil Tillage & Research, 50, 319331. Moriondo, M., Stefanini, F.M. & Bindi, M. 2008. Reproduction of olive tree habitat suitability for global change impact assessment. Ecological Modelling, 218, 95109. Petrie, G. & Toth, C.K. 2009. Airborne and spaceborne laser prolers and scanners. In: Topographic Laser Ranging and Scanning (eds J. Shan & C.K. Toth), pp. 2985. CRC Press, Boca Raton, FL. Prasad, A.M., Iverson, L.R. & Liaw, A. 2006. Newer classication and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems, 9, 181199. Prosser, I.P., Rutherfurd, I.D., Olley, J.M., Young, W.J., Wallbrink, P.J. & Moran, C.J. 2001. Largescale patterns of erosion and sediment transport in river networks, with examples from Australia. Marine & Freshwater Research, 52, 8199.

R Development Core Team 2009. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna [WWW document]. URL http://www.Rproject.org [accessed on 7 April 2010]. (ISBN 3900051070). Reef Water Quality Protection Plan Secretariat 2009. Reef Water Quality Protection Plan [WWW document]. URL http://www.reefplan.qld.gov. au/library/pdf/reefplan2009.pdf [accessed on 6 April 2010]. Rienks, S.M., Botha, G.A. & Hughes, J.C. 2000. Some physical and chemical properties of sediments exposed in a gully (donga) in northern KwaZuluNatal, South Africa and their relationship to the erodibility of the colluvial layers. Catena, 39, 1131. Ritchie, J.C. 1995. Airborne laser altitude measurements of landscape topography. Remote Sensing of Environment, 53, 9196. Rowland, T., van den Berg, D., Denham, R., ODonnell, T. & Witte, C. 2006. Land Use Change Mapping from 1999 to 2004 for the Fitzroy River Catchment. Queensland Department of Natural Resources & Water, Brisbane. Rustomji, P. 2006. Analysis of gully dimensions and sediment texture from southeast Australia for catchment sediment budgeting. Catena, 67, 119127. Scarth, P., Byrne, M., Danaher, T., Henry, B., Hassett, R., Carter, J. et al. 2006. State of the paddock: monitoring condition and trend in groundcover across Queensland. In: Proceedings of the 13th Australasian Remote Sensing and Photogrammetry Conference: Earth observation From Science to Solutions. 2024 November 2006, Canberra. Sing, T., Sander, O., Beerenwinkel, N. & Lengauer, T. 2005. ROCR: visualizing classier performance in R. Bioinformatics, 21, 39403941. Thoma, D.P., Gupta, S.C., Bauer, M.E. & Kirchoff, C.E. 2005. Airborne laser scanning for riverbank erosion assessment. Remote Sensing of Environment, 95, 493501. Tickle, P., Wilson, N., Inskeep, C., Gallant, J., Dowling, T. & Read, A. 2009. Digital Surface Model (DSM) & Digital Elevation Model (DEM) (1 Second SRTM Derived): User Guide, Version 1.0. Geoscience Australia, Canberra. Vanwalleghem, T., Van Den Eeckhaut, M., Poesen, J., Govers, G. & Deckers, J. 2008. Spatial analysis of factors controlling the presence of closed depressions and gullies under forest: application of rare event logistic regression. Geomorphology, 95, 504517. Vrieling, A., Rodrigues, S.C., Bartholomeus, H. & Sterk, G. 2007. Automatic identication of erosion gullies with ASTER imagery in the Brazilian Cerrados. International Journal of Remote Sensing, 28, 27232738. Zou, K.H., OMalley, A.J. & Mauri, L. 2007. Receiveroperating characteristic analysis for evaluating diagnostic tests and predictive models. Circulation, 115, 654657.

2011 The Authors Journal compilation 2011 British Society of Soil Science, European Journal of Soil Science, 62, 431441

Potrebbero piacerti anche