Soil Carbon Effects

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/271196721
A comparative assessment of support vector regression, artificial neural

networks, and random forests for predicting and mapping soil organic carbon
stocks across an Afromontane la...
Article in Ecological Indicators · May 2015

DOI: 10.1016/j.ecolind.2014.12.028
CITATIONS READS
184 1,597
4 authors, including:
Kennedy Were Dieu Tien Bui

Kenya Agricultural Research Institute University of South-Eastern Norway
12 PUBLICATIONS 253 CITATIONS 222 PUBLICATIONS 5,117 CITATIONS
SEE PROFILE SEE PROFILE
Øystein B. Dick
Norwegian University of Life Sciences (NMBU)
19 PUBLICATIONS 1,256 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Research and Development of New High-Accuracy Models for Detection, Prediction, and Risk Assessment of Landside using Geospatial data, Artificial Intelligence and
Optimization algorithms View project
Special Issue "Advances in Sensors and Intelligent Techniques for Natural Hazard Modeling and Management" View project
All content following this page was uploaded by Kennedy Were on 26 November 2018.
The user has requested enhancement of the downloaded file.

Ecological Indicators 52 (2015) 394–403
Contents lists available at ScienceDirect
Ecological Indicators
journal homepage: www.elsevier.com/locate/ecolind
A comparative assessment of support vector regression, artificial

neural networks, and random forests for predicting and mapping soil
organic carbon stocks across an Afromontane landscape
Kennedy Were a,b,∗ , Dieu Tien Bui c , Øystein B. Dick a , Bal Ram Singh d
a
Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, P.O. Box 5003, NO-1432 Ås, Norway
b
Kenya Agricultural Research Institute, Kenya Soil Survey, P.O. Box 14733-00800, Nairobi, Kenya
c
Department of Economics and Computer Sciences, Faculty of Arts and Sciences, Telemark University College, NO-3800 Bø, Norway
d
Department of Environmental Sciences, Norwegian University of Life Sciences, P.O. Box 5003, NO-1432 Ås, Norway
a r t i c l e i n f o a b s t r a c t
Article history: Soil organic carbon (SOC) is a key indicator of ecosystem health, with a great potential to affect climate
Received 28 June 2014 change. This study aimed to develop, evaluate, and compare the performance of support vector regres-
Received in revised form sion (SVR), artificial neural network (ANN), and random forest (RF) models in predicting and mapping
29 November 2014
SOC stocks in the Eastern Mau Forest Reserve, Kenya. Auxiliary data, including soil sampling, climatic,
Accepted 24 December 2014
topographic, and remotely-sensed data were used for model calibration. The calibrated models were
applied to create prediction maps of SOC stocks that were validated using independent testing data.
Keywords:
The results showed that the models overestimated SOC stocks. Random forest model with a mean error
Random forests
Artificial neural networks
(ME) of −6.5 Mg C ha−1 had the highest tendency for overestimation, while SVR model with an ME of
Support vector regression −4.4 Mg C ha−1 had the lowest tendency. Support vector regression model also had the lowest root mean
Soil organic carbon squared error (RMSE) and the highest R2 values (14.9 Mg C ha−1 and 0.6, respectively); hence, it was the
Digital soil mapping best method to predict SOC stocks. Artificial neural network predictions followed closely with RMSE, ME,
Eastern Mau and R2 values of 15.5, −4.7, and 0.6, respectively. The three prediction maps broadly depicted similar
Kenya spatial patterns of SOC stocks, with an increasing gradient of SOC stocks from east to west. The highest
stocks were on the forest-dominated western and north-western parts, while the lowest stocks were on
the cropland-dominated eastern part. The most important variable for explaining the observed spatial
patterns of SOC stocks was total nitrogen concentration. Based on the close performance of SVR and ANN
models, we proposed that both models should be calibrated, and then the best result applied for spatial
prediction of target soil properties in other contexts.
© 2014 Elsevier Ltd. All rights reserved.
1. Introduction food security, and sustainable development. Soil organic carbon

(SOC), which is the major constituent of SOM, determines the soil’s
Soils sustain life on Earth by delivering various ecosystem ser- physical, chemical, and biological properties. It maintains soil qual-
vices. For example, they are essential for producing food, fibre, fuel, ity by supplying nutrients, enhancing cation exchange capacity,
and raw materials, as well as for maintaining the climatic and ter- supporting biodiversity, and improving aggregation and water-
restrial systems (Chen et al., 2002). The rapid land use-land cover holding capacity (Bationo et al., 2007). Depletion of SOM occurs
changes (LULCC), especially conversion of natural ecosystems to because of frequent tillage and other disturbances, which disinte-
agro-ecosystems, is straining the world’s soils. Agricultural land grate the aggregates and alter aeration, moisture, and temperature
uses modify the soil’s physical, chemical, and biological properties conditions in the soil. This accelerates microbial decomposition
leading to soil degradation, particularly depletion of soil organic and oxidation of SOM to CO2 , which increases the atmospheric
matter (SOM). This in turn has implications for global climate, concentrations of CO2 and global warming (Murty et al., 2002;
Batlle-Aguilar et al., 2011; Wiesmeier et al., 2012). The threat of
global warming is disturbing because the world’s soils contain
about 1500 Pg C (1 Pg = 1015 g) to 1 m-depth, which is twice the
∗ Corresponding author at: Department of Mathematical Sciences and Technology,
amount of C in the atmospheric pool (750 Pg C), and almost three
Norwegian University of Life Sciences, P.O. Box 5003, NO-1432 Ås, Norway.
Tel.: +47 966 563 62; fax: +47 649 654 01. times in the biotic pool (610 Pg C) (Lal, 2004; Smith, 2004, 2008).
E-mail address: kenwerez@yahoo.com (K. Were). Thus, even slight changes in SOC pool can significantly impact on
http://dx.doi.org/10.1016/j.ecolind.2014.12.028
1470-160X/© 2014 Elsevier Ltd. All rights reserved.
K. Were et al. / Ecological Indicators 52 (2015) 394–403 395
the global C cycle, climate, and soil properties (Powlson et al., 2011). altitudes ranging between 2210 and 3070 m above sea level. The
In the face of climate change and food insecurity, scientists have mean annual rainfall varies between 935 and 1287 mm, while the
focused their attention on LULCC and SOC storage research. There mean annual temperatures range from 9.8 to 17.5 ◦ C (Jaetzold et al.,
is consensus that sustainable use of soil resources is one of the ways 2010). The Njoro, Naishi, and Larmudiac Rivers drain the eastern
to manage climate change and food insecurity issues. This requires slopes into Lake Nakuru, while the Nessuiet flow northwards into
a deeper understanding of the spatial distribution of SOC storage Lake Bogoria, and the Rongai River into Lake Baringo. The physiog-
to guide policy formulation. raphy and lithology consists of major scarps and uplands covered
Consequently, many tools have been developed and tested with pyroclastic rocks (i.e., pumice tuffs) of tertiary-quaternary vol-
to help scientists analyze soil processes, and derive spatially- canic age. The rocks decompose into deep to very deep, dark reddish
continuous information on soil properties at different scales. The brown clayey, friable and smeary soil aggregates with humic top-
open accessibility to most geographic information systems (GIS) soils: the resultant soils are classified as Mollic Andosols (McCall,
and remotely-sensed data and technologies has boosted these 1967; Jaetzold et al., 2010). The major land uses are forestry, agri-
efforts. This forms the basis of digital soil mapping (DSM). In DSM, culture, and grazing. The red stinkwood (Prunus Africana), bamboo
the variability of a target soil property is explained by its relation- (Arundinaria alpina), red cedar (Juniperus procera), African wild
ships with soil-forming factors, such as topography, climate, land olive (Olea europaea ssp. Africana), East African olive (Olea capensis
use, vegetation, and soil type. This is underpinned by Jenny’s (1941) ssp. hochstetteri), broad-leaved yellowwood (Podocarpus latifolius),
seminal work, which considered soil development as a function of brittlewood (Nuxia congesta), clematis (Clematis hirsuta), schefflera
climate (c), organisms (o), relief (r), parent material (p), and time (Schefflera volkensii), and forest dombeya (Dombeya torrida) are
(a). This function was later expanded by McBratney et al. (2003) to dominant in the indigenous forests, and pine (Pinus patula) and
include soil properties (s) and space (n) under the famous SCORPAN cypress (Cupressus lusitanica) in the plantation forests. The major
framework. Numerous statistical techniques have been applied in crops grown are maize (Zea mays), beans (Phaseolus vulgaris), wheat
digital mapping of SOC stocks, including multiple linear regression (Triticum aestivum), and potatoes (Solanum tuberosum).
(Meersmans et al., 2008), partial least square regression (Amare
et al., 2013), generalized linear models (Yang et al., 2008), lin- 2.2. Soil data
ear mixed models (Doetterl et al., 2013; Karunaratne et al., 2014),
geographically-weighted regression (Mishra et al., 2010; Kumar 2.2.1. Sampling design and soil sampling
et al., 2013), kriging (Cambule et al., 2014), and regression-kriging We conducted field campaign from June to August 2012. Before
(Hengl et al., 2004, 2007; Kumar et al., 2012). Recently, a few studies that, sampling points were generated randomly in a GIS with agro-
also applied new methods from the machine learning field, such as ecological zones as the stratifying factor, and then a map showing
artificial neural networks (Malone et al., 2009; Jaber and Al-Qinna, the distribution of the points was produced for field use. In the
2011; Li et al., 2013), support vector machines (Rossel and Behrens, field, plots measuring 30 × 30 m were laid out at each sampling
2010), boosted regression trees (Martin et al., 2011), and random point and soil samples collected at 0–15 cm and 15–30 cm depths
forests (Grimm et al., 2008; Wiesmeier et al., 2011; Vågen and from the centres and corners of these plots using an auger. Sam-
Winowiecki, 2013; Vågen et al., 2013) to map SOC stocks. Machine ples taken from similar depths in a plot were properly mixed and
learning methods overcome the shortcomings of parametric and bulked into one composite sample weighing about 500 g. For bulk
non-parametric statistical methods, such as spatial autocorrelation, density (BD) determination, a core ring sampler (5 cm in diameter,
non-linearity, and overfitting (Drake et al., 2006). This improves the 5 cm in height) was used to collect undisturbed samples from each
prediction accuracy of spatial models. Despite the merits, applica- depth at the centre of each plot. Three hundred and twenty (320)
tion of machine learning techniques in DSM is still rare (Vågen et al., soil samples were collected from 160 sampling plots for chemical
2013). and physical analysis, and a similar number for BD determination.
Therefore, in this study, we aimed to develop, evaluate, and Supplementary soil data that had been collected similarly from 60
compare the performance of random forests (RF), support vector other sampling plots for LULCC impact assessment (Were et al.,
machines for regression (SVR), and artificial neural networks (ANN) 2015) were also used. Thus, soil data from 220 sampling plots were
models in predicting and mapping the variability of SOC stocks in used for spatial modelling.
the Eastern Mau Forest Reserve, Kenya. The distinction between
this study and the previous ones is that SVR with the recently 2.2.2. Determination of soil properties
proposed sequential minimal optimization (SMO) algorithm (Platt, The collected soil samples were air-dried, ground, and passed
1999; Smola and Schölkopf, 2004) was implemented. This relatively through a 2 mm mesh at the National Agricultural Research Lab-
new algorithm has been valuable for diverse environmental appli- oratories. The Walkley-Black wet oxidation method (Nelson and
cations, but seldom used to model and map SOC stocks and other Sommers, 1982) and Kjeldahl digestion method (Bremner and
soil properties. Mulvaney, 1982) were used to determine SOC concentrations and
TN concentrations, respectively. The hydrometer method (Day,
1965) was used to analyze particle size distribution, the core
2. Materials and methods method (Blake, 1965) to determine BD, and the Mehlich method
(Okalebo et al., 2002) to estimate phosphorous (P) content. A
2.1. Study area flame-photometer was used to measure potassium (K) content, an
atomic absorption spectrophotometer to measure calcium (Ca) and
The study was conducted in the Eastern Mau Forest Reserve magnesium (Mg) contents, and a pH metre to measure pH (1:2.5
(∼650 km2 ), which is bounded by the latitudes 0◦ 15 –0◦ 40 S and soil-water) (Okalebo et al., 2002).
the longitudes 35◦ 40 –36◦ 10 E (Fig. 1). It is part of the largest
closed-canopy Afromontane forest in Eastern Africa, and provides 2.2.3. Estimation of SOC stocks
essential ecosystem services. This is despite the deforestation and The SOC stocks, i.e., mass of C per unit area for a given depth,
degradation experienced since the mid-1990s because of illegal were calculated according to Eq. (1) (Aynekulu et al., 2011):
logging, charcoal burning, and encroachments, as well as excision
of ∼61,023 ha for human settlement (Government of Kenya, 2009; SOC
SOCst = × BD × D × 100 (1)
UNEP, 2009). The climate is cool and humid thanks to the high 100
396 K. Were et al. / Ecological Indicators 52 (2015) 394–403
Fig. 1. Geographical location of the study area.
where SOCst is the soil organic carbon stock (Mg C ha−1 ), SOC is the The first component (PC1) from principal component analysis of
soil organic carbon concentration (%, which is then converted to g OLI band 2, 3, 4, 5, 6, and 7 was also included. The data, all of which
C g−1 soil), BD is the bulk density (g cm−3 ), D is the depth (cm), and were in raster format, were transformed to UTM WGS84 Zone 36S
100 is the multiplication factor to convert the SOC per unit area and subsets made. The climatic grids were resampled from 1 km
from g C cm2 to Mg C ha−1 . Coarse particles were negligible due to to 30 m using the nearest neighbour method, to match them with
the softness of the volcanic rocks; hence, Eq. (1) does not account for the other raster grids. As in Kumar and Lal (2011), soil data from
them. The SOC stocks in the surface (0–15 cm) and subsurface soils the laboratory, including sand content, silt content, clay content,
(15–30 cm) were summed up to obtain the total stocks to 30 cm Ca, Mg, P, K, TN, and pH were also integrated into the GIS database
depth. both as points in vector format and as raster grids after interpo-
lation by ordinary kriging. Ordinary kriging has been widely used
to optimize the prediction of soil properties at unsampled loca-
2.3. Environmental data tions in pedological studies (Chaplot et al., 2010; Pachomphon et al.,
2010; Tesfahunegn et al., 2011; Marchetti et al., 2012; Elbasiouny
Based on the SCORPAN conceptual model of soil development et al., 2014). Finally, the attribute values of all the other raster grids
(McBratney et al., 2003) and review of literature (e.g., Liu et al., (e.g., slope, rainfall, and temperature) were extracted to the points,
2006; Vasques et al., 2010; Kumar and Lal, 2011; Kumar et al., 2012; which were the main input for spatial modelling.
Li et al., 2013; Shelukindo et al., 2014), we selected a priori nine-
teen (19) environmental variables (predictors) with the potential
to explain the spatial variability of SOC stocks, and retrieved them
from existing spatial databases. Climate data (temperature and 3. Spatial modelling and prediction
rainfall) were obtained from www.worldclim.org, land cover data
from Were et al. (2013), elevation data (digital elevation model; 3.1. Exploratory data analysis
DEM) from http://srtm.csi.cgiar.org/, and Landsat 8 Operational
Land Imager (OLI) data from http://earthexplorer.usgs.gov/. Four We first estimated the descriptive statistics of the target vari-
terrain parameters, including slope, curvature, aspect, and topo- able, followed by pairwise Pearson’s product-moment correlation
graphic wetness index (TWI) were extracted from the DEM (Wilson analysis to detect collinearity between the predictors, as well as
and Gallant, 2000). Normalized Difference Vegetation Index (NDVI) their correlation with the target variable. Predictors that were
was derived from OLI band 4 (red) and 5 (near infra-red) after con- highly correlated (r ≥ 0.8), and had high variance inflation factors
version of the digital numbers to top-of-atmosphere reflectance. (VIFs ≥ 10) in regression analysis were excluded from modelling.
3.2. Model training that minimized the empirical risk (Eq. (3)) (Pozdnoukhov, 2005;
Ruß and Kruse, 2010):
After exploring the data (n = 220), we randomly split it into train-
1
N
ing (n = 176) and testing (n = 44) sets. The former was used to model
Remp = Lε (y − f (x)) (3)
the relationships between the site-specific SOC stocks and the pre- N
i=1
dictors, and the latter to evaluate the predictive performance of
the models developed. For modelling purpose, RF, SVR, and ANN where Lε is the loss function, which penalizes the model in case
algorithms were used. of differences between the training data and model predictions
(i.e., errors). An ε-insensitive loss function was used (Eq. (4)) where
3.2.1. Random forests smaller errors than the specified non-negative constant ε were not
The algorithm is an extension of bagging (i.e., bootstrap aggre- penalized (Gunn, 1998; Pozdnoukhov, 2005).

gation) and a competitor to boosting (Cutler et al., 2012). It uses 0 for y − f (x) < ε
Lε (y − f (x)) = (4)
either categorical (i.e., classification) or continuous (i.e., regression) y − f (x) − ε otherwise
response variables, and either categorical or continuous predic-
tor variables. As described by Cutler et al. (2012), the algorithm Prior to developing the SVR function f(x), the quadratic program-
worked by growing an ensemble of regression trees based on binary ming optimization problem shown in Eq. (5) was solved using the
recursive partitioning, where the predictor space at each tree node SMO algorithm (Platt, 1999; Smola and Schölkopf, 2004).
was partitioned based on binary splits on a subset of randomly
1
N N
selected predictors. At each binary split, the response data were
max ˛, ˛∗ − (˛i − ˛∗i )(˛j − ˛∗j )K(xi , xj )
grouped into two descendant nodes to maximize homogeneity, 2
i=1 j=1
and the best binary split was selected. The response data for each
tree were obtained through bootstrap sampling (with replacement)
N

N
of original observations in the training set. Each descendant node −ε (˛i + ˛∗i ) + yi (˛i + ˛∗i ) (5)
of the selected split was treated similarly as the original (root) i=1 i=1
node, and the process continued recursively until a stopping cri-
terion was met at a terminal node. The trees were grown to their with the constraints given in Eq. (6):
⎧ N
⎨ (˛ + ˛∗ ) = 0
maximum sizes with the results being combined by unweighted
⎪
averaging to make predictions. In RF modelling, the training param- i i
eters that needed specification were: (i) the number of trees to grow (6)
⎪
⎩ i=1
in the forest (ntree ), (ii) the number of randomly selected predictor 0≤ ˛i , ˛∗j ≤C for i = 1, . . .N
variables at each node (mtry ), and (iii) the minimal number of obser-
vations at the terminal nodes of the trees (nodesize). These were where ˛j and ˛∗jare the weights (Lagrange multipliers), which
set to 1000, 12, and 5, respectively. The default of ntree was 500, but determined the influence of each data point on the model (support
it has been observed that more stable results for estimating vari- vectors were the data with non-zero weights), K(xi , xj ) is the kernel
able importance are achieved with a higher number (Grimm et al., function, and C is the regularization parameter, which determined
2008). the trade-off between the training errors and model complexity
The training data that were left out of the bootstrap samples (i.e., flatness of f(x)). The SMO algorithm decomposed the optimiza-
(out-of-bag (OOB) samples) were used to estimate prediction error tion problem into sub-problems, which were solved step by step
and variable importance. In error estimation, the OOB samples were (Platt, 1999). At each step, the algorithm selected two Lagrange
predicted by the respective trees and by aggregating the predic- multipliers, found their optimal values analytically, and updated
tions, the mean square error (MSEOBB ) was calculated using Eq. (2) the SVR function (Eq. (7)) to reflect the new values. The process
(Cutler et al., 2012): was repeated until the Lagrange multipliers converged.
1
N
N
MSEOOB = (yi − ŷiOOB )

2
(2) f (x) = (˛i − ˛∗i )K(xi , xj ) + b (7)
N
i=1 i=1
where b is a constant threshold. The Gaussian radial basis kernel

where ŷiOOB is the OOB prediction for observation yi . Regarding vari-
function of the form in Eq. (8) was used (Tien Bui et al., 2012).
able importance, the values of a specific predictor variable were
2
randomly permutated in the OOB data of a tree while the val-
− xi ,xj /2 2
ues of other predictors remain fixed. The modified OOB data were K(xi , kj ) = exp (8)
predicted, and the differences between the MSEs obtained from where is the bandwidth parameter. The best parameters C and
the permutated and original OOB data gave a measure of variable obtained using the training data were 5 and 0.1, respectively. Deter-
importance. mination of these parameters was carried out using the grid search
method (Zhuang and Dai, 2006; Kavzoglu and Colkesen, 2009).
3.2.2. Support vector machines for regression
Support vector machines (SVM) use kernel functions to project 3.2.3. Artificial neural networks
the data onto a new hyperspace where complex non-linear patterns Artificial neural network algorithm simulates human learning
can be simply represented (Gunn, 1998; Williams, 2011). In the processes through establishment and reinforcement of linkages
new hyperspace, SVM aims to construct an optimal hyperplane that between the input and output data. The linkages then connect input
separates classes and creates the widest margin between their data and output data in the absence of training data (Campbell, 2002).
(i.e., classification), or that fits data and predicts (i.e., SVR) with Numerous ANN algorithms have been proposed, such as Radial
minimal empirical risk and complexity of the modelling function. Basis Function (Vojislav, 2001), Elman recurrent (Rakkiyappan and
In this study, SVR was used. Given the training data {(xi , yi ), i = 1, Balasubramaniam, 2008), and Hopfield neural networks (Nguyen
2, . . . n}, where x is a vector of the input predictors and y is the et al., 2006); however, Multi-layer perceptron neural networks
values of SOC stocks, the SVR developed an optimal function f(x) (MLP Neural Nets) with back-propagation algorithm may be the
3.3. Model testing and comparison
The testing data (n = 44) was used to validate the RF, SVR, and
ANN models, and derive statistical measures to compare their per-
formance. The root mean squared error (RMSE) and mean error
(ME) were computed from the differences between the predicted
SOC stock values and measured values (Eqs. (9) and (10)), to deter-
mine the precision and bias of the predictions, respectively.
n (y − ŷ )2
i i
RMSE = (9)
1 n
Fig. 2. Architecture of the MLP neural network for SOC stocks modelling.

n
(yi − ŷi )
ME = (10)
n
1
most popular, and was selected for this study. The architecture of
the MLP Neural Nets consists of input, hidden, and output layers where ŷi is the estimated value, yi is the measured value and n is
(Fig. 2), each with a set of interconnected nodes (neurons) work- the number of measured values in the testing data. The ME should
ing in parallel to transform the input data into output values (Lee be close to zero, while RMSE should be as small as possible. The
and Evangelista, 2006; Conforti et al., 2014). In this context, the coefficient of determination (R2 ) was also calculated.
neurons in the input layer were equal to the 16 predictors that
were statistically selected from the original list, while the number
of neurons in the hidden layer, which carried weights representing 3.4. Model application
the linkages between the predictors and SOC stocks, was deter-
mined using both the training and validation data. That is, values We applied the output SVR, ANN, and RF models to create pre-
ranging from 1 to 20 were used to build different models, the diction surfaces that showed the spatial distribution of SOC stocks.
training and prediction errors of which were calculated (Fig. 3). Data preparations, analyses, and geovisualization were carried out
The network having 2 hidden neurons and the lowest error was using ArcGIS® 10.1, ERDAS IMAGINE® 2013, Microsoft Excel® 2010,
selected, with the Unipolar sigmoid as the transfer function. The Weka 3.6, and R 3.0.1 (R Core Team, 2013) with its add-in packages:
output layer comprised a single neuron that represented the output “sp”, “maptools”, “rgdal”, “randomForest”, and “raster”.
values of SOC stocks. The training phase was initiated by assigning
arbitrary connection weights. Then the algorithm fed forward the
input layer to the hidden layer. The neurons in the hidden layer 4. Results and discussion
multiplied the inputs by their associated weights, summed up the
products, and processed the weighted sums using the transfer func- 4.1. Exploratory data analysis
tion, the results of which were propagated to the output layer (Lee
and Evangelista, 2006). The output values were compared with Descriptive statistics of SOC stocks along with other soil
the expected values in the training data, and the errors computed. properties are presented in Table 1. SOC stocks ranged from
Through iterative propagation of errors back to the network, the 42.0 to 193.4 Mg C ha−1 , with a mean and median of 102.7 and
connection weights were automatically adjusted until the target 103.2 Mg C ha−1 , respectively. The standard deviation and coef-
minimum error was attained, and the network was able to assign ficient of variation were 24.6 Mg C ha−1 and 23.9%, respectively,
correct values of SOC stocks to the training data, as well as from new denoting moderate variability of SOC stocks. This can be attributed
input data in the absence of training data. To achieve this, different to environmental factors, such as climate, land cover, and topogra-
tests were conducted and the best learning rate, momentum, and phy, as well as measurement errors. The BD varied between 0.5 and
training time (iterations) obtained were 0.01, 0.18, and 500, respec- 1.1 g cm−3 , and the pH between 4.8 and 7.0. The macronutrients,
tively. Finally, the trained network was used as a feed-forward including N, P, and K ranged from 0.2 to 0.9%, 8 to 62.5 me/100 g,
structure to produce predictions for the entire spatially continuous and 0.3 to 2.1 me/100 g, respectively, while the soil separates var-
data. ied from 20 to 53% for sand, 21 to 49% for silt, and 10 to 55% for
clay. For all soil properties, the mean and median values were sim-
ilar indicating normality of data distribution. Some skewness was
evident, although quite low and mostly positive. This was proba-
bly the influence of extreme data values. Similarly, kurtosis values
were low, which implied less peaked values in data distribution.
According to Table 2, Pearson’s coefficients (r) ranged between
−0.04 (sand content) and 0.74 (TN content) for the relationships
between SOC stocks and the predictors. Among the predictors, the
correlation between temperature and elevation (r = −0.99), eleva-
tion and TN (r = 0.81), temperature and band 11 (r = 0.81), elevation
and band 11 (r = −0.83), band 11 and PC1 (r = 0.82), land cover and
elevation (r = −0.83), land cover and temperature (r = 0.83), and
land cover and band 11 (r = 0.88) exceeded the threshold corre-
lation value of 0.8. Therefore, elevation, temperature, and band 11
were excluded from model building: their VIFs in regression anal-
Fig. 3. Training and validation errors associated with a given number of neurons in
ysis were also greater than 10 (not shown here). This reduced the
the hidden layer. number of predictors from 19 to 16.
Table 1
Descriptive statistics of SOC stocks and some soil properties in 0–30 cm depth.
Soil properties 0–30 cm
SOC (Mg ha−1 ) C (%) TN (%) P (ppm) K (me %) Ca (me %) Mg (me %) BD (g cm−3 ) pH Clay (%) Silt (%) Sand (%)
Mean 102.65 4.22 0.42 31.86 1.21 4.04 4.97 0.84 5.84 28.13 35.56 36.25
Median 103.15 4.02 0.41 30.25 1.24 4.10 5.29 0.84 5.81 27.00 36.00 36.00
SD 24.55 1.26 0.13 12.56 0.43 1.44 1.42 0.11 0.50 7.28 5.93 5.89
CV 23.91 29.84 29.64 39.43 35.23 35.58 28.59 13.18 8.64 25.87 16.68 16.24
Kurtosis 0.39 0.79 0.51 −0.75 −0.64 8.27 −0.40 0.05 −0.70 0.85 −0.40 0.31
Skewness 0.97 0.81 0.71 0.31 −0.08 1.67 −0.43 −0.20 0.18 0.79 −0.03 0.00
Range 151.43 6.86 0.67 54.50 1.86 11.00 7.16 0.60 2.19 45.00 28.00 33.00
Minimum 41.99 1.86 0.18 8.00 0.28 1.40 1.38 0.50 4.83 10.00 21.00 20.00
Maximum 193.42 8.72 0.85 62.50 2.14 12.40 8.54 1.10 7.02 55.00 49.00 53.00
4.2. Relative importance of the predictor variables The contributions of the remaining predictors to the models
were more or less the same; that is, their exclusion marginally
The increases in RMSEs as the predictors were excluded one increased the RMSEs. The importance of the predictors was judged
by one from the SVR, ANN, and RF models can be seen in Fig. 4. by the decrease in prediction accuracy after excluding each from the
Based on the magnitude of increase in RMSE, all models showed model(s) because SVR, ANN, and RF algorithms did not reveal the
TN concentration as the most important variable for explaining the functional relationships between the target and predictor variables.
spatial variations of SOC stocks. This was unsurprising for statistical This limited their interpretability, and is the reason they are often
and theoretical reasons. Statistically, the correlation found between referred to as “black box” approaches. Therefore, visualization of
SOC stocks and TN concentration was significantly high (Table 2), the prediction surfaces also helped to assess the soil–environment
which made it a good predictor for SOC stocks. The relationship relationships that explained the observed spatial patterns of SOC
between the two is well defined and has also been reported by stocks.
Chaplot et al. (2010), Phachomphon et al. (2010), and Elbasiouny
et al. (2014). Theoretically, the high correlation can be ascribed 4.3. Spatial prediction and mapping of SOC stocks
to the tight coupling of carbon and nitrogen cycles. For instance,
nitrogen supply increases the net uptake of carbon by stimulating The spatial patterns of SOC stocks predicted by SVR, ANN, and RF
biochemical determinants, including the photosynthetic enzymes models are displayed in Fig. 5. Broadly, the three prediction surfaces
(Lorenz, 2013), which in turn leads to higher input of carbon and were similar in terms of the spatial patterns of SOC stocks. There
nitrogen to the SOC pool. In addition, mineralization of organic was an increasing gradient of SOC stocks from east to west, with the
matter not only leads to the breakdown of carbon substrates and highest stocks on the western and north-western parts. This clearly
emission of CO2 , but also to the release of plant-available inorganic reflected the land cover-effect because these were areas covered by
nitrogen (Butterbach-Bahl and Dannenmann, 2012). Other nitrogen the Logoman, Nessuiet, Kiptunga, and Baraget forests. Besides land
transformations (e.g., nitrification and denitrification) also use the cover, the highly fertile Andosols, high rainfall, low temperatures,
energy supplied by carbon (Batlle-Aguilar et al., 2011). This finding and high altitudes, which favour high net primary productivity and
differs from previous studies, which for instance reported that land low SOC turnover, also explained the high carbon storage in these
use (Wiesmeier et al., 2011) and topographic attributes (Grimm parts. The lowest stocks, on the other hand, were distributed on the
et al., 2008) were the most important predictors of SOC stocks. Most eastern part, including Teret, Nessuiet, Kapkembu, Tuiyotich, and
of the past studies, however, seldom included TN concentration in Sururu locations. These were areas where plantation and indige-
distributed modelling of SOC stocks mainly because accurate and nous forests had been converted to croplands since the mid-1990s.
spatially-exhaustive information on it (and other soil parameters, Thus, the low SOC stocks was due to biomass removal after harvest-
e.g., pH, CEC, soil moisture, etc.) was lacking. ing, erosive processes, and frequent tillage, which breaks up the soil
aggregates, alters aeration, and accelerates microbial decomposi-
PC1 tion and oxidation of SOM to CO2 (Murty et al., 2002; Smith, 2008;
TWI Eclesia et al., 2012; Wiesmeier et al., 2012). The northern and south-
Rainfall eastern parts exhibited moderate to high SOC stocks. These also
Land cover were cropland-dominated areas. Higher estimates of SOC stocks in
Aspect forests and lower estimates in croplands coincide with the findings
Predictor excluded from the model
Slope of Tesfahunegn et al. (2011) in northern Ethiopia.

Curvature Table 3 shows the general statistics of the predictions by the
SVR
NDVI SVR, ANN, and RF models. The predicted minimum and mean val-
ANN
Sand ues approximated the measured ones, whereas the maximum and
RF
Silt standard deviation values slightly differed from the measured ones
Mg (cf. Table 1). The training data point with the maximum value was
Ca
probably treated as an outlier by the algorithms resulting in the
K
different maximum values.
Targeted climate change mitigation and sustainable land man-
P
agement strategies in the area will require an understanding of
pH
the spatial distribution of SOC stocks; hence, the output SOC stock
TN
map is important. The map can guide the identification of areas
13 14 15 16 17 18 19 20 21 for differential allocation of resources for carbon sequestration and
Increase in RMSE fertility management. For instance, areas with low SOC stocks, but
Fig. 4. Variable importance shown by increase in the RMSEs of the SVR, ANN, and
with good soil fertility potential may be targeted for conservation
RF models after excluding a predictor. agriculture and agro-forestry practices.
Fig. 5. Spatially distributed maps of SOC stocks.
4.4. Performance of the spatial models 17.6 Mg C ha−1 , which compared with those of the fitted models
(i.e., 14.5, 15.4, and 18.3 for SVR, ANN, and RF models, respectively).
Table 4 presents the prediction error indices derived from inde- This implies that the models predicted new data as precise as they
pendent validation of the SOC stock maps using the testing dataset. fitted the training ones. Support vector regression model had the
The negative ME signs indicate that the models overestimated SOC lowest RMSE (14.9 Mg C ha−1 ) and ME values, as well as the highest
stocks. In particular, the RF model with an ME of −6.5 Mg C ha−1 R2 (0.6) value; hence, it was the best method to predict SOC stocks
had the highest tendency for overestimation, while SVR model with at the unvisited locations in this context. However, ANN prediction
an ME of −4.4 Mg C ha−1 showed the lowest tendency for over- followed closely with RMSE, ME, and R2 values of 15.5, −4.7, and
estimation. The RMSEs from the testing data varied from 14.9 to 0.6, respectively. This indicated a modest relative improvement of
Table 3
1.00
20
Descriptive statistics of the SOC stocks estimated by the SVR, ANN, and RF models.
0.81
1.00
19 Model Min. Max. Mean SD
1. SVR 40.05 169.05 103.75 15.16

2. ANN 44.45 180.99 105.48 16.20
0.71 −0.99
0.82 −0.83
−0.72 1.00
3. RF 58.38 162.94 105.39 14.61
18
1.00
Table 4
17
Prediction error indices of the SVR, ANN, and RF models.
Model ME RMSEcal RMSEval R2
−0.19
0.18
0.11
0.07
1.00
16
1. SVR −4.42 14.45 14.88 0.64

2. ANN −4.71 15.39 15.46 0.61
3. RF −6.51 18.27 17.57 0.53
0.65
−0.62
−0.57
−0.04
−0.50
1.00
15
−0.83
0.83
0.88
−0.46
0.11
0.74
1.00
prediction accuracy by SVR. Thus, in other contexts, both SVR and

14
ANN models should be calibrated, and the best result applied for
spatial prediction of target soil properties. The RF model results
compare with other studies in the region (Vågen and Winowiecki,
−0.16
0.18
−0.25
0.17
−0.17
−0.25
−0.01
1.00
2013; Vågen et al., 2013), although the other studies reported

13
slightly higher R2 values than this one. This may be because of the
different extents of the study areas, topography, sampling densi-
−0.21
−0.23
0.17
−0.19
−0.12
−0.02
−0.03
1.00
−0.00
ties, or quantity and quality of the auxiliary data used. In addition,

12
these comparative results are consistent with those of Rossel and

Behrens (2010) who found that SVR outperformed RF and boosted
−0.14
−0.12
−0.12
0.12
0.04
−0.01
0.08
−0.10
−0.03
1.00
trees in estimating SOC, clay content, and pH in Australia.

11
Furthermore, the RMSEs of all models were lower than the

standard deviations of the measured values (cf. Table 1), which
0.18
0.12
−0.52
0.16
−0.21
0.44
−0.44
−0.57
−0.09
−0.05
1.00
suggested that application of auxiliary spatial data produced bet-

10
ter predictions than what was expected using the measured values
alone. Generally, the RMSE values obtained in this study reflected
−0.26
0.39
0.17
0.12
0.26
0.33
−0.04
−0.04
0.05
−0.30
0.30
1.00
the measurement, laboratory, statistical, and random errors. For

example, the soil properties used as predictors were interpolated
9
by ordinary kriging. Thus, the associated interpolation errors were

propagated to the subsequent SOC stock estimations. Retrieval of
−0.27
−0.17
0.43
−0.37
0.26
−0.43
0.41
0.47
0.05
−0.08
−0.01
1.00
0.00
auxiliary spatial data from different sources also meant different

8
data quality. Poor coverage of samples in the south-eastern most

and middle parts dominated by thick impenetrable bamboo forests
0.36
0.14
−0.12
0.41
−0.11
−0.40
−0.06
0.08
0.04
−0.06
0.01
0.03
1.00
−0.00
Correlation matrix showing the relationships between the variables used in spatial modelling.
also influenced prediction accuracy in these areas. Lastly, some

7
soil-forming factors (e.g., parent material) were not included in

the models for lack of suitable data. Thus, incorporating the miss-
0.41
−0.25
−0.22
−0.12
0.36
−0.26
0.46
−0.42
0.42
0.39
0.10
−0.07
−0.05
0.02
1.00
ing environmental data, as well as the stochastic component by

6
analysing the spatial structure of residuals with geostatistical tech-

niques (i.e., kriging) (Hengl et al., 2004, 2007) are some of the ways
0.71
0.66
0.28
−0.12
0.33
0.11
0.32
−0.32
0.34
0.21
−0.40
−0.05
−0.08
−0.03
0.10
1.00
to minimize prediction errors in future.

5
5. Conclusion
0.69
0.45
0.57
−0.17
0.38
−0.28
−0.16
0.49
0.17
0.33
−0.38
0.41
0.27
0.02
−0.08
0.01
1.00
4
The results have demonstrated that SVR with SMO algorithm

is the best for spatially predicting and mapping the patterns of
0.62
0.63
0.56
0.37
−0.12
−0.14
0.73
−0.43
0.64
−0.75
0.74
0.64
0.40
0.05
−0.30
0.01
0.06
1.00
SOC stocks in the Eastern Mau Forest Reserve, Kenya. However,

Highly correlated predictor variables are in bold.
3
due to the close performance of SVR and ANN models, we propose

that both models should be calibrated, and then the best result
0.81
−0.44
−0.11
0.38
−0.67
0.36
0.17
−0.68
−0.17
−0.64
−0.78
−0.79
0.07
0.07
−0.09
0.09
0.08
0.70
1.00
applied for spatial prediction of target soil variables in other geo-

graphical settings. Data quality cannot be overlooked in the process.
2
The results have also shown that TN is the most important vari-
able explaining the observed variability of SOC stocks in the area,
0.72
−0.28
−0.15
−0.47
0.27
0.15
−0.48
0.43
−0.46
0.52
−0.51
−0.56
0.06
0.03
0.20
−0.04
0.06
0.04
−0.10
1.00
and that contributions of the other environmental factors are only

1
marginal. Overall, the performance of the models in this study will

inform the selection of machine learning techniques for spatial pre-
19. Temperature
4. Phosphorous
14. Land cover

7. Magnesium
diction of SOC stocks plus other soil functional properties in other

11. Curvature
2. TN content
18. Elevation
5. Potassium
1. SOC stock
20. Band 11
15. Rainfall
6. Calcium
environments, while the map generated will be instrumental for

13. Aspect
Variables
12. Slope
10. NDVI
16. TWI
17. PC1
9. Sand
formulating spatially-targeted climate change mitigation and sus-

8. Silt
Table 2
3. pH
tainable land management strategies. In future, model performance

will be improved by incorporating other important environmental
data (e.g., parent material), as well as the stochastic component of Jaetzold, R., Schmidt, H., Hornetz, B., Shisanya, C., 2010. Farm management hand-
SOC stocks. book of Kenya, Vol. II. Natural conditions and farm management information,
2nd edition, Part B Central Kenya, Subpart B1a Southern Rift Valley Province.
Ministry of Agriculture, Kenya and German Agency for Technical Cooperation
Acknowledgements (GTZ), Nairobi.
Jenny, H., 1941. Factors of Soil Formation: A System of Quantitative Pedology.
McGraw-Hill.
We thank the Research Council of Norway for funding this work Karunaratne, S.B., Bishop, T.F.A., Baldock, J.A., Odeh, 1.O.A., 2014. Catchment scale
through the Norwegian University of Life Sciences. We also thank mapping of measureable soil organic carbon fractions. Geoderma 219-220,
14–23.
Mr. P. Owenga for technical support, Mr. E. Thairu for driving skil-
Kavzoglu, T., Colkesen, I., 2009. A kernel function analysis for support vector
fully in extreme weather and terrain conditions, and the three machines for land cover classification. Int. J. Appl. Earth Observ. Geoinfomat.
anonymous reviewers for their constructive comments. 11 (5), 352–359.
Kumar, S., Lal, R., 2011. Mapping the organic carbon stocks of surface soils using
local spatial interpolator. J. Environ. Monit. 13, 3128–3135.
References Kumar, S., Lal, R., Liu, D., 2012. A geographically weighted regression kriging
approach for mapping soil organic carbon stock. Geoderma 189-190, 627–634.
Amare, T., Hergarten, C., Hurni, H., Wolfgramm, B., Yitaferu, B., Selassie, Y.G., 2013. Kumar, S., Lal, R., Liu, D., 2013. Estimating the spatial distribution of organic carbon
Prediction of soil organic carbon for Ethiopian highlands using soil spectroscopy. density for the soils of Ohio, USA. J. Geograph. Sci. 23 (2), 280–296.
ISRN Soil Sci. 720589, 11, http://dx.doi.org/10.1155/2013/720589. Lal, R., 2004. Soil carbon sequestration to mitigate climate change. Geoderma 123,
Aynekulu, E., Vågen, T.-G., Shepherd, K., Winowiecki, L., 2011. A protocol for mea- 1–22.
surement and monitoring soil carbon stocks in agricultural landscapes. Version Lee, S., Evangelista, D.G., 2006. Earthquake-induced landslide susceptibility mapping
1. 1. World Agroforestry Centre, Nairobi. using an artificial neural network. Nat. Hazards Earth Syst. Sci. 6, 687–695.
Butterbach-Bahl, L., Dannenmann, M., 2012. Soil carbon and nitrogen interactions Li, Q., Yue, T., Wang, C., Zhang, W., Yu, Y., Li, B., Yang, J., Bai, G., 2013. Spatially dis-
and biosphere-atmosphere exchange of nitrous oxide and methane. In: Lal, R., tributed modeling of soil organic matter across China: an application of artificial
Lorenz, K., Hüttl, R.F., Schneider, B.U., von Braun, J. (Eds.), Recarbonization of the neural network approach. Catena 104, 210–218.
Biosphere: Ecosystems and the Global Carbon Cycle. Springer Science+Business Liu, Z.P., Shao, M.A., Wang, Y.Q., 2006. Large-scale spatial variability and distribution
Media, pp. 429–442. of soil organic carbon across the entire Loess Plateau, China. Soil Res. 50 (2),
Bationo, A., Kihara, J., Vanlauwe, B., Waswa, B., Kimetu, J., 2007. Soil organic carbon 114–124.
dynamics, functions and management in West African agro-ecosystems. Agric. Lorenz, K., 2013. Ecosystem carbon sequestration. In: Lal, R., Lorenz, K., Hüttl, R.F.,
Syst. 94, 13–25. Schneider, B.U., von Braun, J. (Eds.), Ecosystem Services and Carbon Sequestra-
Batlle-Aguilar, J., Brovelli, A., Porporato, A., Barry, D.A., 2011. Modelling soil carbon tion in the Biosphere. Springer Science+Business Media Dordrecht, pp. 39–62.
and nitrogen cycles during land use change – a review. Agron. Sustain Dev. 31, Malone, B.P., McBratney, A.B., Minasny, B., Laslett, G.M., 2009. Mapping continuous
251–274. depth functions of soil carbon storage and available water capacity. Geoderma
Blake, G.R., 1965. Bulk density. In: Black, C.A. (Ed.), Methods of Soil Analysis, Part 1. 154, 138–152.
Physical and Mineralogical Properties, including Statistics of Measurement and Marchetti, A., Piccini, C., Francaviglia, R., Mabit, L., 2012. Spatial distribution of soil
Sampling. American Society of Agronomy, Inc, Madison, Wisconsin, USA. organic matter using geostatistics: a key indicator to assess soil degradation
Bremner, J.M., Mulvaney, C.S., 1982. Nitrogen – total. In: Page, A.L. (Ed.), Methods status in central Italy. Pedosphere 22 (2), 230–242.
of soil analysis, Part 2. Chemical and microbiological properties. , 2nd edition. Martin, M.P., Wattenbach, M., Smith, P., Meersmans, J., Jolivet, C., Boulonne, L.,
American Society of Agronomy, Inc, Madison, Wisconsin, USA. Arrouays, D., 2011. Spatial distribution of soil organic carbon stocks in France.
Cambule, A.H., Rossiter, D.G., Stoorvogel, J.J., Smaling, E.M.A., 2014. Soil organic Biogeosciences 8, 1053–1065.
carbon stocks in the Limpopo National Park, Mozambique: amount, spatial dis- McBratney, A.B., Santos, M.L.M., Minasny, B., 2003. On digital soil mapping. Geo-
tribution and uncertainty. Geoderma 213, 46–56. derma 117, 3–52.
Campbell, J.B., 2002. Introduction to Remote Sensing. Taylor & Francis, London. McCall, G.J.H., 1967. Geology of the Nakuru-Thomson’s falls-Lake Hannington area:
Chaplot, V., Bouahom, B., Valentin, C., 2010. Soil organic carbon stocks in Laos: spatial degree sheet No. 35, S.W. Quarter and 43 N.W. Quarter. Report No. 78. Govern-
variations and controlling factors. Global Change Biol. 16, 1380–1393. ment Printer, Nairobi.
Chen, J., Chen, J., Tan, M., Gong, Z., 2002. Soil degradation: a global problem endan- Meersmans, J., de Ridder, F., Canters, F., de Baets, S., van Molle, M., 2008. A multiple
gering sustainable development. J. Geograph. Sci. 12 (2), 243–252. regression approach to assess the spatial distribution of soil organic carbon (SOC)
Conforti, M., Pascale, S., Robustelli, G., Sdao, F., 2014. Evaluation of prediction capa- at the regional scale (Flanders, Belgium). Geoderma 143, 1–13.
bility of the artificial neural networks for mapping landslide susceptibility in the Mishra, U., Lal, R., Liu, D., van Meirvenne, M., 2010. Predicting the spatial varia-
Turbolo River catchment (northern Calabria, Italy). Catena 113, 236–250. tion of the soil organic carbon pool at a regional scale. Soil Sci. Soc. Am. J. 74,
Cutler, A., Cutler, D.R., Stevens, J.R., 2012. In: Zhang, C., Ma, Y. (Eds.), Ensemble 906–914.
Machine Learning: Methods and Applications. Springer Science+Business Media, Murty, D., Kirschbaum, M.F., McMurtrie, R.E., McGilvray, H., 2002. Does conversion
LLC. of forest to agricultural land change soil carbon and nitrogen? A review of the
Day, P.R., 1965. Particle fractionation and particle size analysis. In: Black, C.A. (Ed.), literature. Global Change Biol. 8, 105–123.
Methods of Soil Analysis, Part 1. Physical and Mineralogical Properties, including Nelson, D.W., Sommers, L.E., 1982. Total carbon, organic carbon and organic matter.
Statistics of Measurement and Sampling. American Society of Agronomy, Inc, In: Page, A.L. (Ed.), Methods of Soil Analysis, Part 2, Chemical and Microbio-
Madison, Wisconsin, USA. logical Properties. , 2nd edition. American Society of Agronomy, Inc, Madison,
Doetterl, S., Stevens, A., van Oost, K., Quine, T.A., van Wesemael, B., 2013. Spatially Wisconsin, USA.
explicit regional scale prediction of soil organic carbon stocks in cropland using Nguyen, M.Q., Atkinson, P.M., Lewis, H.G., 2006. Super-resolution mapping using
environmental variables and mixed model approaches. Geoderma 204-205, Hopfield neural network with fused images. IEEE Trans. Geosci. Remote Sens.
31–42. 44 (3), 736–749.
Drake, J.M., Randin, C., Guisan, A., 2006. Modelling ecological niches with support Okalebo, J.R., Gathna, K.W., Woomer, P.L., 2002. Laboratory Methods for Soil and
vector machines. J. Appl. Ecol. 43, 424–432. Plant Analysis: A Working Manual, 2nd edition. Tropical Soil Biology and Fertility
Eclesia, R.P., Jobbagy, E.G., Jackson, R.B., Biganzoli, F., Piñeiro, G., 2012. Shifts in soil Programme, Nairobi.
organic carbon for plantation and pasture establishment in native forests and Pachomphon, K., Dlamini, P., Chaplot, V., 2010. Estimating carbon stocks at regional
grasslands of South America. Global Change Biol. 18, 3237–3251. level using soil information and easily accessible auxiliary variables. Geoderma
Elbasiouny, H., Abowaly, M., Abu Alkheir, A., Gad, A., 2014. Spatial variation of soil 155, 372–380.
carbon and nitrogen pools by using ordinary kriging method in an area of north Platt, J., 1999. Fast training of support vector machines using sequential minimal
Nile delta, Egypt. Catena 113, 70–78. optimization. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (Eds.), Advances in Ker-
Government of Kenya, 2009. Report of the prime minister’s task force on the nel Methods – Support Vector Learning. MIT Press, MA, pp. 185–208.
conservation of the Mau forest complex. [Online], Available: http://www.kws. Powlson, D.S., Gregory, P.J., Whalley, W.R., Quinton, J.N., Hopkins, D.W., Whitmore,
org/export/sites/kws/info/maurestoration/maupublications/Mau Forest A.P., Hirsch, P.R., Goulding, K.W.T., 2011. Soil management in relation to sus-
Complex Report.pdf [Accessed 19.01.14]. tainable agriculture and ecosystem services. Food Policy 36, S72–S87.
Grimm, R., Behrens, T., Märker, M., Elsenbeer, H., 2008. Soil organic carbon concen- Pozdnoukhov, A., 2005. Support vector regression for automated robust spatial map-
trations and stocks on Barro Colorado Island - Digital soil mapping using Random ping of natural radioactivity. Appl. GIS 1 (2), 21.1-21.10.
Forests analysis. Geoderma 146, 102–113. R Core Team, 2013. R: A Language and Environment for Statistical Computing. R
Gunn, S.R., 1998. Support Vector Machines for Classification and Regression. Uni- Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL
versity of Southampton, Technical report. http://wwwR-projectorg/
Hengl, T., Heuvelink, G.B.M., Rossiter, D.G., 2007. About regression-kriging: from Rakkiyappan, R., Balasubramaniam, P., 2008. Delay-dependent asymptotic stability
equations to case studies. Comput. Geosci. 33, 1301–1315. for stochastic delayed recurrent neural networks with time varying delays. Appl.
Hengl, T., Heuvelink, G.B.M., Stein, A., 2004. A generic framework for spatial predic- Math. Comput. 198 (2), 526–533.
tion of soil variables based on regression-kriging. Geoderma 120, 75–93. Rossel, R.A.V., Behrens, T., 2010. Using data mining to model and interpret soil diffuse
Jaber, S.M., Al-Qinna, M.I., 2011. Soil organic carbon modelling and mapping in reflectance spectra. Geoderma 158, 46–54.
a semi-arid environment using thematic mapper data. Photogrammetric Eng. Ruß, G., Kruse, R., 2010. Regression models for spatial data: An example from preci-
Remote Sens. 77 (7), 709–719. sion agriculture. In: Perner, P., ICDM 2010, LNAI 6171, pp. 450-463.
Shelukindo, H.B., Semu, E., Msanya, B.M., Singh, B.R., Munishi, P.K.T., 2014. Predictor Were, K.O., Dick, Ø.B., Singh, B.R., 2013. Remotely sensing the spatial and temporal
variables for soil organic carbon contents in the Miombo woodlands ecosystem land cover changes in Eastern Mau forest reserve and Lake Nakuru drainage
of Kitonga forest. Int. J. Agric. Sci. 4 (7), 222–231. basin, Kenya. Appl. Geogr. 41, 75–86.
Smith, P., 2004. Soils as carbon sinks: the global context. Soil Use Manage 20, Were, K.O., Singh, B.R., Dick, Ø.B., 2015. Effects of land cover changes on soil organic
212–218. carbon and total nitrogen stocks in the Eastern Mau Forest Reserve, Kenya (Chap-
Smith, P., 2008. Land use change and soil organic carbon dynamics. Nutr. Cycl. ter 6). In: Lal, R., Singh, B.R., Mwaseba, D.L., Kraybill, D., Hansen, D.O., Eik, L.O.
Agroecosyst. 81, 169–178. (Eds.), Sustainable Intensification to Advance Food Security and Enhance Cli-
Smola, A.J., Schölkopf, B., 2004. A tutorial on support vector regression. Stat. Comput. mate Resilience in Africa. Springer International Publishing, Switzerland, pp.
14, 199–222. 113–133, http://dx.doi.org/10.1007/978-3-319-09360-4 6.
Tesfahunegn, G.B., Tamene, L., Vlek, P.L.G., 2011. Catchment scale spatial variability Wiesmeier, M., Barthold, F., Blank, B., Kögel-Knabner, I., 2011. Digital mapping of
of soil properties and implications on site-specific soil management in northern soil organic matter stocks using Random Forest modeling in a semi-arid steppe
Ethiopia. Soil Till. Res. 117, 124–139. ecosystem. Plant Soil 340, 7–24.
Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., 2012. Landslide susceptibility Wiesmeier, M., Spörlein, P., Geuß, U., Hangen, E., Haug, S., Reischl, A., Schilling, B.,
assessment in Vietnam using support vector machine, decision tree and Naïve von Lützow, M., Kögel-Knabner, I., 2012. Soil organic carbon stocks in southeast
Bayes models. Math. Prob. Eng., 1–26. Germany (Bavaria) as affected by land use, soil type and sampling depth. Global
UNEP, 2009. Kenya: Atlas of our changing environment. Division of Early War- Change Biol. 18, 2233–2245.
ning and Assessment (DEWA), United Nations Environment Programme (UNEP). Williams, G., 2011. Data Mining with Rattle and R: The Art of Excavating Data
[Online]. Available: http://www.unep.org/dewa/africa/kenyaatlas/ [Accessed for Knowledge Discovery, use R. Springer Science+Business Media, LLC, DOI
28.09.13]. 10.1007/9781441998 2.
Vojislav, K., 2001. Learning and Soft Computing: Support Vector Machines, Neural Wilson, J.P., Gallant, J.C., 2000. Terrain Analysis: Principles and Applications. John
Networks, and Fuzzy Logic Models (Complex Adaptive Systems). The IMT Press. Wiley & Sons, Inc.
Vågen, T.G., Winowiecki, L.A., 2013. Mapping of soil organic carbon stocks for spa- Yang, Y., Fang, J., Tang, Y., Ji, C., Zheng, C., He, J., Zhu, B., 2008. Storage, patterns and
tially explicit assessments of climate change mitigation potential. Environ. Res. controls of soil organic carbon in the Tibetan grasslands. Global Change Biol. 14,
Lett. 8, http://dx.doi.org/10.1088/1748-9326/8/1/015011, 015011 (9 pp). 1592–1599.
Vågen, T.G., Winowiecki, L.A., Abegaz, A., Hagdu, K.M., 2013. Landsat-based Zhuang, L., Dai, H.H., 2006. Parameter optimization of kernel-based classifier
approaches for mapping of land degradation prevalence and soil functional on imbalance text learning. Pricai: 2006. Trends in Artificial Intelligence,
properties in Ethiopia. Remote Sens. Environ. 134, 266–275. Proceedings 4099, 434–443.
Vasques, G.M., Grunwald, S., Comerford, N.B., Sickman, J.O., 2010. Regional modelling
of soil carbon at multiple depths within a subtropical watershed. Geoderma 156,
326–336.
View publication stats

Soil Carbon Effects

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Soil Carbon Effects

Caricato da

Copyright:

Formati disponibili

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

A comparative assessment of support vector regression, artiﬁcial neural

Article in Ecological Indicators · May 2015

Kennedy Were Dieu Tien Bui

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Contents lists available at ScienceDirect

A comparative assessment of support vector regression, artiﬁcial

1. Introduction food security, and sustainable development. Soil organic carbon

Fig. 1. Geographical location of the study area.

MSEOOB = (yi − ŷiOOB )

where b is a constant threshold. The Gaussian radial basis kernel

3.3. Model testing and comparison

Soil properties 0–30 cm

Slope of Tesfahunegn et al. (2011) in northern Ethiopia.

Fig. 5. Spatially distributed maps of SOC stocks.

1. SVR 40.05 169.05 103.75 15.16

Prediction error indices of the SVR, ANN, and RF models.

Model ME RMSEcal RMSEval R2

1. SVR −4.42 14.45 14.88 0.64

prediction accuracy by SVR. Thus, in other contexts, both SVR and

2013; Vågen et al., 2013), although the other studies reported

ties, or quantity and quality of the auxiliary data used. In addition,

these comparative results are consistent with those of Rossel and

trees in estimating SOC, clay content, and pH in Australia.

Furthermore, the RMSEs of all models were lower than the

suggested that application of auxiliary spatial data produced bet-

the measurement, laboratory, statistical, and random errors. For

by ordinary kriging. Thus, the associated interpolation errors were

auxiliary spatial data from different sources also meant different

data quality. Poor coverage of samples in the south-eastern most

also inﬂuenced prediction accuracy in these areas. Lastly, some

soil-forming factors (e.g., parent material) were not included in

ing environmental data, as well as the stochastic component by

analysing the spatial structure of residuals with geostatistical tech-

to minimize prediction errors in future.

The results have demonstrated that SVR with SMO algorithm

SOC stocks in the Eastern Mau Forest Reserve, Kenya. However,

due to the close performance of SVR and ANN models, we propose

applied for spatial prediction of target soil variables in other geo-

and that contributions of the other environmental factors are only

marginal. Overall, the performance of the models in this study will

14. Land cover

diction of SOC stocks plus other soil functional properties in other

environments, while the map generated will be instrumental for

formulating spatially-targeted climate change mitigation and sus-

tainable land management strategies. In future, model performance

View publication stats

Potrebbero piacerti anche