Correcting Climate Model Simulations in Heihe River Using The Multivariate Bias Correction Package

Environmental and Ecological Statistics
https://doi.org/10.1007/s10651-018-0410-x
Correcting climate model simulations in Heihe River using

the multivariate bias correction package
Qiantao Zhu1,2 · Wenzhi Zhao1,2
Received: 14 December 2017 / Revised: 1 August 2018

© Springer Science+Business Media, LLC, part of Springer Nature 2018
Abstract
The simulations from climate models require bias correction prior to use in impact
assessments or when used as predictors in statistical or dynamic downscaling models.
Recent works have sought to address each of these limitations and the results are the
Multivariate Recursive Nesting Bias Correction (MRNBC) and Multivariate recur-
sive Quantile-matching Nested Bias Correction (MRQNBC) methods. The model
was applied to a mountain region of Heihe River. A comparison of the historical
and generated statistics shows that the model preserves all the important charac-
teristics of meteorological variables at daily, monthly, seasonally and annual time
scales. This study has documented the performance of Multivariate Recursive Nesting
Bias Correction to remove the discrepancy between the predictors in the simulated
GCM and the reanalysis NCEP data and assess the projected future precipitation
accuracy in the headwater region of Heihe River. A relatively high spatial resolution
GCM outputs—ACCESS1-3—from the CMIP5 Earth System Models (ESMs) was
employed to downscale for the historical 1960–2005 and the future period 2010–2100
under the scenarios of Representative Concentration Pathways RCP4.5 and RCP8.5.
The MRNBC method can dramatically increase the performance of the simulated pre-
cipitation data. Verified by statistical score metrics applied for evaluation of the results,
the developed method appears to be an important statistical tool in the correction of the
bias between the GCM output and the reanalysis data, leading to significant improve-
ments in the predictive performance accuracy of the precipitation projections. The
projected precipitation under RCP8.5 appeared to exhibit the significant increasing
trend relative to the RCP4.5 scenario in the headwater region of Heihe River. Future
precipitation will increasing by 8% and 20% for near and long term period under
RCP4.5 and increasing 14% and 37% for near and long term period, under RCP8.5,
respectively.
Keywords Bias correction · Distribution based quantile matching · Heihe River ·

Nested bias correction
Handling Editor: Pierre Dutilleul.
Extended author information available on the last page of the article
123
1 Introduction
General circulation models (GCMs) are mathematical models in the form of a set
of linear and non-linear partial differential equations for hydro-climatic variables,
developed by considering the underlying physics involved in the land, ocean and
atmospheric processes and becoming increasingly sophisticated with improvements
in resolution and the range of processes that are represented. In fact in many cases
GCMs are now more accurately referred to as Earth System Models (ESMs) because
of the number of systems that are included. Despite these improvements and overall
confidence in the representation of large scale responses such as the global tempera-
ture sensitivity, there remain a number of biases in GCM simulations, particularly in
the hydrological cycle. Dynamic downscaling using regional climate models (RCMs)
can improve some of these biases because of their finer resolution but in many cases
significant biases can persist either from the driving GCM or the RCM itself, often crit-
icized for being computational expensive. When GCM or RCM simulations are used
in statistical downscaling approaches or directly for impact assessments, bias correc-
tion of the variables is likely required. Traditionally this bias correction has focussed
on correcting the representation of individual variables over a single time-scale of
interest (e.g., daily or monthly data). The underlying idea behind a bias correction
approach is to identify the bias (in a statistic or value itself) for the current climate
and correct the future climate under the assumption that the bias does not change over
time. Daily or monthly standardization forms the most basic bias correction and is
used to correct for systematic biases in the mean and variances of GCM simulations
(Wilby et al. 2004). Another nonparametric bias correction category includes quantile
matching, correction factors and transfer functions based approaches (e.g., Arnell and
Reynard 1996; Chen et al. 2013; Chiew and McMahon 2002; Ines and Hansen 2006;
Piani et al. 2010). These approaches typically look at the biases at individual values
(distributional biases) in the GCM simulations (e.g., Cayan et al. 2008). A variation
of quantile matching, termed as equidistant quantile matching (EQM), has been pro-
posed by Li et al. The approach looks at the biases at each cumulative distribution
between the observed (reanalysis) and training (GCM current climate) datasets and
accordingly corrects the testing (GCM future climate) datasets.
There are a number of different approaches to bias correction although most of
these focus on a single variable for a particular location and consider corrections
only over one time scale of interest, for example daily or monthly. Recently, bias
correction approaches have been proposed to deal with the bias correction of multiple
variables. Piani and Haerter proposed a bias correction approach to simultaneously
correct temperature and precipitation variables. They apply a univariate bias correction
to the time series of one variable (e.g., precipitation) conditionally on the bias-corrected
values of the time series for the other variable (e.g., temperature). In recent times copula
based methods have been proposed to consider the joint- and/or spatial dependence
across variables/grids. Mehrotra and Sharma (2015) proposed a multivariate extension
to parametric approaches and also a multivariate and multi-timescale extension of
quantile matching based nonparametric bias correction alternatives. The approach
looks at the biases in the cumulative probability space and applies correction in the
probability space as well.
123
In multivariate modelling of Mehrotra and Sharma (2015), two common AR models

are combined—the first has constant parameters over time and is used to represent
the daily and annual time series, whilst the second model uses periodic parameters to
represent the monthly and seasonal characteristics (Salas 1980). The idea is that the
GCM simulations are corrected so that the distributional and persistence attributes at
each of the above time-scales match the observations. Future GCM simulations have
the same corrections applied, which allows for changes in the statistical properties
over time but corrects for biases (assuming that the biases are stationary and smaller
than the magnitude of changes that are projected). More details on the structure of
the multivariate bias correction models are discussed in details in Salas (1980) and
Mehrotra and Sharma (2015).
Commonly used bias correction approaches are applied on a single variable at a time,
consider a selected single time scale (day, month or year) and do not look at the biases
in persistence attributes. When the bias corrected variables are aggregated/averaged
to higher time scales (for example, daily to monthly/seasonal or annual), observed
and bias corrected statistics can be quite different. Johnson and Sharma proposed the
idea of multiple time scales nesting and persistence correction in the standard bias
correction procedure and named it as the nested bias correction (NBC) approach. As
the time nesting was found to introduce some imbalance in some of the statistics of the
bias corrected series, Mehrotra and Sharma proposed multiple repeats of the nested
bias correction procedure in order to minimise the biases at all time scales and named
it as Recursive Nested Bias Correction (RNBC).
One of the criticisms of bias correction is that they are applied over a single variable
at a time. Hence, although the resulting statistics of the corrected data are improved
the physical dependencies between different variables are disregarded when the cor-
rections are applied to each variable separately (Colette et al. 2012; Maraun 2012).
Often bias correction is required for a number of different variables, for example
precipitation and temperature or for upper air variables used as inputs for statisti-
cal downscaling schemes. In some cases, bias corrected variables are combined using
empirical equations to estimate quantities such as potential evapotranspiration or com-
bined implicitly through their use in impact assessment models, such as rainfall-runoff
modelling. Equally problematic can be the poor representation of spatial correlations
if variables are corrected separately for different locations.
Two variants of the Multivariate and multi-timescale approach are provided as a
package in the R statistical computing environment developed by Mehrotra et al.
(2018) and Mehrotra and Sharma (2015). In addition to keeping across variables
dependence, the bias corrected simulations maintain the correct persistence structures
as well as the distribution over multiples time-scales of interest. The package is named
as Multivariate Bias Correction (MBC). In this paper, we employed both multivariate
bias correction approaches, namely, Multivariate Recursive Nested Bias Correction
(MRNBC) and the Multivariate Recursive Quantile-matching Nested Bias Correction
(MRQNBC) also including their variants to correct the GCM output simulation on the
target to the NCEP reanalysis data and provided simple examples of its applications
in the mountainous of Heihe River by using the multiple linear regression method
to confirm the improvement of simulation accuracy. Finally, we investigated on the
123
precipitation variation in the future climate scenarios to understand the climate change
impacts on regional water resources available.
2 Multivariate bias correction (MBC)
2.1 MBC application
MBC is compiled in R and allows applying variants of MRNBC and MRQNBC bias
correction approaches in a fairly simple manner. The package requires all essential
information to be provided in the ‘basic.dat’ file. In addition, four data files are to be
prepared and included before running the package. These include observed and raw
data files for calibration as well as verification period. It is not necessary to have equal
length of data for raw and observed file either for calibration or verification periods.
Also, depending upon the requirement, same file can be used for both calibration and
verification periods. As the package considers across the variables dependence, it can
also be used to maintain observed spatial dependence across multiple locations in the
simulations. User can first pick their choice of going for either MRNBC or MRQNBC
bias correction options and then specify the bias correction statistics and time nesting
to be included. Statistics to be used for bias correction include, correction for only
mean, mean and SD (or distribution), LAG1 auto, LAG0 and LAG1 cross correlation
attributes. The options for time nesting include, daily, monthly, seasonal, annual and
tri-annual. The package also allows flexibility of applying bias correction either to
daily or to monthly time series. Users are allowed to define their own seasons. Also,
there is an option of averaging or aggregating the data as one move from daily to
higher time scales (monthly, seasonal and annual). The option of data aggregation
is useful when dealing with variables for example, rainfall. In addition to the name
of four data files, the ‘basic.dat’ file also requires information about the number of
years of data, number of variables, width of moving window, number of repeats in the
recursive procedure, physical lower and upper limits on the variables, whether data
consider leap years or not and distribution of calendar months in the seasons specified.
All the information is provided in a free format, separated by spaces. At present, the
package allows use of a maximum of 70 years of daily data, 15 variables, 5 seasons
and 31 days wide moving window.
Upon successful completion of the program, 6 output files are generated, two files
containing bias corrected time series for calibration and verification periods and four
statistics results files, containing important statistics of (1) Observed and Raw data for
calibration; (2) observed and raw data for verification; (3) observed and bias corrected
data for calibration; and (4) observed and bias corrected data for verification time
periods. Some of the important statistics calculated include means, standard deviations,
skewness, LAG1 and LAG2 auto correlations, and distribution plots at daily, monthly,
seasonal and annual time scales. In case of multiple variables, auto and LAG1 cross
correlations are also computed. The package allows the users to look at a few raw and
bias corrected statistics either in the form of a table or as plots at multiple time scales
of interest. It also provides plots of empirical distribution of raw and bias corrected
time series.
123
Giving a vector m predictor variables with I time steps Z (m × t matrix) at a local

site. The lag one autocorrelations and lag one and lag zero cross correlations in the
GCM simulations can be corrected to match the observed correlations in time and
space (Sarhadi et al. 2016). Zh denote the observations and Zg denote the GCM
g
variables. The data is first standardized to form a periodic time series Ẑ i which need
h
to be modified to match the observation Ẑ i . The standard Multivariate Auto Regressive
order 1 model (MAR1) for both observed and GCM data is expressed as follows (Salas
et al. 1985):
Ẑ ih C Ẑ i−1
h
+ Dεi (1)
g g
Ẑ i E Ẑ i−1 + Fεi (2)
where C and D are the lag zero and lag one cross correlations coefficient matrices
for the observation Ẑ ih . E and F are calculated by the same way for the standardized
GCM outputs and 2i is a vector of mutually independent random variation having zero
mean and the identity covariance matrix.
Our aim here is to modify the lag-0 and lag-1 auto and cross correlations (E and F)
g
that are present in the standardized Ẑ i t time series to match the observed lag-0 and
g
lag-1 one auto and cross correlations (C and D) to create Ẑ i . Rearranging the terms
of Eq. 2 and simplifying for εi leads to the following:

εi F −1 Ẑ i − E Ẑ i−1
g g
(3)
g
where 2i now represents a vector of standardized variates obtained from the Ẑ i series
from which we have taken out the lag-1 and lag-0 dependence structure. Rearranging
g
the terms of the above equations (Eqs. 1, 2) and modify Ẑ i along with lag zero and lag
g
one correlation matrices (C and D) to Z i that have the desired dependence properties:
g g
Z i C Z i−1 + D F −1 Ẑ i − D F −1 E Ẑ i−1
g g
(4)
g
For the periodic parameters correction, let vectors Z ht,i and Z t,i represent the
observed and GCM outputs with m variables for the month i and year t. The stan-
dardized periodic time series with zero mean and unit variance is denoted as Ẑ t,i .
g
Followed the Eq. (3), the series Z t,i maintains the observed lag-1 serial and cross
dependence can be formulated as:
g g
Z t,i Ci Z t,i−1 + Di Fi−1 Ẑ t,i − Di Fi−1 E i Ẑ t,i−1
g g
(5)
g
where Z t,i−1 is the value in the corrected time series from the previous mouth in year

t. After correction the time series Z g is rescaled by the observed mean and standard
g
deviation to give the final corrected time series Z . Details can be found from studies
of Mehrotra and Sharma (2015) and Sarhadi et al. (2016).
123
g
Following the monthly corrections, the time series Z is aggregated to form seasonal
series and the periodic corrections described above are applied, now indexing over the
g
4 seasons rather than 12 months to give S where S refers to the seasonal matrix of
simulations which is p × n/4 in size. Finally this time series is aggregated to annual
time series and the correlations, standard deviations and mean are corrected to form
g
A where A is the matrix of yearly data which is p × n/12. The each time aggregation
corrections can be applied to the daily time series to create a simple correction step as
follows (Srikanthan and Pegram 2009):
g g g
g Ȳ j,s,t S̄s,t Āt g
Z̄ i, j,s,t g × g × g × Z i, j,s,t (6)
Y j,s,t Ss,t At
g g g
where Ȳ j,s,i , S̄s,i , Āi , respectively donate the monthly, seasonally and annually cor-
g g g
rected value, Y j,s,i , Ss,i , Ai , respectively donate the aggregated monthly, seasonal and
annually value. The subscript i stands for day, j for month, s for season, t for year.
A three step correction procedure is used to correct biases firstly in the mean, then
the standard deviation and finally the correlations. This ensures that the future climate
change signal is not affected by the bias correction.
2.2 Stepwise procedure followed in the MBC
The initial steps for univariate part of the parametric (MRNBC) and quantile matching
(MRQNBC) bias correction approaches differ slightly and are discussed separately.
Multivariate extension follows the same procedure. Calculate the reanalysis and GCM
series daily mean and standard deviation vectors of all the variables of interest. Also
calculate daily lag-0 and lag-1 auto and cross correlations across variables and form
lag-0 and lag-1 correlation matrices. Calculate these daily statistics using the data
falling within a moving window of pre-specified width (for example, 31 days) centered
on the current day of interest.
2.2.1 MRNBC approach
1. Correct the daily raw future climate GCM series of individual variables for bias in
mean by removing the GCM series current climate mean and adding the reanalysis
mean.
2. Subtract the mean of the daily GCM mean corrected series and bias-correct the
residuals for Standard Deviation (SD), by dividing the GCM current climate SD
and multiplying by reanalysis SD. Add the mean thereafter.
2.2.2 MRQNBC approach
1. Fit empirical Cumulative Distribution Functions (CDFs) to the observed as well as

GCM simulated data for current and future climate for each variable, separately.
2. For a given value of future climate GCM simulations, calculate the cumulative
probability. Obtain the values from observed CDF and GCM current climate CDF
123
for this cumulative probability. Calculate the difference of observed and GCM
current climate values at this cumulative probability.
3. Obtain the corresponding value for this cumulative probability from the future
climate CDF. Apply the difference to the value for the future climate to obtain the
bias corrected value.
4. Repeat the same procedure for every point for the future projection time series, to
obtain the bias-corrected future projection time series.
2.2.3 Correcting for auto and cross dependence
1. Calculate the mean and standard deviation vectors of the daily GCM mean and
standard deviation corrected series (obtained from step 3 of 3.1 and step 4 of 3.2)
and form time series of residuals by subtracting the mean and dividing by the
standard deviation. Bias correct the residuals for a day t lag-1 and lag-0 auto and
cross correlations where form of the correction is based on a standard multivariate
auto regressive model as discussed in Mehrotra and Sharma (2015).
2. Multiply the bias corrected residuals by standard deviation and add mean thereafter
to form bias corrected daily time series.
3. Aggregate the daily seriesto higher (monthly, seasonal and annual) time scales
and follow the same steps for these time scales as well. Note that for monthly and
seasonal time scales, the parameter estimation procedure is slightly different to
what is used at daily and annual time scales.
4. Derive weighting factors for each time scale bias correction and modify the bias
corrected daily time series by multiplying these factors and obtain final bias cor-
recting time series.
5. Consider final daily bias corrected series as a raw GCM daily time series and
repeat above steps multiple times to implement the recursive scheme described in
Mehrotra and Sharma (2012).
2.3 Multiple linear regression
In order to verify whether the MRNBC method can significantly increasing the accu-
racy of downscaling results, we employed the multiple linear regression (MLR) meth-
ods with training period of 1960–1990 and the validation period of 1991–2005.The
MLR algorithm attempts to model the relationship between the dimensionally-reduced
atmospheric predictors and the target variable at the downscaled local site by fitting a
linear equation at monthly scale. Subsequently, the MLR model is defined as (Draper
and Smith 1981; Montgomery et al. 2012):
Yi β0 + β1 X 1,i + β2 X 2,i + · · · + β p X p,i + εi (7)
In Eq. (7), Y i denotes the observed precipitation sub-time series, X denotes the indepen-
dent sub-time variable matrices with p × n in size, β denotes the regression coefficient
matrices with (p + 1) × n in size, εi denotes the difference between the observational
and model expected values.
123
In this study, the primary identified atmospheric predictors are the monthly precipi-
tation, mean, maximum, and minimum air temperatures, U and V-wind fields, relative
humidity (RHUM), downward longwave radiation flux (DLRF), upward longwave
radiation flux (ULRF), downward shortwave radiation flux (DSRF), upward short-
wave radiation flux (USRF), latent heat flux (LHF) and sensible heat flux (SHF).
A total of four surface points surrounding the downscaling site has been chosen for
both GCM and the National Center for Environmental Prediction/National Center for
Atmospheric Research (NCEP/NCAR) reanalysis data.
3 Study area
Heihe River is a famous inland river in China, located in the central part of the Hexi
Corridor. In this study, we selected the headwater region (Yingluoxia Watershed) of
the Heihe River as the study area (Fig. 1) which is located in the north slope of
Qilian Mountain covering an area of 10,018 km2 lying between 99° to 101°E and
38° to 39°N. About 90% of the water resources of the Heihe River generate from
Yingluoxia (YLX) Watershed. The water resources from YLX Watershed feeds more
than 1.3 million people in China and supports about 266, 000 ha of irrigated agricultural
land in the midstream and downstream, including its major role in maintaining the
stability of the natural ecosystem (Yang et al. 2017a, b). Considering these, YLX
Watershed is a very important inland area which has attracted much research attention
in China. The climate of the Watershed is characterized by hot and wet conditions in
summer and cold and dry conditions in the winter season. The annual precipitation
data shows a decrease in rainfall from east to the west of the region and an increase
Fig. 1 a, b Location of the YLX Watershed; c mean monthly rainfall and temperature at the Yeniugou
weather station from 1961 to 2013
123
Table 1 Coupled model inter-comparison project phase-5 (CMIP5) model attributes
Model Modeling Spatial Data length

centre resolution
Historical RCP4.5 RCP8.5
ACCESS1-3 CSIRO-BOM 1.875° × 1.25° 1948–2005 2006–2100 2006–2100

(Australia)
from approximately 200–700 mm with an increase in altitude. Detailed descriptions

of the YLX Watershed can be found in previous studies of Cheng et al. (2014).
In this study, we have adopted two different datasets, including the observed histor-
ical weather data and the simulated Global Climate Model (GCM) outputs, in order to
correlate the bias for the YLX Watershed region for the historical period 1961–2005.
The historical weather data including the daily maximum, minimum and mean tem-
perature, relative humidity (%), precipitation (mm), wind speed (m/s), atmospheric
pressure (hPa) and sunlight duration (h) at 4 weather stations in and around the YLX
Watershed were downloaded from the China Meteorological Administration (http://d
ata.cma.cn/) for the period 1961–2005. The simulated historical daily data in the same
period for ACCESS1-3 was acquired from the Coupled Model Intercomparison Project
phase 5 (CMIP5) project (Table 1). The projected daily data (e.g., daily maximum,
minimum and mean temperature, daily relative humidity, wind speed, atmospheric
pressure, etc.) was acquired for the period 1961–2005.
4 Results
4.1 Bias correlation for unequal data lengths
The applicability of the package is demonstrated on three sample datasets, included

with the package. These datasets are expected to cover a variety of options included in
the package. The first dataset considers unequal lengths of time series for calibration
and verification periods. The second dataset considers equal lengths of observed and
GCM data for calibration (current) and verification time periods. The third datasets
considers observed and AR1 model simulated monthly rainfall. Here we applied the
first dataset considers unequal lengths of time series for calibration and verification
periods. It considers synthetically generated daily time series 6 variables look very
similar to typical atmospheric variables used in downscaling and applies MRQNBC
bias correction approach. And using MRQNBC approach to induce observed spatio-
temporal dependence in the rainfall simulations.
The first dataset consists of synthetic (mimicking reanalysis (observed) and raw
GCM) daily time series of 6 (climate variables) with unequal data lengths. 30 years of
daily data (from 1961 to 1990) is used for model calibration where as another subset
of 15 years (from 1991 to 2005) is used for model verification. Please note that the
specification of start and end years are arbitrary and are specified only to account
for leap years in the data. Quantile-matching multivariate bias correction model with
the options of bias correction at daily, monthly, seasonal and annual time scales is
123
Mean
SD
LAG1-
auto
LAG0-
Cross
LAG1-
cross
Bias corrected- Bias corrected-

Raw-calibration Raw-verification
calibration verification
Fig. 2 Scatter plots of daily, monthly, seasonal and annual means, standard deviations and LAG0 and LAG1
auto and cross correlations of reanalysis and raw and bias corrected GCM data for calibration and verification
periods using MRQNBC bias correction approach and dataset 1. Points on the plots denote variables. Mean
and standard deviation (SD) values of all variables are rescaled to lie between − 100 and 100
picked. Three seasons in a year are considered. In addition, LAG0 cross and LAG1
auto dependence options are selected.
Upon successful completion of the bias correction program, package provides four
result files showing the statistics of (1) Observed and Raw data for calibration; (2)
Observed and raw data for verification; (3) Observed and bias corrected data for cali-
bration; and (4) observed and bias corrected data for verification. A few scatter plots
of statistics and distribution plots of time series of raw and bias corrected data for cal-
ibration and verification periods are presented in Figs. 1 and 2, respectively. The bias
correction approach performs well in reproducing the statistics of the reanalysis data
in the GCM simulations at all time scales during calibration period. It also reproduces
well the time distribution of variable at all selected time scales (Fig. 3). Some biases
123
Daily
Monthly
Seasonal
Annual
Bias corrected- Bias corrected- Time

Raw-calibration Raw-verification scale
calibration verification
Fig. 3 Distribution plots of daily, monthly, seasonal and annual time series of reanalysis and raw and bias
corrected GCM data for calibration and verification time periods for a selected variable-1 and dataset 1
in the statistics during verification period are noted. Although, LAG1-cross correla-
tions and SKEW are not modelled explicitly, the bias correction does improve their
representation the corrected time series (Fig. 3).
4.2 Modelling results comparison
In order to verify whether the MRNBC method can significantly increasing the MLR
model’s accuracy, we compared the two results of using and without using the MRNBC
method with training period of 1960–1990 and the validation period of 1991–2005.
Following this, we selected the relatively high spatial resolution ACCESS1-3 from the
GCM model ensembles to test the general applicability. Henceforth, the correlation
coefficient (R), Nash–Sutcliffe efficiency coefficient (NSE), Root Mean Square Error
(RMSE) and Mean Absolute Error (MAE) (Chai and Draxler 2014; Deo et al. 2017;
Nash and Sutcliffe 1970) between modelled and observed results were employed
to access the performance of the downscaling models. Table 2 shows the results of
123
Table 2 Performance of the MRL with MRNBC in terms of the correlation coefficient (R), Nash–Sutcliffe
coefficient (NSE), root mean square error (RMSE) and mean absolute error (MAE) in the validation period
Methods R NSE RMSE MAE
Without MRNBC 0.77 0.50 25.25 17.76

With MRNBC 0.83 0.70 22.02 15.85
1.2
Emperical CDF
0.8
With MRNBC
0.4 Without MRNBC
Observed
0
0 40 80 120 160
Precipitation (mm)
Fig. 4 Empirical cumulative distribution of reproduced precipitation in validation period against with obser-
vation
Table 3 Historical annual

Observation With MRNBC Without MRNBC
precipitation statistics on using
or not MRNBC methods STD (mm) 79.48 61.07 25.77
comparison with Observation
Mean (mm) 525.25 520.45 546.91
the performance criteria derived from ACCESS1-3. The performance of using the
MRNBC method before processing to MRL is better than without using the MRNBC
method. The R values is 0.83, the NSE value is 0.70, the RMSE value is 22.02 and
the MAE value is 15.85 for the modelling result of using the MRNBC method before
processing the MRL. It is clearly visible that the use of MRNBC procedure was able
to increase the accuracy of the MLR model simulations quite significantly.
To illustrate the impact of the bias corrections on each of the GCM-based inputs used
as atmospheric projectors, the downscaled models are compared against the observed
precipitation data for the same period. Figure 4 shows the empirical cumulative distri-
bution of the reproduced precipitation data in the validation period. Significant biases
exist between observed and projected precipitation data using the prescribed MLR
method among the raw GCM model outputs, especially in the extreme high magni-
tude observations. Table 3 shows the Historical annual precipitation statistics on using
or not MRNBC methods comparison with Observation. We can see the historical
downscaling results by using MRNBC representing equivalent average and standard
deviation when comparing with observed precipitation. While, the results without
using the MRNBC method revealed higher in average and lower standard deviation
when comparing to the observations.
123
(a) (b)
150
Simulated (mm)
100
50
y = 0.6929x + 10.491 y = 0.8185x + 6.153
R² = 0.6933 R² = 0.8163
0
0 50 100 150 0 50 100 150
Observed (mm) Observed (mm)
Fig. 5 Scatter plots of the simulated monthly precipitation from the ACCESS1-3 and observation. a Without
using the MRNBC method and b using the MRNBC method
Fig. 6 Annual variation of 600

projected precipitation derived
Precipitation (mm)
from ACCESS1-3 scenario 550
500
450
400
RCP4.5
350 RCP8.5
300
2010 2020 2030 2040 2050 2060 2070 2080 2090 2100
It is worth noting that the expected improvement of the empirical cumulative dis-
tribution of the projected precipitation by using MRNBC method corresponds closer
than without using the MRNBC to that of the observation point. Especially, it is true
that the simulations utilizing the MRNBC method is seen to lead the ACCESS1-3
outputs projection precipitations to follow similar distribution behaviours and also to
exhibit reasonably good fits with the observed precipitation data. Thus, the MRNBC
methods are able to remove the difference between the observed and the raw simu-
lated multiple GCMs; thus improving the predictive performance of the precipitation
projections under different climate change scenarios considered in this paper (Fig. 5).
4.3 Future precipitation projection under ACCESS1-3 scenario
We apply the MRNBC and MLR methods to project future precipitation acquired from
the ACCESS1-3 GCM outputs. Figure 6 shown the variation in future precipitation
under the RCP4.5 and RCP8.5 warming scenarios from 2010 to 2099. For the RCP4.5
scenario, the precipitation exhibits an increasing trend. Similar deduction is also made
for the results of the RCP8.5 scenario. However, compared to RCP4.5, the RCP8.5
scenario shows more obviously variation in precipitation in the future with a lower
precipitation value in the near term (2020–2040) and a higher precipitation value in
the long term (2070–2090). It is found that the precipitation under RCP4.5 is 570 mm
and 630 mm for the near and long term, respectively, and 600 mm and 720 mm
123
Fig. 7 Monthly distribution of 160

projected precipitation by using RCP 4.5 2020-2050
140
the MRNBC and MLR methods 2060-2090
Precipitation (mm)
under RCP4.5 and RCP8.5 120
100
80
60
40
20
0
160
RCP 8.5 2020-2050
140
2060-2090
Precipitation (mm) 120
100
80
60
40
20
0
1 2 3 4 5 6 7 8 9 10 11 12
under RCP8.5 for near and long term, respectively. Comparison with the historical
precipitation, the future precipitation will increasing by 8% and 20% for near and long
term period, respectively, under RCP4.5 and will increasing 14% and 37% for near and
long term period, respectively, under RCP8.5. Figure 7 shows the future evolution of
monthly precipitation under the RCP4.5 and RCP8.5 scenarios. Evidently, the summer
precipitation from June to August is the dominant proportion of precipitation with
the value of 367.42 mm and 387.59 mm for near and long term under RCP4.5 and
373.72 mm and 412.43 mm for the near and long term under RCP8.5, respectively,
which counts for over 60% of total annual value, followed by autumn and spring
precipitation, and the winter precipitation counts less than 10% of annual precipitation.
The seasonal future hydrological evolution under the RCP4.5 and RCP8.5 scenario.
Table 4 shows the estimated slope of the trend generated from the ACCESS1-
3 derived projection precipitations with different periods and warming scenarios.
It is evident that the projected precipitations results present the increasing trend of
0.61 mm/a for the near term and significant increasing with the 1.30 mm/a for long
term under the RCP4.5 scenario. For the RCP8.5 scenario, the near term variation in
precipitation showed a significant increasing with the rate of 1.29 mm/a, and the long
term variation showed significant increasing with the trend of 3.23 mm/a. Figure 8
shown the monthly variation of precipitation under the two scenarios and periods. We
can see the different variations of the precipitation in the future. Under the RCP4.5
scenario, the changes during the two periods were agreement to each other with obvi-
ously increasing in the September and decreasing in the April. During the RCP8.5, the
apparently rising month of precipitation is also the September, while the decreasing
month is not the same in 2020–2050 and 2060–2090. In the long term, the obvious
decreasing month are April and August and in the near term is May.
123
Table 4 Change trends for future precipitation in the near term (2020–2050) and long term (2060–2090)
period under RCP4.5 and RCP8.5 scenario (mm/a)
Periods RCP4.5 RCP8.5
Slope P Sig. Slope P Sig.
2020–2050 0.61 0.92 1.29 2.75 **

2060–2090 1.30 2.31 * 3.23 4.83 **
*Denotes the significance at 0.05; **denotes the significance at 0.01
Fig. 8 Monthly variation of 1

projected precipitation under the RCP 4.5 2020-2050
0.8
RCP4.5 and RCP8.5 during Change rate (mm/a) 2050-2090
different periods of time 0.6
0.4
0.2
0
1 2 3 4 5 6 7 8 9 10 11 12
-0.2
-0.4
-0.6
1
RCP 8.5 2020-2050
0.8
2060-2090
Change rate (mm/a)
0.6
0.4
0.2
0
1 2 3 4 5 6 7 8 9 10 11 12
-0.2
-0.4
-0.6
Although this study improved the downscaling performance of MLR results and
detected the future variation of precipitation in the headwater region of Heihe River,
Future research should emphasize on using different models (e.g. Extreme Learning
Machine, Artificial Neural Network) to confirm the combination methods are still
working. We could also use a Bayesian model averaging method or ensemble of
bootstrapped models where confidence intervals are obtained to reduce the uncertainty
in down scaled results. We can also apply the proposed methods to the entire Heihe
River for water resource future management and decisions.
5 Conclusion
Majority of existing bias correction approaches focus on a single variable and consider
corrections only over a single time scale of interest, for example daily or monthly.
Open-source software in R statistical computing environment has been developed
to provide hassle free access to the multivariate bias correction and multi-timescale
123
alternatives. The software includes the option of running multivariate recursive NBC
and two multivariate and timescale nested distribution function based approaches.
This study has documented the performance of Multivariate Recursive Nesting Bias
Correction (MRNBC) to remove the discrepancy between the predictors in the simu-
lated GCM and the reanalysis NCEP data and assess the projected future precipitation
accuracy in the headwater region of Heihe River. A relatively high spatial resolution
GCM outputs—ACCESS1-3 from the CMIP5 ESMs was employed to downscale for
the historical 1960–2005 and the future period (2010–2100) under the RCP4.5 and
RCP8.4 scenarios. The following conclusions can be drawn:
The combination of MRNBC method can dramatically increase the performance
of the simulated precipitation data. Verified by statistical score metrics applied for
evaluation of the results, the developed method appears to be an important statistical
tool in the correction of the bias between the GCM output and the reanalysis data,
leading to significant improvements in the predictive performance accuracy of the pre-
cipitation projections. The projected precipitation under RCP8.5 appeared to exhibit
the significant increasing trend relative to the RCP4.5 scenario. Compared to the his-
torical period, the increase in projected precipitation was dramatically larger under
the RCP8.5 than that RCP4.5 scenario in the headwater region of Heihe River, with
rate of 14% versus 8% for near term and 20% versus 37% for long term.
Acknowledgement This study was supported by the national social sciences foundation: water resources
assessment and management system research based on water account in the typical desert oasis of the
Silk Road economic belt (17CGL032), Gansu Provincial key research and development foundation: water
resource assessment and model demonstration in the typical desert oasis of the Silk Road economic belt
(17YF1FA134). The authors also would like to thank the editors and anonymous reviewers for their detailed
and constructive comments, which helped to significantly improve the manuscript. The reanalysis data is
obtained from the National Center for Environmental Prediction (NCEP) reanalysis provided by the NOAA-
CIRES Climate Diagnostics Centre, Boulder, Colorado, USA, from their web site at http://www.cdc.noaa.
gov/.
References
Arnell NW, Reynard NS (1996) The effects of climate change due to global warming on river flows in Great
Britain. J Hydrol 183(3–4):397–424
Cayan DR, Maurer EP, Dettinger MD, Tyree M, Hayhoe K (2008) Climate change scenarios for the Cali-
fornia region. Clim Change 87(Suppl. 1):21–42. https://doi.org/10.1007/s10584-007-9377-6
Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)? Arguments
against avoiding RMSE in the literature. Geosci Model Dev 7:1247–1250
Chen X, Wang D, Chopra M (2013) Constructing comprehensive datasets for understanding human and
climate change impacts on hydrologic cycle. Irrig Drain Syst Eng 2:106. https://doi.org/10.4172/216
8-9768.1000106
Cheng G et al (2014) Integrated study of the water–ecosystem–economy in the Heihe River Basin. Natl Sci
Rev 1:413–428
Chiew FHS, McMahon TA (2002) Modelling the impacts of climate change on Australian streamflow.
Hydrol Process 16(6):1235–1245
Colette A, Vautard R, Vrac M (2012) Regional climate downscaling with prior statistical correction of the
global climate forcing. Geophys Res Lett. https://doi.org/10.1029/2012GL052258
Deo RC, Kisi O, Singh VP (2017) Drought forecasting in eastern Australia using multivariate adaptive
regression spline, least square support vector machine and M5Tree model. Atmos Res 184:149–175
Draper N, Smith H (1981) Applied regression analysis. Wiley, New York, p 709
123
Ines AVM, Hansen JW (2006) Bias correction of daily GCM rainfall for crop simulation studies. Agric For
Meteorol 138:44–53
Maraun D (2012) Nonstationarities of regional climate model biases in European seasonal mean temperature
and precipitation sums. Geophys Res Lett 39:L06706
Mehrotra R, Sharma A (2015) Correcting for systematic biases in multiple raw GCM variables across a
range of timescales. J Hydrol 520:214–223
Mehrotra R, Johnson F, Sharma A (2018) A software toolkit for correcting systematic biases in climate
model simulations. Environ Model Softw 104:130–152
Montgomery DC, Peck EA, Vining GG (2012) Introduction to linear regression analysis. Wiley, London
Nash J, Sutcliffe J (1970) River flow forecasting through conceptual models part I: a discussion of principles.
J Hydrol 10:282–290
Piani C, Haerter JO, Coppola E (2010) Statistical bias correction for daily precipitation in regional climate
models over Europe. Theor Appl Climatol 99:187–192. https://doi.org/10.1007/s00704-009-0134-9
Salas JD (1980) Applied modeling of hydrologic time series. Water Resources Publication, Littleton
Salas JD, Tabios GQ, Bartolini P (1985) Approaches to multivariate modeling of water resources time
series. J Am Water Resour Assoc 21(4):683–708
Sarhadi A, Burn DH, Johnson F, Mehrotra R, Sharma A (2016) Water resources climate change projections
using supervised nonlinear and multivariate soft computing techniques. J Hydrol 536:119–132
Srikanthan R, Pegram GGS (2009) A nested multisite daily rainfall stochastic generation model. J Hydrol
371(1–4):142–153
Wilby RL, Charles SP, Zorita E, Timbal B, Whetton P, Mearns LO (2004) Guidelines for use of climate
scenarios developed from statistical downscaling methods. In: IPCC task group on scenarios for climate
and impact assessement, Geneva, Switzerland
Yang L et al (2017a) Separation of the climatic and land cover impacts on the flow regime changes in two
watersheds of northeastern Tibetan Plateau. Adv Meteorol 2017:15
Yang L et al (2017b) Identifying separate impacts of climate and land use/cover change on hydrological
processes in upper stream of Heihe River, northwest China. Hydrol Process 31:1100–1112
Qiantao Zhu is an assistant professor at the Northwest Institute of Eco-Environment and Resources
(NIEER), CAS, China. His research topics include eco-economy and water resources.
Wenzhi Zhao is a professor at the NIEER, China. His research topics include watershed hydrology, water-
shed land and water resources, watershed restoration ecology.
Aﬃliations
Qiantao Zhu1,2 · Wenzhi Zhao1,2
B Qiantao Zhu
zhuqiantao@lzb.ac.cn
Wenzhi Zhao
zhaowzh@lzb.ac.cn
1 Northwest Institute of Eco-Environment and Resources, CAS, Donggang West Road
320, Lanzhou 730000, Gansu, China
2 University of the Chinese Academy of Sciences, Beijing 100864, China
123

Correcting Climate Model Simulations in Heihe River Using The Multivariate Bias Correction Package

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Correcting Climate Model Simulations in Heihe River Using The Multivariate Bias Correction Package

Caricato da

Copyright:

Formati disponibili

Environmental and Ecological Statistics

Correcting climate model simulations in Heihe River using

Qiantao Zhu1,2 · Wenzhi Zhao1,2

Received: 14 December 2017 / Revised: 1 August 2018

Keywords Bias correction · Distribution based quantile matching · Heihe River ·

Handling Editor: Pierre Dutilleul.

Extended author information available on the last page of the article

In multivariate modelling of Mehrotra and Sharma (2015), two common AR models

2 Multivariate bias correction (MBC)

2.1 MBC application

Giving a vector m predictor variables with I time steps Z (m × t matrix) at a local

2.2 Stepwise procedure followed in the MBC

2.2.1 MRNBC approach

2.2.2 MRQNBC approach

1. Fit empirical Cumulative Distribution Functions (CDFs) to the observed as well as

2.2.3 Correcting for auto and cross dependence

2.3 Multiple linear regression

Yi β0 + β1 X 1,i + β2 X 2,i + · · · + β p X p,i + εi (7)

Table 1 Coupled model inter-comparison project phase-5 (CMIP5) model attributes

Model Modeling Spatial Data length

ACCESS1-3 CSIRO-BOM 1.875° × 1.25° 1948–2005 2006–2100 2006–2100

from approximately 200–700 mm with an increase in altitude. Detailed descriptions

4.1 Bias correlation for unequal data lengths

The applicability of the package is demonstrated on three sample datasets, included

Bias corrected- Bias corrected-

Bias corrected- Bias corrected- Time

4.2 Modelling results comparison

Methods R NSE RMSE MAE

Without MRNBC 0.77 0.50 25.25 17.76

Table 3 Historical annual

Fig. 6 Annual variation of 600

from ACCESS1-3 scenario 550

4.3 Future precipitation projection under ACCESS1-3 scenario

Fig. 7 Monthly distribution of 160

Periods RCP4.5 RCP8.5

Slope P Sig. Slope P Sig.

2020–2050 0.61 0.92 1.29 2.75 **

Fig. 8 Monthly variation of 1

Qiantao Zhu1,2 · Wenzhi Zhao1,2

Potrebbero piacerti anche