Taiwanese Export Trade Forecasting Using Firefly Algorithm Based

Computers & Industrial Engineering 99 (2016) 153161
Contents lists available at ScienceDirect
Computers & Industrial Engineering

journal homepage: www.elsevier.com/locate/caie
Taiwanese export trade forecasting using firefly algorithm based

K-means algorithm and SVR with wavelet transform
R.J. Kuo a,, P.S. Li b
a
Department of Industrial Management, National Taiwan University of Science and Technology, No. 43, Section 4, Kee-Lung Road, Taipei 106, Taiwan
b
AU Optronics Corporation, No. 1 Jhong-Ke Road, Central Taiwan Science Park, Taichung 407, Taiwan
a r t i c l e i n f o a b s t r a c t
Article history: In order to develop a prediction system for export trade value, this study proposes a three-stage forecast-
Received 8 March 2016 ing model which integrates wavelet transform, firefly algorithm-based K-means algorithms and firefly
Received in revised form 9 July 2016 algorithm-based support vector regression (SVR). First, wavelet transform is utilized to reduce the noise
Accepted 12 July 2016
in data preprocessing. Then, the firefly algorithm-based K-means algorithm is employed for cluster anal-
Available online 14 July 2016
ysis. Finally, a forecasting model is built for each cluster individually. For evaluation, this study compares
methods with and without clustering. In addition, both non-wavelet transform and wavelet transform for
Keywords:
data preprocessing are investigated. The experimental results indicate that the forecasting algorithm
Cluster analysis
Firefly algorithm
with both wavelet transform and clustering has better performance. Besides, firefly algorithm-based
Support vector regression SVR outperforms the other algorithms.
Forecasting 2016 Elsevier Ltd. All rights reserved.
Wavelet transform
1. Introduction SVR usually derives more precise results than the other methods.
However, determining the SVR parameters is very important for
Because of Taiwans relatively small domestic market and the establishing the SVR model. Some metaheuristics have been
nations lack of resources, Taiwans economic development counts employed to determine SVR parameters, like genetic algorithm
heavily on international trade. Thus, being able to make forecasts (GA) (Cai, Sheng, & Xiao-bin, 2009; Chen & Wang, 2007; Fang,
about export trade is a very important issue. In recent years, Wang, Qi, & Zheng, 2008; Ju & Wu, 2010), particle swarm optimiza-
because of the innovation of forecasting techniques and improve- tion algorithm (PSO) (Chen & Liu, 2013; Jiansheng & Enhong, 2010;
ment in forecasting accuracy, forecasting methodology is neces- Siamak, Bahrami Jovein, & Ramezanianpour, 2012), differential
sary for enhancing the decision-making process in both industry algorithm (DE) (Cai, Qu, & Li, 2013; Li & Cai, 2008; Pan, Cheng, &
and government. However, according to previous researches on Ding, 2013; Wang, Li, Niu, & Tan, 2012) and firefly algorithm (FA)
export trade forecasting, most forecasting methods usually con- (Kavousi-Fard, Samet, & Marzbani, 2014; Kazem, Sharifi, Hussain,
struct a traditional model or fuzzy time series model as a tool for Saberi, & Hussain, 2013; Xiong, Bao, & Hu, 2014). In addition, a
forecasting future value and data analysis related to export trade two-stage forecasting model was developed based on SVM and
(Tu, Hsien-Lun, & Chi-Chen, 2009; Wang, 2011; Wong, Tu, & metaheuristic which brings a good predictive value than other
Wang, 2010). Therefore, other intelligent methods may be machine learning model (Olatomiwa et al., 2015).
employed in order to obtain greater accuracy in forecasting. This study intends to present a novel three-stage forecasting
Currently, the support vector regression (SVR) model is being model which integrates wavelet transform, FA-based K-means
widely used in many fields to analyze time series forecasting, for algorithms and the FA-based SVR model. Wavelet transform is first
example tourism demand (Chen, 2011; Pai, Hung, & Lin, 2014; applied to reduce the noise in data preprocessing. The FA-based K-
Shahrabi, Hadavandi, & Asadi, 2013), traffic flow (Castro-Neto, means algorithm is then utilized for cluster analysis. Finally, the
Jeong, Jeong, & Han, 2009; Hong, Dong, Chen, & Wei, 2011; Hong, FA- based SVR model is used to develop the forecasting model
Dong, Zheng, & Lai, 2011; Li, Hong, & Kang, 2013) and demand fore- for each cluster individually.
casting (Guanghui, 2012; Lu & Wang, 2010; Wu, 2010). Besides, The remainder of this study is organized as follows. Section 2
presents the SVR model, while the proposed three-stage forecast-
ing model is proposed in Section 3. Section 4 shows the experimen-
Corresponding author. tal results for export forecasting. Finally, the concluding remakes
E-mail addresses: rjkuo@mail.ntust.edu.tw (R.J. Kuo), karenli127@gmail.com are offered in Section 5.
(P.S. Li).
http://dx.doi.org/10.1016/j.cie.2016.07.012
0360-8352/ 2016 Elsevier Ltd. All rights reserved.
154 R.J. Kuo, P.S. Li / Computers & Industrial Engineering 99 (2016) 153161
2. Support vector regression where e is a precision parameter representing the radius of the tube
located around the regression function f x. The region enclosed by
Support vector machine (SVM) is an artificial intelligence the tube is called the e-sensitive zone since the loss function
method which was first developed by Vapnik (1995). SVM is based assumes a zero value in this region and the prediction errors with
on structural risk minimization (SRM) principle which aims to a value smaller than e are not penalized (Lu & Wang, 2010).
minimize an upper bound of generation error; it consists of the Furthermore, w and b are estimated by minimizing the follow-
sum of training error and a confidence interval (Guo, Sun, Li, & ing optimization problem:
Wang, 2008). In Vapnik (1999, 1995) promoted the SVM method 1
to support vector regression (SVR) by using a new type of loss func- Minimize kwk2 3
2
tion called e-insensitive loss function which is used to penalize
yi w; uxi b 6 e
errors as long as they are greater than e; it is assumed that e is Subject to
known beforehand (Guo et al., 2008). SVR is a non-linear kernel-
w; uxi b yi 6 e
based regression method which seeks to locate the best regression In addition, to deal with feasibility issues and to make the
hyperplane with the smallest risk in high dimensional feature method stronger, points forming the e-sensitive band are not elim-
space (Yeh, Huang, & Lee, 2011). inated. Instead, we penalize these points by introducing slack vari-
The SVM model consists of both classification and regression ables ni and ni (Kazem et al., 2013). The SVR minimizes the overall
tasks, it has many advantages, including: a global optimal solution errors as follows:
can be found, the result is a general solution avoiding overtraining,
1 X
and nonlinear solutions can be calculated efficiently due to the Minimize kwk2 C ni ni 4
usage of inner products (Thissen, van Brakel, de Weijer, Melssen, 2 i1
8
& Buydens, 2003). The SVM technique has been used in a range > y
< i w; uxi b 6 e ni
of applications, including financial stock market prediction (Ding,
Subjected to w; uxi b yi 6 e ni
Song, & Zen, 2008; Huang, Nakamori, & Wang, 2005; Kim, 2003; >
:
Tay & Cao, 2001, 2002), electric load and price (Che & Wang, ni; ni P 0 for i 1; . . . ; l
2010; Pai & Hong, 2005; Yan & Chowdhury, 2014) and sales fore- The first term of Eq. (4), having the concept of maximizing the
casting (Wu, 2009). distance of two separated training data, is used to regularize
It is assumed that a set of training data which consists of data weight sizes, penalize large weights and maintain regression func-
points fx1 ; y1 ; . . . ; x ; y , i = 1, 2 , . . . , and xi denotes the space of tion flatness. The second term is to penalize training errors of f x
the input pattern, and yi is the corresponding target value. The and y by using the e-sensitive loss function (Hong, Dong, Chen,
input is first mapped onto an n-dimension feature space by ux, et al., 2011; Hong, Dong, Zheng, et al., 2011). C is a modifying coef-
which is a kernel function, to transform the non-linear input into ficient representing the trade-off between empirical risk and struc-
a linear mode in a high dimensional feature space (Lu, 2014). The ture risk. The optimum value of each parameter can be solved by
goal of SVR is to estimate a function f x that is as accurate as pos- Lagrange with an appropriate modifying coefficient C, band area
sible to get the target value yi for every xi (Guo et al., 2008). The width e and kernel function K (Kao, Chiu, Lu, & Chang, 2013). The
function f is represented using a linear function, as in Eq. (1): general form of the SVR-based regression function can be written
f x w ux b; 1 as Eq. (5):
where f x denotes the forecasting value, w is a vector of weight X

f x; w ai ai Kxi ; x b; 5
coefficients, b is a bias constant, ux is a nonlinear mapping from i1
the input space to the feature space and w ux describes the
dot production in the feature space. In SVR, the problem of nonlin- where ai ; ai are nonzero Lagrangian multipliers and the solution
ear regression in the lower dimension input space (x) is trans- for the dual problem. Any function that meets Mercers condition
formed into a linear regression problem in a high dimension can be used as the kernel function. Although several options for
feature space (Fig. 1(a) and (b)) (Lu & Wang, 2010). the kernel function are available, the most widely used kernel
The robust e-sensitive loss function (Le ) (Fig. 1(c)) given below function is the radial basis function (RBF) which is defined as
is the most commonly used equation: follows:
!
jf x yj e if jf x yj P e kxi xj k2
Le f x; y 2 Kxi ; xj exp ; 6
0 otherwise 2r 2
Fig. 1. Transformation process illustration of an SVR model.

R.J. Kuo, P.S. Li / Computers & Industrial Engineering 99 (2016) 153161 155
where r denotes the width of the RBF. SVR has the best perfor- dilation factor, k is an integer that determines the translation, s0 is a
mance in most forecasting programs as the value of r is set specified fixed dilation step greater than 1 and s0 denotes the loca-
between 0.1 and 0.5 (Cherkassky & Ma, 2004). tion parameter. The wavelet coefficient W w j; k for DWT is defined
as follows:
3. The proposed three-stage forecasting model
1X N1
t
W w j; k x t w k ; 8
2j
j
The framework of the proposed three-stage forecasting model is 22 t0
illustrated in Fig. 2. Firstly, the raw data need to be preprocessed in
where the wavelet coefficient is evaluated at scale s 2 j and loca-
order to delete the missing value or detect the noise by implement-
ing wavelet transform and performing normalization. Secondly, tion c 2 j k to explain the variation of signals at different scales and
FA-based K-means algorithms are built to cluster the data with locations (Partal & Kk, 2006).
similar features; therefore, the best one will be chosen. For the DWT operates two sets of function viewed as high-pass and
next step, the best algorithm is applied to cluster the export trade low-pass filters for wavelet decomposition. The original time series
datasets. Furthermore, the data for each cluster are employed to are passed through high-pass and low-pass filters and separated at
construct the forecasting models, including: GA-SVR, PSO-SVR, different scales. In the decomposition phase, the low-pass filter
FA-SVR and DE-SVR. Finally, this study uses MSE as the perfor- removes the higher frequency components of the signal, while
mance indicator. the high-pass filter picks the remaining parts. Then, the filtered sig-
nals are down-sampled by two and the results are called approxi-
mation coefficients (cA) and detail coefficients (cD). At level 1, the
3.1. Data preprocessing using wavelet transform
cA1 contains the general trend or low frequency components of the
signal f t, and the cD1 is associated with the high frequency com-
Data preprocessing plays a crucial role in data mining. It can
transform the data into a specified format and delete noise or out- ponents of the signal f t (Kao et al., 2013).
liers; the data we get can thereby be more meaningful for analysis. Given a signal X of length N, the DWT consists of log 2 N stages
This study employs wavelet transform to transform the time series at most. The length of each filter is equal to 2N. If n = length(s), the
signals F and G are of length n + 2N 1 and the coefficients cA1 and
data first; data normalization is then implemented.
Wavelet analysis indicates a time-scale representation of the cD1 are of length floor n1 2
N.
signal; it can provide information about both the time and fre-
quency domains (Tan & Pedersen, 2009). At different resolution 3.2. Firefly algorithm-based K-means clustering (FAK)
levels, time series data can be decomposed into some components
by using the wavelet function, which is called the mother wavelet The Firefly Algorithm (FA) is a recent technique of natural inspi-
(Tiwari & Chatterjee, 2010). In addition, wavelet transform can ration that has been utilized for solving nonlinear optimization
detect many properties of a time series that may not be revealed problems. In the FA, only the distance is necessary for the move-
by other signal analysis techniques, such as trends, change points, ment (Senthilnath, Omkar, & Mani, 2011). In Senthilnaths study
self-similarity and discontinuities (Nalley, Adamowski, & Khalil, (2011), the result showed that FA-based K-means algorithm
2012). This step is very important at the beginning of implement- enhanced the performance with PSO-based K-means and artificial
ing forecasting system, since selecting proper input parameters is bee colony-based K-means clustering. The detailed procedure of
the key issue to influence the forecasting accuracy (Mojumder, FA-based K-means algorithm is as follows:
Ong, Chong, Shamshirband, & Abdullah Al, 2016). In the other
way, different feature selection techniques for choosing more sig- Step 1: Parameters setup
nificant data will also reach more accurate prediction easily (Hu,
Bao, Chiong, & Xiong, 2015; Chang, Wu, & Lin, 2016). Furthermore, The parameters include the size of population (N), light absorp-
discrete wavelet transform (DWT) can detect and decompose sig- tion coefficient (c), attractiveness value (b0 ), trade-off constant (a)
nals on all scales, and requires less computation time (Partal & and k clusters.
Kk, 2006). The discrete wavelet transform is as follows (Qi &
Yan, 2009): Step 2: Initialization
!
1 t ks0 s0j Generate N fireflies (X id ) as the clusters centroids, which can be
wj;k t q w ; 7
s0j s0j represented as:
X id zi1 ; . . . ; zij ; . . . zik ; 9

where w is the transforming function called the mother wavelet, t is
the time and s is the translation factor of the wavelet over the time where zij denotes the jth clusters centroids for the ith chromosome
series, s denotes the scale factor, j is an integer that determines the (1 5 i 5 N) and (1 5 j 5 k).
Forecasting
Cluster 1 (FA-SVR)
Wavelet Cluster
Database Normalization analysis
transform Forecasting
(FAK) Cluster 2 (FA-SVR)
Cluster n Forecasting
(FA-SVR)
Fig. 2. The framework of the proposed forecasting model.

Step 3: Fitness calculation other soft computing techniques, FA-based machine learning shows
high estimated accuracy (Soltani, Moghaddam, Karim,
Calculate the fitness value for each particle by using Euclidean Shamshirband, & Sudheer, 2015). The FA is used to determine the
distance. best parameter set within the user-defined number of iterations
for training an optimal SVR forecasting model.
Step 4: Attractiveness FA-SVR is a hybrid method with an integrated firefly algorithm
and support vector regression. FA is able to assist machine learning
In the firefly algorithm, it is assumed that the attractiveness of a model in selection process effectively and can enhance the fore-
firefly is determined by its brightness, which is associated with the casting capability more accurately (Hoang & Pham, 2016). Further-
objective function of the optimization problem. The attractiveness more, FA is a simple, powerful global search technique so that
of a firefly can be formulated as follows: researchers can have promising setting parameters for SVR. The
cr 2ij
detailed FA-SVR procedure is as follows.
b b0 e ; 10
Step 1: Parameters initialization
where b0 is the attractiveness value at r = 0, c is the light absorption
coefficient at the source; and r ij is the Euclidean distance between xi
Initialize the parameters (N, c, b0 , a). The parameters are the
and xj :
size of the population of FA algorithm (N), light absorption coeffi-
v
u d cient (c), attractiveness value (b0 ), trade-off constant (a) and the
uX
r ij kxi xj k t xi;k xj;k 2 ; 11 number of decision variables (D). D is set as 2 due to C and r.
k1
Step 2: Population initialization
Step 5: Firefly movement
Create an N D matrix with uniform probability distribution
The movement of a firefly i, which is attracted to another more random values. The generation method is defined as:
attractive firefly j, is determined by:
xi;j rand highj lowj lowj; 14
cr 2ij 1
xnew
i xold
i be xold
j i
xold a rand ; 12 where i = 1, 2, 3, . . . , N, rand is a random number with a uniform
2
probability distribution; high[j] and low[j] are the upper and lower
where a determines the random behavior of movement. bounds of jth column, respectively. The chromosome x is repre-
sented as x = p1 and p2, where p1 and p2 are C and r, respectively.
Step 6: Euclidean distance calculation
Step 3: Training phase
Calculate the Euclidean distance of all the x data vectors to all
clusters centroids for each firefly. In K-fold cross validation, the training data set is divided into K
subsets. The regression function is built with a given set of param-
Step 7: Assign new cluster eters (C, r), using the K-1 subset as the training set. The fitness
function is defined as the MSE of the 5-fold cross validation
Assign each data vector x to the closest cluster centroid. method on the training dataset, as in Eqs. (15) and (16):
Min f MSEcross validation ; 15
Step 8: New centroids Pl
jai pi j2
MSEcross validation i1 ; 16
Recalculate the cluster centroids vector for each firefly by using: l
1X
Cennew
d xid ; 13 Table 1
nd Average MSE result for B1 dataset.
where Cennew
d is the new centroid vectors of cluster dth, and nd is the Methods
Original Wavelet Clustering
Wavelet-cl
size of cluster d, xid is ith data in cluster d. Models ustering
training 0.000360 0.000270 0.000293 0.000247
GA-SVR
Step 9: Check if the number of iterations is satisfied testing 0.000568 0.000388 0.000485 0.000289
training 0.000355 0.000265 0.000274 0.000248
Stop if the number of iterations is satisfied; otherwise, go back PSO-SVR
testing 0.000515 0.000364 0.000478 0.000293
to Step 3.
training 0.000349 0.000251 0.000273 0.000247
FA-SVR
testing 0.000462 0.000340 0.000424 0.000224
3.3. Forecasting stage
training 0.000468 0.000266 0.000289 0.000246
DE-SVR
This study uses the radial basis function (RBF) for the kernel testing 0.000655 0.000394 0.000449 0.000216
function of the FA- based SVR model. The kernel function is used training 0.000392 0.000294 0.000302 0.000292
to construct a nonlinear decision hyper-surface on the SVR input ARIMA
testing 0.000588 0.000448 0.000475 0.000355
space. It is popular as the accuracy of SVR depends on the choice of
training 0.002444 0.002356 0.000635 0.000572
parameters. The three parameters are regulation constant C, loss BPN
testing 0.002505 0.002153 0.000802 0.000663
function e, and r which is the parameter for the width of RBF. These
parameters are important in regard to the accuracy of SVR forecast- training 0.004415 0.001419 0.002733 0.000609
Regression
ing; this study fixes the e and combines with the metaheuristics testing 0.004509 0.004213 0.002273 0.001890
algorithms for tuning a better parameter set, C and r, to generate Dark gray color means that it is the best method for training data.
the minimum forecasting mean square error (MSE). Comparing with Light gray color means that it is the best method for testing data.
where l is the number of training data samples; ai is the actual value cr 2ij
b b0 e ; 17
and pi is the predicted value.
where b0 is the attractiveness value at r = 0, c is the light absorption
coefficient at the source; and rij is the Euclidean distance between xi
Step 4: Fitness evaluation
and xj :
v
u d
Input the generated populations. Calculate and record the fit- uX
ness values of all individuals by Eqs. (15) and (16). The solution
rij kxi xj k t xi;k xj;k 2 ; 18
k1
with a smaller MSE of the training dataset has a smaller fitness
value; therefore, it has a better chance to survive in the next Step 6: Firefly movement
generation.
The movement of a firefly i, which is attracted to another more
Step 5: Attractiveness attractive firefly j, is determined by:
In the firefly algorithm, it is assumed that the attractiveness of a

firefly is determined by its brightness, which is associated with the
objective function of the optimization problem. The attractiveness Table 4
Average MSE result for European dataset.
of a firefly can be formulated as follows:
Methods Wavelet-cl
Table 2 Models ustering
Average MSE result for B2 dataset.
training 0.006342 0.005961 0.003395 0.002245
Methods Wavelet-cl GA-SVR
Original Wavelet Clustering testing 0.033683 0.011631 0.031525 0.011596
Models ustering training 0.006166 0.006242 0.003365 0.002229
training 0.002107 0.001964 0.001288 0.000862 PSO-SVR
GA-SVR testing 0.030412 0.011521 0.028697 0.010904
testing 0.001913 0.001751 0.001456 0.000809 training 0.006075 0.005978 0.003271 0.002240
training 0.002124 0.001966 0.001247 0.000840 FA-SVR
PSO-SVR testing 0.030150 0.011575 0.028536 0.011457
training 0.002057 0.001886 0.001275 0.000849 DE-SVR
FA-SVR testing 0.030172 0.011820 0.028550 0.011605
training 0.002115 0.001950 0.001471 0.001123 ARIMA
DE-SVR testing 0.037869 0.015136 0.034170 0.012203
training 0.002547 0.002089 0.002287 0.001794 BPN
ARIMA testing 0.038259 0.012917 0.033026 0.015436
training 0.002495 0.002981 0.002257 0.002088 Regression
BPN testing 0.038403 0.012217 0.035361 0.015971
testing 0.003315 0.002572 0.002628 0.001948 Dark gray color means that it is the best method for training data.
training 0.004311 0.002233 0.002920 0.002456 Light gray color means that it is the best method for testing data.
Regression
testing 0.004123 0.002029 0.002559 0.001891
Dark gray color means that it is the best method for training data.
Light gray color means that it is the best method for testing data.
Table 5
Average MSE result for North American dataset.
Table 3
Average MSE result for B3 dataset. Methods
Wavelet-cl
Methods Wavelet-cl ustering
Original Wavelet Clustering Models
Models ustering
training 0.006319 0.006690 0.004601 0.002198
training 0.002290 0.002282 0.000360 0.000150 GA-SVR
GA-SVR testing 0.042741 0.013147 0.025964 0.01295
testing 0.001674 0.001584 0.001240 0.000388
training 0.006522 0.006860 0.004777 0.002162
training 0.002267 0.002233 0.000326 0.000139 PSO-SVR
PSO-SVR testing 0.042350 0.012923 0.025221 0.012554
testing 0.001680 0.001609 0.001136 0.000382
training 0.006124 0.006715 0.004458 0.002158
training 0.002162 0.002091 0.000352 0.000120 FA-SVR
FA-SVR testing 0.042226 0.012992 0.025052 0.01254
testing 0.001642 0.001555 0.001118 0.000355
training 0.002362 0.002073 0.000417 0.000143 training 0.006123 0.006610 0.004479 0.002538
DE-SVR DE-SVR
testing 0.001696 0.001539 0.001132 0.000338 testing 0.042619 0.012921 0.025063 0.01258
ARIMA ARIMA
testing 0.001944 0.001748 0.001441 0.000435 testing 0.044257 0.015092 0.027363 0.017894
BPN BPN
testing 0.002727 0.002552 0.002185 0.000510 testing 0.043799 0.013303 0.026387 0.015306
Regression Regression
testing 0.002445 0.001761 0.002573 0.000563 testing 0.045611 0.016093 0.028969 0.018115
Dark gray color means that it is the best method for training data. Dark gray color means that it is the best method for training data.
Light gray color means that it is the best method for testing data. Light gray color means that it is the best method for testing data.

cr 2ij 1 Computation for Time Series Prediction Minisite website. The first
xnew
i xold
i be x old
j x old
i a rand ; 19
2 one is the exchange rates between US Dollars and UK Pounds (B1),
which is a time series dataset; it shows the mean value of the US
Step 7: Stop criteria
Dollar per British Pound rate, taken each month from January
1981 till July 2005. The second one is the exchange rates between
Check if the specific number of iteration is satisfied to stop; US Dollars and Euros (B2); it shows the mean value of the US Dol-
otherwise, go back to Step 3. lars per Euro rate, taken each month from January 1979 till July
2005. The third one is the Consumer Price Index of Spain (B3); it
Step 8: Testing phase shows the Consumer Price Index (CPI) per month in Spain from
January 1960 till June 2005. The data are managed to use the
The best-performing parameter set is applied to testing model. records of the first five months to predict the value of the sixth
Compute the predicted value for each testing sample according to month.
the trained metaheuristics-based SVR with the best parameter set Three datasets are based on the export trade from Taiwan to
obtained from the training phase in the same cluster. North America (US and Canada), Europe (France, German, Italy
and the UK) and China. Each dataset has monthly records from
4. Experimental results January 2000 to November 2014. Three export trade datasets are
from the Taiwan National Statistics website.
4.1. Data collection
4.2. Computational results
There are three time series benchmark datasets provided by
the Spanish Central Bank from the Evolutionary and Neural K-fold cross-validation (K = 5) is applied to confirm the statisti-
cal independent random process, and every experiment is imple-
Table 6 mented for 30 runs. Tables 16 show the average MSE values of
Average MSE result for Chinese dataset.
training and testing for methods including: original, wavelet, clus-
Methods Wavelet-cl tering and wavelet-clustering process by using FA-SVR. Besides,
Models ustering GA-SVR, PSO-SVR, DE-SVR, ARIMA, BPN and Regression are also
training 0.005013 0.005041 0.001328 0.001166 conducted for comparison purpose. Figs. 38 are the MSE values
GA-SVR of B1, B2, B3, European dataset, North American dataset and Chi-
testing 0.008086 0.006391 0.005277 0.003416
nese dataset, respectively. Besides, the MSE values are lower after
training 0.004973 0.005331 0.001358 0.001003
PSO-SVR the process of clustering and wavelet transform. Furthermore, for
testing 0.008050 0.006372 0.005210 0.003394 B1 dataset, FA-SVR has better results than the other models under
training 0.004697 0.004956 0.001283 0.001096 original, wavelet transform and the clustering processes. However,
FA-SVR
testing 0.007529 0.006363 0.005121 0.003299 ARIMA has a better result than DE-SVR under the original process.
training 0.004690 0.004975 0.001284 0.001283 DE-SVR performs better under wavelet transform with clustering.
DE-SVR For B2 dataset, FA-SVR performs better under the original, wavelet
testing 0.007622 0.006395 0.005149 0.003320
transform and clustering processes. PSO-SVR performs better
training 0.008606 0.007695 0.001849 0.001497 under wavelet transform with clustering. For the B3 dataset, FA-
ARIMA
testing 0.009980 0.008565 0.005880 0.003557 SVR has better results under the original and clustering processes.
training 0.009188 0.006778 0.002049 0.001555 PSO-SVR has better results under wavelet transform, and wavelet
BPN transform with clustering. For the European dataset, FA-SVR has
testing 0.010579 0.008260 0.006774 0.00368
better results under the original and clustering processes. PSO-
training 0.009761 0.007846 0.001868 0.001758
Regression SVR has better results under wavelet transform, and wavelet trans-
testing 0.011166 0.008945 0.005248 0.003781 form with clustering processes. For the American dataset, FA-SVR
Dark gray color means that it is the best method for training data. has better results under the original, clustering and wavelet with
Light gray color means that it is the best method for testing data. clustering processes. DE-SVR has better results under wavelet
Fig. 3. MSE value of each model for B1 dataset.

Fig. 6. MSE value of each model for European dataset.

Fig. 7. MSE value of each model for North American dataset.
Fig. 8. MSE value of each model for Chinese dataset.
transform. For the Chinese dataset, FA-SVR has better results under rithms because its expansive searching technique helps to find
the original, wavelet transform, clustering, and wavelet transform the optimal parameters. According to the experimental results,
with clustering processes. FA-SVR also outperforms ARIMA, BPN and Regression. However,
In fact, the MSE values are very similar among the four meta- some directions are required: simple K-means algorithm is com-
heuristics algorithms. In general, the FA-SVR model has the lowest bined with metaheuristics algorithms to cluster data in this study.
MSE value compared to the other six models, and Regression has In future research, it may be feasible to utilize the time series clus-
the highest MSE value compared to the other six models: B1, B2, tering method for time series dataset. The number of input nodes
B3, European, Northern American and Chinese datasets. influences both the autoregressive structures of time series and
the forecasting performance. Therefore, a technique to systemi-
5. Conclusions cally determine the number of input nodes in SVR will be a poten-
tial direction for future development. In this study, only time series
This study proposes an integrated model of cluster analysis and data are used in the forecasting models. In fact, export trade fore-
metaheuristics- based SVR with wavelet transform to predict the casting is a complex topic to analyze. There are many influential
values of Taiwanese export trade. In order to enhance the accuracy effects, such as the price index of exports, exchange rate and some
of forecasting capability, we not only utilize metaheuristics algo- other financial indices. Furthermore, obtaining more precise
rithms to find the optimal solutions for the SVR model, but also results of forecasting by applying multivariate statistical analysis,
apply wavelet transform and clustering analysis for data prepro- feature selection and measuring the variable weight to forecasting
cessing. Wavelet transform can help to decrease the impact of models is promising.
noise and outliers, and clustering analysis can cluster the data with
similar features; therefore, they can improve the performance. In References
this study, four metaheuristics are applied (GA, PSO, FA and DE)
Cai, L., Qu, S., & Li, X. (2013). Self-adapt evolution SVR in a traffic flow forecasting. In
to optimize the SVRs parameters. The computational results indi- 2013 IEEE international conference on signal processing, communication and
cate that FA usually has better performance than the other algo- computing (ICSPCC) (pp. 15).
Cai, Z.-J., Sheng, L., & Xiao-bin, Z. (2009). Tourism demand forecasting by support Nalley, D., Adamowski, J., & Khalil, B. (2012). Using discrete wavelet transforms to
vector regression and genetic algorithm. In 2nd IEEE international conference on analyze trends in streamflow and precipitation in Quebec and Ontario (1954
computer science and information technology (pp. 144146). 2008). Journal of Hydrology, 475, 204228.
Castro-Neto, M., Jeong, Y.-S., Jeong, M.-K., & Han, L. D. (2009). Online-SVR for short- Olatomiwa, L., Mekhilef, S., Shamshirband, S., Mohammadi, K., Petkovic, D., &
term traffic flow prediction under typical and atypical traffic conditions. Expert Sudheer, C. (2015). A support vector machinefirefly algorithm-based model for
Systems with Applications, 36(3), 61646173. Part 2. global solar radiation prediction. Solar Energy, 115, 632644.
Chang, P.-C., Wu, J.-L., & Lin, J.-J. (2016). A Takagi-Sugeno fuzzy model combined Pai, P.-F., & Hong, W.-C. (2005). Forecasting regional electricity load based on
with a support vector regression for stock trading forecasting. Applied Soft recurrent support vector machines with genetic algorithms. Electric Power
Computing, 38, 831842. Systems Research, 74(3), 417425.
Che, J., & Wang, J. (2010). Short-term electricity prices forecasting based on support Pai, P.-F., Hung, K.-C., & Lin, K.-P. (2014). Tourism demand forecasting using novel
vector regression and Auto-regressive integrated moving average modeling. hybrid system. Expert Systems with Applications, 41(8), 36913702.
Energy Conversion and Management, 51(10), 19111917. Pan, H., Cheng, G., & Ding, J. (2013). Drilling cost prediction based on self-adaptive
Chen, K.-Y. (2011). Combining linear and nonlinear model in forecasting tourism differential evolution and support vector regression. In H. Yin, K. Tang, Y. Gao, F.
demand. Expert Systems with Applications, 38(8), 1036810376. Klawonn, M. Lee, & T. Weise, et al. (Eds.). Intelligent data engineering and
Chen, P.-Y., & Liu, L. (2013). Study on coal logistics demand forecast based on PSO- automated learning IDEAL (Vol. 8206, pp. 6775). Berlin, Heidelberg: Springer.
SVR. In 2013 10th international conference on service systems and service Partal, T., & Kk, M. (2006). Long-term trend analysis using discrete wavelet
management (pp. 130133). components of annual precipitations measurements in Marmara region
Chen, K.-Y., & Wang, C.-H. (2007). Support vector regression with genetic (Turkey). Physics and Chemistry of the Earth, Parts A/B/C, 31(18), 11891200.
algorithms in forecasting tourism demand. Tourism Management, 28(1), Qi, X., & Yan, X. (2009). Forecast of the total volume of import-export trade based on
215226. grey modelling optimized by genetic algorithm. In Third international
Cherkassky, V., & Ma, Y. (2004). Practical selection of SVM parameters and noise symposium on intelligent information technology application (pp. 545547).
estimation for SVM regression. Neural Networks, 17(Mo. 1), 113126. Senthilnath, J., Omkar, S. N., & Mani, V. (2011). Clustering using firefly algorithm:
Ding, Y., Song, X., & Zen, Y. (2008). Forecasting financial condition of Chinese listed Performance study. Swarm and Evolutionary Computation, 1(3), 164171.
companies based on support vector machine. Expert Systems with Applications, Shahrabi, J., Hadavandi, E., & Asadi, S. (2013). Developing a hybrid intelligent model
34(4), 30813089. for forecasting problems: Case study of tourism demand time series.
Fang, S. F., Wang, M. P., Qi, W. H., & Zheng, F. (2008). Hybrid genetic algorithms and Knowledge-Based Systems, 43, 112122.
support vector regression in forecasting atmospheric corrosion of metallic Siamak, S. G., Bahrami Jovein, H., & Ramezanianpour, A. A. (2012). Hybrid support
materials. Computational Materials Science, 44(2), 647655. vector regression Particle swarm optimization for prediction of compressive
Guanghui, W. (2012). Demand forecasting of supply chain based on support vector strength and RCPT of concretes containing metakaolin. Construction and Building
regression method. Procedia Engineering, 29, 280284. Materials, 34, 321329.
Guo, X., Sun, L., Li, G., & Wang, S. (2008). A hybrid wavelet analysis and support Soltani, M., Moghaddam, T. B., Karim, M. R., Shamshirband, S., & Sudheer, C. (2015).
vector machines in forecasting development of manufacturing. Expert Systems Stiffness performance of polyethylene terephthalate modified asphalt mixtures
with Applications, 35(12), 415422. estimation using support vector machine-firefly algorithm. Measurement, 63,
Hoang, N.-D., & Pham, A.-D. (2016). Hybrid artificial intelligence approach based on 232239.
metaheuristic and machine learning for slope stability assessment: A Tan, C., & Pedersen, C. N. S. (2009). Financial time series forecasting using improved
multinational data analysis. Expert Systems with Applications, 46, 6068. wavelet neural network : . Aarhus Universitet, Datalogisk Institut.
Hong, W.-C., Dong, Y., Chen, L.-Y., & Wei, S.-Y. (2011). SVR with hybrid chaotic Tay, F. E. H., & Cao, L. (2001). Application of support vector machines in financial
genetic algorithms for tourism demand forecasting. Applied Soft Computing, 11 time series forecasting. Omega, 29(4), 309317.
(2), 18811890. Tay, F. E. H., & Cao, L. J. (2002). Modified support vector machines in financial time
Hong, W.-C., Dong, Y., Zheng, F., & Lai, C.-Y. (2011). Forecasting urban traffic flow by series forecasting. Neurocomputing, 48(14), 847861.
SVR with continuous ACO. Applied Mathematical Modelling, 35(3), 12821291. Thissen, U., van Brakel, R., de Weijer, A. P., Melssen, W. J., & Buydens, L. M. C. (2003).
Hu, Z., Bao, Y., Chiong, R., & Xiong, T. (2015). Mid-term interval load forecasting Using support vector machines for time series prediction. Chemometrics and
using multi-output support vector regression with a memetic algorithm for Intelligent Laboratory Systems, 69(12), 3549.
feature selection. Energy, 84, 419431. Tiwari, M. K., & Chatterjee, C. (2010). Development of an accurate and reliable
Huang, W., Nakamori, Y., & Wang, S.-Y. (2005). Forecasting stock market movement hourly flood forecasting model using waveletbootstrapANN (WBANN) hybrid
direction with support vector machine. Computers & Operations Research, 32 approach. Journal of Hydrology, 394(34), 458470.
(10), 25132522. Tu, Y.-H., Hsien-Lun, W., & Chi-Chen, W. (2009). An evaluation of comparison
Jiansheng, W., & Enhong, C. (2010). A novel hybrid particle swarm optimization for between multivariate fuzzy time series with traditional time series model for
feature selection and kernel optimization in support vector regression. In 2010 forecasting taiwan export. In 2009 WRI world congress on computer science and
International conference on computational intelligence and security (CIS) information engineering (pp. 462467).
(pp. 189194). Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer-
Ju, Y.-F., & Wu, S.-W. (2010). Village electrical load prediction by genetic algorithm Verlag.
and SVR. In 2010 3rd IEEE international conference on computer science and Vapnik, V. N. (1999). An overview of statistical learning theory. IEEE transactions on
information technology (ICCSIT) (pp. 278281). neural networks, 10(5), 988999.
Kao, L.-J., Chiu, C.-C., Lu, C.-J., & Chang, C.-H. (2013). A hybrid approach by Wang, C.-C. (2011). A comparison study between fuzzy time series model and
integrating wavelet-based feature extraction with MARS and SVR for stock ARIMA model for forecasting Taiwan export. Expert Systems with Applications, 38
index forecasting. Decision Support Systems, 54(3), 12281244. (8), 92969304.
Kavousi-Fard, A., Samet, H., & Marzbani, F. (2014). A new hybrid modified firefly Wang, J., Li, L., Niu, D., & Tan, Z. (2012). An annual load forecasting model based on
algorithm and support vector regression model for accurate short term load support vector regression with differential evolution algorithm. Applied Energy,
forecasting. Expert Systems with Applications, 41(13), 60476056. 94, 6570.
Kazem, A., Sharifi, E., Hussain, F. K., Saberi, M., & Hussain, O. K. (2013). Support Wong, H.-L., Tu, Y.-H., & Wang, C.-C. (2010). Application of fuzzy time series models
vector regression with chaos-based firefly algorithm for stock market price for forecasting the amount of Taiwan export. Expert Systems with Applications,
forecasting. Applied Soft Computing, 13(2), 947958. 37(2), 14651470.
Kim, K.-J. (2003). Financial time series forecasting using support vector machines. Wu, Q. (2009). The forecasting model based on wavelet m-support vector machine.
Neurocomputing, 55(12), 307319. Expert Systems with Applications, 36(4), 76047610.
Li, J., & Cai, Z. (2008). A novel automatic parameters optimization approach based Wu, Q. (2010). Product demand forecasts using wavelet kernel support vector
on differential evolution for support vector regression. In L. Kang, Z. Cai, X. Yan, machine and particle swarm optimization in manufacture system. Journal of
& Y. Liu (Eds.). Advances in computation and intelligence (Vol. 5370, Computational and Applied Mathematics, 233(10), 24812491.
pp. 510519). Berlin, Heidelberg: Springer. Xiong, T., Bao, Y., & Hu, Z. (2014). Multiple-output support vector regression with a
Li, M.-W., Hong, W.-C., & Kang, H.-G. (2013). Urban traffic flow forecasting using firefly algorithm for interval-valued stock price index forecasting. Knowledge-
GaussSVR with cat mapping, cloud model and PSO hybrid algorithm. Based Systems, 55, 87100.
Neurocomputing, 99, 230240. Yan, X., & Chowdhury, N. A. (2014). Mid-term electricity market clearing price
Lu, C.-J. (2014). Sales forecasting of computer products based on variable selection forecasting utilizing hybrid support vector machine and auto-regressive
scheme and support vector regression. Neurocomputing, 128, 491499. moving average with external input. International Journal of Electrical Power &
Lu, C.-J., & Wang, Y.-W. (2010). Combining independent component analysis and Energy Systems, 63, 6470.
growing hierarchical self-organizing maps with support vector regression in Yeh, C.-Y., Huang, C.-W., & Lee, S.-J. (2011). A multiple-kernel support vector
product demand forecasting. International Journal of Production Economics, 128 regression approach for stock market price forecasting. Expert Systems with
(2), 603613. Applications, 38(3), 21772186.
Mojumder, J. C., Ong, H. C., Chong, W. T., Shamshirband, S., & Abdullah Al, M. (2016).
Application of support vector machine for prediction of electrical and thermal
performance in PV/T system. Energy and Buildings, 111, 267277.

Taiwanese Export Trade Forecasting Using Firefly Algorithm Based

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Taiwanese Export Trade Forecasting Using Firefly Algorithm Based

Caricato da

Copyright:

Formati disponibili

Computers & Industrial Engineering 99 (2016) 153161

Contents lists available at ScienceDirect

Computers & Industrial Engineering

Taiwanese export trade forecasting using firefly algorithm based

where f x denotes the forecasting value, w is a vector of weight X

Fig. 1. Transformation process illustration of an SVR model.

X id zi1 ; . . . ; zij ; . . . zik ; 9

Fig. 2. The framework of the proposed forecasting model.

In the firefly algorithm, it is assumed that the attractiveness of a

Fig. 3. MSE value of each model for B1 dataset.

Fig. 4. MSE value of each model for B2 dataset.

Fig. 5. MSE value of each model for B3 dataset.

Fig. 6. MSE value of each model for European dataset.

Fig. 7. MSE value of each model for North American dataset.

Fig. 8. MSE value of each model for Chinese dataset.

Potrebbero piacerti anche