Sei sulla pagina 1di 11

Expert Systems with Applications 39 (2012) 8369–8379

Contents lists available at SciVerse ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

Using ridge regression with genetic algorithm to enhance real estate


appraisal forecasting
Jae Joon Ahn a, Hyun Woo Byun a, Kyong Joo Oh a,⇑, Tae Yoon Kim b,1
a
Department of Information and Industrial Engineering, Yonsei University, 134, Shinchon-Dong, Seodaemun-Gu, Seoul 120-749, South Korea
b
Department of Statistics, Keimyung University, Daegu 704-701, South Korea

a r t i c l e i n f o a b s t r a c t

Keywords: This study considers real estate appraisal forecasting problem. While there is a great deal of literature
Ridge regression about use of artificial intelligence and multiple linear regression for the problem, there has been always
Genetic algorithm controversy about which one performs better. Noting that this controversy is due to difficulty finding
Real estate market proper predictor variables in real estate appraisal, we propose a modified version of ridge regression,
i.e., ridge regression coupled with genetic algorithm (GA-Ridge). In order to examine the performance
of the proposed method, experimental study is done for Korean real estate market, which verifies that
GA-Ridge is effective in forecasting real estate appraisal. This study addresses two critical issues regard-
ing the use of ridge regression, i.e., when to use it and how to improve it.
Ó 2012 Elsevier Ltd. All rights reserved.

1. Introduction & Cripps, 2001; Worzala, Lenk, & Silva, 1995). In this study, it will
be shown that this confusing episode appears due to difficulty find-
In recent years, interest in performance of real estate markets ing proper predictor variables and could be resolved quite success-
and real estate investment trusts (REITs) has grown up so fast fully by a modified version of ridge regression, i.e., ridge regression
and tremendously as they are usually required for asset valuation, coupled with genetic algorithm (GA-Ridge).
property tax, insurance estimations, sales transactions, and estate Theoretically as well as practically, there has been widespread
planning. Conventionally, sales comparison approach has been strong objection to arbitrary use of ridge regression. The main crit-
widely accepted to forecast residential real estate. The sales com- icisms are twofold. Firstly, though it is well known that ridge
parison grid method, however, is often questioned for relying too regression is effective for the case where the unknown parameters
much on subjective judgments for obtaining reliable and verifiable (or the linear coefficients) are known a priori to have small modu-
data (Wiltshaw, 1995). As a consequence, multiple linear regres- lus values, it is hard to obtain or implement such prior information.
sion (MLR) based on related predictors has been considered as a Secondly, blind use of ridge regression is likely to change any non-
rigorous alternative enhancing predictability of real estate and significant predictor variable into significant one easily. Our study
property value, which immediately faces criticism such as nonlin- addresses these two critical issues and proposes GA-Ridge as a
earity within the data, multicollinearity issues in the predictor measure that takes care of them nicely.
variables and the inclusion of outlier in the sample. As is often The rest of the study is divided as follows. Section 2 discusses
the case with other financial forecasting problems, this criticism background of this article involving difficulty finding proper pre-
has prompted researchers to resort to artificial neural network dictor variables in real estate forecasting. Section 3 is devoted to
(ANN) as another logical alternative (Ahn, Lee, Oh, & Kim, 2009; detailed description of the proposed GA-Ridge and discusses its
Chen & Du, 2009; Dong & Zhou, 2008; Lee, Booth, & Alam, 2005; effectiveness for handling the two critical issues of ridge regres-
Lu, 2010; Oh & Han, 2000; Versace, Bhatt, Hinds, & Shiffer, 2004). sion. In Section 4, GA-Ridge is experimented in the Korean real es-
The follow-up studies observe, however, that either ANN or MLR tate market to demonstrate its effectiveness. Lastly, the concluding
fails to report a dominating performance than the other, i.e., ANN remarks are given in Section 5.
excels MLR in some cases while MLR excels ANN in other cases
(Dehghan, Sattari, Chehreh, & Aliabadi, 2010; Hua, 1996; Nguyen
2. Background

⇑ Corresponding author. Tel.: +82 2 2123 5720; fax: +82 2 364 7807. 2.1. Predictor variable for real estate forecasting
E-mail addresses: redsamzang@yonsei.ac.kr (J.J. Ahn), next1219@nate.com
(H.W. Byun), johanoh@yonsei.ac.kr (K.J. Oh), tykim@kmu.ac.kr (T.Y. Kim). Forecasting of asset pricing is a major issue in real estate
1
Tel.: +82 53 580 5533. practice (Bourassa, Cantoni, & Hoesli, 2010; Chica-Olmo, 2007;

0957-4174/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2012.01.183
8370 J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379

McCluskey & Anand, 1999; O’Roarty et al., 1997; Peterson & Table 1
Flanagan, 2009; Tay & Ho, 1994; Wilson, Paris, Ware, & Jenkins, Training and testing period for moving window scheme.

2002). Property development relies on prediction of expected costs Window number Training period Testing period
and returns (Allen, Madura, & Springer, 2000; Evans, James, & 1 1996.07–2004.12 2005.01–2005.06
Collins, 1993; Juan, Shin, & Perng, 2006; McCluskey & Anand, 2 1997.01–2005.06 2005.07–2005.12
1999; Pivo & Fisher, 2010). Property and facilities managers need 3 1997.07–2005.12 2006.01–2006.06
forecasts of supply and demand as well as of cost and return. Funds 4 1998.01–2006.06 2006.07–2006.12
5 1998.07–2006.12 2007.01–2007.06
and investment managers rely on forecasts of the present and future 6 1999.01–2007.06 2007.07–2007.12
values of real estate in terms of economic activities. In these real 7 1999.07–2007.12 2008.01–2008.06
estate forecasting problems, there has been a hot controversy over 8 2000.01–2008.06 2008.07–2008.12
superiority of ANN over MLR as proper tool since use of ANN for 9 2000.07–2008.12 2009.01–2009.06
10 2000.01–2009.06 2009.07–2009.12
residential valuation was first suggested by Jensen (1990). Rossini
(1997) seeks to assess the application of ANN and MLR to residential
valuation and supports the use of MLR while Do and Grudnitski
(1992) suggests ANN as s superior technique. Worzala et al.
(1995) notices that while ANN slightly outperforms MLR in some
cases, the difference between the two is insignificant. Hua (1996)
and Brooks and Tsolacos (2003) support ANN over MLR with some
cautionary note on predictor variables and McGreal, Berry, McPar-
land, and Turner (2004) expresses skepticism about ANN. Noting
that ANN is designed mainly for the purpose of modeling any func-
tional relationship (or ANN is designed mainly to correct modeling
bias or assumption), ANN is expected to excel MLR when there is
significant modeling bias from linear model, while MLR is expected
to excel ANN otherwise. Thus no clear-cut superiority between ANN
and MLR implicitly suggests that other source of trouble might exist
than incorrect modeling in real estate appraisal forecasting.
One possible source of trouble is difficulty finding significant
and reliable predictors as discussed by several authors. Rossini
(1997) noticed that quantitative predictor variables such as past
sale price, land area, rooms and year of construction tend to suffer
from lack of qualitative measures, while qualitative predictor vari-
ables such as building style and environments are frequently
rather simplistic and fail to capture sufficient information. Similar
observations are made by Brooks and Tsolacos (2003) and McGreal
Fig. 1. Moving window scheme.
et al. (2004). In particular, Brooks and Tsolacos (2003) noticed that
significant predictors depend on the used methodology. These dis-
cussions altogether suggest clearly that finding a proper set of pre-
dictor variables is hard in real estate appraisal and it would be where k is a positive number. In applications, the interesting values
highly desirable to take care of this predictor selection problem of k usually lie in the range of (0, 1). This procedure is called ridge
technically. regression.
It is well known that ridge regression can be regarded as an
2.2. Ridge regression estimation of b from the data subject to prior knowledge that
smaller values in modulus of the b s are more likely than larger
Ridge regression is known as a very useful tool for alleviating values, and that larger and larger values of the b s are more
multicolinearity problem (Walker & Birch, 1988). Its formal formu- and more unlikely. Thus ridge regression is quite useful when
lation is given as one of least squares subject to a specific type of smaller values in modulus of the b s are expected more than lar-
restrictions on the parameters. The standard approach to solve ger values. In this context, one major drawback of ridge regres-
an overdetermined system of linear equations: sion is the ‘‘unchecked arbitrariness’’ when it is implemented in
Y ¼ Xb practice. Indeed the characteristic effect of the ridge regression
procedure is to change any non-significant estimated b to the sig-
is known as linear least squares and seeks to minimize the residual: nificant estimated b and hence it is questionable that much real
kY  Xbk2 ; improvement can be really achieved by such a procedure. Refer
to Draper and Smith (1981).
where Y is n  1 vector, X is n  p matrix (n P p), b is p  1 vector
and k k is Euclidean norm. However, the matrix X may be ill condi-
tioned or singular yielding a non-unique solution. In order to give 2.3. Genetic algorithm
preference to a particular solution with desirable properties, the
regularization term is included in this minimization: GA is a stochastic search technique that can explore large and
2 complicated spaces on the ideas from natural genetics and evolu-
kY  Xbk2 þ k kbk2 :
tionary principle (Goldberg, 1989; Holland, 1975; Oh, Kim, &
This regularization improves the conditioning of the problem, thus Min, 2005). It has been demonstrated to be effective and robust
^ is
enabling a numerical solution. An explicit solution, denoted by b, in searching very large spaces in a wide range of applications
given by (Koza, 1993). GA is particularly suitable for multi-parameter opti-
 1 mization problems with an objective function subject to numerous
^ ¼ bðkÞ
b ^ ¼ X0 X þ kI X0 Y; ð1Þ hard and soft constraints. GA performs the search process in four
J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379 8371

Table 2
List of predictor variables.

Selected predictor Explanation


X1 Note default rate (NDR) Using raw data
Pt
X2 Size of the run of increasing X1 during the latest 12 months i¼t11 ðZ1Þi ; Z1 ¼ 1 if ðX1Þt > ðX1Þt1 ; 0; otherwise
X3 Change rate of foreign exchange holdings (FEH) Ratio of the current month to the same month of the last year
X4 Change rate of money stock Ratio of the current month to the same month of the last year
X5 Change rate of producer price index Ratio of the current month to the same month of the last year
X6 Change rate of consumer price index Ratio of the current month to the same month of the last year
X7 Change rate of balance of trade Ratio of the current month to the same month of the last year
X8 Change rate of index of industrial production Ratio of the current month to the same month of the last year
Pt
X9 Size of the run of decreasing X8 during the latest 12 months i¼t11 ðZ8Þi ; Z8 ¼ 1 if ðX8Þt < ðX8Þt1 ; 0; otherwise
X10 Change rate of index of producer shipment Ratio of the current month to the same month of the last year
X11 Change rate of index of equipment investment Ratio of the current month to the same month of the last year
X12 FEH per gross domestic products FEH/GDP
Pt
X13 Size of the run of decreasing X12 during the latest 12 months i¼t11 ðZ8Þi ; Z12 ¼ 1 if; 0; otherwise
Pt
X14 Size of the run of decreasing monthly change of X12 during the latest 12 months i¼t11 ðZ12Þi Z12 ¼ 1 if ðX12Þt < ðX12Þt1 ; 0; otherwise
X15 Change rate of FEH per GDP Ratio of the current month to the same month of the last year
X16 Balance of trade per GDP BOT/GDP
Pt
X17 Size of the run of increasing X16 during the latest 16 months i¼t11 ðZ16Þi ; Z16 ¼ 1 if ðX16Þt < ðX16Þt1 ; 0; otherwise
Pt
X18 Size of the run of negative X16 during the latest 16 months i¼t11 ðZ16Þi ; Z16 ¼ 1 if ðX16Þt < 0; 0; otherwise
X19 Balance of payments (direct Investment) Use raw data
X20 Balance of payments (securities investment) Use raw data
X21 Other balance of payments Use raw data
X22 Amount of foreigners’ investment in the stock market Use raw data

(a) stages: (i) initialization, (ii) selection, (iii) crossover, and (iv) muta-
tion (Wong & Tan, 1994). In the initialization stage, a population of
genetic structures (known as chromosomes) that are randomly
distributed in the solution space is selected as the starting point
of the search. After the initialization stage, each chromosome is
evaluated using a user-defined fitness function. The goal of the fit-
ness function is to encode numerically the performance of the
chromosome. For real-world applications of optimization methods
such as GA, the choice of the fitness function is the most critical
step. In this paper, GA is employed for finding optimal k and proper
set of predictors simultaneously.

3. GA-Ridge algorithm
(b)
Procedure of GA-Ridge algorithm is described. A general multi-
ple linear regression model is represented as follows:

Y i ¼ D1 b1 X i1 þ    þ Dp bp X ip þ ei ; i ¼ 1; 2; . . . ; n; ð2Þ

where Dj (j = 1, . . . , p) equals to either 0 or 1 according to the inclu-


sion of predictor variable Xj(j = 1, . . . , p) in the model (2). Then, for
numerous combinations of ðD1 ; D2 ; . . . ; Dp Þ and various values of

0 < k < 1, the optimal ðD1 ; D2 ; . . . ; Dp ; k Þ is searched by GA, i.e.,
 

D1 ; D2 ; . . . ; Dp ; k ¼ argminD1 ;D2 ;...;Dp ;k
(c) n h
X  i2
 Y t  Y^ t D1 ; D2 ; . . . ; Dp ; k
t¼1
 
¼ argminD1 ;D2 ;...;Dp ;k SSE D1 ; D2 ; . . . ; Dp ; k ; ð3Þ

where Y ^ ðD ; D ; . . . ; D ; kÞ ¼ XðD1 ; D2 ; . . . ; Dp ÞbðD


^ 1 ; D2 ; . . . ; Dp ; kÞ,
 t 1 2  p
 X D1 ; D2 ; . . . ; Dp is
 a n  q  matrix  with  q 6 p and
^ D1 ; D2 ; . . . ; Dp ; k ¼ ½X D1 ; D2 ; . . . ; Dp 0 X D1 ; D2 ; . . . ; Dp þ kI1 XðD1 ;
b
D2 ; . . . ; Dp Þ0 Y.
Thus final GA-Ridge regression estimator is:
   
Fig. 2. Performance comparison of ANN and MLR for HSI forecasting during ^ GA ¼ X D ; D ; . . . ; D b
Y ^ D ; D ; . . . ; D ; k : ð4Þ
1 2 p 1 2 p
evaluation period.
8372 J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379

Remark 1. Ridge regression is preferred when one expects smaller Table 3


b s in modulus. Note that insignificant b s might be included in Significance test for 22 predictor variables when MLR is used for HSI forecasting.

smaller b s in modulus. Problem is that it is usually hard to Predictor Coefficient p-Value


determine such situation effectively because smaller b s in (a) Window No. 1
modulus requires a subjective judgment. To resolve this problem, X1 0.0014402 0.5826
we propose to use ridge regression ‘‘when neither ANN nor MLR X2 0.0010064 0.3458
excels the other that significantly’’. This litmus rule for using ridge X3 0.058566 0.0553
X4 0.054909 0.2217
regression is based on the fact that the quoted situation arises X5 0.082096 0.0757
when smaller b s in modulus are very likely or reliable predictor X6 0.053088 0.655
variables are hard to find. Note that this litmus rule is desirable X7 2.686E05 0.411
because it does not depend on subjective judgment that X8 0.017776 0.7411
X9 0.0008066 0.326
much. Refer to Section 4 for how to implement the litmus rule in
X10 0.039748 0.4673
practice. X11 0.012778 0.1496
X12 7.649E05 0.1309
X13 0.0002201 0.7276
Remark 2. One strong criticism against ridge regression is that X14 0.003623 0.0005
^
bðkÞ in (1) arbitrarily changes the non-significant estimated b to X15 0.071041 0.0166
^  ; D ; . . . ; D ; k Þ of (4) prevents this
the significant estimated b. bðD X16 1.9552 0.0004
1 2 p
unchecked arbitrariness effectively since the optimal k and the X17 0.0007475 0.396
X18 0.0022285 0.008
optimal predictor variables are searched ‘‘simultaneously’’. Note X19 1.94E06 0.1587
that GA plays a key role for GA-Ridge algorithm because it is X20 4.56E07 0.358
particularly suitable for multi-parameter optimization problems X21 3.37E07 0.4
with an objective function subject to numerous hard and soft X22 1.54E09 0.0455
constraints. (b) Window No. 10
X1 0.0047141 0.6146
X2 0.0004101 0.6759
4. Empirical studies X3 0.076859 0.2488
X4 0.094025 0.048
X5 0.11661 0.0647
4.1. Experimental setting X6 0.14433 0.3806
X7 4.36E05 0.46
In this experimental study, forecasting of home sales index X8 0.15811 0.0313
X9 0.0012202 0.2552
(HSI) and home rental index (HRI) in the Korean real estate market
X10 0.15055 0.0569
are considered. These monthly indexes are produced and main- X11 0.012823 0.3028
tained by the KB bank, one of the major banks in Korea, for the pur- X12 5.252E05 0.2384
pose of monitoring real estate market movement. In this study X13 0
forecasting analysis of HSI and HRI covers a 14-year period from X14 0.0009064 0.4315
X15 0.09998 0.1568
July 1996 to December 2009. In order to evaluate the forecasting
X16 0.23245 0.6975
accuracy of GA-Ridge algorithm under different experimental situ- X17 0.0016523 0.0887
ations, ‘‘moving window scheme’’ is employed. Indeed moving X18 0.0006417 0.7232
window which is a block of time series data of size l comprising X19 3.88E07 0.6875
X20 1.83E07 0.642
the first sub-block of size l1 and the second sub-block of size l2
X21 1.13E07 0.654
(i.e., l = l1 + l2), moves by size l2 each time and thus each moving X22 2.76E10 0.5506
window of size l overlaps the next window by size l2. Here the lat-
ter sub-block of size l2 is held out for evaluation purpose while GA-
Ridge algorithm is implemented for the former sub-block of size l1. n  2
The moving window scheme with 10 windows is illustrated in Ta- 1X
RMSE ¼ Y t  Y^ t ; ð5Þ
ble 1 and Fig. 1. Refer also to Jang, Lai, Jiang, Pan, and Chien (1993) n t¼1
n  
and Hwarng (2001). 1X  
For experimenting GA-Ridge algorithm with monthly HRI or MAE ¼ Y t  Y^ t ; ð6Þ
n t¼1
monthly HSI as predicted variables, the predictor variables used !2
for monitoring economic condition in Ahn, Oh, Kim, and Kim 1X n
Y t  Y^ t
(2011) are employed as predictors here. Indeed the three major MAPE ¼  100; ð7Þ
n t¼1 Yt
economic variables, foreign exchange rates, interest rates, and
stock market index, and the key macroeconomic predictors such where Yt and Y ^ t are respectively the actual and forecasted value of
as GDP and trade balance with their derivations are included to ob- HSI or HRI at time t. While both MAE and RMSE are simply mea-
tain 22 predictors. Refer to Table 2. Note that all the predictors are sures of discrepancies between the predicted values and the actual
monthly data and are developed for the purpose of monitoring the observations, MAPE measures scaled discrepancies at each t.
Korean economic conditions by Ahn et al. (2011). What is behind
this selection of predictor is that economic condition itself is obvi-
ously quite influential on the real estate market but hard to quan- 4.2. Experimental results
tify as a single predictor. Thus it is decomposed as the predictors in
Table 2 instead. Forecasting analysis is done for both HSI and HRI. Since fore-
In order to evaluate the forecasting accuracy, the following casting analysis result of HRI is quite similar to HSI, detailed fore-
three distance metrics (5)–(7) are employed: root mean squared casting analysis of HSI is given first and then a brief summary of
error (RMSE), mean absolute error (MAE), and mean absolute per- HRI forecasting analysis is given later. As a prior validity check
centage error (MAPE): for using GA-Ridge for HSI forecasting, we examined two things.
Firstly, we compare performance of MLR and ANN, which is our
J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379 8373

Fig. 3. Predicted vs actual HSI during testing period.


8374 J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379

Table 4
Numerical comparison of four forecasting methods for HSI.

Distance metric GA-Ridge Multiple regression Ridge regression ANN


RMSE 0.0074 0.0104 0.0088 0.0110
MAE 0.0055 0.0086 0.0069 0.0088
MAPE 239.66 304.91 244.72 291.05

Table 5
p-Values of 6 paired t-tests for four forecasting methods on HIS.

GA-Ridge Multiple regression Ridge regression ANN


(a) MSE
GA-Ridge – 0.000* 0.000* 0.000*
Multiple regression – 0.003* 0.266
Ridge regression – 0.009*
ANN –
(b) MAE
GA-Ridge – 0.000* 0.001* 0.000*
Multiple regression – 0.000* 0.421
Ridge regression – 0.009*
ANN –
(c) MAPE
GA-Ridge – 0.000* 0.000* 0.001*
Multiple regression – 0.011* 0.425
Ridge regression – 0.043*
ANN –
*
Significant at 5%.

Table 6 significances of the 22 predictors are tested individually when they


Significance test for the selected variables when GA-Ridge is used for HSI forecasting. are employed for MLR at each window (see Table 3). Table 3 shows
Selecting variables Coefficient p-Value Ridge value
that most of the predictors are not significant and significant pre-
dictors found at each window vary. In addition it shows that the
(a) Window No. 1
estimated coefficients are close to zero in their modulus. For edito-
X1 0.0037 0.0831 0.0019
X2 0.0013 0.0747 rial purpose, test results for only window 1 and 10 are given in Ta-
X3 0.0381 0.0045 ble 3 (others point out similar things). This is not really surprising
X5 0.0984 0.0001 because each predictor constituting the current economic condi-
X14 0.0025 0.0042
tion together is expected to have indirect influence though the cur-
X15 0.0473 0.0009
X16 1.7609 0.0001
rent economic condition itself has evidently great influence on HSI.
X18 0.0019 0.0001 As a result, it seems to be technically as well as intuitively desir-
X22 1.93E09 0.0007 able to employ GA-Ridge as appropriate method for forecasting
(b) Window No. 2 on this particular problem.
X3 0.0416 0.0234 0.0007 For forecasting performance comparison with GA-Ridge, three
X4 0.0539 0.1198 other forecasting methods are considered: MLR, Pure Ridge regres-
X5 0.0767 0.0013
sion and ANN. Pure ridge regression method is considered here in
X9 0.0011 0.0501
X12 0.0001 0.0192 order to assess how effectively GA-Ridge resolves the unchecked
X14 0.0021 0.0004 arbitrariness of ridge regression mentioned in Remark 2. Fig. 3 de-
X15 0.0522 0.0055 picts the forecasting result when each method is employed during
X16 2.1558 0.0001
the testing periods of the moving window scheme. Note that the
X18 0.0017 0.0001
X19 0.000002 0.187
testing periods are connected continuously without any time
X20 0.0000005 0.2578 break, starting from January 2005 (refer to Fig. 1). Fig. 3 is summa-
X22 9.83E10 0.1143 rized by Table 4 numerically which calculates RMSE, MAE and
(c) Window No. 10 MAPE values from Fig. 3 for evaluating the performances of the
X4 0.0545 0.0314 0.0027 four methods. It is easy to notice from Table 4 that GA-Ridge is
X5 0.1291 0.0001 superior to the other methods across the three distance metrics.
X6 0.1397 0.1571
To understand things better, mean difference tests (or paired t-
X8 0.1324 0.0054
X9 0.0011 0.1278 tests) are done for 6 pairs out of the 4 methods. Indeed from calcu-
X10 0.1152 0.0169 ^ tj Þ2 : t ¼ 1; . . . ; ng
lation of MASE, a set of data W j ¼ fwtj ¼ ðY tj  Y
X16 0.5481 0.2067 for j = 1 (GA Ridge), j = 2 (MLR), j = 3 (Pure Ridge) and j = 4 (ANN)
X17 0.0019 0.0372
are obtained and paired tests are done for six pairs (W1, W2),
(W1, W3), (W1, W4), (W2, W3), (W2, W4) and (W3, W4). Similar proce-
dure is done for MAE and MAPE. Results of these paired tests given
litmus rule for using GA-Ridge (refer to Remark 1). Fig. 2 shows in Table 5 verify that the performances of the four methods are sig-
that neither ANN nor MLR excels its counterpart uniformly nificantly different from each other except the pair (ANN, MLR).
throughout 10 windows. Recall that our experiments use moving This together with Fig. 2 and Table 4 confirms the superior perfor-
window scheme with 10 windows as described in Fig. 1. Secondly, mance of GA-Ridge and the insignificant difference between
J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379 8375

Fig. 4. Performance comparison of ANN and MLR for HRI forecasting during testing period.
8376 J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379

Table 7
Significance test for 22 predictor variables when MLR is used for HRI forecasting.

Predictor Coefficient p-Value


(a) Window No. 1
X1 0.00005 0.9909
X2 0.00204 0.2479
X3 0.06572 0.1896
X4 0.06273 0.3963
X5 0.09897 0.1922
X6 0.05789 0.7676
X7 4.231E05 0.4324
X8 0.07073 0.4263
X9 0.00073 0.5915
X10 0.02647 0.7689
X11 0.01088 0.4547
X12 0.00011 0.1922
X13 0.00041 0.6975
X14 0.00396 0.0185
X15 0.05529 0.2516
X16 2.65423 0.0031
X17 0.00029 0.8365
X18 0.00357 0.0099
X19 3.072E06 0.1759
X20 2.065E06 0.0131
X21 4.27E07 0.5178
X22 7.82E10 0.5339
(b) Window No. 10
X1 0.01136 0.2566
X2 7.792E06 0.994
X3 0.07624 0.2827
X4 0.12016 0.0183
X5 0.07448 0.2648
X6 0.06548 0.7083
X7 6.922E05 0.2719
X8 0.11270 0.1465
X9 0.00070 0.4835
X10 0.08147 0.3294
X11 0.01323 0.3183
X12 2.074E05 0.6609
X13 0 0.0000
X14 0.00097 0.4296
X15 0.08063 0.2826
X16 0.04988 0.9376
X17 0.00204 0.0491
X18 0.00170 0.3789
X19 3.71E07 0.7183
X20 3.33E07 0.4286
X21 1.21E07 0.6512
X22 8.52E11 0.8627

performances of MLR and ANN. Finally the predictors selected by across the three distance metrics. Again, mean difference tests
GA-Ridge are examined for their significance at each window in Ta- (or paired t-tests) are done for 6 pairs out of the four methods from
ble 6, which shows that almost all the selected predictors are chan- calculation of MASE, MAE and MAPE. Results of the paired tests in
ged to significant ones in GA-Ridge method. Again for editorial Table 9 verify that performances of the four methods are signifi-
purpose results for windows 1, 2 and 10 are given. cantly different from each other except the pair (ANN, MLR). Final-
The above comparison studies altogether indicate the follow- ly the predictors and the ridge value k selected by GA-Ridge are
ings: (i) ANN and MLR equally match. (ii) Pure ridge is improved examined for their significance at each window in Table 10 which
significantly by GA-Ridge. (iii) GA-Ridge excels others easily. Note shows almost all the selected predictors are changed to significant
that (i) recommends the use of GA-Ridge (see Remark 1) while (ii) ones in GA-Ridge method.
implies the checked arbitrariness of pure ridge by GA-Ridge (see
Remark 2).
For forecasting analysis of HRI, almost identical steps are done. 5. Concluding remarks
Fig. 4 shows that neither ANN nor MLR excels its counterpart uni-
formly throughout 10 windows. Significances of the 22 predictor We studied ridge regression as an alternative tool in real estate
variables are tested for HRI in Table 7, which suggests that most forecasting where one usually faces difficulty finding proper pre-
of predictor variables have weak influence on HRI though some dictors. GA-Ridge is proposed here and its performance is exam-
of them show strong significance depending on window. For fore- ined against other forecasting methods. It is shown that GA is
casting performance comparison, the four forecasting methods are not only successful for real estate forecasting but also nicely settles
considered again. Fig. 5 depicts the forecasting result when each critical issues in ridge regression. Experimental results are given
method is employed during the testing periods of the moving win- for justification of GA-Ridge. It is noteworthy from the experimen-
dow scheme. Then Fig. 5 is summarized by Table 8 numerically, tal results that GA-Ridge becomes a perfect solution particularly
which confirms that GA-Ridge is superior to the other methods when a desirable predictor is hard to quantify but might be
J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379 8377

Fig. 5. Predicted vs actual HRI during testing period.


8378 J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379

Table 8
Comparison of four forecasting methods for HRI.

Distance metric GA-Ridge Multiple regression Ridge regression ANN


RMSE 0.0057 0.0111 0.0070 0.0112
MAE 0.0045 0.0079 0.0051 0.0085
MAPE 131.14 208.97 140.73 242.38

Table 9
p-Values of 6 paired t-tests for four forecasting methods on HRI.

GA-Ridge Multiple regression Ridge regression ANN


(a) MSE
GA-Ridge – 0.003* 0.020* 0.000*
Multiple regression – 0.002* 0.470
Ridge regression – 0.000*
ANN –
(b) MAE
GA-Ridge – 0.000* 0.022* 0.000*
Multiple regression – 0.000* 0.298
Ridge regression – 0.001*
ANN –
(c) MAPE
GA-Ridge – 0.000* 0.018* 0.002*
Multiple regression – 0.000* 0.151
Ridge regression – 0.049*
ANN –
*
Significant at 5%.

Table 10 Acknowledgment
Significance test for the selected variables when GA-Ridge is used for HRI forecasting.

Selecting variables Coefficient p-Value Ridge value T. Y. Kim’s work was supported by Basic Science Research Pro-
(a) Window No. 1
gram through the National Research Foundation of Korea (NRF)
X2 0.0007 0.4126 0.01396 funded by the Ministry of Education, Science and Technology
X3 0.0088 0.0968 (MEST) (KRF-2011-0015936).
X4 0.1116 0.0107
X5 0.1686 0.0001
X7 0.0000 0.2943 References
X12 0.0001 0.0036
X14 0.0019 0.0428 Ahn, J. J., Lee, S. J., Oh, K. J., & Kim, T. Y. (2009). Intelligent forecasting for financial
X16 2.3114 0.0001 time series subject to structural changes. Intelligent Data Analysis, 13, 151–163.
X18 0.0019 0.0009 Ahn, J. J., Oh, K. J., Kim, T. Y., & Kim, D. H. (2011). Usefulness of support vector
X19 0.0000 0.3463 machine to develop an early warning system for financial crisis. Expert Systems
X20 0.0000 0.0005 with Applications, 38, 2966–2973.
Allen, M. T., Madura, J., & Springer, T. M. (2000). REIT characteristics and the
(b) Window No. 2 sensitivity of REIT returns. Journal of Real Estate Finance and Economics, 21,
X1 0.0041 0.0964 0.01531 141–152.
X3 0.01289 0.0001 Bourassa, S. C., Cantoni, E., & Hoesli, M. (2010). Predicting house prices with spatial
X5 0.1242 0.0003 dependence: A comparison of alternative methods. Journal of Real Estate
X7 0.0001 0.0453 Research, 32, 139–159.
X10 0.0228 0.0068 Brooks, C., & Tsolacos, S. (2003). International evidence on the predictability of
X14 0.0032 0.0006 returns to securitized real estate assets: Econometric models versus neural
X16 2.4530 0.0001 networks. Journal of Property Research, 20, 133–155.
Chen, W. S., & Du, Y. K. (2009). Using neural networks and data mining techniques
X18 0.0017 0.0002
for the financial distress prediction model. Expert Systems with Applications, 36,
X19 0.0000 0.1365
4075–4086.
X20 0.0000 0.0001
Chica-Olmo, J. (2007). Prediction of housing location price by a multivariate spatial
(c) Window No. 9 method: Cokriging. Journal of Real Estate Research, 29, 92–114.
X4 0.0673 0.0183 0.0005 Dehghan, S., Sattari, G., Chehreh, C. S., & Aliabadi, M. A. (2010). Prediction of uniaxial
X5 0.1591 0.0001 compressive strength and modulus of elasticity for Travertine samples using
X6 0.2497 0.0339 regression and artificial neural networks. Mining Science and Technology, 20,
41–46.
X7 0.0001 0.1600
Do, A. Q., & Grudnitski, G. (1992). A neural network approach to residential property
X8 0.0356 0.0173
appraisal. The Real Estate Appraiser, 58, 38–45.
X9 0.0015 0.0465
Dong, M., & Zhou, X. S. (2008). Knowledge discovery in corporate events by neural
X14 0.0012 0.0238 network rule extraction. Applied Intelligence, 29, 129–137.
X17 0.0012 0.1975 Draper, N., & Smith, H. (1981). Applied regression analysis. New York: Wiley.
X22 5.75E10 0.0974 Evans, A., James, H., & Collins, A. (1993). Artificial neural networks: An application
to residential valuation in the UK. Journal of Property Valuation & Investment, 11,
195–204.
Goldberg, D. E. (1989). Genetic algorithms in search, optimization and machine
decomposed into various other predictors having less influence on learning. New York: Addison-Wesley.
Holland, J. H. (1975). Adaptation in natural and artificial systems: An introductory
response.
analysis with applications to biology, control and artificial intelligence. Cambridge:
The MIT Press.
J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379 8379

Hua, C. (1996). Residential construction demand forecasting using economic Oh, K. J., Kim, T. Y., & Min, S. (2005). Using genetic algorithm to support portfolio
indicators: A comparative study of artificial neural networks and multiple optimization for index fund management. Expert Systems with Applications, 28,
regression. Construction Management and Economics, 14, 125–134. 371–379.
Hwarng, H. B. (2001). Insights into neural-network forecasting of time series O’Roarty, B., Patterson, D., McGreal, W. S., & Adair, A. S. (1997). A case based
corresponding to ARMA(p, q) structures. Omega, 29, 273–289. reasoning approach to the selection of comparable evidence for retail rent
Jang, G. S., Lai, F., Jiang, B. W., Pan, C. C., & Chien, L. H. (1993). Intelligent stock determination. Expert Systems with Applications, 12, 417–428.
trading system with price trend prediction and reversal recognition using dual- Peterson, S., & Flanagan, A. B. (2009). Neural network hedonic pricing models in
module neural networks. Applied Intelligence, 3, 225–248. mass real estate appraisal. Journal of Real Estate Research, 31, 148–164.
Jensen, D. (1990). Artificial intelligence in computer-assisted mass appraisal. Pivo, G., & Fisher, J. D. (2010). Income, value and returns in socially responsible
Property Tax Journal, 9, 5–26. office properties. Journal of Real Estate Research, 32, 243–270.
Juan, Y. K., Shin, S. G., & Perng, Y. H. (2006). Decision support for housing Rossini, P. (1997). Artificial neural networks versus multiple regression in the
customization: A hybrid approach using case-based reasoning and genetic valuation of residential property. Australian Land Economics Review, 3, 1–12.
algorithm. Expert Systems with Applications, 31, 83–93. Tay, D., & Ho, D. (1994). Intelligent mass appraisal. Journal of Property Tax
Koza, J. (1993). Genetic programming. Cambridge: The MIT Press. Assessment and Administration, 1, 5–25.
Lee, K., Booth, D., & Alam, P. (2005). A comparison of supervised and unsupervised Versace, M., Bhatt, R., Hinds, O., & Shiffer, M. (2004). Predicting the exchange traded
neural networks in predicting bankruptcy of Korean firms. Expert Systems with fund DIA with a combination of genetic algorithms and neural networks. Expert
Applications, 29, 1–16. Systems with Applications, 27, 417–425.
Lu, C. J. (2010). Integrating independent component analysis-based denoising Walker, E., & Birch, J. B. (1988). Influence measures in ridge regression.
scheme with neural network for stock price prediction. Expert Systems with Technometrics, 30, 221–227.
Applications, 37, 7056–7064. Wilson, I. D., Paris, S. D., Ware, J. A., & Jenkins, D. H. (2002). Residential property
McCluskey, W., & Anand, S. (1999). The application of intelligent hybrid techniques price time series forecasting with neural networks. Knowledge-Based Systems,
of residential properties. Journal of Property Investment & Finance, 17, 218–238. 15, 335–341.
McGreal, S., Berry, J., McParland, C., & Turner, B. (2004). Urban regeneration, Wiltshaw, D. G. (1995). A comment on methodology and valuation. Journal of
property performance and office markets in Dublin. Journal of Property Property Research, 12, 157–161.
Investment & Finance, 22, 162–172. Wong, F., & Tan, C. (1994). Hybrid neural, genetic, and fuzzy systems. In G. J.
Nguyen, N., & Cripps, A. (2001). Predicting housing value: A comparison of multiple Deboeck (Ed.), Trading on the edge (pp. 243–261). New York: Wiley.
regression analysis and artificial neural networks. Journal of Real Estate Research, Worzala, E., Lenk, M., & Silva, A. (1995). An exploration of neural networks
22, 313–336. and its application to real estate valuation. Journal of Real Estate Research,
Oh, K. J., & Han, I. (2000). Using change-point detection to support artificial neural 32, 185–202.
networks for interest rates forecasting. Expert Systems with Applications, 19,
105–115.

Potrebbero piacerti anche