Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
a r t i c l e i n f o a b s t r a c t
Keywords: This study considers real estate appraisal forecasting problem. While there is a great deal of literature
Ridge regression about use of artificial intelligence and multiple linear regression for the problem, there has been always
Genetic algorithm controversy about which one performs better. Noting that this controversy is due to difficulty finding
Real estate market proper predictor variables in real estate appraisal, we propose a modified version of ridge regression,
i.e., ridge regression coupled with genetic algorithm (GA-Ridge). In order to examine the performance
of the proposed method, experimental study is done for Korean real estate market, which verifies that
GA-Ridge is effective in forecasting real estate appraisal. This study addresses two critical issues regard-
ing the use of ridge regression, i.e., when to use it and how to improve it.
Ó 2012 Elsevier Ltd. All rights reserved.
1. Introduction & Cripps, 2001; Worzala, Lenk, & Silva, 1995). In this study, it will
be shown that this confusing episode appears due to difficulty find-
In recent years, interest in performance of real estate markets ing proper predictor variables and could be resolved quite success-
and real estate investment trusts (REITs) has grown up so fast fully by a modified version of ridge regression, i.e., ridge regression
and tremendously as they are usually required for asset valuation, coupled with genetic algorithm (GA-Ridge).
property tax, insurance estimations, sales transactions, and estate Theoretically as well as practically, there has been widespread
planning. Conventionally, sales comparison approach has been strong objection to arbitrary use of ridge regression. The main crit-
widely accepted to forecast residential real estate. The sales com- icisms are twofold. Firstly, though it is well known that ridge
parison grid method, however, is often questioned for relying too regression is effective for the case where the unknown parameters
much on subjective judgments for obtaining reliable and verifiable (or the linear coefficients) are known a priori to have small modu-
data (Wiltshaw, 1995). As a consequence, multiple linear regres- lus values, it is hard to obtain or implement such prior information.
sion (MLR) based on related predictors has been considered as a Secondly, blind use of ridge regression is likely to change any non-
rigorous alternative enhancing predictability of real estate and significant predictor variable into significant one easily. Our study
property value, which immediately faces criticism such as nonlin- addresses these two critical issues and proposes GA-Ridge as a
earity within the data, multicollinearity issues in the predictor measure that takes care of them nicely.
variables and the inclusion of outlier in the sample. As is often The rest of the study is divided as follows. Section 2 discusses
the case with other financial forecasting problems, this criticism background of this article involving difficulty finding proper pre-
has prompted researchers to resort to artificial neural network dictor variables in real estate forecasting. Section 3 is devoted to
(ANN) as another logical alternative (Ahn, Lee, Oh, & Kim, 2009; detailed description of the proposed GA-Ridge and discusses its
Chen & Du, 2009; Dong & Zhou, 2008; Lee, Booth, & Alam, 2005; effectiveness for handling the two critical issues of ridge regres-
Lu, 2010; Oh & Han, 2000; Versace, Bhatt, Hinds, & Shiffer, 2004). sion. In Section 4, GA-Ridge is experimented in the Korean real es-
The follow-up studies observe, however, that either ANN or MLR tate market to demonstrate its effectiveness. Lastly, the concluding
fails to report a dominating performance than the other, i.e., ANN remarks are given in Section 5.
excels MLR in some cases while MLR excels ANN in other cases
(Dehghan, Sattari, Chehreh, & Aliabadi, 2010; Hua, 1996; Nguyen
2. Background
⇑ Corresponding author. Tel.: +82 2 2123 5720; fax: +82 2 364 7807. 2.1. Predictor variable for real estate forecasting
E-mail addresses: redsamzang@yonsei.ac.kr (J.J. Ahn), next1219@nate.com
(H.W. Byun), johanoh@yonsei.ac.kr (K.J. Oh), tykim@kmu.ac.kr (T.Y. Kim). Forecasting of asset pricing is a major issue in real estate
1
Tel.: +82 53 580 5533. practice (Bourassa, Cantoni, & Hoesli, 2010; Chica-Olmo, 2007;
0957-4174/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2012.01.183
8370 J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379
McCluskey & Anand, 1999; O’Roarty et al., 1997; Peterson & Table 1
Flanagan, 2009; Tay & Ho, 1994; Wilson, Paris, Ware, & Jenkins, Training and testing period for moving window scheme.
2002). Property development relies on prediction of expected costs Window number Training period Testing period
and returns (Allen, Madura, & Springer, 2000; Evans, James, & 1 1996.07–2004.12 2005.01–2005.06
Collins, 1993; Juan, Shin, & Perng, 2006; McCluskey & Anand, 2 1997.01–2005.06 2005.07–2005.12
1999; Pivo & Fisher, 2010). Property and facilities managers need 3 1997.07–2005.12 2006.01–2006.06
forecasts of supply and demand as well as of cost and return. Funds 4 1998.01–2006.06 2006.07–2006.12
5 1998.07–2006.12 2007.01–2007.06
and investment managers rely on forecasts of the present and future 6 1999.01–2007.06 2007.07–2007.12
values of real estate in terms of economic activities. In these real 7 1999.07–2007.12 2008.01–2008.06
estate forecasting problems, there has been a hot controversy over 8 2000.01–2008.06 2008.07–2008.12
superiority of ANN over MLR as proper tool since use of ANN for 9 2000.07–2008.12 2009.01–2009.06
10 2000.01–2009.06 2009.07–2009.12
residential valuation was first suggested by Jensen (1990). Rossini
(1997) seeks to assess the application of ANN and MLR to residential
valuation and supports the use of MLR while Do and Grudnitski
(1992) suggests ANN as s superior technique. Worzala et al.
(1995) notices that while ANN slightly outperforms MLR in some
cases, the difference between the two is insignificant. Hua (1996)
and Brooks and Tsolacos (2003) support ANN over MLR with some
cautionary note on predictor variables and McGreal, Berry, McPar-
land, and Turner (2004) expresses skepticism about ANN. Noting
that ANN is designed mainly for the purpose of modeling any func-
tional relationship (or ANN is designed mainly to correct modeling
bias or assumption), ANN is expected to excel MLR when there is
significant modeling bias from linear model, while MLR is expected
to excel ANN otherwise. Thus no clear-cut superiority between ANN
and MLR implicitly suggests that other source of trouble might exist
than incorrect modeling in real estate appraisal forecasting.
One possible source of trouble is difficulty finding significant
and reliable predictors as discussed by several authors. Rossini
(1997) noticed that quantitative predictor variables such as past
sale price, land area, rooms and year of construction tend to suffer
from lack of qualitative measures, while qualitative predictor vari-
ables such as building style and environments are frequently
rather simplistic and fail to capture sufficient information. Similar
observations are made by Brooks and Tsolacos (2003) and McGreal
Fig. 1. Moving window scheme.
et al. (2004). In particular, Brooks and Tsolacos (2003) noticed that
significant predictors depend on the used methodology. These dis-
cussions altogether suggest clearly that finding a proper set of pre-
dictor variables is hard in real estate appraisal and it would be where k is a positive number. In applications, the interesting values
highly desirable to take care of this predictor selection problem of k usually lie in the range of (0, 1). This procedure is called ridge
technically. regression.
It is well known that ridge regression can be regarded as an
2.2. Ridge regression estimation of b from the data subject to prior knowledge that
smaller values in modulus of the b s are more likely than larger
Ridge regression is known as a very useful tool for alleviating values, and that larger and larger values of the b s are more
multicolinearity problem (Walker & Birch, 1988). Its formal formu- and more unlikely. Thus ridge regression is quite useful when
lation is given as one of least squares subject to a specific type of smaller values in modulus of the b s are expected more than lar-
restrictions on the parameters. The standard approach to solve ger values. In this context, one major drawback of ridge regres-
an overdetermined system of linear equations: sion is the ‘‘unchecked arbitrariness’’ when it is implemented in
Y ¼ Xb practice. Indeed the characteristic effect of the ridge regression
procedure is to change any non-significant estimated b to the sig-
is known as linear least squares and seeks to minimize the residual: nificant estimated b and hence it is questionable that much real
kY Xbk2 ; improvement can be really achieved by such a procedure. Refer
to Draper and Smith (1981).
where Y is n 1 vector, X is n p matrix (n P p), b is p 1 vector
and k k is Euclidean norm. However, the matrix X may be ill condi-
tioned or singular yielding a non-unique solution. In order to give 2.3. Genetic algorithm
preference to a particular solution with desirable properties, the
regularization term is included in this minimization: GA is a stochastic search technique that can explore large and
2 complicated spaces on the ideas from natural genetics and evolu-
kY Xbk2 þ k kbk2 :
tionary principle (Goldberg, 1989; Holland, 1975; Oh, Kim, &
This regularization improves the conditioning of the problem, thus Min, 2005). It has been demonstrated to be effective and robust
^ is
enabling a numerical solution. An explicit solution, denoted by b, in searching very large spaces in a wide range of applications
given by (Koza, 1993). GA is particularly suitable for multi-parameter opti-
1 mization problems with an objective function subject to numerous
^ ¼ bðkÞ
b ^ ¼ X0 X þ kI X0 Y; ð1Þ hard and soft constraints. GA performs the search process in four
J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379 8371
Table 2
List of predictor variables.
(a) stages: (i) initialization, (ii) selection, (iii) crossover, and (iv) muta-
tion (Wong & Tan, 1994). In the initialization stage, a population of
genetic structures (known as chromosomes) that are randomly
distributed in the solution space is selected as the starting point
of the search. After the initialization stage, each chromosome is
evaluated using a user-defined fitness function. The goal of the fit-
ness function is to encode numerically the performance of the
chromosome. For real-world applications of optimization methods
such as GA, the choice of the fitness function is the most critical
step. In this paper, GA is employed for finding optimal k and proper
set of predictors simultaneously.
3. GA-Ridge algorithm
(b)
Procedure of GA-Ridge algorithm is described. A general multi-
ple linear regression model is represented as follows:
Y i ¼ D1 b1 X i1 þ þ Dp bp X ip þ ei ; i ¼ 1; 2; . . . ; n; ð2Þ
Table 4
Numerical comparison of four forecasting methods for HSI.
Table 5
p-Values of 6 paired t-tests for four forecasting methods on HIS.
Fig. 4. Performance comparison of ANN and MLR for HRI forecasting during testing period.
8376 J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379
Table 7
Significance test for 22 predictor variables when MLR is used for HRI forecasting.
performances of MLR and ANN. Finally the predictors selected by across the three distance metrics. Again, mean difference tests
GA-Ridge are examined for their significance at each window in Ta- (or paired t-tests) are done for 6 pairs out of the four methods from
ble 6, which shows that almost all the selected predictors are chan- calculation of MASE, MAE and MAPE. Results of the paired tests in
ged to significant ones in GA-Ridge method. Again for editorial Table 9 verify that performances of the four methods are signifi-
purpose results for windows 1, 2 and 10 are given. cantly different from each other except the pair (ANN, MLR). Final-
The above comparison studies altogether indicate the follow- ly the predictors and the ridge value k selected by GA-Ridge are
ings: (i) ANN and MLR equally match. (ii) Pure ridge is improved examined for their significance at each window in Table 10 which
significantly by GA-Ridge. (iii) GA-Ridge excels others easily. Note shows almost all the selected predictors are changed to significant
that (i) recommends the use of GA-Ridge (see Remark 1) while (ii) ones in GA-Ridge method.
implies the checked arbitrariness of pure ridge by GA-Ridge (see
Remark 2).
For forecasting analysis of HRI, almost identical steps are done. 5. Concluding remarks
Fig. 4 shows that neither ANN nor MLR excels its counterpart uni-
formly throughout 10 windows. Significances of the 22 predictor We studied ridge regression as an alternative tool in real estate
variables are tested for HRI in Table 7, which suggests that most forecasting where one usually faces difficulty finding proper pre-
of predictor variables have weak influence on HRI though some dictors. GA-Ridge is proposed here and its performance is exam-
of them show strong significance depending on window. For fore- ined against other forecasting methods. It is shown that GA is
casting performance comparison, the four forecasting methods are not only successful for real estate forecasting but also nicely settles
considered again. Fig. 5 depicts the forecasting result when each critical issues in ridge regression. Experimental results are given
method is employed during the testing periods of the moving win- for justification of GA-Ridge. It is noteworthy from the experimen-
dow scheme. Then Fig. 5 is summarized by Table 8 numerically, tal results that GA-Ridge becomes a perfect solution particularly
which confirms that GA-Ridge is superior to the other methods when a desirable predictor is hard to quantify but might be
J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379 8377
Table 8
Comparison of four forecasting methods for HRI.
Table 9
p-Values of 6 paired t-tests for four forecasting methods on HRI.
Table 10 Acknowledgment
Significance test for the selected variables when GA-Ridge is used for HRI forecasting.
Selecting variables Coefficient p-Value Ridge value T. Y. Kim’s work was supported by Basic Science Research Pro-
(a) Window No. 1
gram through the National Research Foundation of Korea (NRF)
X2 0.0007 0.4126 0.01396 funded by the Ministry of Education, Science and Technology
X3 0.0088 0.0968 (MEST) (KRF-2011-0015936).
X4 0.1116 0.0107
X5 0.1686 0.0001
X7 0.0000 0.2943 References
X12 0.0001 0.0036
X14 0.0019 0.0428 Ahn, J. J., Lee, S. J., Oh, K. J., & Kim, T. Y. (2009). Intelligent forecasting for financial
X16 2.3114 0.0001 time series subject to structural changes. Intelligent Data Analysis, 13, 151–163.
X18 0.0019 0.0009 Ahn, J. J., Oh, K. J., Kim, T. Y., & Kim, D. H. (2011). Usefulness of support vector
X19 0.0000 0.3463 machine to develop an early warning system for financial crisis. Expert Systems
X20 0.0000 0.0005 with Applications, 38, 2966–2973.
Allen, M. T., Madura, J., & Springer, T. M. (2000). REIT characteristics and the
(b) Window No. 2 sensitivity of REIT returns. Journal of Real Estate Finance and Economics, 21,
X1 0.0041 0.0964 0.01531 141–152.
X3 0.01289 0.0001 Bourassa, S. C., Cantoni, E., & Hoesli, M. (2010). Predicting house prices with spatial
X5 0.1242 0.0003 dependence: A comparison of alternative methods. Journal of Real Estate
X7 0.0001 0.0453 Research, 32, 139–159.
X10 0.0228 0.0068 Brooks, C., & Tsolacos, S. (2003). International evidence on the predictability of
X14 0.0032 0.0006 returns to securitized real estate assets: Econometric models versus neural
X16 2.4530 0.0001 networks. Journal of Property Research, 20, 133–155.
Chen, W. S., & Du, Y. K. (2009). Using neural networks and data mining techniques
X18 0.0017 0.0002
for the financial distress prediction model. Expert Systems with Applications, 36,
X19 0.0000 0.1365
4075–4086.
X20 0.0000 0.0001
Chica-Olmo, J. (2007). Prediction of housing location price by a multivariate spatial
(c) Window No. 9 method: Cokriging. Journal of Real Estate Research, 29, 92–114.
X4 0.0673 0.0183 0.0005 Dehghan, S., Sattari, G., Chehreh, C. S., & Aliabadi, M. A. (2010). Prediction of uniaxial
X5 0.1591 0.0001 compressive strength and modulus of elasticity for Travertine samples using
X6 0.2497 0.0339 regression and artificial neural networks. Mining Science and Technology, 20,
41–46.
X7 0.0001 0.1600
Do, A. Q., & Grudnitski, G. (1992). A neural network approach to residential property
X8 0.0356 0.0173
appraisal. The Real Estate Appraiser, 58, 38–45.
X9 0.0015 0.0465
Dong, M., & Zhou, X. S. (2008). Knowledge discovery in corporate events by neural
X14 0.0012 0.0238 network rule extraction. Applied Intelligence, 29, 129–137.
X17 0.0012 0.1975 Draper, N., & Smith, H. (1981). Applied regression analysis. New York: Wiley.
X22 5.75E10 0.0974 Evans, A., James, H., & Collins, A. (1993). Artificial neural networks: An application
to residential valuation in the UK. Journal of Property Valuation & Investment, 11,
195–204.
Goldberg, D. E. (1989). Genetic algorithms in search, optimization and machine
decomposed into various other predictors having less influence on learning. New York: Addison-Wesley.
Holland, J. H. (1975). Adaptation in natural and artificial systems: An introductory
response.
analysis with applications to biology, control and artificial intelligence. Cambridge:
The MIT Press.
J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379 8379
Hua, C. (1996). Residential construction demand forecasting using economic Oh, K. J., Kim, T. Y., & Min, S. (2005). Using genetic algorithm to support portfolio
indicators: A comparative study of artificial neural networks and multiple optimization for index fund management. Expert Systems with Applications, 28,
regression. Construction Management and Economics, 14, 125–134. 371–379.
Hwarng, H. B. (2001). Insights into neural-network forecasting of time series O’Roarty, B., Patterson, D., McGreal, W. S., & Adair, A. S. (1997). A case based
corresponding to ARMA(p, q) structures. Omega, 29, 273–289. reasoning approach to the selection of comparable evidence for retail rent
Jang, G. S., Lai, F., Jiang, B. W., Pan, C. C., & Chien, L. H. (1993). Intelligent stock determination. Expert Systems with Applications, 12, 417–428.
trading system with price trend prediction and reversal recognition using dual- Peterson, S., & Flanagan, A. B. (2009). Neural network hedonic pricing models in
module neural networks. Applied Intelligence, 3, 225–248. mass real estate appraisal. Journal of Real Estate Research, 31, 148–164.
Jensen, D. (1990). Artificial intelligence in computer-assisted mass appraisal. Pivo, G., & Fisher, J. D. (2010). Income, value and returns in socially responsible
Property Tax Journal, 9, 5–26. office properties. Journal of Real Estate Research, 32, 243–270.
Juan, Y. K., Shin, S. G., & Perng, Y. H. (2006). Decision support for housing Rossini, P. (1997). Artificial neural networks versus multiple regression in the
customization: A hybrid approach using case-based reasoning and genetic valuation of residential property. Australian Land Economics Review, 3, 1–12.
algorithm. Expert Systems with Applications, 31, 83–93. Tay, D., & Ho, D. (1994). Intelligent mass appraisal. Journal of Property Tax
Koza, J. (1993). Genetic programming. Cambridge: The MIT Press. Assessment and Administration, 1, 5–25.
Lee, K., Booth, D., & Alam, P. (2005). A comparison of supervised and unsupervised Versace, M., Bhatt, R., Hinds, O., & Shiffer, M. (2004). Predicting the exchange traded
neural networks in predicting bankruptcy of Korean firms. Expert Systems with fund DIA with a combination of genetic algorithms and neural networks. Expert
Applications, 29, 1–16. Systems with Applications, 27, 417–425.
Lu, C. J. (2010). Integrating independent component analysis-based denoising Walker, E., & Birch, J. B. (1988). Influence measures in ridge regression.
scheme with neural network for stock price prediction. Expert Systems with Technometrics, 30, 221–227.
Applications, 37, 7056–7064. Wilson, I. D., Paris, S. D., Ware, J. A., & Jenkins, D. H. (2002). Residential property
McCluskey, W., & Anand, S. (1999). The application of intelligent hybrid techniques price time series forecasting with neural networks. Knowledge-Based Systems,
of residential properties. Journal of Property Investment & Finance, 17, 218–238. 15, 335–341.
McGreal, S., Berry, J., McParland, C., & Turner, B. (2004). Urban regeneration, Wiltshaw, D. G. (1995). A comment on methodology and valuation. Journal of
property performance and office markets in Dublin. Journal of Property Property Research, 12, 157–161.
Investment & Finance, 22, 162–172. Wong, F., & Tan, C. (1994). Hybrid neural, genetic, and fuzzy systems. In G. J.
Nguyen, N., & Cripps, A. (2001). Predicting housing value: A comparison of multiple Deboeck (Ed.), Trading on the edge (pp. 243–261). New York: Wiley.
regression analysis and artificial neural networks. Journal of Real Estate Research, Worzala, E., Lenk, M., & Silva, A. (1995). An exploration of neural networks
22, 313–336. and its application to real estate valuation. Journal of Real Estate Research,
Oh, K. J., & Han, I. (2000). Using change-point detection to support artificial neural 32, 185–202.
networks for interest rates forecasting. Expert Systems with Applications, 19,
105–115.