China

Technological Forecasting & Social Change 130 (2018) 123–134
Contents lists available at ScienceDirect
Technological Forecasting & Social Change

journal homepage: www.elsevier.com/locate/techfore
Big Data analytics for forecasting tourism destination arrivals with the T
applied Vector Autoregression model
⁎
Yuan-Yuan Liua,b, Fang-Mei Tsengc, , Yi-Heng Tsengc
a
School of Management, Guizhou University (The Northern Campus), Huaxi District, 550025 Guiyang City, Guizhou, China
b
Faculty of Economics and Business Administration, Vilnius University, Sauletekio al. 9, II Building, 10222 Vilnius, Lithuania
c
College of Management, Yuan Ze University, Taiwan
A R T I C L E I N F O A B S T R A C T
Keywords: The prediction of tourist numbers is important for Destination Management and Marketing. While most existing
Big Data analytics methods rely on well-structured statistical data, using web search queries of the destination to forecast its tourist
Vector Autoregression model arrivals is a new way to apply Big Data analytics. However, there are no studies exploring correlation of weather,
Granger causality temperatures, weekends and public holidays with tourism destination arrivals and web search queries of the
Destination Management and Marketing
destination, respectively. This study uses the Vector Autoregressive modeling to examine the Granger causality
between actual arrivals of the studied cultural tourism destination and its web search queries, and to explore the
correlation mentioned above. The striking result is that weather has no correlation either with actual arrivals of
the studied cultural tourism destination, or with its web search queries. Meanwhile, unlike previous researchers
who discuss the predictive power of web queries on actual tourism flows, this study emphasizes their reciprocal
predictive powers upon each other. The originality of this study is exemplifying the utilization of Big Data
analytics in the tourism domain with Big Data datasets, data capture techniques, analytical tools, and analysis
results. This study further digs possible reasons for an identified short time lag length (p = 2), to provide insights
for Destination Management and Marketing.
1. Introduction and Zhang et al. (2017) show the development of tourism as a discipline
in China by using meta-analysis. Big Data analysis has also been em-
We are in the era of Big Data (Miller, 2008). Big Data is a term that ployed to forecast tourist flows. Gunter and Önder (2016) test the
primarily describes datasets that are so large, unstructured, and com- ability of 10 traffic indicators of Google Analytics of the Viennese
plex that require advanced and unique technologies to store, manage, Destination Management Organization (DMO) to predict the tourist
analyze, and visualize (Chen et al., 2012). Since the advent of the World arrivals to Vienna by applying big data shrinking methods with VAR
Wide Web, major parts of tourism information processing and trans- modeling. Li et al. (2017) adopt data from search engine query to
actions are handled electronically (Buhalis, 2006), so numerous travel- forecast tourism demand for Beijing. Meanwhile, Big Data analytics is
related electronic traces are left. These electronic traces include a large found in works related to Destination Management and Marketing
variety and a huge number of tourism information, such as, pre-trip (DMM). Fuchs et al. (2014) illustrate how Big Data analytics is applied
planning and information searching, reservation and booking, post-trip at a Swedish mountain tourism destination for its knowledge infra-
experiences sharing and recommendation, as well as photo uploading structure to create and apply knowledge for the destination as a
and other social media interacting activities, and these large, un- learning organization. Marine-Roig and Anton Clavé (2015) highlight
structured, and complex electronic traces become tourism Big Data the application of Big Data analytics in supporting Barcelona as a smart
which need to be captured and analyzed so as to uncover hidden pat- destination by examining its online image on thousands of travel blogs
terns, correlation and other insights in the tourism domain. and online travel reviews (OTRs). Miah et al. (2017) deploy Big Data
Recent researches on Big Data in tourism have sprung up. Utilizing analytics for tourist behavior study. Besides, Big Data techniques in
Big Data analysis, Liu et al. (2017) reveal determinants of hotel cus- tourism research are developed, such as text mining in tourism (Godnov
tomer satisfaction in language groups; Xiang et al. (2015) prove the and Redek, 2016), opinion mining in tourism (Bucur, 2015; Marrese-
inherent relationship between hotel guest experience and satisfaction, Taylor et al., 2014), cross media and multi-agent Big Data collection for
⁎
Corresponding author.
E-mail address: fmtseng@saturn.yzu.edu.tw (F.-M. Tseng).
https://doi.org/10.1016/j.techfore.2018.01.018
Received 27 February 2017; Received in revised form 21 December 2017; Accepted 14 January 2018
Available online 02 February 2018
0040-1625/ © 2018 Elsevier Inc. All rights reserved.
Y.-Y. Liu et al. Technological Forecasting & Social Change 130 (2018) 123–134
tourism perception research (Guan and Du, 2015), and hardware for 2. Literature review
tourism organizations to download huge amounts of data (d'Amore
et al., 2015). 2.1. The impacts of weather and temperatures on tourism
In the area of tourism research, weather and temperatures are
crucial topics, which have led to the widespread attentions and interests Holiday travels are motivated by the climate and weather condi-
of researchers. Travel motivations, satisfactions, tourism demands, and tions at holiday destinations (Becken, 2013; De Freitas, 2003; Gómez
destination images are the most concerned topics in relation to desti- Martín, 2005; Gössling et al., 2012). Crompton (1979) present a ‘push-
nation weather or temperatures. Weather has been identified as both a and-pull’ model where push factors are these factors motivating tourists
motivator and a disincentive factor for travels in the theory of ‘push and traveling away from home while pull factors are those driving them
pull’ (Crompton, 1979). Some researchers also find that climate and toward destinations. Falk (2014) find that warm weather is a pull factor
weather conditions may motivate holiday travels to holiday destina- for tourists to select a destination, and in addition, temperatures,
tions (Becken, 2013; De Freitas, 2003; Gómez Martín, 2005; Falk, 2014; duration of sunshine, as well as precipitation are important factors in
Gössling et al., 2012). Weather may impact profitability of the tourism summer seasons. On the contrary, Thapa (2012) find that in long-term
industry in a way of reducing or increasing customer satisfaction and tourist expectations of poor weather at a destination may restrain their
loyalty for destinations (Becken and Hay, 2012). Bad weather experi- visitations. However, the impact of weather condition considerably
ences more negatively impact tourist satisfaction than good weather differs according to different destinations and types of touristic activ-
experiences positively do so (Coghlan and Prideaux, 2009). Weather ities (Lohmann and Kaim, 1999). Smith (1993) finds that cultural tours
patterns and tourism demands are widely discussed (Becken, 2013; and urban breaks are not that dependent on weather conditions. In-
Falk, 2014; Gössling et al., 2012; Kaján and Saarinen, 2013). Weather is stead, holidays are reliant on natural resources and outdoor activities.
an ‘intangible asset’ of destinations as it is one part of destination Besides, tourist perceptions of weather condition attractiveness vary
images (Baloglu and Mangaloglu, 2001; Echtner and Ritchie, 1991; according to different destinations. Jeuring and Peters (2013) conclude
Gallarza et al., 2002; Pike, 2010; Tasci and Gartner, 2007). Weather that for nature-based tourism the mist in the mountains may limit
and temperatures are components of tourism demands and destination possible visitations while the clouds surrounded valley is more im-
image (Day et al., 2013). In addition, there is also a research about pressive than that with the bright sunny weather condition.
impacts of weekends and weekdays on tourism motivations, for ex- Regression models have been used to estimate the impact of
ample, differences in motivations from weekends and weekdays im- weather on tourism. The results of the first-difference regression models
pacting on Japanese spa tourism (Kamata and Misui, 2015). However, of Falk (2014) show that average sunshine duration and temperatures
there are few research studies on the impacts of weather, temperatures, have significant positive impacts on domestic overnight stays while
and weekends together with public holidays in a single research to average precipitation has significant negative effects. In his research,
explore their impacts on tourism destination arrivals, or on web queries Falk (2014) approve that the positive impact of temperatures on
about the destination. tourism is limited. The relationship between temperature and the
To explore the complex relationships among these variables, we number of visitations is non-linear in the form of an inverted u-shape
introduce Xijiang Thousand Households Miao Village (the Miao Village, curve (Gössling and Hall, 2006; Rossello-Nadal et al., 2011), which
in short) as the studied destination. It is a leading Miao cultural tourism indicates a decline in temperature's effect after a given point (Falk,
destination in Guizhou Province, China. Moreover, it is the biggest and 2014). Besides, there is a 1-year time lag before temperatures and
most complete original Miao Village in China, probably even in the sunshine duration positively impact foreign overnight stays (Falk,
world. It presents tourists with the original Miao lifestyle in a Miao 2014). However, Scott et al. (2007) reveal the impact of monthly
Village locating on mountains, surrounded by rivers and streams. It is a temperatures and precipitation on tourist flows to Waterton Lakes Na-
cultural destination with abundant natural resources and about 95% of tional Park of Canada by using monthly data from 1996 to 2003 and
its tourists are domestic. Thus, the research questions are the following: remarkably find no relationship between visitations in peak summer
(July and August) and weather. Agnew and Palutikof (2006) present a
1: Can web search queries of the Miao Village be able to predict the relationship between domestic tourism demand and weather conditions
actual tourism arrivals to the village, and vice versa? in the UK, and a relationship between outbound tourist flows and
2: What is the time lag length if prediction in (1) is possible? weather conditions. They find that in the UK, domestic tourism is more
3: How will weather, temperatures, weekends and public holidays responsive to variations of weather conditions, and in general, the
impact actual tourism arrivals to the village and the web search tourism industry of the UK benefits from warm and dry conditions. The
queries of the village? time lag in their research is 1-year (or -season) before international
tourism being affected by weather conditions. Taylor and Ortiz (2009)
The main contribution of the study lies in its uniqueness of invol- explain time lag variations that domestic residents are more sponta-
ving weather, temperatures, weekends and public holidays together neous, whereas international tourists need more time to plan their
with tourism destination arrivals and web search queries of the desti- visitations well in advance.
nation in one model to explore their complex relationships.
Furthermore, being different from previous tourism flows forecasting 2.2. Forecasting tourism volumes and Vector Autoregressive (VAR)
papers, this paper emphasizes the reciprocal predictive power between modeling in the forecasts
tourism destination arrivals and web search queries of the destination.
Besides, adopting Big Data analytics for data and analysis needed ex- Different methodologies can be applied to forecast tourism volumes,
emplifies its application in tourism arrivals forecasting and Destination but their performances vary in terms of accuracy. Athanasopoulos et al.
Management and Marketing (DMM). (2011) evaluate the performance of various methods for forecasting
The remainder of the paper is structured as follows. Section 2 tourism flows. In their research, monthly series, quarterly series, and
overviews the recent literature relevant to tourism forecasting mod- annual series are included. Uni- and multivariate time series ap-
eling, providing the rationale for using the chosen methodology. proaches, and econometric models are implemented. Algorithms such
Section 3 presents the data preparation and cleansing procedures with a as Forecast Pro, ARIMA, exponential smoothing is included. Specific
conceptual framework of the destination's Big Data dataset while methods, such as the Theta method and damped trend, are employed.
Section 4 contains a discussion of the VAR (p) modeling methodology. Frameworks, such as static and dynamic regression, autoregressive
Section 5 presents the results, and Section 6 concludes the study with distributed lag models, and time varying parameter models, are in-
implications and a discussion. corporated. They find that pure time-series approaches are more
124
accurate for forecasting tourism data than models with explanatory the second half presents web search queries based forecast modeling
variables while Song and Li (2008) find that no single model could and methods. The forecast performance of each referred forecast
consistently outperform other models in all situations. Gunter and modeling or method is included.
Önder (2015) compare the accuracy of uni- and multivariate models. Drawn from Table 1, the reasons for the choice to use VAR modeling
Error Correction-ADLM, Classical VAR, Bayesian VAR, TVP, ARMA, and for this research are listed below. First, because the VAR modeling can
ETS are used to forecast the international city tourism demand for Paris be used to construct and test mutual causality between variables
from its five most important foreign source markets, and the results (Gunter and Önder, 2016), it may fit the research question to identify
vary according to forecast horizon and source market, which is similar the Granger causality between the destination web search queries and
to the result of Witt and Witt (1995). Li et al. (2017) propose a fra- its actual tourism arrivals. Second, the lagged dependent variable
mework called the Generalized Dynamic Factor Model (GDFM) as a possibly reflecting word-of-mouth effects and consumer habit persis-
composite search index, which is claimed to be more accurate than tence is automatically included in the VAR specification (Song and Witt,
traditional time-series models or models with index created by principal 2006), so the VAR modeling may fit this research in the context of web
component analysis. search queries which reflect either potential tourists' pre-trip informa-
However, with the advent of web search query techniques, the way tion search of the destination or their post-trip behavior sharing travel
researchers acquire data to forecast tourism volumes has changed. experiences or making recommendations through online texts, images,
Instead of adopting data from traditional statistical channels, re- or videos. Finally, the VAR modeling has consistent accurate forecasting
searchers have begun to employ data from web search engines to performance (Athanasopoulos et al., 2011; Gunter and Önder, 2015;
forecast tourism volumes by applying various forecasting models and Song and Witt, 2006). However, as this research includes only a few
frameworks. Google and Baidu, two of the most widely used search variables (e.g., web search queries, tickets, weather, temperature,
engines, have been applied in tourism flows forecasting. Rivera (2016) weekends, and public holidays), limitations, such as, over-para-
adopts search queries from Google Trends to forecast the number of meterization, freedom running out (Song and Witt, 2006), or high di-
hotel nonresident registrations in Puerto Rico with a dynamic linear mensionality (Gunter and Önder, 2016), will not arise in this research,
model. Except for Google Trends, Google Analytics which is usually so further shrinkage methods (Gunter and Önder, 2016) are not needed.
used for web site performance analysis is also employed for forecasting Besides, the limitation of Granger causality has been taken into
tourism arrivals. For example, Gunter and Önder (2016) employ 10 consideration. Granger causality is not necessarily true causality as its
website traffic indicators of Google Analytics from the Viennese Desti- name implies. It is a statistical test for determining whether a time-
nation Management Organization (DMO) website to predict the actual series variable is useful in forecasting another variable but not which
tourist arrivals to Vienna with the VAR model class. In their research, one is to determine or cause. In this research, we intend to test the
Bayesian estimation of the VAR, reduction to a factor-augmented VAR, Granger causality between the actual arrivals (tickets sold) and the web
Bayesian estimation of FAVAR, and the novel Bayesian FAVAR are used search queries to identify their predictive power upon each other.
as big data shrinkage to avoid over-parameterization. Furthermore, the However, there might be potential other sources of probabilistic de-
performances of search engines in forecasting tourism volumes have pendence impacting both variables even though the null hypothesis is
also been discussed. Yang et al. (2015) compare the predictive powers rejected. For example, tourist word-of-mouth may have impacts on both
between Google Trends and Baidu Index in predicting tourism volumes the actual arrivals and the web search queries of the destination, but it
in Hainan, China. The result shows that compared to the corresponding has not been included as a variable.
Autoregression Moving Average (ARMA) models, both search engines
could significantly reduce forecasting errors but Baidu Index outper-
forms because of its bigger market share in China. 2.3. Crawling the data
Forecasting tourism volumes with VAR (p) modeling have been
adopted and proved to be accurate. Song and Witt (2006) review the A web crawler is also known as a web spider or web robot. It is a
recent literature on tourism modeling and forecasting and identify that program that automatically browses the web and assembles web con-
the Vector Autoregressive (VAR) modeling technique is the rationale for tent. A web crawler must compare a topic of the internet with the
accurately predicting tourism flows to Macau. They adopt a general content of collected web pages. There should be proper keywords to
VAR (p) model for Macau to predict its tourism arrivals from its eight describe the chosen topic for the comparison. Appropriate keywords
major origin countries/regions over the period 2003–2008. They pre- should be those frequently used in web pages to describe the chosen
dict that Macau will be faced with an increase in tourism demand from topic (Rungsawang and Angkawattanawit, 2005). A web crawler is
mainland China, which has been confirmed. The study reveals that a given a starting set of web pages as its seed pages to extract outgoing
VAR model can be used to predict accurate medium-to long-term links that appearing in the seed pages and determine what links to visit
forecasts. There are two advantages of applying VAR modeling in next based on certain criteria. A crawler will not stop visiting pages
tourism forecasting. For one thing, the VAR modeling is a theory based until a desired number of pages have been downloaded (Batsakis et al.,
approach, and for the other the modeling procedure is simplified. 2009). The general working principles of a web crawler are crawling,
Moreover, Huang et al. (2017) employ Baidu Index to provide data of indexing, searching, filtering, and the sorting/ranking of information
web search queries to predict tourist volumes to The Forbidden City of (Bhushan and Nath, 2013).
China. In the study, an ADF test is applied to implement unit root There are four types of web crawlers: general purpose web crawlers,
testing to ensure the stationary of the time-series data. The co-in- topical web crawlers, incremental web crawlers, and deep web craw-
tegration test is used to examine and prove that there is a long-term lers. In our research, the topical web crawler is adopted. A topical web
equilibrium relationship between actual arrivals to The Forbidden City crawler, also known as focused crawler, is topic oriented. It collects
and its web search queries. Residuals are used to filter keywords, and specific topic related web pages to crawl. Its performance is more ac-
those with negative values are excluded. A Granger causality test is curate (Ahmadi-Abkenari and Selamat, 2012) because it attempts to
implemented, and the result is that the actual arrivals and the Baidu focus its crawling process on pages relevant to the topic, and it keeps
Index search queries Granger cause each other. The time lag length the overall number of downloaded web pages for processing (Pivk et al.,
(p = 0.9975) is identified based on the Akaike Information Criterion 2007) to a minimum while maximizing the percentage of relevant pages
(AIC) and Schwarz Criterion (SC). (Batsakis et al., 2009).
In Table 1, the referred articles are sorted out in accordance with
their forecast modeling and methods. The first half of the table presents
traditional statistical data based forecast modeling and methods, while
125
Y.-Y. Liu et al.
Table 1
Literature conclusion with different forecasting modeling and method.
Literature author Benchmark Data type Data period Method Forecasting
Suitable Performance
Athanasopoulos et al. Snaïve Traditional Monthly 1. Pure time series Forecast Pro (1) In smaller scale evaluations (2) without Snaïve > pure time series
(2011) statistical data quarterly ARIMA ETS explanatory variables (3) in fully automated series
annual (4) for seasonal data
2. Theta damped trend Monthly data Others > Theta/damped trend
Quarterly data Damped trend > Snaïve > Theta
Annual data Theta > Snaïve
3. Static Dynamic ADLM TVP VAR When allowing for time varying parameters within Challenging in models with exogenous or explanatory
specifications variables
Gunter and Önder Naïve Traditional Monthly EC-ADLM Not suitable for any of the studied tourist source EC-ADLM > Naïve
(2015) statistical data market
VAR (1) For German tourist source market and two or VAR > others
three months ahead (2) for Japanese tourist source
market and short and long-term (3) all studied
source markets and two years ahead
Bayesian VAR Italian and German tourist source markets, and six BVAR > others
months to one year ahead
TVP German tourist source market and two years ahead TVP > others
ARMA (1) UK tourist source market and six months ahead ARMA > others
(2) US tourist source market as well as short and
long-term
126
ETS (1) Japanese tourist source market and one year ETS > others
ahead
Naïve has been significantly outperformed across all source markets and forecast horizons
Song and Witt (2006) No comparison Traditional Quarterly VAR (1) For medium- to long-term (2) theory based (1) The lagged dependent variable reflecting word-of-mouth
statistical data approach allowing for impulse response and effects and consumer habit persistence is automatically
examine responds to ‘shocks’ in the economic included (2) a systematic approach relaxing the exogenous
variables assumption of the explanatory variables (3) likely to be over-
parameterized
Gunter and Önder ARIMA ETS Naïve Web search 1, 2, 3, 6, Novel BFAVAR (Bayesian factor- Shorter horizons (h = 1, 2 months) ARIMA > others
(2016) queries data 12 months augmented VAR with the Longer horizons (h = 3, 6, 12 months) Novel BFAVAR > others
predictive information of Google
Analytics)
Huang et al. (2017) ARIMA Web search Daily Novel ADLM (ADLM with Baidu With specific keywords of Baidu Index Novel ADLM > ARIMA
queries data keywords as explanatory
variables)
Li et al. (2017) ARMA Component Web search Weekly GDFM (Generalized Dynamic (1) High dimensional search engine queries data GDFM > others
Analysis (PCA) queries data Factor Model) (2) short - term (1 week and 4 week)
Rivera (2016) SARIMA Holt- Web search Monthly Novel DLM (dynamic linear model For over 6 months horizon but not at shorter Novel DLM > others
Winter (HW) queries data with Google search query volume) horizon
Snaïve
Yang et al. (2015) ARMA Web search Monthly Time series model with shifted and Baidu search engine has a bigger market share in Time series with Baidu search data > time series with Google
queries data summed up aggregated Baidu China than that of Google search data
search data
Time series model with shifted and
summed up aggregated Google
search data
Technological Forecasting & Social Change 130 (2018) 123–134
Fig. 1. Number of daily tickets sold at the destination and key-

words web search queries of the destination, 2015.
3. Data keywords about the destination. Thus we find six keywords: ‘Travel
guide of Xijiang Thousand Households Miao Village’, ‘Night View of
3.1. Data acquisition Xijiang Thousand Households Miao Village’, ‘Long table feast of Xijiang
Thousand Households Miao Village’, ‘Accommodations in Xijiang
3.1.1. Destination data warehouse Thousand Households Miao Village’, ‘Tickets of Xijiang Thousand
The Destination Management Organization (DMO) of the Miao Households Miao Village’, and ‘Xijiang Miao Village’ (all these key-
Village provides the study with its Big Data Warehouse which contains words are in Chinese language). There are two reasons why we do not
data of ① daily tickets sold in 2015, shown in Fig. 1, ② daily weather of set corresponding English translations to these keywords as clues for
2015 launched by China Meteorological Administration (CMA), shown web search queries. For one thing, according to the village DMO report,
in descriptive data as sunny, cloudy, overcast, light rain, shower, there is only about 5% tourists are inbound foreign tourists. Thus the
thunderstorm, light snow, glazed frost, freeze and heavy snow, ③ daily language used for searching information will mainly be Chinese. For the
average temperatures of 2015 launched by CMA, shown in numbers of other, there are various semantic translations to even one Chinese
degree centigrade, in Fig. 2, ④ weekdays and weekends of 2015 ac- word, so these six keywords in Chinese will derive too many additional
cording to the calendar, and ⑤ public holidays launched by General English translations.
Office of the National Council of China.
3.1.3. Keywords web search queries with a topical web crawler

3.1.2. Keywords selection
Baidu Index provides only weekly search queries in a curve shape,
Tourists put keywords into search engines to get destination in-
still we need exact number of daily web search queries of the selected
formation before their visitations. To reveal the relationship between
keywords. Thus we use a topical web crawler to get the exact numbers
the tourist web search queries and the actual visitations, we firstly need
of web search queries of the keywords. Fig. 1 presents the number of
to identify and select keywords put into the search engine by potential
daily tickets sold at the destination. Thus daily web search queries of
tourists.
the destination in the year of 2015, while Fig. 2 shows daily average
In this study, we perform a function of Baidu Index. Its ‘demand
temperature at the destination.
mapping’ may present the most concerned and most frequent searched
Fig. 2. Daily average temperature at the destination, 2015.
127
Fig. 3. The Big Data dataset for the research.
3.2. The Big Data dataset of the research and unstructured data. The daily tickets are traditional data, and
weather, temperatures, calendar information (weekends and public
The very word “Big” indicates size (Jagadish, 2015). ‘Big Data’ has holidays) is semi-structured data, while web search queries (e.g. texts,
been characterized as a five-dimension data management and analytics images and videos) are unstructured data. ‘Velocity’ and ‘Veracity’ can
paradigm (Varma et al., 2016). Volume represents the vast amount of be reflected by the process of data capture, storage and transformation
data, much bigger amount than the traditional data. Variety stands for while ‘value’ is realized by data analysis to uncover hidden patterns and
various data with heterogeneous format from different sources. Velocity facts.
means data processing at its gathering speed in real or near-real time. Big Data analytics should be a workflow with different components
Veracity is the uncertainty, noise and abnormality in data. Value re- connected in specific data management architecture, and it should be
flects the hidden opportunities and insights revealed by statistical and able to acquire and store data, transform and process data for analysis,
analytical methods (Ang and Seng, 2016; Colombo and Ferrari, 2015; include analytical tools to solve specific problems, and deliver mean-
Fan et al., 2015; Huang et al., 2015; Jin et al., 2015; Miah et al., 2017; ingful information to discover business values and insights in a timely
Power, 2015; Wang et al., 2017). These are simply Big Data 5Vs. manner. In this research, we target to identify the correlation of
However, the term ‘Big Data’ remains ill-defined if we talk of data weather, temperatures, weekends and public holidays with the desti-
volume only (Mohanty, 2015). What matters is not the big amount of nation's tourist arrivals and web search queries of the destination, as
data but extracting hidden information from it, making sense of it and well as the Granger causality between the destination's actual arrivals
exploring its values. Thus we develop our unique Big Data dataset in- and its web search queries. Thus Big Data analytics helps research link
tegrating the data of destination web search queries, daily tickets sold and organize the captured data in a Big Data dataset so that the VAR
at the destination, weather and temperatures at the destination, as well modeling as an analytical tool can provide analysis results in order to
as weekends and public holidays. The linked data become the Big Data answer the targeted questions. And uncovered phenomena urge to dig
dataset of our study, shown in Fig. 3. further for potential reasons and insights for Destination Management
Depicted by Fig. 3, the Big Data dataset is different from a tradi- and Marketing.
tional destination data warehouse which includes only simple in-
formation such as tickets and sales, and 5Vs features of Big Data can be 3.3. Data preparation and cleansing
reflected by our Big Data dataset. The feature ‘volume’ has been em-
bodied in the largely increased daily information of several different In this study, a VAR (p) model is constructed to explore the impacts
components: daily tickets, weather, temperatures, weekends, public of weather, temperatures, weekends and public holidays on actual ar-
holidays, and web search queries. The data of daily tickets, weather, rivals and web search queries of the Village. Each group of data from
temperatures, weekends, and public holidays grow bigger and bigger as Fig. 2 will be included into the model.
time flies as the huge web search queries volume is composed of
countless data (traces) of trip information. The feature ‘variety’ has 3.3.1. The variables
been presented by the traditional structured data, semi-structured data, We set the daily tickets sold at the village as independent variable y1
128
Fig. 4. Daily web search queries of the destination's main po-

tential source cities of visitors, 2015.
and the daily web search queries of the village as independent variable 4.2. VAR (p) modeling
y2. And we set the daily weather as dependent variable x1, daily tem-
perature as dependent variable x2, weekends as dependent variable x3 The VAR model is a system equation in which more than one
and public holidays as dependent x4. x1, x2, x3 and x4 are all exogenous variable is treated as endogenous and the values of the variables are
variables. A practical problem with estimating a VAR (p) model is that regressed against lagged dependent variables in the system (Song and
degrees of freedom quickly run out when more variables are introduced Witt, 2006). The general VAR (p) model is shown below as Eq. (1),
although it is desirable to include as much information as possible where p is the lag length of the VAR, Yt is the independent variable of
(Song and Witt, 2006). am-vector time series, while m is the amount of dependent variables.
Yt−j is lagged independent variable as determinants of the system. K is
the amount of predetermined variables and Xt is a k-vector time series.
3.3.2. Dummy variables
As the dependent variable x2 is numerical data, we set x1, x3, and x4 Yt C p Bp × Yt − j A × Xt
those non numerical data as dummy variables shown in Fig. 4. In x1, 0 =
[m ∗1] [m ∗1]
+ ∑ j=1 [m ∗m][m ∗1]
+
[m ∗k ][k ∗1] (1)
represents good weather and 1 represents poor weather. The weather
conditions of sunny, cloudy and overcast are regard as good weather To construct the VAR (p) model, identifying together with selecting
while conditions of light rain, shower, thunderstorm, light snow, glazed lag length is a key step. On the one hand, greater values of lag length
frost, freeze and heavy snow are considered as poor weather. In x3, 0 will better dynamically reflect features of the model (Song and Witt,
represents weekdays from Monday to Thursday, while 1 represents 2006). On the other hand, greater p values will introduce larger number
weekends from Friday to Sunday because many of the tourists arrive at of parameters into the model because if a VAR model has m equations,
the destination on Friday afternoon or evening. In x4, 0 represents non- there will be m + pm2 coefficients to be estimated and an unrestricted
holiday and 1 represents public holidays. VAR model is likely to be over-parameterized with less degree of
freedom. Thus Akaike Information Criterion (AIC) or Schwarz Criterion
(SC) can be employed to identify and select the proper p value.
3.3.3. Missing data
There are 5 flooding days of 2015, which results in a 5-day tem-
porary shutdown of the destination, so those five days are set as missing
4.3. Granger causality test
data.
The Granger causality can be used to test whether all of the lagged
4. Methodology items of a variable have an impact on the current value of the other
variable(s). If the effect is significant, the variable will have Grainger
4.1. Unit root and co-integration tests causality with the other variable(s), and if the effect is not significant,
there will be no causal relationship between (among) the variables. The
Before modeling, we should examine and satisfy the needs of sta- existence of Granger causality shows the predictive power of the vari-
tionary and long-term equilibrium of the variables to avoid inaccuracy ables upon each other. There are two independent variables in the
of the results. Thus we adopt Augmented Dickey-Fuller (ADF) value of study: daily tickets sold y1 and daily web search queries y2. Thus in
the unit root test to prove the stationary and co-integration test for the order to explore the Granger causality between y1 and y2, hypotheses
long-term equilibrium. Traditional VAR models are designed for sta- should be established as:
tionary variables without time trends (Song and Witt, 2006), and
nonstationary variables cannot pass the unit root test, so there should H0: y2 does not Granger cause y1. Meanwhile, H0: y1 does not
be a de-trend treatment for nonstationary variable. Hodrick Prescott Granger cause y2.
Filter is used for de-trend treatment. In the equation below, zt stands for
a nonstationary variable, and the de-trend treatment is to use the value If the test statistic χ2 is significant, H0 will be rejected, and thus y2
of ztBias for the VAR (p) model. Granger causes y1. In the second hypothesis, if the test statistic χ2 is
significant, H0 will be rejected, and thus y1 Granger causes y2.
z t = ztHP + ztBias
129
Table 2 y1, t = c1 + b1,1,1 y1, t − 1 + b1,1,2 y2, t − 1 + b1,2,1 y1, t − 2 + b1,2,2 y2, t − 2 + a1,1 x1, t
Results of unit root test.
+ a1,2 x2, t + a1,3 x3, t + a1,4 x 4, t + ε1,2 (3)
Variables ADF values 1% critical 5% critical 10% critical Conclusion
value value value y2, t = c2 + b2,1,1 y1, t − 1 + b2,1,2 y2, t − 1 + b2,2,1 y1, t − 2 + b2,2,2 y2, t − 2 + a2,1 x1, t
y1_bias −7.0197 −3.4482 −2.8693 −2.5710 Stationary + a2,2 x2, t + a2,3 x3, t + a2,4 x 4, t + ε1,2 (4)
y2_bias −5.6609 −3.4482 −2.8693 −2.5710 Stationary
x2_bias −7.3528 −3.4482 −2.8693 −2.5710 Stationary The estimation results are shown in Eqs. (3r) and (4r). Numbers
included in the [] are the t-statistic of the estimated coefficients:
5. Results − 330.6728 0.588703 4.194850

y1, t = + × y1, t − 1 + × y2, t − 1
[− 2.24224]∗∗ [ 11.0593]∗∗∗ [ 12.6704]∗∗∗
5.1. The result of unit root test − 0.088829 − 2.236216
+ × y1, t − 2 + × y2, t − 2
[− 1.77800]∗ [− 5.65951]∗∗∗
It has been mentioned that non-stationary variables cannot pass the − 25.29595 25.81741 832.3226
+ × x1, t + × x2, t + × x3, t
unit root test unless they are treated to be de-trend variables. The non- [− 0.14611] [ 1.16602] [ 4.60950]∗∗∗
stationary variables y1, y2, and x2 are treated for de-trend and shown as 427.0785
+ × x 4, t
y1_bias, y2_bias, and x2_bias. Dummy variables do not have to pass unit [ 0.98949] (3r)
root test. In Table 2, the time series of y1_bias, y2_bias, and x2_biasare all
stationary at the significant 1% level, and thus they pass the unit root 27.97765 − 0.011646 1.195313
y2, t = + × y1, t − 1 + × y2, t − 1
test. [ 1.13501] [− 1.30891] [ 21.6004]∗∗∗
In the following part of the study, these de-trend treated variables − 0.022022 − 0.140441
+ × y1, t − 2 + × y2, t − 2
will be shown just as y1, y2, and x2, the subscript ‘bias’ will be omitted. [− 2.63713]∗∗∗ [− 2.12649]∗∗
26.00541 9.414443 − 84.05159
+ × x1, t + × x2, t + × x3, t
[ 0.89869] [ 2.54386]∗∗ [− 2.78492]∗∗∗
5.2. The result of lag rank identification and VAR (p) modeling − 75.44115
+ × x 4, t
[− 1.04572] (4r)
The smallest statistic of both AIC and SC identify that the lag length
should be 2, thus in this study p = 2, shown in Table 3. The joint AIC and SIC of these two equations are 31.44 and 31.63.
As there are two independent variables in this study, the equation Besides, *, **, and *** represent the statistic being significant at dif-
can be interpreted as, ferent probability levels of 10%, 5%, 1%, respectively. The estimates
show that the first order lag of y1 has significant positive correlation
y1, t with the current order lag of y1 at 1% probability level. The second
Yt = ⎡ y ⎤ order lag of y1has significant negative correlation with the current
⎣ 2, t ⎥
⎢ ⎦
order lag of y2 at 1% probability level. The first lag order of y2 has
where y1, t is the daily tickets sold and y2, t is the daily web search significant positive correlation with the first order lag of y1, and it has
queries of the village. Besides, there are four dependent variables in this significant positive correlation with the first order lag of y2 at 1%
study, probability level. The second order lag of y2 has significant negative
correlation with the current order lag of y1 at 1% probability level, and
x it has significant negative correlation with the current order lag of y2 at
⎡ 1, t ⎤
⎢ x2, t ⎥ 5% probability level.
Xt = ⎢
x3, t ⎥ The impacts of x1, x2, x3, x4 ony1 and y2 can also be found. x1 has no
⎢x ⎥
⎣ 4, t ⎦ correlation with either y1 or y2. x2 has significant positive correlation
with y2 at 5% probability level. x3 has significant positive correlation
where x1, t is a dummy variable representing the daily weather at the with y1 and significant negative correlation with y2 at 1% probability
village, x2, t representing the daily temperature at the village, x3, t a level. x4 has significant positive correlation with y1 at 1% probability
dummy variable representing weekends, and x4, t a dummy variable level.
representing public holidays.
The VAR (p) model can be expressed as Eq. (2), or two separate 5.3. The result of co-integration test
regression equations as Eqs. (3) and (4).
The Johansen co-integration test is applied to examine the long-
y b b y b b y
⎡ 1, t ⎤ = ⎡ c1 ⎤ + ⎡ 1,1,1 1,1,2 ⎤ × ⎡ 1, t − 1 ⎤ + ⎡ 1,2,1 1,2,2 ⎤ × ⎡ 1, t − 2 ⎤ term equilibrium of the independent variables. The results are shown in
⎢ y2, t ⎥ ⎣ c2 ⎦ ⎢ b2,1,1 b2,1,2 ⎥ ⎢ y2, t − 1⎥ ⎢ b2,2,1 b2,2,2 ⎥ ⎢ y2, t − 2 ⎥ Table. 4.
⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦
x1, t There is a long-term equilibrium between y1and y2, and the equa-
⎡ ⎤
a1,1 a1,2 a1,3 a1,4 tion is shown as,
+⎡ ⎤ × ⎢ x2, t ⎥ + ⎡ ε1, t ⎤,
⎢ a2,1 a2,2 a2,3 a2,4 ⎥ ⎢ x ⎥ ε2, t
⎣ ⎦ ⎢ 3, t ⎥ ⎣ ⎦ y1 = 3.4114y2
x 4, t (2)
⎣ ⎦
5.4. The result of Granger causality test

Table 3
Selection of lag order.
Previously we have constructed zero hypotheses as H0: y2 does not
Lag AIC SC Granger cause y1, and H0: y1 does not Granger cause y2. The results of
the hypothesis are shown in Table. 5. And the results indicate the
0 33.86263 33.97103
Granger causality between y1 and y2 when the probability level is
1 31.60227 31.75402
2 31.43790* 31.63302* 0.005.
The findings can be referred that 1) y1 Granger causes y2,and vice
* Represents the minimum value of AIC or SC among different Lags. versa; 2) weather has neither correlation with actual arrivals of the
130
Table 4
The co-integration test.
Hypothesized no. of CE(s) Eigenvalue Trace statistic 0.05 critical value Prob.**
None* 0.273084 123.3911 15.49471 0.0001

At most 1* 0.025395 9.208817 3.841466 0.0024
Cointegrating equation(s): Log likelihood 2445.196

Normalized cointegrating coefficients (standard error in parentheses)
y1 y2
1.000000 −3.4114
(0.2631)
Table 5 destinations that have natural resources from Friday through Sunday
Granger causality test. and actively search for destination information, prepare for their visi-
tations, or share travel experiences from Monday through Thursday.
H0 Chi-square statistic P value Conclusion
Thus, this finding's practical contribution lies in destination manage-
y1 does not Granger cause y2 21.0055 0.0000 Rejected ment and the effectiveness of destination e-marketing. DMOs should be
y2 does not Granger cause y1 186.9238 0.0000 Rejected prepared with sufficient services and adequate management for week-
ends, including Friday, and effective e-marketing may occur from
Monday to Thursday.
village nor correlation with web search queries of the village; 3) tem-
The impact of public holidays on actual arrivals reveals challenges
peratures have no correlation with actual arrivals of the village but they
faced in destination management. In this research, it is found that
have significant positive correlation with web search queries of the
public holidays have a significant positive correlation with actual
village; 4) weekends have significant positive correlation with actual
tourist arrivals to the village. It is vital for DMOs to predict tourist ar-
arrivals of the village and significant negative correlation with web
rivals to avoid overcrowding or tourist burst and conflicts, prevent
search queries of the village; 5) public holidays have significant positive
potential safety hazards, or provide sufficient services and proper
correlation with actual arrivals of the village; and 6) there is a 2-day lag
management, especially when they are organizing special events or
length indicating spontaneous domestic visitors (Taylor and Ortiz,
offering free entry during public holidays. Destination overcrowding
2009), tallying with the destination report that about 95% of the
may even challenge public management because of free highway tolls
tourists are domestic.
during public holidays. Attractive events, free destination entry, plus
free highway tolls may worsen the traffic jams on highways. Besides,
6. Implications and discussion the short time lag (2 days) requires DMOs to be very sensitive in pre-
dicting, and flexible in managing and adjusting, their tactics.
The theoretical contribution of this study lies in weather and tem- This study was carried out based on one-year of daily data, so the
peratures impact domestic tourism and travel motivations. Previous data does not show long-term lag length. However, to further explore
research studies reveal the impacts of weather on tourism. Smith (1993) the reasons for such a short 2-day lag length, and to investigate the
finds that cultural tours are not that dependent on weather conditions. destination tourist portfolio in the context of the short lag length, we
Falk (2014) claims that warm weather is a particular pull factor for employ the Baidu Index to perform its user profile function. We iden-
tourists to select certain destinations. Thapa (2012) reveals that tourist tified two important clues about how to proceed with our research.
expectations of poor weather at a destination restrain their travels. One, the Baidu Index lists the top 10 cities whose web search queries
Agnew and Palutikof (2006) prove that the sensitivity of UK tourism is about the destination hold the absolute amount of the total. Two, the
influenced by weather variability. However, in this study, it is found most concerning topics about the destination in Baidu Knows (the in-
that weather does not impact domestic tourist visitations to cultural ternet community of Baidu search engine) are issues related to self-
destinations that have natural resources and that weather does not driving tours of the destination.
impact web search queries about the destination. Because web search According to these two clues, we us web crawlers to identify a new
queries reflect tourist travel intentions, it can be inferred that weather list of the top 10 cities about which internet users search for destination
does not impact tourist travel intention. In addition to Falk's (2014) information and produce the absolute number of web queries. The city
finding about temperatures' significant positive impact on domestic list and their corresponding numbers of daily web queries are presented
tourist overnight stays, our finding reveals temperatures have sig- in Fig. 4, and these 10 cities are the main potential source cities of
nificant and positive impacts on domestic tourist intentions to visit visitors to the destination. Our findings about these 10 cities are dif-
cultural destinations that have natural resources. On the contrary, the ferent from those of Baidu Index, and our findings are more accurate
practical contributions of this finding lie in the effectiveness of elec- than those of Baidu Index because we integrate six keywords about the
tronic destination marketing (e-marketing). Because temperatures sig- destination while the result of Baidu Index only stems from one key-
nificantly and positively impact web search queries, an increase in web word. However, we compared our city list with that on Baidu Index and
search queries indicates an increase in temperature. Thus, the higher found that there was only a slight difference between our list and that
the temperature is, the more active tourists will engage in internet on Baidu Index because the final city of Badu Index is Kunming.
sharing or searching for information about the destination. Effective Fig. 5 shows cities' web query proportions. The biggest proportion of
destination e-marketing may occur when temperatures are high and web queries about the destination are made from Guiyang City, which
tourists are active on the internet. is about 200 km away from the destination, and Guiyang is the capital
The impacts of weekends uniquely show that tourist travel pre- city of Guizhou Province, in which the destination is located. The re-
ferences and their destination based search behavioral preferences on maining nine cities are Beijing, Chengdu, Shanghai, Guangzhou,
the internet. In this study, we set Monday through Thursday as week- Chongqing, Shenzhen, Nanning, Wuhan, and Hangzhou. Moreover,
days and Friday through Sunday as weekends. Weekends are sig- these 9 cities have two things in common. On the one hand, Chengdu,
nificantly positively correlated with actual arrivals to the village and Chongqing, Nanning, Wuhan, and Guiyang are neighboring cities of the
significantly negatively correlated to web search queries about the destination. This explains why the most concerned topic on Baidu
village. It can be referred that tourists prefer visiting cultural
131
Fig. 5. The destination's main potential source cities and their market shares, 2015.
Knows is self-driving tours to the destination. Thus, some potential neighboring cities and the distant but advertisement-targeted cities.
tourists are self-driving tourists from neighboring cities, which can The proportion of web queries made from neighboring cities is slightly
further explain the spontaneity of the tourists who visit the destination higher than that of the distant advertisement-targeted cities. Although
and the short time lag length that tourists check information about the the annual web search queries data cannot fully explain the reasons for
route, the destination, and so on. Two days before their departure. On the short 2-day lag length, the spontaneity of self-driving tourist from
the other hand, the other cities, Beijing, Shanghai, Guangzhou, neighboring cities maybe one of the main reasons. Moreover, there
Shenzhen, and Hangzhou, are very typical developed cities in China, so might be research limitations because the list only covers 10 cities.
the DMO launches commercial advertisements in these cities to draw Further research can be carried out for more detailed explanations of
the attention of potential tourists. Besides, the DMO provides special the short time lag length and the potential tourist source cities.
offers to tourists from Hangzhou because of theG20 conference in Different from previous research studies, this study uses Baidu Index
Hangzhou, and the DMO intends to draw attention to this city. We can to get clues for deeper information crawling while Huang et al. (2017)
also infer that the advertisement is working because web search queries directly import web queries data from Baidu Index. The main reason we
partly come from potential tourists' information searching behaviors. used a web crawler to get web query data is that Baidu Index does not
Fig. 6 depicts the proportions of web queries made from the provide detailed web query data anymore. Instead, it only provides
Fig. 6. The destination's potential visitors from neighboring cities and distant cities, 2015
132
curves to show the trends of web queries. This phenomenon suggests technique (the use of a web crawler), an analytical tool (VAR mod-
the importance of the destination Big Data warehouse's ability to cap- eling), analysis results (findings of correlation and Granger causality),
ture internet-based data. On the contrary, we use Baidu Index but not and further digging for potential reasons for the phenomenon of the
Google Trends because 95% of the destination's tourists are domestic, identified two-day time lag length. The research is developed based on
and the Baidu search engine has a bigger market share in China and data from the studied DMO's Big Data warehouse. We further develop a
Google Trends does (Yang et al., 2015). However, a study of Google VAR model as an analytical tool with which to identify the correlations
search data in forecasting can be as important as that of Baidu for in the data in a static manner. However, Big Data analytics should be
destinations that have more inbound tourists. One more comparison more than a Big Data warehouse. It should be based on information
between the findings of Huang et al. (2017) and this study was con- technology with architecture of layers of data, a workflow of data
ducted regarding time lag length. Studying the Forbidden City using transformation within the business intelligence environment of the
ADLM modeling, including Baidu keyword search queries as ex- destination data warehouse. It should be able to encompass various
planatory variables, Huang et al. (2017) conclude a 1–2 day time lag analytical techniques and tools. These techniques and tools should be
length. However, the Forbidden City, as one of the most famous and constructed to have sophisticated functionalities to facilitate tourism
interesting cultural heritage sites in China, attracts numerous inbound destination information integration and provide DMOs with insights
tourists every day while very few tourists of Miao Village are inbound about how to meet tourist needs and future market trends. It should be
tourists. Still, the time lag length of Huang et al. (2017) is almost as able to take advantages of cloud computing to support real-time ana-
short as that of our study. Although we apply different forecasting lytical capabilities. The main trend of future tourism Big Data analytics
methods than Huang et al. (2017), both studies are based on Baidu should be embodied in architecture with data organization and func-
search engine and the daily tourist arrivals to each destination. The tionalities to integrate data-capturing and data analysis with specific
reason for the phenomenon may be that search engines might reflect goals, techniques, and tools. A shift to a more unstructured data cap-
significant differences in the trip planning behaviors of different tour- tured by increasing use of sensors and remote monitors supporting
ists with different search languages and search engine preferences. destination services are to be called for, and destination big data vi-
This study uniquely reflects the nature of Granger causality by sualization is required, and we suggest that DMOs should adopt busi-
emphasizing the reciprocity between tourism destination arrivals and ness intelligence and develop Big Data analytics.
web search queries of the destination, which extends the domain of the Thus, in the context of tourism Big Data analytics, the contributions
current knowledge of tourism flow forecasting. The predictive power of of this research are the following: We exemplify what tourism Big Data
actual arrivals on web search queries may reflect that tourists searching (TBD) can be by linking traditional structured data (ticket information)
for destination information before their visitations and their willingness with unstructured data (web queries), and we demonstrate tourism Big
to share their experiences and recommendations through web pages Data analytics (TBDA) as an architecture connecting information
after their visits. It can reflect the influences and effects of electronic technology for capturing data (web crawlers) and analyzing data with a
word-of-mouth (eWOM). It can also reflect tourist travel intentions and specific tool (VAR modeling) and how TBDA can serve DMOs by
questions about the destination. matching the needs with specific information technology using the
This study exemplifies Big Data analytics in Destination correct analytical tool. Although we did not focus on data storage, this
Management and Marketing (DMM), especially destination e-mar- research shows the systematic philosophy of Big Data analytics in
keting, which reveals that information is one of the destination assets, tourism domain.
and destination data is helpful for making sense of the destination
context, transforming and enhancing the values of destination services, Acknowledgments
and strengthening destination competitiveness. However, the key
words election in this study relied on the Demand Mapping function of We would like to extend our gratitude to the Destination
Baidu Index, which may cause limitations due to the ignorance of some Management Organization (DMO) of Xijiang Thousand Households
other keywords. Although many works vex the accuracy of forecasting Miao Village for their supports to the study. We acknowledge Heiko A.
models and the performances of different web search engines' fore- von der Gracht (Associate Editor), and two anonymous reviewers for
casting functions, there are few works that focus on the predictive their valuable suggestion and comments.
power of web queries. Future research may be developed to investigate
the relationship between changes in numbers of tourist arrivals and References
changes in web queries.
To conclude this study, we present factors that may influence Agnew, M.D., Palutikof, J.P., 2006. Impacts of short-term climate variability in the UK on
tourism arrivals and domestic tourists' intentions to visit cultural des- demand for domestic and international tourism. Clim. Res. 31, 109–120.
Ahmadi-Abkenari, F., Selamat, A., 2012. An architecture for a focused trend parallel Web
tinations that have natural resources. First, weather does not impact crawler with the application of clickstream analysis. Inf. Sci. 184, 266–281.
tourist arrivals to cultural destinations that have natural resources, and d'Amore, M., Baggio, R., Valdani, E., 2015. A Practical Approach to Big Data in Tourism:
it does not impact domestic tourists' intentions to visit the destination. A Low Cost Rasberry Pi Cluster. Springer International.
Ang, L.-M., Seng, K.P., 2016. Big sensor data applications in urban environments. Big Data
These findings are different from those of previous studies that reveal Res. 4, 1–12.
that weather does impact tourism demand (Agnew and Palutikof, 2006; Athanasopoulos, G., Hyndman, R.J., Song, H., Wu, D.C., 2011. The tourism forecasting
Falk, 2014; Thapa, 2012), but it reinforces the finding that there is no competition. Int. J. Forecast. 27, 822–844.
Baloglu, S., Mangaloglu, M., 2001. Tourism destination images of Turkey, Egypt, Greece,
relationship between weather and tourism arrivals at a Canadian na-
and Italy as perceived by US-based tour operators and travel agents. Tour. Manag.
tional natural park in peak summer (Scott et al., 2007). Second, tem- 22, 1–9.
peratures do not impact the numbers of tourist arrivals to cultural Batsakis, S., Petrakis, E.G.M., Milios, E., 2009. Improving the performance of focused web
crawlers. Data Knowl. Eng. 68, 1001–1013.
destinations that have natural resources, but temperatures have sig-
Becken, S., 2013. A review of tourism and climate change as an evolving knowledge
nificant impacts on domestic tourist intentions to visit such destina- domain. Tour. Manag. Perspect. 6, 53–62.
tions. Finally, both weekends and public holidays significantly and Becken, S., Hay, J., 2012. Climate Change and Tourism: From Policy to Practice.
positively impact the numbers of domestic tourist arrivals to cultural Routledge, United Kingdom (280 pp.).
Bhushan, R., Nath, R., 2013. Web crawler–a review. Int. J. Adv. Res. Comput. Sci. Softw.
destination that have natural resources while weekends significantly Eng. 8 (3), 54–57.
and negatively impact domestic tourists' intentions to visit such desti- Bucur, C., 2015. Using opinion mining techniques in tourism. Procedia Econ. Financ. 23.
nations. Buhalis, D., 2006. The impact of ICT on tourism competition. In: Paptheodorou, A. (Ed.),
Corporate Rivalry and Market Power: Competition Issues in the Tourism Industry. IB
The originality of this study is exemplified through the utilization of Tauris, London, pp. 143–171.
Big Data analytics in the tourism domain with a data capturing
133
Chen, H.C., Chiang, R.H.L., Storey, V.C., 2012. Business intelligence and analytics: from Scott, D., Jones, B., Konopek, J., 2007. Implications of climate and environmental change
Big Data to big impact. MIS Q. 36 (4), 1165–1188. for nature-based tourism in the Canadian Rocky Mountains: a case study of Waterton
Coghlan, A., Prideaux, B., 2009. Welcome to the Wet Tropics: the importance of weather Lakes National Park. Tour. Manag. 28 (2), 570–579.
in reef tourism resilience. Curr. Issue Tour. 12 (2), 89–104. Smith, K., 1993. The influence of weather and climate on recreation and tourism. Weather
Colombo, P., Ferrari, E., 2015. Privacy aware access control for big data: a research 48, 398–404.
roadmap. Big Data Res. 2. Song, H., Li, G., 2008. Tourism demand modelling and forecasting—a review of recent
Crompton, J.L., 1979. Motivations for pleasure vacation. Ann. Tour. Res. 6, 408–424. research. Tour. Manag. 29, 203–220.
Day, J., Chin, N., Sydnor, S., Cherkauer, K., 2013. Weather, climate, and tourism per- Song, H., Witt, S.F., 2006. Forecasting international tourist flows to Macau. Tour. Manag.
formance: a quantitative analysis. Tour. Manag. Perspect. 5, 51–56. 27, 214–224.
De Freitas, C.R., 2003. Tourism climatology: evaluating environmental information for Tasci, A.D.A., Gartner, W.C., 2007. Destination image and its functional relationships. J.
decision making and business planning in the recreation and tourism sector. Int. J. Travel Res. 45 (4), 413–425.
Biometeorol. 48, 45–54. Taylor, T., Ortiz, R.A., 2009. Impacts of climate change on domestic tourism in the UK: a
Echtner, C., Ritchie, J., 1991. The meaning and measurement of destination image. J. panel data estimation. Tour. Econ. 15 (4), 803–812.
Tour. Stud. 2 (2), 2–12. Thapa, B., 2012. Why did they not visit? Examining structural constraints to visit Kafue
Falk, M., 2014. Impact of weather conditions on tourism demand in the peak summer National Park, Zambia. J. Ecotour. 11 (1), 74–83.
season over the last 50 years. Tour. Manag. Perspect. 9, 24–35. Varma, P.C.V., Chakravarthy, K.V.K., Kumari, V.V., Raju, S.V., 2016. Analysis of a net-
Fan, S., Lau, R.Y.K., Zhao, J.L., 2015. Demystifying big data analytics for business in- work IO Bottleneck in big data environments based on Docker Containers. Big Data
telligence through the lens of marketing mix. Big Data Res. 2. Res. 3, 24–28.
Fuchs, M., Höpken, W., Lexhagen, M., 2014. Big Data analytics for knowledge generation Wang, H., Xu, Z., Pedrycz, W., 2017. An overview on the roles of fuzzy set techniques in
in tourism destinations - a case from Sweden. J. Destination Mark. Manag. 3 (4). big data processing: trends, challenges and opportunities. Knowl.-Based Syst. 118.
Gallarza, M., Saura, I.G., Garcia, H.C., 2002. Destination image: towards a conceptual Witt, S.F., Witt, C.A., 1995. Forecasting tourism demand: a review of empirical research.
framework. Ann. Tour. Res. 29 (1), 56–78. Int. J. Forecast. 11, 447–475.
Godnov, U., Redek, T., 2016. Application of text mining in tourism: case of Croatia. Ann. Xiang, Z., Schwartz, Z., Gerdes Jr., J.H., Uysal, M., 2015. What can big data and text
Tour. Res. 58. analytics tell us about hotel guest experience and satisfaction? Int. J. Hosp.
Gómez Martín, M.B., 2005. Weather, climate and tourism: a geographical perspective. Manag. 44.
Ann. Tour. Res. 32 (3), 571–591. Yang, X., Pan, B., Evans, J.A., Lv, B., 2015. Forecasting Chinese tourist volume with
Gössling, S., Hall, C.M., 2006. Uncertainties in predicting tourist flows under scenarios of search engine data. Tour. Manag. 46, 386–397.
climate change. Clim. Chang. 79 (3–4), 163–173. Zhang, L., Lan, C., Qi, F., Wu, P., 2017. Development pattern, classification and evalua-
Gössling, S., Scott, D., Hall, C.M., Ceron, J.-P., Dubois, G., 2012. Consumer behaviour and tion of the tourism academic community in China in the last ten years: from the
demand response of tourists to climate change. Ann. Tour. Res. 39, 36–58. perspective of big data of articles of tourism academic journals. Tour. Manag. 58.
Guan, D., Du, J., 2015. Cross-media Big Data: A Tourism Perception Research Based on
Multi-agent. Springer-Verlag.
Gunter, U., Önder, I., 2015. Forecasting international city tourism demand for Paris: Yuan-Yuan Liu is a Ph.D. candidate in Management at
accuracy of uni- and multivariate models employing monthly data. Tour. Manag. 46, Faculty of Economics and Business Administration, Vilnius
123–135. University, Lithuania and a senior lecturer at School of
Gunter, U., Önder, I., 2016. Forecasting city arrivals with Google Analytics. Ann. Tour. Management, Guizhou University, China. She received her
Res. 61, 199–212. Master of Engineering, and Master of Business
Huang, T., Lan, L., Fang, X., An, P., Min, J., Wang, F., 2015. Promises and challenges of Administration from Gävle University, Sweden. Her re-
big data computing in health sciences. Big Data Res. 2, 2–11. search interests can be categorized into three major areas:
Huang, X., Zhang, L., Ding, Y., 2017. The Baidu Index: uses in predicting tourism flows–a 1) Big Data analytics in tourism; 2) Behavioral science in
case study of the Forbidden City. Tour. Manag. 58, 301–306. tourism; and 3) Tourism products innovation.
Jagadish, H.V., 2015. Big data and science: myths and reality. Big Data Res. 2.
Jeuring, J.H.G., Peters, K.B.M., 2013. The influence of the weather on tourist experiences:
analysing travel blog narratives. J. Vacat. Mark. 19 (3), 209–219.
Jin, X., Wah, B.W., Cheng, X., Wang, Y., 2015. Significance and challenges of big data
research. Big Data Res. 2.
Kaján, E., Saarinen, J., 2013. Tourism, climate change and adaptation: a review. In:
Current Issues in Tourism. 3 Routledge. Fang-Mei Tseng received her Ph.D. in management of
Kamata, H., Misui, Y., 2015. The difference of Japanese spa tourists motivation in technology from National Chiao Tung University in Taiwan.
weekends and weekdays. Procedia. Soc. Behav. Sci. 175, 210–218. Currently, she is a professor at College of Management,
Li, X., Pan, B., Law, R., Huang, X., 2017. Forecasting tourism demand with composite Yuan Ze University. Her research interest includes data
search index. Tour. Manag. 59, 57–66. analysis, technology forecasting and technology assess-
Liu, Y., Teichert, T., Rossi, M., Li, H., Hu, F., 2017. Big Data for big insights: investigating ment, new product sales forecasting, and new product/
language-specific drivers of hotel satisfaction with 412,784 user-generated reviews. service development and performance evaluation.
Tour. Manag. 59.
Lohmann, M., Kaim, E., 1999. Weather and holiday destination preferences image, atti-
tude and experience. Tour. Rev. 54 (2), 54–64.
Marine-Roig, E., Anton Clavé, S., 2015. Tourism analytics with massive user-generated
content: a case study of Barcelona. J. Destination Mark. Manag. 4.
Marrese-Taylor, E., Velásquez, J.D., Bravo-Marquez, F., 2014. A novel deterministic ap-
proach for aspect-based opinion mining in tourism products reviews. Expert Syst.
Appl. 41.
Miah, S.J., Vu, H.Q., Gammack, J., McGrath, M., 2017. A big data analytics method for
Yi-Heng Tseng is an associate professor at College of
tourist behaviour analysis. Inf. Manag. 54 (6), 771–785.
Management, Yuan Ze University, Taiwan. He received his
Miller, E., 2008. Community cleverness required. Nature 455, 1.
PhD of Economics from Department of Economics, National
Mohanty, H., 2015. Big data: an introduction. In: Mohanty, H. (Ed.), Studies in Big Data.
Taiwan University. His research interests cover Applied
11 Springer.
Macroeconomics, Financial Market, and Investor's
Pike, S., 2010. Destination branding case study: tracking brand equity for an emerging
Behavior. In recent years, his results of research were
destination between 2003 and 2007. J. Hosp. Tour. Res. 34 (1), 124–139.
published in the Journal of Financial Studies, Economic
Pivk, A., Cimiano, P., Sure, Y., Gams, M., Rajkovic, V., Studer, R., 2007. Transforming
Modelling, Pacific-Basin Finance Journal, Research in
arbitrary tables into logical form with TARTAR. Data Knowl. Eng. 60 (3), 567–595.
International Business and Finance, and International Journal
Power, D.J., 2015. ‘Big Data’ Decision Making Use Cases. Springer International
of Tourism Research.
Publishing.
Rivera, R., 2016. A dynamic linear model to forecast hotel registrations in Puerto Rico
using Google Trends data. Tour. Manag. 57, 12–20.
Rossello-Nadal, J., Riera-Font, A., Cardenas, V., 2011. The impact of weather variability
on British outbound flows. Clim. Chang. 105, 281–292.
Rungsawang, A., Angkawattanawit, N., 2005. Learnable topic-specific web crawler.
Comput. Appl. 28, 97–114.
134

China

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

China

Caricato da

Copyright:

Formati disponibili

Technological Forecasting & Social Change 130 (2018) 123–134

Contents lists available at ScienceDirect

Technological Forecasting & Social Change

Literature author Benchmark Data type Data period Method Forecasting

Fig. 1. Number of daily tickets sold at the destination and key-

3.1.3. Keywords web search queries with a topical web crawler

Fig. 2. Daily average temperature at the destination, 2015.

Fig. 3. The Big Data dataset for the research.

Fig. 4. Daily web search queries of the destination's main po-

5. Results − 330.6728 0.588703 4.194850

5.4. The result of Granger causality test

None* 0.273084 123.3911 15.49471 0.0001

Cointegrating equation(s): Log likelihood 2445.196

Potrebbero piacerti anche