Sei sulla pagina 1di 12

Tourism Management 63 (2017) 54e65

Contents lists available at ScienceDirect

Tourism Management
journal homepage: www.elsevier.com/locate/tourman

Sochi 2014 Olympics on Twitter: Perspectives of hosts and guests


Andrei P. Kirilenko*, Svetlana O. Stepchenkova
The Department of Tourism, Recreation and Sport Management, University of Florida, P.O. Box 118208, Gainesville, FL 32611-8208, United States

h i g h l i g h t s g r a p h i c a l a b s t r a c t

! Comparative study of Twitter


discourse on Sochi 2014 Olympics in
Russian and English.
! Text mining of 400,000 Twitter
messages collected over 6-month
period.
! Analysis of change in main topics of
discussion over time.
! Differences in pre- and post-Games
sentiment towards issues surround-
ing the Olympics.

a r t i c l e i n f o a b s t r a c t

Article history: Mega sports events create multiple benefits for the host country but can also bring into focus the political
Received 17 May 2016 and social problems. This study provides a comprehensive description of the public discourse about
Received in revised form Sochi 2014 Winter Olympics on Twitter in two languages, Russian and English. The former represents the
2 April 2017
perspective of hosts and the latter e that of the guests. The study traces the temporal dynamics of the
Accepted 2 June 2017
most salient issues and conducts sentiment analysis of public attitudes. It also examines whether sen-
timents toward the Games changed as the event unfolded, that is, whether the event succeeded in
creating a more positive image of the Games. It was found that while the positive attitudes expressed in
Keywords:
Mega sports event
the tweets about the Sochi Olympics improved throughout the course of the Games, this improvement
Olympic Games was practically significant only for the hosts' segment of the sample, with much smaller improvement in
Sentiment analysis the guests’ segment.
Sochi © 2017 Elsevier Ltd. All rights reserved.
Social networks
Twitter

1. Introduction within the country and inspire volunteering movements (Getz,


1997; Kim & Petrick, 2005). They also expand opportunities for
Mega sports events stimulate economic growth, leave infra- local communities and domestic tourists to support their national
structure legacies, and create opportunities for image promotion team, attend competitions in their favorite sports, meet foreign
(Matos, 2006). They attract worldwide visibility and publicity for visitors, enjoy the overall festive atmosphere and, directly and
the host country, provide a nucleus for a positively framed public indirectly, affect quality of life (Kaplanidou et al., 2013).
discourse, and, as such, have the potential to improve attitudes In 2012, Russia entered the list of the world's top 10 tourism
toward the host country and its people. For residents, successful destinations for the first time with 26 million international arrivals.
mega sports events facilitate feelings of unity and national pride It remained the ninth most-visited country in 2012, 2013, and 2014
(United Nations World Tourism Organization, 2013, 2014, 2015)
before a downward trend set in 2015. In preparations for the Sochi
* Corresponding author. 2014 Winter Olympics, organizers focused on modernizing the
E-mail address: andrei.kirilenko@ufl.edu (A.P. Kirilenko).

http://dx.doi.org/10.1016/j.tourman.2017.06.007
0261-5177/© 2017 Elsevier Ltd. All rights reserved.
A.P. Kirilenko, S.O. Stepchenkova / Tourism Management 63 (2017) 54e65 55

telecommunications, electric power, and transportation in- one of the most egalitarian social media outlets with 297 million
frastructures in the region, developments that would facilitate users worldwide (data for 2013, 2nd quarter as cited in Zeng &
future tourism growth and attract other major cultural and sporting Gerritsen, 2014). Content analysis, which provides a systematic
events to Sochi (e.g., Formula 1 Grand Prix series and matches of reduction of content flow, is easily facilitated with tweets, as these
the 2018 FIFA World Cup). Other benefits that Russia sought short messages (140 characters at most) have already undergone
included an improved image and reputation, both internationally the process of data reduction and incorporated conventions and
and domestically. Russia promoted the Olympics through social shortcuts in the form of hash tags (Kirilenko & Stepchenkova,
networks such as Facebook, YouTube, Twitter, Вконтакте (V kon- 2014).
takte, meaning “in contact”) and Flickr, with several public relations Discussions on Twitter provide vast amounts of data about
campaigns starting as early as 2005 (Losevskaya, 2013). For various topics of social importance. A 2013 meta-analysis of 575
example, an official Twitter project oriented primarily toward in- peer-reviewed publications that used Twitter data identified ge-
ternational users had 36,370 followers and accumulated approxi- ography, marketing, natural disaster management, linguistics and
mately 4000 tweets. The official page for Russian-language users politics as the primary topical domains for the studies (Williams,
(http://twitter.com/sochi2014_ru) had 109,000 readers and content Terras, & Warwick, 2013). Twitter as a source of data can be use-
consisting of approximately 3000 messages (Losevskaya, 2013). ful for tourism research as well, especially in the areas of destina-
Domestic and international policies of the host country can tion marketing, imaging and branding, crisis management and risk
negatively affect destination perceptions because these policies are analysis. A general discussion of social media and user-generated
the subject of representations, discussions and interpretations in content in tourism is available from, for example, Leung, Law, van
both general and social media. Therefore, one of the problems Hoof, and Buhalis (2013), Lu and Stepchenkova (2015) and Zeng
discussed during the preparation stage was the cost of the Games, and Gerritsen (2014). Lu and Stepchenkova registered the arrival
which was estimated at USD 12 billion in the original Olympic bid of tourism and hospitality studies that utilized the Big Data
(Müller, 2011). The final figure, however, was USD 51 billion (USD approach; however, all six studies in their review identified
520 million per event), surpassing the cost of the 2008 Summer mobility patterns of tourists using the geolocational capabilities of
Olympics in Beijing (USD 44 billion total; USD 146 million per mobile devices. More Big Data studies appeared recently on geo-
event) as the most expensive Olympics in history. The total cost of spatial analytics in tourism using database accommodations data
the preceding Winter Olympics in Vancouver, Canada, was USD 6.4 (Supak, Brothers, Bohnenstiehl, & Devine, 2015) and geotagged
billion (USD 74 million per event). The cost of the Olympic Games photos from Flickr (Onder, Koerbitz, & Hubmann-Haidvogel, 2014;
themselves was estimated to be USD 6.5 billion, and the remaining Vu, Li, Law, & Ye, 2015) and Panoramio (Garcia-Palomares et al.,
costs were due to Sochi infrastructural projects exacerbated by the 2015). There have also been a few publications predicting hotel
climate in the Sochi area, which is sub-tropical (Oliphant, 2013) demand using web-traffic data (Xiang, Schwartz, Gerdes, & Uysal,
(e.g., the cost of a 48-km mountain road between the Olympic 2015; Yang, Pan, & Song, 2014), studies on destination image and
venues alone was estimated to cost $7.8 billion: Arnold & Foxall, marketing using Facebook posts (Mariani, Di Felice, & Mura, 2016),
2014). travel blogs (Marine-Roig & Clave, 2015) and social networks other
Besides cost, other potentially damaging themes to the image of than Twitter (Koltringer & Dickinger, 2015; Shao, Li, Morrison, &
the Sochi Olympics and Russia in general were discussed in the Wu, 2016), as well as satisfaction studies that used accommoda-
media as early as several years prior to the Games. These problems tions data from reservation websites (Radojevic, Stanisic, & Stanic,
included issues associated with the terrorist activity in the region, 2015).
security required to suppress said terrorist activity, and corruption However, a search in the Science Direct database of social sci-
(Taylor, 2014). With regard to the scale of the last factor, Russia ences and sports and recreation academic journals with the key-
ranked 127th out of 175 countries on the Transparency Interna- words “Twitter AND tourism” and “Twitter AND Olympic” in
tional Corruption Perception Index with a score of 28 out of 100 in journal paper titles, abstracts and keywords returned only three
2013, where 100 indicated “no corruption” (Transparency articles (as of May 9, 2016, in Public Relations Review, Sport Man-
International, 2013). Releasing the “high profile” dissidents agement Review and Annals of Tourism Research). There are also
Mikhail Khodorkovsky, the former owner of the Yukos oil company, publications in journals not indexed by Science Direct (e.g.,
and the members of the Pussy Riot punk rock group Nadezhda Hambrick and Pegoraro (2014) reviewed tweets under three hash
Tolokonnikova and Maria Alyokhina prior to the Games may be tags used during the Sochi Olympics on Twitter (#WeAreWinter,
considered an attempt by the Russian government to control the #SochiProblem and #CheersToSochi)). To the best of our knowl-
narrative about the Sochi Olympics and Russia in a situation in edge, none of these publications used the Big Data approach to
which relationships between Russia and the West have been analyze the public discourse surrounding a mega Olympic Games
noticeably “cooling off” (Coaffee, 2015). However, controversies event based on Twitter data.
over gay rights in Russia, threats of terrorism, and the involvement The primary purpose of this study is to provide a comprehensive
of Russia in the unfolding political crisis and military conflict in description of how the Sochi 2014 Winter Olympics was portrayed
Ukraine might have adversely affected the image of the country. on Twitter and what were the issues of main public interest from
Sports are traditionally viewed as a vehicle to promote happy the perspectives of hosts and guests. To this end, we relied on
relationships between peoples and bring them together. However, tweets in Russian, that were considered as reflective of the hosts’
mega sports events can also act as the lens through which the host perspective, and on tweets in English, which were accepted as a
country is viewed, amplifying weaknesses and bringing problems proxy to represent the perspective of the guests. Thus, in the study
into focus. To date, no studies have comprehensively described the the terms hosts and guests are used as convenient labels assigned
public discourse surrounding a mega sports event such as Olympic based on the language of a tweet. Specifically, this investigation
Games; such a description would highlight issues of main public uses a Big Data approach (Kitchin, 2014) to identify the most salient
interest and, ultimately, have the potential to yield better strategies topics of public discussion before, during and after the Games, the
with respect to image promotion. The Big Data approach seems to temporal dynamics of these topics, and attitudes expressed in
be paramount for such a study, and Twitter, an Internet service that Twitter messages. Given a number of problem surrounding Sochi
allows users to post brief online messages that are visible to their Olympics, our study also examines whether sentiments toward the
social networks, seems remarkably suited to this purpose. Twitter is Games changed as the event unfolded and whether attitudes
56 A.P. Kirilenko, S.O. Stepchenkova / Tourism Management 63 (2017) 54e65

toward the event differed between hosts and guests. The rationale their translations. Fig. 1 illustrates the frequency distribution of the
for the study was to see whether a mega sports event is influential collected tweets.
enough to change the attitudes toward the event and, by doing this, Additionally, we collected a random sample of 3,275,263 tweets
cast a more favorable light on the hosting country. The attainment posted between June 13 and July 3, 2013 without using any key-
of these goals results in a comprehensive and detailed description words in order to establish the tweeting “baseline” in different
of public discourse in English and Russian surrounding the Sochi languages. For this purpose, every 10 min within the sampling
2014 Winter Olympics and comparisons of the two perspectives, period a random sample of 1000 tweets was collected with Twitter
which potentially may contribute to a better understanding of API search without using any search keyword. The purpose of data
influential issues in the process of mega sports event image for- collection was to establish major characteristics of Twitter data
mation and destination image formation. regardless of the topic of discussion such as Twitter users’
Considering that there have not been enough studies that use geographical distribution, relative tweeting volume in various
Big Data for analyses of public discourse surrounding mega sports languages, mean tweeting sentiment in different languages, and
events, this study also serves a secondary purpose that is method- similar. The baseline data allowed us to identify deviations and
related. We would like to point out explicitly that, while this study anomalies in the Olympic Games tweeting stream as compared to
uses the terminology of “traditional” image studies, this research is the “baseline” tweeting.
data-driven, not theory-driven. It uncovers temporal, topical, and The three most frequent languages in the sample were English
attitudinal patterns in the data and, by doing so, contributes to the (En), Russian (Ru) and Japanese; when combined, these languages
mega-event body of knowledge as well as image studies. Inevitably, accounted for 75% of all collected tweets. A full description of the
some of the methodological issues pertain to Twitter as a data sample with respect to the language composition of the tweets can
medium (e.g., data collection, filtering and identifying the geo- be found in Kirilenko and Stepchenkova (2017). For the purpose of
locational information of individual tweets). Another important this study, we only analyzed tweets in En and Ru, except when we
methodological issue deals with using automated sentiment anal- discuss the sample representativeness. To ensure the quality of the
ysis classifiers to extract attitudes of hosts and guests from Twitter collected sample, the tweets that were unrelated to the Sochi
messages. Considering both subject-matter and method goals, this Olympics were identified (and subsequently removed) using a two-
paper proceeds as follows. The Methods section explains our step procedure: (1) time period filtering and (2) hash tag filtering.
approach and the issues associated with the data collection and First, a random sample of 20 tweets per month in each language (En
preparation, hash tag analysis to determine the primary topics of and Ru) was selected and its tweets were manually classified as
public discourse surrounding the Sochi Olympics, and the senti- relevant or not relevant to the study. The percentage of irrelevant
ment analysis algorithm. The Results section describes the findings tweets in these monthly samples varied significantly: from
and summarizes the outcomes, visualizing them with a series of November 2013 to March 2014, the percentage varied between
figures and tables. Finally, the concluding section provides the 0 and 5% and then increased sharply, reaching 90% in October 2014.
discussion of findings and their implications, the methodological Therefore, the database was restricted to the period spanning
considerations of the study, and venues for further research. November 1, 2013 to March 31, 2014. Second, the most frequent
hash tags were extracted from the collected tweets, and a random
2. Methods sample of 100 tweets per each hash tag was manually classified as
being related or unrelated to the Sochi Olympics. Only tweets with
2.1. Data collection and filtering the hash tags that had at least 90% relevant tweets were left in the
dataset. For example, the tweets containing either of the hash tags
The Sochi Winter Olympics were held on February 7e23, 2014, #sochi 2014, #olympics 2014, or #sochi would be included for
with the opening rounds of several events held on February 6. The further analysis. Tweets with only one hash tag #winterolympic
dates for the Paralympics were March 7e16, 2014. For one year, would be filtered out (the author might refer to a different Olympic
from November 1, 2013 through October 31, 2014, we systemati- game), however, a tweet with two hash tags #winterolympic and
cally collected Twitter data related to the Sochi Olympics by per- #sochi2014 would be included.
forming a Twitter search with adaptive frequency, ranging from In total, 439,106 tweets in En and Ru spanning the period be-
once every three minutes during the Olympic Games to 12 times tween November 1, 2013 and March 31, 2014 were retained for
per day during the pre- and post-Games periods. The searches were further analysis. The distribution of tweets across languages
performed using a Python code that used Twitter REST API, version differed significantly from the distribution in a random sample
1.1. After the duplicates were removed, 7.8 million tweets in total (baseline) of all tweets (Table 1). Within the November 1, 2013
were collected using the keywords sochi, olympics, paralympics and through March 31, 2014 period, the month of February when the

Fig. 1. Distribution of collected daily number of tweets in two the most prevalent languages, English and Russian, and in all other languages.
A.P. Kirilenko, S.O. Stepchenkova / Tourism Management 63 (2017) 54e65 57

Table 1 which are user-provided metadata that serve to group similarly


Distribution of tweets in En and Ru languages. The two rightmost columns represent tagged tweets. Since tagging a tweet with multiple hash tags is a
percentages of tweets in the Sochi sample and in a random baseline sample of
3,275,263 tweets.
common practice, there were 54,756 hash tags in the database in
total; the top six hash tags (#Sochi 2014, #Сочи 2014 [#Sochi 2014],
Language Count Olympic tweets (%) Random sample (%) #Olympics, #SochiProblems, #RoadToSochi and #TeamUSA)
English 372,221 60.4 33.8 accounted for over half of all hash tag usage. We did not consider
Russian 66,885 10.9 0.9 the top two hash tags, #Sochi 2014 (41.4% of all hash tags) and
Note: the total percentage is lesser than 100% since the table presents only two #Сочи 2014 (2.9%), in our analysis because the same keywords were
languages, Russian and English. used to select the sample. Of the remaining hash tags, the 75 most
frequently used hash tags were used in more than 50% of tweets. To
Olympic Games were held generated the majority of tweets (60%), this list of the 75 most frequently used hash tags, we added hash
followed by March (18%). For Ru, February and March generated a tags that were in the “top 10” on a certain day or in the “top 25”
comparably high fraction of the tweeted messages (55% and 28%, during a certain week to detect any high-interest topic that was
respectively). Accordingly, we separated the analysis into two pe- localized in time. The hash tags in the resulted set were classified
riods: the Pre-Games (PreG) period of November 2013 through into eight broad themes, or categories, described below:
January 2014 and the Games and Post-Games period (GPG) inclu-
sive of February and March 2014. 1. Events: hash tags related to (1) the opening ceremony, (2) the
closing ceremony, and (3) the Paralympic Games. Tweets with
2.2. Geolocating the tweets hash tags from groups (1) and (2) were tightly distributed
within two or three days after the respective event. Tags related
We used the self-described place of residence of the owners of to the Paralympics (3) rarely appeared outside an approximately
the Twitter accounts to estimate the origins of the tweets. The three-week period centered on the Paralympics week.
online geographical database GeoNames (http://www.geonames. 2. News: hash tags of general media, broadcasting companies
org) was used to resolve the textual place-of-residence definition (e.g., #bbcsochi) and generic hash tags (e.g., #news).
into a geographical latitude and longitude. We resolved geolocation 3. Sports & Sportsmen: hash tags related to specific sports (e.g.,
ambiguity by assigning the coordinates of the largest of possible #hockey), sports in general (e.g., #sport), sporting organizations
populated places as the place of residence. For example, we (e.g., #ioc e International Olympic Committee), particular
resolved “London” to “London, UK” and “London, CA” to “London, games (e.g., #canvsusa e Canada vs. USA) and descriptors used
Ontario, Canada.” In total, we were able to estimate the locations of with particular sports (e.g., #icequeen with figure skating).
roughly 35% of the tweets that we collected, and these tweets are There were spikes in this category during the pre-Olympic trials
hereafter referred to as “geolocated tweets.” We estimated the (e.g., January 12e18) and during key events in the most popular
geography of tweeting on the Games based on these geolocated sports, such as ice hockey. There was a drastic reduction in the
tweets assuming that they were representative of the entire number of tweets in this category during the Paralympics.
collected dataset. We estimated that 88% of tweets in Russian were 4. Anticipation: hash tags related to awareness campaigns (e.g.,
sent from users living in Russia and that 86% of tweets in English #roadToSochi: campaign of support and raising awareness of
were sent from users living outside Russia. This assessment the Games by Team Great Britain), countdowns (e.g., #100days,
confirmed the relevance of the hosts and guests labels that were #50days), and the Olympic torch relay (e.g., #torch or #iss: torch
assigned to the tweets depending on their language. In absence of travel to the International Space Station between November 6
known discriminating mechanisms we believe that our findings for and 12, 2013). The last category also included the names of cities
the geolocated sample are generalizable for the entire population of and places on the torch route. Anticipation was the strongest
collected tweets. category throughout almost the entire PreG period.
Approximately 2% of the geolocated tweets also had geograph- 5. Cheering: hash tags related to specific countries and their
ical coordinates of the tweeterer sent by his or her mobile device. teams (e.g., #teamusa, #gocanada) and Olympic cheering cam-
These GPS coordinates are normally assumed to define the true paigns (#teamvisa). This category had two distinct spikes during
device's location. These tweets were used to validate the geo- the first few and final few days of the Games. Similar to the
location algorithm described above. The distribution of the haver- Sports and Sportsmen category, Cheering was not well popu-
sine distances between the estimated and true coordinates is long- lated during the Paralympic Games. Two “country” hash tags,
tailed, yet the coordinates of 75% of the geolocated tweets were #russia and #ukraine, were not classified under the Cheering
found to have an error less than 40 km. Hence, the geolocational theme. The hash tag #russia was assigned to the theme Other
algorithm was judged to be acceptable. For a full description of the and removed from further analysis because of its predominant
geolocation and data validation algorithms see Kirilenko, usage as a geographical descriptor in combination with other
Molodtsova and Stepchenkova (2015). hash tags, e.g., the tweet “Sad there was no mention of the
Geolocational analysis revealed that the guest population was immoral treatment of the #LGBT community in #Russia. I guess
predominantly represented by three English-speaking countries: ignorance is bliss. #Sochi 2014” was classified as belonging to
the United States, Canada, and the United Kingdom; these countries the Problems & Politics category because of its #LGBT tag and
also dominated the discussions about the Olympic Games on also the Others category; the hash tag #Russia defined the
Twitter (Fig. 2 A). In Russia, the tweets mainly originated within a geographical location but not the topic of discussion. The hash
200-km radius of the two largest cities: Moscow (43%) and St. tag #ukraine was classified under the Problems & Politics
Petersburg (18%) and the Games’ location, together with the largest category because the Ukraine Crisis was the predominant topic
city in the region, Krasnodar (18%). The rest of the country was under this tag.
responsible for only 39% of the tweets in Ru (Fig. 2B). 6. Problems & Politics: hash tags directly related to problems
during the Olympics, e.g., #sochiproblem, and hash tags related
2.3. Hash tag analysis to political issues surrounding the Games, e.g., #putin, #LGBT
and #ukraine. Several major topics drove this theme. The most
Topical analysis of Twitter messages was based on hash tags, significant political issue by frequency of mentions present in
58 A.P. Kirilenko, S.O. Stepchenkova / Tourism Management 63 (2017) 54e65

Fig. 2. Distribution of tweets on Sochi Olympic Games between the countries in the entire sample (A) and in Russia alone (B).

tweets under this theme was LGBT discrimination in Russia and promising to follow back anyone who follows the user),
related concerns about athletes' safety. Hash tags included not #100рублей [100 rubles] (promotion of a new Sochi Games-
only those directly related to the issue (e.g., #LGBT) but also the themed 100 ruble bill) and many others. We note that a very
#cheersToSochi campaign by Coca-Cola and McDonald's, which large percentage of Ru tweets fell into this category. This cate-
were initially conceived to inspire athletes but were overtaken gory was excluded from the tests.
by LGBT activists (Hambrick & Pegoraro, 2014). The second
significant political issue was the protests over the Games'
location. Finally, #sochiProblems and similar hash tags indicated 2.4. Sentiment analysis with automated classifiers
problems with infrastructure and organization of the Games'
activists (Hambrick & Pegoraro, 2014). This sub-category was Sentiment analysis software attempts to automatically extract
responsible for more than half of all Olympics-related tweets the sentiment expressed in unstructured textual data by classifying
during the single day immediately preceding the opening cer- units of analysis as expressing positive, negative, or neutral emo-
emony when large numbers of Games guests and participants tions of different strength. The approaches for automated senti-
arrived to Sochi. After the first few days of the Games, the ment analysis draw on two ideas: lexicon-based and learning-
number of tweets in this category gradually decreased, but it did based text classification. The lexicon-based approach (Ding, Liu, &
not reach zero until the end of the Paralympics. Yu, 2008; Taboada et al., 2011) starts with a set of words, for
7. Volunteering: the hash tags related to volunteering at the which a typical sentiment (positive or negative) is defined. The
Sochi Games, predominantly in the Ru domain. Due to a rela- sentiment of the entire textual unit is then derived based on the
tively small number of tweets in this category, it is included only balance of words with negative and positive polarity, subjected to
in the overall hash tag analysis. the linguistic rules. The most prominent resource used for the
8. Other: generic hash tags and tags that did not fit into any of lexicon-based classifiers is Princeton's WordNet (Miller, 1995),
the above categories. Examples include #sochi (geographical which currently contains over a 100,000 sets of synonyms (syn-
descriptor), #юмор [jokes], #f4f (used for status inflation sets). SentiWordNet (Baccianella, Andrea, & Fabrizio, 2010) extends
A.P. Kirilenko, S.O. Stepchenkova / Tourism Management 63 (2017) 54e65 59

the functionality of WordNet by assigning polarity to its synsets. normalized positive sentiment for a tweet with the sentiment equal
Even though lexicon-based classification tends to be very efficient, to the mean baseline sentiment is 1 for both En and Ru.
it also has significant drawbacks, namely, that (1) the polarity of
words may differ depending on the context and (2) the informal 3. Results
language used in many forms of online communication (e.g., LOL
for “laughing out loud”) is very dynamic and, therefore, frequently 3.1. Distribution of tweets between hosts and guests
absent from the databases. The machine-learning approach to
sentiment analysis (Pang & Lillian, 2008) typically includes training Both En and Ru Twitter users demonstrated pronounced inter-
a learning algorithm such as Maximum Entropy or Naïve Bayes on a est to the Sochi Games compared with users of other languages.
dataset of textual units with known sentiments and accordingly is Normally, only 34% of Twitter users mark En as their language; we
free of these constraints. The learning approach is, however, found that 60% of users in the Olympic sample marked En as their
extremely time consuming since it requires manual classification of language. For Ru, this difference was even more striking: 1% of the
a large representative sample of texts; a training set of over 10,000 baseline tweets were in Ru vs. 11% in the Olympic sample (Table 1).
documents is not unusual. Because of this constraint, the training For comparison, tweets in Spanish constituted 12% of the baseline
part is frequently skipped, which reduces the classification quality tweeting, but only 4% in the Olympic sample. In terms of
and, subsequently, the validity of the data. geographical location, three English-speaking countriesdthe
Three automatic sentiment analysis algorithms were considered United States, Canada and the United Kingdom (25.9%, 23.0% and
for this study: one learning-based classifier and two lexicon-based 6.7% of tweets in all languages, respectively), and Russia (16.4%)d
classifiers. The Deeply Moving machine-learning algorithm, devel- were responsible for nearly ¾ of all tweets in all languages.
oped at Stanford University, is based on the Recursive Neural Together, Ru and En users made the two largest groups tweeting on
Network on top of the grammatical structure (Socher et al., 2013), the Games, representing perspectives of hosts and guests
hence utilizing some benefits of the lexicon-based approach. We respectively.
used the pre-trained algorithm based on the classification of movie
reviews provided by the authors of the program, which presents a 3.2. Topics discussed
limitation. The Pattern sentiment analysis algorithm, developed by
CLiPS (Computational Linguistics & Psycholinguistics) at the Uni- The percentage of tweets classified into each of the eight major
versity of Antwerp (www.clips.ua.ac.be), is a lexicon-based algo- discussion topics is listed in Table 2; the change in the discussed
rithm. Pattern separately evaluates each word's polarity using topics over time is illustrated in Fig. 3. The most noticeable differ-
SentiWordNet's ratings. Finally, SentiStrength (Thelwall, Buckley, ence among the languages was that a significantly larger numbers
Paltoglou, Cai, & Kappas, 2010), developed in the University of of tweets about the Games’ problems were tweeted in En (15.3%)
Wolverhampton, UK, is also a lexicon-based program. The program than in Ru (4.1%). The most frequently noted problems in the En
employs a set of words derived from the social network MySpace, tweets were issues with infrastructure such as hotels not being
which makes it specifically suitable for rating short informal ready for the Games (38.2%) and human rights in Russia (20.4%). A
comments such as those found on Twitter. similar percentage of En and Ru tweets discussed sports (17.4% vs.
We evaluated all three sentiment-analysis instruments 23.3%), but there were large differences between sports disciplines:
comparing the machine-estimated sentiment with human coding the most prevalent was hockey (33.1% En and 39.8% Ru) and figure
using a separate set of 600 En tweets. SentiStrength's sentiment skating (12.4% En and 48.3% Ru). A large percentage of En tweets
analysis best matched the manual classification, an expected result pertained to cheering for national teams (32.0%), mostly the Ca-
given that SentiStrength's standard En classifier is calibrated on nadian and the US teams.
MySpace social network data. Additionally, SentiStrength has the Not surprisingly, there were very few Volunteering tweets in
capability of detecting emotion in over 25 languages, including Ru, languages other than Ru. Finally, in the Other category, the top
by using language-specific dictionaries. We, however, found that place was universally occupied by the tags #sochi and/or #russia.
the quality of emotion detection in Ru was rather low: the esti- Aside from these two hash tags, for both En and Ru, promotion tags
mated correlation between the machine-based sentiments and a were prevalent in the Other category. In En, the third and fourth
manually classified set of 3000 tweets in Ru was 0.38 for positive most common tags in the Other category were #create and
emotions and 0.49 for negative compared with roughly 0.56e0.60 #chatwing tags, advertising the online chat service ChatWing. In
for texts in En (Thelwall et al., 2010; Thelwall, 2013). Based on this Ru, in third place, after #sochi and #russia, were status-inflating
preliminary research, we further analyzed the sentiment expressed hash tags promising to reciprocate followers (those who sub-
by Twitter users toward Game-related events by conducting scribe to read your tweets), e.g., #ru_ff.
sentiment analysis of collected tweets with the SentiStrength The mean sentiment for each category of topics is provided in
algorithm. Table 3. As expected, the tweets in the Events (such as the Games’
We rescaled the SentiStrength scores to [-4, 0] for negative opening ceremony), Sports and Sportsmen, Cheering, and, for Ru
sentiments and [0, 4] for positive sentiments; higher absolute tweets, Volunteering (at the Games) had the highest positive sen-
numbers corresponded to stronger emotions, and zero represented timents. The tweets in the News and Problems & Politics categories
neutral sentiment. Since the SentiStrength algorithm uses different had the lowest positive sentiments; the former because the News
dictionaries for En and Ru tweets, raw En and Ru sentiment scores category tends to have a mostly neutral sentiment. The Problems &
are incompatible. To render the Ru and En sentiment scales Politics category had the most negative sentiment (1.7 times more
compatible, we applied SentiStrength to random samples of negative in En than the mean baseline sentiment) (Table 3).
approximately 100,000 En and 100,000 Ru tweets from the
“baseline” set; on average, the En sentiment scores were nearly 3e5 3.3. Temporal dynamics of the topics and sentiments
times those of the Ru scores: 0.49 vs. 0.17 for positive scores
and "0.32 vs. "0.07 for negative scores. To enable a comparison of We summarized the number of tweets in a particular category
sentiments expressed in En and Ru, we normalized the sentiment (Fig. 3) and averaged their sentiments (Fig. 4) on a daily basis. Prior
of each tweet in the Sochi Games sample dividing by the absolute to the Olympics, the most populated categories were Anticipation
value of the respective mean baseline sentiment. For example, the and Cheering with discussions about the Olympic torch relay and
60 A.P. Kirilenko, S.O. Stepchenkova / Tourism Management 63 (2017) 54e65

Table 2
Distribution of tweets over the main categories (percentages).

Language N News Problems & Politics Volunteer Antici- pation Cheering Sports & Sportsmen Events Others

English 762,475 5.6 15.3 0.0 9.2 32.0 17.4 10.6 9.8
Russian 165,425 10.3 4.1 1.2 5.9 12.8 23.3 12.4 30.0

Note: If a tweet contains multiple hash tags, it is counted several time; thus, N is greater than the total number of tweets in the sample.

Fig. 3. Dynamics of tweets in English (A) and in Russian (B) in different categories. The categories “Others” and “Volunteering” are not shown.

Table 3 and concerns about the Games' security after a train station
Mean normalized sentiment in relation to the sentiment expressed in a random bombing in Volgograd, Russia, on December 29th.
baseline sample of tweets in English and Russian tweets on Sochi Olympics.
In the Ru segment of Twitter, the identified PreG negative sen-
Category English Russian timents were not always reliable due to the relatively small number
Positive Negative Positive Negative of tweets with a negative sentiment on each particular day. The
mean number of Ru daily tweets from November through
Events 1.62 -0.63 1.78 -0.13
News 0.85 -0.89 0.62 -0.26
December 2013 was 114 with a minimum of only four tweets. A few
Sports & Sportsmen 1.23 -0.78 1.71 -0.46 angry messages on such an unpopulated day could have resulted in
Anticipation 1.16 -0.41 0.96 -0.22 a significant negative sentiment anomaly. An example from the
Cheering 1.36 -0.49 1.95 -0.12 Sochi dataset included four highly negative messages from people
Problems & Politics 0.89 "1.70 0.91 -0.52
dissatisfied with traffic disruptions because of the torch relay: these
Volunteering e e 1.82 -0.43
Other 1.31 -0.89 1.32 -0.30 messages lead to registering a highly negative sentiment on
December 6th. In contrast, the mean daily number of En tweets
Note: The sentiments were normalized to a common value so that the mean
normalized sentiment of background tweets in En and in Ru is 1 for positive
during this period was 652. Similar, during the GPG period of
sentiment and "1 for negative sentiment. For example, the Events positive senti- February through March 2014, the average number of Ru tweets
ment in En (1.62) is 62% higher than the mean sentiment in En baseline tweeting. was 932, and only the last two days of March had fewer than 50
tweets. This improved the reliability of the daily sentiment analysis.
A few days prior to the Games (but within the GPG period ac-
cheering for the national Olympic teams. During the PreG period, cording to our classification), as the guests started arriving in Sochi,
positive sentiments by far surpassed negative sentiments; but in a the En topics of discussion changed; there was an increased num-
few instances, the majority of Sochi Olympic tweets were very ber of tweets expressing a negative sentiment. A large fraction of
negative. The day when the discussion on Twitter took the most tweets during this period were classified into Problems & Politics;
negative turn was November 26e27th, 2013, reflecting alleged example topics included the issues of the alleged unpreparedness
human rights violations in Russia. Frequently retweeted messages of the Games venues, the extermination of stray dogs and envi-
on this day included “Olympic torch throws light on human rights ronmental issues. The Problems & Politics tweets peaked the day
violations …” and “… Violence Against Gays in Russia …”. SentiS- before the opening ceremony and gradually decreased by the end of
trength's estimated sentiment for both of these tweets was "3. Two the first week of the Games; they were replaced with cheering for
other days with prevalent negative tweets during the PreG period athletes and discussions of sports. It is important to note that there
included a misclassified sentiment about the torch relay over a were very few Problems & Politics tweets in the Ru segment.
slope of an active volcano (November 13th; see Discussion section)
A.P. Kirilenko, S.O. Stepchenkova / Tourism Management 63 (2017) 54e65 61

Fig. 4. Sentiment analysis of the Olympic Games tweets in Russian and English. Positive (green) and negative (blue) sentiment is shown separately. The horizontal dash lines show
the mean background sentiment computed for a random sample of Russian and English tweets. Notice that background sentiment of Russian tweets is almost four times below the
one in English tweets; accounting for this difference. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Together with the changes in the Problems & Politics category, the positive emotions and attitudes and that they were less suc-
negative sentiment tweets spiked during the first week of February cessful at bringing negative attitudes down.
and then gradually decreased. Finally, a considerable number of (3) Topical categories: The largest increase in En positive
tweets during the GPG period were classified as “Events,” e.g., sentiment was observed in the Events category, which can be
Opening and Closing Ceremonies and News (e.g., references to chiefly attributed to tweets covering the (very successful)
translation of the Games onto TV). opening and closing ceremonies; however, the increase in all
The rest of this section reports the results of t-test comparisons other categories except for [Games] Anticipation was statisti-
between the sentiments expressed (1) in Ru and En tweets; (2) cally highly significant (Table 4). The change in Ru positive
during the PreG and PGP periods; and (3) in each of eight topical sentiment had the same sign but was much higher than in the
categories: En segment for all topical categories. The change in negative
sentiment in En and Ru was, however, very dissimilar. Even
(1) Ru and En tweets: Overall, the positive mean daily senti- though the change in negative sentiment expressed in all tweets
ments of the Sochi Games tweets were higher than the was not statistically significant, as noted in the previous para-
respective sentiments in the baseline sample (Fig. 4) for both graph, it was significant in separate categories; the sign of this
languages (En: M ¼ 1.19, t(152) ¼ 7.78, p < 0.001; Ru: M ¼ 1.25, change was opposite in the En and Ru segments. In four cate-
t(152) ¼ 3.35, p < 0.01; recall that M ¼ 1 corresponds to the gories out of five, En tweets had more intensive negative
mean positive baseline sentiment). The negative sentiment was sentiment (statistically significant in three categories) with
slightly better than the baseline (M ¼ "1 corresponds to the particularly negative sentiment expressed in the Problems &
mean negative baseline sentiment) for En: M ¼ "0.92, Politics category. For the Ru tweets, the direction of change was
t(152) ¼ 2.17, p ¼ 0.032 and much better than the baseline for reversed: during the GPG period, in four categories out of five,
Ru: M ¼ "0.47, t(152) ¼ 7.86, p < 0.001. These results indicate Ru tweets reduced negative sentiment (statistically significant
that the Sochi Olympics were associated with an increase in in three categories).
positive sentiments and a decrease in negative sentiments in
Twitter discussions for both hosts and guests. 4. Discussion
(2) PreG and PGP periods: The positive sentiment was not only
higher than the baseline, it also increased during and after the The study investigated how a mega sports event of Sochi 2014
Games. Both the En and Ru samples exhibited a statistically Winter Olympic Games was portrayed on Twitter in two languages,
significant increase in positive sentiments from the PreG to GPG English and Russian. The Ru language segment was accepted as
period based on a t-test: for En, from 1.12 to 1.29 (t(151) ¼ 3.25, representing the perspective of the hosts and the En tweets were
p ¼ 0.001); for Ru, from 0.83 to 1.89 (t(151) ¼ 8.33, p < 0.001), accepted as being a proxy for that of the guests. Using a Big Data
i.e., almost twice the baseline sentiment. However, we did not approach (Kitchin, 2014), this study obtained a comprehensive
find a difference in negative sentiments between the PreG and description of public discourse surrounding the event. Specifically,
GPG periods for any language: En (t(151) ¼ "0.76, p ¼ 0.45) or we compared the perspectives of the two audiences on the topics of
Ru (t(151) ¼ "0.34, p ¼ 0.76). The test results seem to suggest public interest related to the Sochi Olympics, topical temporal dy-
that the Sochi Olympics succeeded in generating an increase in namics, and the sentiments expressed in Twitter discussions to
investigate whether the attitudinal perceptions of the Sochi
62 A.P. Kirilenko, S.O. Stepchenkova / Tourism Management 63 (2017) 54e65

Table 4 as its host, were Sochi's infrastructure problems and the situation
Mean normalized sentiment during the PreG and GPG periods for English and with LGBT rights in Russia. Several times during the PreG period,
Russian tweets on Sochi Olympics. The p column shows significance of the t-test for
means.
LGBT issues surged in the whole volume of Twitter messages,
especially in relation to the anti-LGBT laws (the so-called “homo-
Category Positive sentiment Negative sentiment sexual propaganda laws”) passed in Russia prior to the Games (see
PreG GPG p PreG GPG p Coaffee, 2015 for a discussion). It should be noted, however, that
English Events 0.76 1.43 0.000 "0.53 "0.63 0.240 distribution of topics of the guests from the countries not included
News 0.63 0.82 0.000 "0.53 "0.91 0.000 into the analysis may differ from the ones that we studied. For
Sports & Sportsmen 0.84 1.27 0.000 "0.44 "0.84 0.000 example, we found that the next largest after En and Ru Japanese
Anticipation 1.12 0.92 0.000 "0.41 "0.31 0.000
segment of Twitter (not included in this current analysis) differed
Cheering 0.98 1.39 0.000 "0.47 "0.50 0.003
Problems & Politics 0.76 0.86 0.000 "1.63 "2.22 0.000 from the En segment. The Japanese segment paid less attention to
Russian Events 0.76 1.76 0.000 N/A a "0.14 the issues surrounding the Games other than sports (mostly, figure
News 0.24 0.65 0.000 "0.57 "0.29 0.000 skating). For example, the LGBT topic discussed above was very
Sports & Sportsmen 0.41 1.82 0.000 "1.00 "0.43 0.008 pronounced in En but not in the Ru or Japanese segments of the
Anticipation 1.00 0.88 0.400 "0.29 "0.14 0.080
collected data. This finding is in support of the conclusion that
Cheering 0.71 2.12 0.000 "0.14 "0.14 0.930
Problems & Politics 0.18 1.06 0.000 "0.29 "0.57 0.380 Olympics can serve as the lenses through which controversial and/
Volunteeringb 1.59 1.94 0.220 "0.86 "0.29 0.100 or contested issues (in this case, between Russia and the collective
a
Very small sample (24 tweets). West) can be brought into focus. From the perspective of man-
b
No Volunteering category in En. agement practice, it should be pointed out that the methodological
approach proposed in this paper can be adjusted to monitor public
discourse surrounding any mega sports event in general, as well as
Olympics exhibited a positive temporal change. Together, English with respect to a particular country or a specific topic.
and Russian languages represented 71% of tweets obtained on the With respect to the question of whether the Sochi Olympics
Sochi Games in the 10 most prominent languages used on Twitter. brought positive change in attitudes toward the event itself and, by
Effectively, our sample predominantly pertains to Russia and three association, toward the host country, we found that positive
English-speaking winter-sports countries, namely, the United sentiment did increase in both segments, though the increase was
States, Canada, and the United Kingdom (Fig. 2A). In Russia, the larger in the Ru segment. The highly positive Volunteering category
tweets mainly originated from the vicinity of two largest cities, in the Ru segment and the considerable increase in the positive
Moscow and St. Petersburg, and from the region surrounding Sochi, sentiment during and after the Games indicate that the domestic
the location of the Games (Fig. 2B). image of the Games noticeably improved. This is not surprising as
The tweet volume started with the anticipation phase three to the Sochi Olympics boosted the feelings of national pride in Russia.
four months prior to the Sochi Olympics and was highly concen- Thus, the Ru tweets rarely mentioned problems or politics and also
trated around the Games period; it then decreased quickly after the were more active in discussing various Olympics events than the En
Games were over. The Paralympics generated noticeably less tweets. The Problems and Politics issues were more prominent in
excitement than the Olympics events. The primary topics of dis- the En segment: approximately 15% of all En messages vs. only 4% of
cussion were the opening and closing ceremonies, news and up- Ru messages. We also found that while little overall change in the
dates about the Games tweeted by various media outlets and the negative attitudes between the PreG and the GPG periods existed,
general public, discussions of particular sports, such as ice hockey, in several categories the negative attitudes improved in the hosts'
figure skating and others, cheering for national teams, and segment and worsened in the guests’ segment (Table 4). For the
following the route of the torch relay (see Table 2, Figs. 1 and 3). guest segment, the difference in the negative sentiment was the
Both segments, Ru and En, exhibited noticeable increase in tweet largest in News, Sport & Sportsmen, and Problems and Politics
volume as compared to the base line in respective segments, categories. In addition, the negative sentiment in the En segment
indicating that the Games brought forward a number of new topics was 1.7 times larger than the baseline. These findings further
(e.g., those in the Anticipation category) and at the same time acted indicate that mega sports events can act as a magnifying lens for
as an amplifier for the topics already under discussion (e.g., Sochi “contested” issues between the host country and the guest coun-
infrastructure problems or human rights in Russia). In addition, a tries; therefore, the hosting countries need to take such a dynamics
number of the issues classified under the topics of Anticipation, into account in their management and promotional strategies.
Cheering, News, Sports, Events and Volunteering can be viewed as Thus, managing negative news related to crises like terrorist attacks
pertaining to mega sports events in general, while issues from the or infrastructure unpreparedness (e.g., embarrassments in Sochi
Problems & Politics category are likely to be country-specific (e.g., hotels or green water in Ryo swimming pools) is of utmost
critique of Russian LGBT policies, Pussy Riot controversy, stray dogs importance; a few days prior to the event can be considered a
in Sochi, or conflict in Ukraine) and as such may require different “moment of truth”: this is the time point when the spikes in
image promotion and management strategies from the host negative sentiment can be especially damaging.
country prior, during, and after the Games. Thus, the patterns We would like to stress one more time that the concept of
identified in the data with respect to discussion topics, their vol- “hosts” in the paper refers to the hosting country as a whole. One
ume, and temporal dynamics can serve as the first-draft “blue- should keep in mind that people from different regions of the
print” of how a public discussion of a mega sports event unfolds in country might have different perspectives on the Games. For
the social media and what issues come to the front of the public example, for the 2008 Beijing Olympics, Chen and Tian (2015)
interest at any particular point in time. found quite small (just a few percentage points), but statistically
What may be good news for mega sports event's host countries significant differences in the perspectives of the residents of Bei-
is that the negative sentiment is confined primarily to the group of jing, the principal hosting city, and Qingdao, the city that organized
issues classified under the Problems and Politics category, and this the sailing competitions. We, however, find it unlikely that the local
group of issues accounted for a relatively small fraction of the sentiment variability in Russia would make a practically significant
discussion volume (approx. 20%). Among the most prominent impact on the analysis. Unlike the Beijing Olympics, the Sochi
topics, that cast an unfavorable light on the event itself and Russia Olympic Games were held just a few kilometers from the Russian
A.P. Kirilenko, S.O. Stepchenkova / Tourism Management 63 (2017) 54e65 63

border and far away from the major urban population centers. sentiment of the same tweet was correctly identified as well,
Because the local population in the area is 76,000 people (2010 yielding a neutral sum (Fig. 4). Within a relatively low tweeting
census), the sample in this study overwhelmingly represents the frequency period such as in November through December 2013, a
residents of Russian regions not directly involved in or affected by misclassification of a message with multiple retweeting may lead to
the Games. While we found a small percentage of tweets Type 1 errors; the same is true even for the days with a high
expressing attitudes of the locals toward the Games, e.g., complains tweeting intensity when a retweeted message comes from a highly
from the residents on the Olympic Torch Relay interrupting their influential blogger. For example, the message “… Mao [Osada] - you
daily commute, the overall negative sentiment is mainly a reflec- was great, special thanks for Axel 3,5! You're real fighter!!” was
tion of the sentiment of Russian people living in metropolitan areas misclassified due to negative connotation of the word fighter.” This
other than Sochi. message (citing the famous figure skater Evgeni Plushenko) was
Knowledge about channels of information dissemination can retweeted 1600 times on February 21st, leading to a detection of
help to bring the message across. The collected tweets referenced negative sentiment on this day. Similar to the previous example,
various media resources: of the entire database, 28% of the tweets this message was also correctly identified as having positive
(174,428) contained references to at least one resource on the sentiment as well, yielding a neutral sum (Fig. 4). To deal with cases
Internet; 50% of these references pointed to only 43 (0.39%) of the of this type, in this exploratory study we manually reviewed the
most popular Internet domains. By far, the most frequently refer- events classified by SentiStrength as both highly negative and highly
enced domains (29%) belonged to the photography and video- positive. However, evaluating the same textual data with multiple
sharing services Instagram and YouTube. Among traditional news sentiment analysis tools, as described in Section 2.4, seems to be a
outlets, the largest number of tweets referred to the BBC and viable option for more extensive analysis because the results ob-
Vesti.ru (a Russian-language news channel) and dedicated sports tained from classification by different software packages may
channels, such as NBC Olympics (the holder of the USA broadcast potentially reduce the possibility of high-impact misclassifications.
rights), sportbox.ru and other Russian-language sports outlets. A Another methodological issue related to sentiment analysis is
considerable number of tweets referenced the domains belonging associated with incomparable sentiments expressed in different
to national committees, the International Olympic Committees languages. Indeed, both lexical and learning algorithms are based
(e.g., Olympic.org, Olympic.ca), and to other sports organizations on the results of manual evaluation of sentiment expressed in
such as the National Hockey League. Among non-traditional news words and/or sentences in a dictionary or a training dataset.
sources, the Facebook, Tumblr, and Worldpress blogging and social Different groups conducting this manual evaluation in their own
network domains, together with the Ru language social networks V native languages will inherently have dissimilar connotations of
Kontakte (vk.com) and LiveJournal, were the most popular. Overall, how strongly different words express emotions. Additionally, the
it seems, that the variability of news resources cited in Twitter accepted ways of expressing a sentiment over social networks may
discussions did not translate into bringing up neither a large vol- differ among countries. For example, we found that the mean
ume nor a large variety of issues that could have been considered positive sentiment expressed in a random baseline sample of
damaging to the image of the Games and Russia in general: for approximately 100,000 tweets in En was almost three times higher
example, the issue of terrorism was almost non-existent beyond a than the correspondent value in Ru. We suspect that this difference
spike on October 21, 2013. These finding might indicate that mega is mainly related to the smaller size of the Ru dictionary used by
sports events, and the Olympic Games in particular, are viewed by SentiStrength compared with its En counterpart; however, it may
the general public as, predominantly, non-political events. also partially reflect the real differences in the sentiments
expressed in tweets in these two languages. Effectively, sentiment
4.1. Methodology considerations evaluations conducted in different languages are using different
scales and therefore are incompatible. To introduce a common
With respect to the secondary goal of this study pertaining to scale, we normalized the sentiments expressed in tweets of our
the methodology of Big Data analysis using Twitter, we turn to Sochi sample by dividing them by the mean baseline sentiment of a
discussing the sentiment analysis procedure with automated clas- random baseline sample of tweets in each respective language. The
sifiers. We found the performance of automated sentiment analysis normalized tweets are comparable in a sense that they measure the
to be generally satisfactory, as validated by the manual classifica- strength of the sentiment in relation to the sentiment of an
tion of selected positive and negative sentiment tweets; however, a “average” tweet in the respective language.
few issues need to be pointed out. Normally, the misclassified
sentiment of a textual unit does not present a problem since the 5. Conclusion
error is compensated for by a large number of other, correctly
classified units. However, the distribution of messages on Twitter The public interest to mega sports events like Olympic Games is
was highly skewed: among the 68,226 retweeted messages in our reflected in discussions in traditional and social media, and these
sample, two thirds were retweeted only once, yet two the most discussions, related to sports in general the broader problems
retweeted messages were retweeted over 1000 times. A misclas- surrounding the Games, contribute to the image of the event itself
sification of such highly influential tweets could distort the senti- and, consequently, to the image of the host country. Overall, the
ment analysis, especially if a misclassification occurred on a day results of the study suggest that the Sochi Olympics succeeded in
with a relatively small number of tweets. When daily tweets are generating an increase in positive sentiment, but the Games were
aggregated into a single case (unit of analysis), the sentiment score less successful in bringing negative sentiment down. The Sochi
may easily become an outlier for subsequent analyses. Olympics produced larger positive effects in terms of favorability of
For example, a November 13th highly negative sentiment event attitudes for domestic audiences: it boosted national pride and the
(Fig. 4), registered in both the Ru and En samples, was apparently sense of accomplishment, and the volunteer movement contrib-
misclassified by SentiStrength. Variations of the tweet “… Flame on uted to better, more responsible civic society in Russia. The Sochi
the slope of Avachinskaya sopka, active volcano of Kamchatka. Games produced less positive impressions on the guest audiences:
Gorgeous view! #TorchRelay …”, re-tweeted over 160 times, were in this particular case, it may be reflective of wider problems be-
rated as "2, i.e., moderately negative. Note, however, that positive tween the host country and the countries where the guest segment
64 A.P. Kirilenko, S.O. Stepchenkova / Tourism Management 63 (2017) 54e65

studied in this research came from. However, while the study un- geographical landscape, and temporal dynamics. In Analytics in smart tourism
design (pp. 215e234). Springer International Publishing.
covered a number of themes of public discourse that can be viewed
Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big Data &
as potentially damaging for the Russia's image, their aggregated Society, 1(1), 1e12. http://dx.doi.org/10.1177/2053951714528481.
volume was much smaller as compared to the combined volume of Ko€ltringer, C., & Dickinger, A. (2015). Analyzing destination branding and image
other, more positive themes, pointing to the conclusion that, on a from online sources: A web content mining approach. Journal of Business
Research, 68(9), 1836e1843.
larger scale of things, Russia managed its image well through the Leung, D., Law, R., van Hoof, H., & Buhalis, D. (2013). Social media in tourism and
Games, despite the number of problem surrounding the Games and hospitality: A literature review. Journal of Travel & Tourism Marketing, 30, 3e22.
the strained relations between Russia and the English-speaking Losevskaya, E. (2013). Роль «новых медиа» в информационной кампании «Сочи-2014»
(The role of new media in the Sochi-2014 informational campaign). Журнал
guest countries. We also demonstrated that information for С оциологии и С оциальной ант роVологии (The Journal of Sociology and Social An-
managerial implications is discernable using Big Data analytics and thropology), 16(5), 203e220.
are confident that the potential use of such information will only Lu, W., & Stepchenkova, S. (2015). User-generated content as a research mode in
tourism and hospitality applications: Topics, Methods, and Software. Journal of
increase when the results of Big Data studies on mega sports events Hospitality Marketing & Management, 24, 119e154.
and the knowledge on how to look for such information on Twitter Mariani, M. M., Di Felice, M., & Mura, M. (2016). Facebook as a destination mar-
and other social media datasets accumulate. Thus, the results ob- keting tool: Evidence from Italian regional destination management organiza-
tions. Tourism Management, 54, 321e343.
tained in this study provide a useful starting point for the host Marine-Roig, E., & Clave !, S. A. (2015). Tourism analytics with massive user-
countries of future Olympic events. generated content: A case study of Barcelona. Journal of Destination Marketing
Even though this study used the Twitter social network to & Management, 4(3), 162e172.
Matos, P. (2006). Hosting mega sports events e a brief assessment of their multi-
examine public discourse about the Sochi Olympics, this approach
dimensional impacts. In Paper presented at the copenhagen conference on the
is expandable to user-generated content from a wider set of social economic and social impact of hosting mega sport events.
networks and other Web 2.0 applications. Overall, we demon- Miller, G. A. (1995). WordNet: A lexical database for English. Communications of the
strated that despite a virtual absence of research into mega sports ACM, 38(11), 39e41.
Müller, M. (2011). State dirigisme in megaprojects: Governing the 2014 Winter
events based on Big Data analysis of social networks in the scientific Olympics in Sochi. Environment and Planning A, 43(9), 2091e2108.
literature, social networks provide ample data about the major Oliphant, R. (2013, October 30). Sochi: Chaos behind the scenes of world's most
topics of interest and the political issues surrounding events, expensive Winter Olympics. The Telegraph. Retrieved from http://www.
telegraph.co.uk/news/worldnews/europe/russia/10414885/Sochi-chaos-
including their sentiment and spatial and temporal dynamics. behind-the-scenes-of-worlds-most-expensive-Winter-Olympics.html.
Assigning locations to user-generated messages and classification €
Onder, I., Koerbitz, W., & Hubmann-Haidvogel, A. (2014). Tracing tourists by their
according to variants of content analysis enable investigation into digital footprints the case of Austria. Journal of Travel Research,
0047287514563985.
differences between countries and regions in terms of a specific Pang, B., & Lillian, L. (2008). Opinion Mining and Sentiment Analysis. Foundations
sporting event. Additionally, messages with geographical locations and Trends in Information Retrieval, 2(1e2), 1e135.
provided by users enable analyses of travel patterns of event visi- Radojevic, T., Stanisic, N., & Stanic, N. (2015). Solo travellers assign higher ratings
than families: Examining customer satisfaction by demographic group. Tourism
tors, which were not reported here but can be found in Kirilenko Management Perspectives, 16, 247e258.
and Stepchenkova (2017). Finally, although it was unexplored in Shao, J., Li, X., Morrison, A. M., & Wu, B. (2016). Social media micro-film marketing
this paper, an important opportunity of mapping the connections by Chinese destinations: The case of Shaoxing. Tourism Management, 54,
439e451.
inside the collected data has the potential to provide insights into
Socher, R., Perelygin, A., Wu, J. Y., Chuang, J., Manning, C. D., Ng, A. Y., et al. (2013).
information flows between people and groups discussing events. Recursive deep models for semantic compositionality over a sentiment tree-
bank. In , Vol. 1631. Proceedings of the conference on empirical methods in natural
language processing (EMNLP) (p. 1642).
References Supak, S., Brothers, G., Bohnenstiehl, D., & Devine, H. (2015). Geospatial analytics for
federally managed tourism destinations and their demand markets. Journal of
Arnold, R., & Foxall, A. (2014). Lord of the (five) Rings: Issues at the 2014 Sochi Destination Marketing & Management, 4(3), 173e186.
Winter Olympic Games: Guest editors' introduction. Problems of Post-Commu- Taboada, M., Brooke, J., Tofiloski, M., Voll, K., & Stede, M. (2011). Lexicon-based
nism, 61, 3e12. methods for sentiment analysis. Computational Linguistics, 37(2), 267e307.
Baccianella, S., Andrea, E., & Fabrizio, S. (2010). SentiWordNet 3.0: An enhanced Taylor, A. (2014). Why Sochi is by far the most expensive Olympics ever. Business
lexical resource for sentiment analysis and opinion mining. In LREC (Vol. 10, pp. Insider. Retrieved from http://www.businessinsider.com/why-sochi-is-by-far-
2200e2204). the-most-expensive-olympics-ever-2014-1.
Chen, F., & Tian, L. (2015). Comparative study on residents' perceptions of follow-up Thelwall, M. (2013). Heart and soul: Sentiment strength detection in the social web
impacts of the 2008 Olympics. Tourism Management, 51, 263e281. with SentiStrength. Proceedings of the CyberEmotions, 1e14.
Coaffee, J. (2015). The uneven geographies of the Olympic carceral: From excep- Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., & Kappas, A. (2010). Sentiment
tionalism to normalisation. The Geographical Journal, 181(3), 199e211. strength detection in short informal text. Journal of the American Society for
Ding, X., Liu, B., & Yu, P. S. (2008). A holistic lexicon-based approach to opinion Information Science and Technology, 61(12), 2544e2558.
mining. In Proceedings of the 2008 international conference on web search and Transparency International. (2013). Corruption perceptions index 2013. Retrieved
data mining (pp. 231e240). ACM. from https://www.transparency.org/cpi2013/results.
García-Palomares, J. C., Gutie !rrez, J., & Mínguez, C. (2015). Identification of tourist United Nations World Tourism Organization. (2013). UNWTO tourism highlights.
hot spots based on social networks: A comparative analysis of European met- Retrieved from http://www.e-unwto.org/doi/pdf/10.18111/9789284415427.
ropolises using photo-sharing services and GIS. Applied Geography, 63, 408e417. United Nations World Tourism Organization. (2014). UNWTO tourism highlights.
Getz, D. (1997). Event management and event tourism. New York: Cognizant Retrieved from http://www.e-unwto.org/doi/pdf/10.18111/9789284416226.
Communication Corporation. United Nations World Tourism Organization. (2015). UNWTO tourism highlights.
Hambrick, M. E., & Pegoraro, A. (2014). Social sochi: Using social network analysis to Retrieved from http://www.e-unwto.org/doi/pdf/10.18111/9789284416899.
investigate electronic word-of-mouth transmitted through social media com- Vu, H. Q., Li, G., Law, R., & Ye, B. H. (2015). Exploring the travel behaviors of inbound
munities. International Journal of Sport Management and Marketing, 15(3e4), tourists to Hong Kong using geotagged photos. Tourism Management, 46,
120e140. 222e232.
Kaplanidou, K., Karadakis, K., Gibson, H., Thapa, B., Walker, M., Geldenhuys, S., et al. Williams, S. A., Terras, M. M., & Warwick, C. (2013). What do people study when
(2013). Quality of life, event impacts, and mega-event support among South they study Twitter? Classifying Twitter related academic papers. Journal of
African residents before and after the 2010 FIFA World Cup. Journal of Travel Documentation, 69(3), 384e410.
Research, 52(5), 631e645. Xiang, Z., Schwartz, Z., Gerdes, J. H., & Uysal, M. (2015). What can big data and text
Kim, S. S., & Petrick, J. F. (2005). Residents' perceptions on impacts of the FIFA 2002 analytics tell us about hotel guest experience and satisfaction? International
world cup: The case of Seoul as a host city. Tourism Management, 26, 25e38. Journal of Hospitality Management, 44, 120e130.
Kirilenko, A. P., Molodtsova, T., & Stepchenkova, S. O. (2015). People as sensors: Yang, Y., Pan, B., & Song, H. (2014). Predicting hotel demand using destination
Mass media and local temperature influence climate change discussion on marketing organization's web traffic data. Journal of Travel Research, 53(4),
Twitter. Global Environmental Change, 30, 92e100. 433e447.
Kirilenko, A. P., & Stepchenkova, S. O. (2014). Public microblogging on climate Zeng, B., & Gerritsen, R. (2014). What do we know about social media in tourism? A
change: One year of Twitter worldwide. Global Environmental Change, 26, review. Tourism Management Perspectives, 10, 27e36.
171e182.
Kirilenko, A. P., & Stepchenkova, S. O. (2017). Sochi Olympics on Twitter: topics,
A.P. Kirilenko, S.O. Stepchenkova / Tourism Management 63 (2017) 54e65 65

ANDREI P. KIRILENKO, PHD Dr. Kirilenko is an Associate SVETLANA STEPCHENKOVA, PHD Svetlana Stepchenkova is
Professor in the department of Tourism, Recreation, and an Assistant Professor at the Dept. of Tourism, Recreation
Sports Management at the University of Florida. He & Sport Management, University of Florida. The area of her
received his first degree in Applied Mathematics, a Ph.D. in research interests is destination marketing and branding,
Computer Science, and held positions at the Center for media and user-generated communications in tourism,
Ecology & Forest Productivity in Russia, European Forest and social studies methodology.
Institute in Finland, US, Environmental Protection Agency
laboratory in Oregon, Purdue University, and the Univer-
sity of North Dakota. His research interests include Big
Data analysis, tourism analytics, climate change impacts
and sustainability issues, especially the water and food
security.

Potrebbero piacerti anche