10 1109@icbda 2019 8713246-2 PDF

2019 the 4th IEEE International Conference on Big Data Analytics
Predicting Financial Markets Using The Wisdom of Crowds
Anasse Bari Pantea Peidaee

Computer Science Department, CIMS Computer Science Department, CIMS
New York University New York University
USA USA
e-mail: abari@nyu.edu e-mail: pp1788@nyu.edu
Aniruddh Khera Jianghao Zhu

Computer Science Department, CIMS Computer Science Department, CIMS
New York University New York University
USA USA
e-mail: ak5146@nyu.edu e-mail: jz2575@nyu.edu
Hongting Chen
Computer Science Department, CIMS
New York University
USA
e-mail: hc1924@nyu.edu
Abstract—In the world of finance, one key lesson is the registered in Tesla’s capitalization [1] and 3% increase in
importance of psychology in the behavior of financial markets. Tesla’s stock price. Netizens flocked to guess possible new
Many investors are irrationally exuberant when making products that Tesla might announce - some thought it might
financial decisions, but predictive analytics can generate be a new solar energy product, a Tesla hovercraft, or a flying
insights that are free of investors’ emotions, and hence human broom. It turned out that there was no major product
irrational exuberance in decision-making can be mitigated. announcement; Elon Musk was simply referring to a
Data sources that investors adopt in their investment decision- software update that could help Tesla owners locate charging
making are, in most cases, traditional – including quarterly stations [2]. Just a few minutes after the Hawthorne event,
earnings reports and financial statements. In this work, we
Tesla stock fell several points.
propose a predictive analytics framework that aims at mining
insights from two alternative data sources: news articles and
2. Analogously, in December of 2016, President Trump
micro-blogs. We investigate the predictive correlation and tweeted: “Based on the tremendous cost and cost overruns of
causation between (1) collective opinion mining in news articles the Lockheed Martin F-35, I have asked Boeing to price-out
fused with Twitter mood and (2) movements in financial a comparable F-18 Super Hornet!” After the tweet, it was
markets. Experimental results indicate a relationship between reported that Lockheed Martin shares fell about two percent,
stock market prices and collective opinion mining variations on while Boeing shares were up 0.5 percent. The President’s
news articles combined with Twitter’s sentiment variations. tweet might have lost Lockheed Martin about one billion
The framework introduced in this work could potentially be dollars in market value [3].
adopted as a supplement to the conventional analyses being Data is becoming a catalyst of decision-making, growth
used in major investment banks. This research was partially and change; indeed, it is the new oil. However, data in its
funded by the Australian government under the Awards- raw form is useless. Dr.Eric Siegel said that “The real value
Endeavour research grant. of raw data is what is discovered therein” [4]. Predictive
analytics can interrogate and derive data-driven insights. It is
Keywords—predictive analytics, machine learning, artificial the art and science of mining data to extract novel and
intelligence, applied data science, finance, wall street, investment ultimately useful trends to make better data-driven decisions
banking [5]. And it has long been a research subject of both academic
and finance, using predictive analytics to understand
I. INTRODUCTION financial markets.
Words matter. Even a few hundreds of characters of In 1999, Wysocki was the first researcher to investigate
Twitter can influence stock prices, as seen in two recent the predictive power of micro-blogs in finance. He
instances that made headlines:1. In March 2015, Elon Musk investigated the predictive correlation between Yahoo!
of Tesla tweeted about unveiling a new product: “Major new Finance message boards and stock prices [6] and discovered
product line – not a car – will be unveiled at our Hawthorne that message volume can predict changes in next day stock
Design Studio on Thurs 8pm, April 30.” It was reported that returns. However, in 2001, Tumarkin et. al [7] contradicted
in just a few minutes, a one billion US dollar increase was the work of Wysocki and reported no correlative or causative
978-1-7281-1282-4/19/$31.00 ©2019 IEEE 334

link between the number of messages and next-day stock banking, for instance, data scientists mine satellite images of
returns. A key difference between the two studies was the shopping mall parking lots as an alternative data source to
nature of the data source they adopted in their analysis. predict the revenue numbers of the business entity in which
While Tumarkin analyzed data from RangingBull.com, they are considering making an investment [12]. In this paper
Wysocki analyzed Yahoo message boards. we consider two alternative data sources: news articles and
In late 2012, Oliveira et. al [8] analyzed StockTwits.com, Twitter data.
a social media platform founded in 2008 that was designed
for sharing ideas between investors, traders, and A. Twitter Data
entrepreneurs, and discovered that messages posted on social We collected over eight hundred thousand tweets in 2017.
media have an influence over the stock market and thus can The tweets were associated with fourteen companies listed
be used to enhance the prediction of trading volume. In on the Australian ETF. Table I. shows the fourteen
recent years, many researchers [9], [10] and [11] have companies included in this analysis. We designed
expanded on the work published in 2011 where the authors taxonomies of related terms and concepts for each of the
used the sentiment score of Twitter data to predict the fourteen companies that were used to help filter the noisy
percent change of the Dow Jones Industrial Average (DJIA) tweets and to associate the tweets with the entities in
over time. The authors’ analysis relied on two major open question. The taxonomy we built contained terms related to
source sentiment tools: OpinionFinder, which measures company names, leadership team names, company products,
positive and negative mood, and Google-Profile of Mood related industries, and competitors.
States (GPOMS), which models the emotion in text into a Figure. 1. illustrates the first layer of the universe of
six-dimensional vector: calm, alert, sure, vital, kind, and taxonomies that we built to get a better quality of tweets and
happy. Correlation does not always imply causation; news articles relevant to the entities in the ETF in question.
therefore, Granger causality analysis was used to further A location filter for Australia was applied to all tweets to
investigate this correlation. Although the analysis was done enhance the quality of Twitter data. More details on the pre-
on a short stream of data and lacks depth in terms of testing, processing of the data are illustrated in the next section.
it reflects a promising accuracy of 87% on predicting the
TABLE I. ENTITIES BASED IN AUSTRALIA INCLUDED IN THIS ANALYSIS
daily up and down changes of the DJIA. The research study
presented in this work was motivated by the following Entity Name Descriptions
question: Could a machine learning analysis of micro-blogs Commonwealth The largest retail bank in Australia. Main
and news articles really help predict changes in stock prices? Bank of Australia products are loans and insurance.
(CBA)*
We investigate in this study if there is a predictive
Westpac Banking Second largest retail bank in Australia.
correlation between (1) opinions expressed in collections of Corp (WBC)* Main products are loans and insurance.
news articles and Twitter and (2) movement of stock prices Australia & New Third largest retail bank in Australia.
on a daily basis. We use a real dataset of underlying stock Zealand Bank Main products are loans and insurance.
prices of the Australian exchange-traded funds (ETF), along (ANZ)*
with a large collection of Tweets and news articles BHP Billiton Ltd Biggest mining company in Australia.
(BHP)* Main product is iron ore.
associated with the underlying ETF securities in question.
National Australia Fourth largest retail bank in Australia.
We designed an ensemble of sentiment analysis algorithms Bank (ANZ)* Main products are loans and insurance.
that measure sentiment variations in Twitter and news CSL Ltd (CSL)* A pharmaceutical company whose main
headlines in 2017. We learned that there is a consistent product is vaccines.
predictive correlation between sentiment variations from Wesfarmers (WES)* Holds one of the largest retail
news and Twitter, and movements in stock prices. Our supermarkets, petro station.
results register an accuracy of 78.33% in the predictive Woolworths Holds one of the largest retail
models that estimate the percentage change in the stock (WOW)* supermarkets, petro station.
Rio Tinto (RIO)* One of the biggest metal mining
market based on the news articles and Twitter sentiment. companies.
After that, we conducted regression analysis (autocorrelation, Macquarie Group Biggest investment bank in Australia.
autoregression and regression) as well as causality analysis (MQG)* Also has a small retail bank.
to measure the prediction power of these data sources on the Woodside Petroleum Big Petroleum exploration company.
price movement of ETF. In the next sections, we will outline (WPL)
details about the data used in this study, as well as the data Scentre Group Owns retail shopping centers and is a
(SCG) branch of Westfield.
preprocessing, data modeling, and experimental results.
Suncorp Group A retail bank and insurance company.
(SUN)
II. ALTERNATIVE DATA SOURCES IN FINANCE Westfield Corp Property management company. Owns
Alternative Data is a relatively new terminology in the (WFD)* retail centers in Australia, USA, UK.
world of investment banking. It refers to data collected from * Large weighting in S&P200 index (Major Australian index in Australia)
and EWA (Major Australian ETF trading in US)
non-traditional data sources that can provide new
perspectives about the entity or the event in question. B. News Articles
Inclusion of alternative data in the predictive analytics
The second alternative data source that we adopted in this
process can provide insights beyond the ones that can be
analysis was news articles. We tried several APIs that could
learned from traditional financial data sources. In investment
335
be used to collect news related to the Australian market and A. Data Pre-processing
in the end incorporated news from two sources: HotCopper We applied standard text processing algorithms to
and Bing News. HotCopper is an Australian stock market Twitter data to remove special characters, numbers, and stop
online chat forum that allows its users to discuss financial words. We removed all punctuation aside from exclamation
topics. Using these two sources, we assembled over three points and question marks, as they were used in the
thousand news articles from 2017 about the top fourteen sentiment analysis that will be described later in this section.
companies in the Australia Index Fund (EWA) mentioned in Stemming was applied to all tweets, converting words with
Table I. the same root to their original format. For instance,
C. Stock Market Data “connection”, “connections”, and “connective” were all
linked to their root form, “connect”. Tweets and news
Yahoo Finance interface was used to collect the data for articles were stored in a Hadoop cluster, and then a map-
the Australian Index Fund (EWA). Data was collected for reduce version of the porter stemming algorithm was applied.
daily prices of open, close, high, low, and traded volume. An After employing these pre-processing techniques, all the
exchange-traded fund, ETF for short, is a marketable security stemmed terms and tweets were combined in to one
that tracks an index or a basket of assets like an index fund document-feature matrix. We adopted a time-based Term
and is different from a mutual fund in a way that an ETF Frequency and Inverse Document Frequency (TF-IDF) as a
trades like a common stock on a stock exchange; an ETF can measure to normalize terms and their frequency for a specific
be bought and sold. Investors might be more attracted to window of time. Equation one represents the inverse
invest in an ETF than in a mutual funds share because an document-frequency where N is the total number of days
ETF has higher daily liquidity and lower fees than mutual
funds. One of the main reasons that led us to select an ETF under the window of observation. Term-frequency tf (t , d )
for our data is that it offers exposure to the Australian equity corresponds to the occurrence of a word over a day in tweets
market based on social media sentiment. Studying the and news document.
Australia Index Fund can be useful for financial decision idf (t , d ) log( N y {d D : t d } ) (1)
making given that the underlying equities and trading region
of the fund are in two different countries with two macro- B. Ensemble of Opinion Mining Algorithms for Generating
economic factors. Mood Time Series
Opinion Mining is the process of computationally
categorizing opinions expressed in text [13]. The goal of a
sentiment analysis algorithm is to determine the attitude and
emotion of the author towards the topic mentioned in his or
her text [14]. After pre-processing, we designed an ensemble
of two sentiment analysis algorithms: Opinion Finder (OF)
and Stanford natural language processing programming
interface (SNLP). OpinionFinder (OF) is a sentiment
analysis algorithm that can be used to identify sentence-level
subjectivity [15]. OF relies on a lexicon of over nine
thousand positive and negative words. The algorithm is
known in literature for its capability to successfully analyze
the emotional context of a large collection of text.
For every tweet and news article headline, we computed
a sentiment score based on the positive and negative words
Figure 1. Graph of ETF’ entities contained in the text. Although OF pulls from a large
dictionary of words, it ignores the word order of the sentence,
which could potentially lead to inaccuracies in predicting
III. EXPERIMENTAL METHODS AND RESULTS sentiment scores. For that reason, we adopted Stanford NLP
The data science problem addressed in this paper can be in our ensemble in addition to OpinionFinder. SNLP is a
formulated as follows: Given (1) historical daily prices and deep learning model that builds up features based on the
volume of the exchange-traded fund, and (2) collection of sentence structure. It computes the sentiment based on how
tweets associated with the ETF and (3) news articles words compose the meaning of longer phrases. The SNLP
published on any given day that are related to the ETF in algorithm is based on a Recursive Neural Network
question, predict the direction of the ETF stock for the next implementation that builds on top of grammatical structures.
day (up or down). In the following sections, we explain the As we were processing the tweets, we found an
data pre-processing methods, opinion mining algorithms, overwhelming use of emojis, a relatively small digital image
feature engineering methods, and data classification - typically smileys and ideograms - used to express an
techniques used in this study. emotion. In order to enhance the sentiment score in twitter,
we included analysis of emojis, which were frequently used
in the tweets we analyzed.
336
Our results indicate that including Emojis in the analysis tweets, and percentage change on the ETF from the previous
enhanced sentiment measurement, particularly in Twitter day (up vs down).
data, as shown in Figure 2 sentiment analysis was applied to
both tweets and news article titles, aiming to derive one D. Correlation and Lag Analysis
sentiment score per day for both data sources in order to In order to analyze whether Twitter sentiment is
generate sentiment time series. correlated and possibly a predictor of ETF stock prices, we
investigated the synchronous correlation coefficients
between the two-time series at various lags. Consider twitter
sentiment time series as x = {x1. ...,,xn} and ETF stock price
time series as y = {y1, ..., yn}, the cross correlation γ at lag h
is then defined as shown in equation 2:
(2)
where x and y are the mean values of the x and y,

respectively. We use the cross-correlation to estimate the
correlation between xt+h and yt, by keeping y still, but move x
Figure 2. Performance measure of different sentiment analysis algorithms forward or backward in time by a lag of h. When h < 0, it
C. Feature Engineering means x leads y and vice versa. We observed a strong cross-
correlation presence between the ETF stock prices (yt) and 1
Tweets and news article headlines were combined in a
day lagged Twitter sentiment (xt-1), which we observed as
document term matrix where the rows represented a data
0.79 as shown in Figure 6. Similarly, when we carried out
record from the two alternative data sources (tweets and
the same investigation with 3 days lagged news sentiment (zt-
news articles), and the columns represented TF-IDF scores
3), we observed a cross-correlation presence of 0.71,
for the stemmed key-phrases. To cluster the terms, we
demonstrated in Figure 5. Thus xt-1 and zt-3 could be potential
applied data clustering algorithms and cluster analysis. The
predictors of yt, and x and z lead y by lag=1 and lag=3
best silhouette score was for k=3. The terms were clustered
respectively. Table 2 shows the feature ranking score of the
to three topics. We analyzed the words used in news articles
features used in this analysis. We applied feature ranking
that were published before the ETF stock prices went down
using information gain (IG) for each feature to build the
and noticed a trend in the words being used and their
classification model. The table below summarizes the IG for
influence on the stock. Figure 3 shows a word cloud
each feature, which represents the change in entropy of
representation of the words and key-phrases used in articles
outcome (stock moving up) based on a feature and a given
that we believe positively influenced the ETF’s stock price,
outcome. Let px be the probability that an arbitrary sample
and Figure 4 shows the word cloud for articles that had a
negative impact. corresponding to feature N belongs to class C x , estimated by
|CX, N|/|N|.
The entropy needed to classify a sample in N:
¦
m
Info( N ) = - p lg(p x ) (3)
x 1 x
Information needed after using feature F to split N into y
partitions to classify N:
¦
Ny
Info F (N ) = - I(N y ) (4)
N
Information Gain for a feature F is given by:

Figure 3. Positive effect phrases Figure 4. Negative effect phrases
IG ( F ) Info( N ) InfoF ( N ) (5)
Ontology can be described as the linguistic expression of
the underlying relations within a concept, with the relevant TABLE II. FEATURE WEIGHT BY INFORMATION GAIN
dictionary of words representing them. Studying the
Feature Weight
ontologies used in articles that were associated with upward
or downward movements of the ETF helped in deriving the OpenPrice(t-1) 0.384
topics and features for the final feature space. For each tweet ClosePrice(t-1) 0.263
or news article, the final set of features that were selected
TwitterSent(t-1) 0.143
were the following: opening price of the previous day,
closing price of the previous day, weighted Twitter sentiment NewsSent(t-3) 0.109
score of the previous day, sentiment score on news articles of Topic 0.093
the three previous days, topics modeled from both news and
337
E. Time Series Cross Validation
We applied cross validation on the stock price time series
and compared 1-step, 2-step,..,12-step forecast using Mean
Absolute Error (MAE), also known as roll-forward cross-
validation.
Tree Algorithm Pseudo code
Input: Sample set & Set of features with entropies:

Map<Features> featMap
//sorted in descending order
Output: Decision tree classification model to
predict stock movement: 1 for market up & 0 for
down
classificationTreeModel()
while(stopCondition != True) Figure 6. ETF price vs. Twitter sentiment
createNewNode(node.left);
createNewNode(node.right);
if(checkStopCondition(node) == True)
predictedLabel
// 1 for market up, 0 for market down
print samplesUnderClass()
stopCondtion = True;
else
continue;
createNewNode()
node <- select feature F based on Info Gain &

threshold of node.parent
update features map featMap with discretized

values after splitting
return node; Figure 7. ETF price vs. Twitter sentiment & news articles sentiment
checkStopCondition(node)
isNodeHomogeneous() && noFeatureRemain() && F. Autocorrelation and Auto-regression of Time Series
noSampleRemain()
return True
In this phase of the project autocorrelation and auto-
else regression analysis of the ETF price is assessed and
return False presented as shown in figure 8. At one time step, the
correlation becomes negative and oscillates around zero.
Using one-day lag of open price to run the regression and to
predict the direction of the market open price movement
does not generate valuable information as it can be
visualized from Figure 9. This generally means that using
one-day lag of open signal as the only feature does not
provide a predictive feature.
Figure 5. ETF price vs. news articles sentiment
Figure 8. Autocorrelation of ETF close prices
338
bing/open@t+1 1 0.2041 0.6583 0.2478 0.6186
bing/open@t+1 2 2.1621 0.1615 6.2899 0.0431
bing/volume@t+1 1 0.4450 0.5156 0.5403 0.4623
bing/volume@t+1 2 0.6531 0.5395 1.8999 0.3868
Then, to evaluate prediction power of these features,

several regression models with different combination of
features was implemented in Table IV and Figure 10.
In Figure 10, red line shows ground truth market prices and
blue line shows prediction by predictive models for different
combination of features. As a future work, we will consider
applying biologically inspired algorithms in [15] and [16] to
Figure 9. Autoregression of ETF close prices compare the results against regression models used in this
study.
G. Granger Causality and Regression Models TABLE IV. LINEAR REGRESSION RESULTS
After autoregression and autocorrelation analysis of ETF
prices, Granger Causality test was applied to analyze Parameters F-statistic R2
causality relation between twitter sentiment and news
sentiment and ETF prices. From the table III, we conclude twitter 2.901 0.153
the causality effect of sentiment scores for the next open and
twitter, Hotcopper 1.503 0.167
such. Thus, The data show that twitter combined with new
articles demonstrate causality effect for one lag time step of twitter, Hotcopper, bing 0.9404 0.168
the ETF open price with average accuracy of 76%. twitter, Hotcopper, bing, open@t-1 6.478 0.666
TABLE III. GRANGER CAUSALITY RESULT OF OPEN, CLOSE, AND VOLUME twitter, Hotcopper, bing, open@t-1, close@t-1 5.632 0.701
FOR TWO LAGS. twitter, Hotcopper, bing, open@t-1, close@t-1,
volume@t-1 4.962 0.73
Parameters Lag# F Test p-value chi2 Test p-value
twitter/close@t 1 0.0069 0.9351 0.0083 0.9272

twitter/close@t 2 0.3725 0.6974 1.0836 0.5817
twitter/close@t+1 1 3.0750 0.1014 3.7339 0.0533
twitter/close@t+1 2 2.5271 0.1250 7.3516 0.0253
twitter/open@t+1 1 7.4886 0.0161 9.0933 0.0026
twitter/open@t+1 2 3.9207 0.0518 11.4058 0.0033
twitter/volume@t+1 1 0.0179 0.8955 0.0217 0.8828
twitter/volume@t+1 2 0.2041 0.8184 0.5936 0.7432
Hotcopper/close@t 1 0.5889 0.4556 0.7150 0.3978

Hotcopper/close@t 2 0.1718 0.8444 0.4998 0.7789
Hotcopper/close@t+1 1 0.0392 0.8460 0.0476 0.8274
Hotcopper/close@t+1 2 0.3996 0.6800 1.1624 0.5592
Hotcopper/open@t+1 1 0.0582 0.8129 0.0707 0.7904 Figure 10. Regression models prediction using different combination of
features
Hotcopper/open@t+1 2 0.0530 0.9486 0.1542 0.9258
Hotcopper/volume@t+1 1 0.0213 0.8860 0.0259 0.8721
Hotcopper/volume@t+1 2 2.5088 0.1266 7.2982 0.0260 IV. CONCLUSION
In this study, we investigated whether collective crowd
mining sentiments from news articles, micro-blogs and
bing/close@t 1 0.2114 0.6528 0.2566 0.6124 emojis could provide a reliable supplement to fundamental
bing/close@t 2 0.7204 0.5082 2.0956 0.3507 research. Examination of these collective sentiments against
bing/close@t+1 1 0.0190 0.8924 0.0230 0.8793
the price movement of our sample financial product
(Australia Index Fund) identified a correlation and causality
bing/close@t+1 2 1.3879 0.2901 4.0377 0.1328
between crowd sentiment and movement in the financial
339
market. Fusing the wisdom of the crowds learned from two volume." Portuguese Conference on Artificial Intelligence. Springer,
alternative data sources – both Twitter and news articles – Berlin, Heidelberg, 2013.
could provide the investor with an edge to position them [9] Johan Bollen, Huina Mao, and Xiaojun Zeng. "Twitter mood predicts
the stock market." Journal of computational science 2.1 (2011): 1-8.
ahead of the pack.
[10] Sohangir, Sahar, et al. "Big Data: Deep Learning for financial
ACKNOWLEDGEMENTS sentiment analysis." Journal of Big Data 5.1
(2018):StockMarketPredictionUsingTwitterSentimentAnalysis.
This work was partially supported by the Australian pdf) 15 (2012).
government department of education under the Australia [11] Huina Mao, Scott Counts, and Johan Bollen. "Predicting financial
Awards-Endeavour research and fellowship grant. markets: Comparing survey, news, twitter and search engine
data." arXiv preprint arXiv:1112.1051 (2011).
REFERENCES [12] Sharad Goel et al. "Predicting consumer behavior with Web
search." Proceedings of the National academy of sciences 107.41
[1] Cornell Bradford, and Aswath Damodaran. "Tesla: Anatomy of a (2010): 17486-17490.
Run-up." The Journal of Portfolio Management 41.1 (2014): 139-
151. [13] Bari, Anasse, and Goktug Saatcioglu. "Emotion Artificial
Intelligence Derived from Ensemble Learning." 2018 17th IEEE
[2] Malhotra, Claudia Kubowicz, and Arvind Malhotra. "How CEOs can International Conference On Trust, Security And Privacy In
leverage twitter." MIT Sloan Management Review 57.2 (2016): 73.
Computing And Communications/12th IEEE International
[3] Ge Qi, Alexander Kurov, and Marketa Halova Wolfe. "Stock Market Conference On Big Data Science And Engineering
Reactions to Presidential Social Media Usage: Evidence from (TrustCom/BigDataSE). IEEE, 2018.
Company-Specific Tweets." (2017). [14] Bari, A., Chaouchi, M., & Jung, T. (2016). Predictive analytics for
[4] Siegel, Eric. Predictive analytics: The power to predict who will click, dummies. John Wiley & Sons.
buy, lie, or die. Hoboken: Wiley, 2013. [15] Bellaachia, A., & Bari, A. (2012, June). Flock by leader: a novel
[5] Usama M. Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. machine learning biologically inspired clustering algorithm.
"Knowledge Discovery and Data Mining: Towards a Unifying In International Conference in Swarm Intelligence (pp. 117-126).
Framework." KDD. Vol. 96. 1996. Springer, Berlin, Heidelberg.
[16] Bellaachia, A., & Bari, A. (2012, March). A flocking based data
[6] Peter D Wysocki. "Cheap talk on the web: The determinants of
postings on stock message boards." (1998). mining algorithm for detecting outliers in cancer gene expression
microarray data. In Information Retrieval & Knowledge
[7] Robert Tumarkin, and Robert F. Whitelaw. "News or noise? Internet Management (CAMP), 2012 International Conference on (pp. 305-
postings and stock prices." Financial Analysts Journal 57.3 (2001): 311). IEEE.
41-51.
[8] Nuno Oliveira, Paulo Cortez, and Nelson Areal. "On the predictability
of stock market behavior using stocktwits sentiment and posting
340

10 1109@icbda 2019 8713246-2 PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

10 1109@icbda 2019 8713246-2 PDF

Caricato da

Copyright:

Formati disponibili

2019 the 4th IEEE International Conference on Big Data Analytics

Predicting Financial Markets Using The Wisdom of Crowds

Anasse Bari Pantea Peidaee

Aniruddh Khera Jianghao Zhu

978-1-7281-1282-4/19/$31.00 ©2019 IEEE 334

where x and y are the mean values of the x and y,

Information Gain for a feature F is given by:

Tree Algorithm Pseudo code

Input: Sample set & Set of features with entropies:

node <- select feature F based on Info Gain &

update features map featMap with discretized

Figure 5. ETF price vs. news articles sentiment

Figure 8. Autocorrelation of ETF close prices

Then, to evaluate prediction power of these features,

twitter/close@t 1 0.0069 0.9351 0.0083 0.9272

Hotcopper/close@t 1 0.5889 0.4556 0.7150 0.3978

Potrebbero piacerti anche