0 Voti positivi0 Voti negativi

21 visualizzazioni9 pagineMay 26, 2014

© © All Rights Reserved

PDF, TXT o leggi online da Scribd

© All Rights Reserved

21 visualizzazioni

© All Rights Reserved

- The Woman Who Smashed Codes: A True Story of Love, Spies, and the Unlikely Heroine who Outwitted America's Enemies
- Steve Jobs
- NIV, Holy Bible, eBook
- NIV, Holy Bible, eBook, Red Letter Edition
- Hidden Figures Young Readers' Edition
- Cryptonomicon
- Make Your Mind Up: My Guide to Finding Your Own Style, Life, and Motavation!
- The Golden Notebook: A Novel
- Alibaba: The House That Jack Ma Built
- The 10X Rule: The Only Difference Between Success and Failure
- Autonomous: A Novel
- Hit Refresh: The Quest to Rediscover Microsoft's Soul and Imagine a Better Future for Everyone
- Hit Refresh: The Quest to Rediscover Microsoft's Soul and Imagine a Better Future for Everyone
- Life After Google: The Fall of Big Data and the Rise of the Blockchain Economy
- Algorithms to Live By: The Computer Science of Human Decisions
- Console Wars: Sega, Nintendo, and the Battle that Defined a Generation
- The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution

Sei sulla pagina 1di 9

Applying Data Mining Techniques to Stock Market

Analysis

Gabriel Fiol-Roig, Margaret Mir-Juli, and Andreu Pere Isern-Dey

*

Abstract. The stock market can be viewed as a particular data mining and artifi-

cial intelligence problem. The movement in the stock exchange depends on capi-

tal gains and losses and most people consider the stock market erratic and

unpredictable. However, patterns that allow the prediction of some movements

can be found. Stock market analysis deals with the study of these patterns. It

uses different techniques and strategies, mostly automatic that trigger buying

and selling orders depending on different decision making algorithms. It can be

considered as an intelligent treatment of past and present financial data in order

to predict the stock market future behavior. Therefore it can be viewed as an ar-

tificial intelligence problem in the data mining field. This paper aims to study,

construct and evaluate these investment strategies in order to predict future

stock exchanges. Firstly, data mining techniques will be used to evaluate past

stock prices and acquire useful knowledge through the calculation of some fi-

nancial indicators. Next artificial intelligence strategies will be used to construct

decision making trees.

Keywords: Data Mining, Artificial Intelligence, Stock Market, Investment Strate-

gies, Decision Trees.

1 Introduction to the Problem

The problem considered deals with the design and validation of a data mining de-

cision system which generates buying and selling orders in the stock market [1].

The decision systems design has three main phases as indicated in Figure 1: the

data processing phase, the data mining phase and the evaluation phase.

The data processing phase selects from the raw data base a data set that focuses

on a subset of attributes or variables on which knowledge discovery is to be

*Gabriel Fiol-Roig

.

Margaret Mir-Juli

.

Andreu Pere Isern-Dey

Departament de Cincies Matemtiques i Informtica

Universitat de les Illes Balears

Ctra. de Valldemossa km 7,5, 07122 Palma de Mallorca, Spain

e-mail: {biel.fiol,margaret.miro}@uib.es

520 G. Fiol-Roig, M. Mir-Juli, and A. Pere Isern-Dey

Data

Base

Data

Processing

Data

Mining

Evaluation

Transformed Data Patterns

Knowledge

Fig. 1 The Decision System

performed. It also removes outliers and redundant information, and uses finan-

cial indicators to represent the processed data by means of an Object Attribute

Table (OAT). The data mining phase converts the data contained in the OAT

into useful patterns, in particular decision trees are found [2]. The evaluation

phase proves the consistency of pattern by means of a testing set. The positively

evaluated decision system can then be used in real world situations that will al-

low for its validation.

2 The Data Processing Phase

The original data base is formed by financial information available in different

web pages. In particular, data from the American stock market, available from

Google finance, have been used. Out of all the variables originally considered the

target data is formed by: date (D), opening price (O), closing price (C), daily high

(H), daily low (L) and daily volume (V) as illustrated in Table 1.

Table 1 Data format available from Google Finance

Date Open High Low Close Volume

15-Dec-08 13.30 13.39 13.23 13.34 144567

In order to obtain a decision making system based on classification techniques a

particular company must be selected. This company must satisfy some restrictions:

- It must not be a technological security, in order to avoid using data from year

2000 economic bubble,

- It must not be a financial institution, due to the erratic behaviour of the past

years,

- It must have a significant influence within the American stock market,

- It must be a reference company.

Due to these restrictions and after a careful study of the stock market, Alcoa was

selected to complete the project. Alcoa is the world's leading producer and man-

ager of primary aluminum, fabricated aluminum and alumina facilities, and is ac-

tive in all major aspects of the industry. The targeted data includes information

from the beginning of 1995 till the end of 2008.

Applying Data Mining Techniques to Stock Market Analysis 521

2.1 Financial Indicators

This target data must be processed and transformed into useful data written in

terms of financial indicators [3]. The most widely used indicators are based on

graphical methods (where price oscillation is studied) and on mathematical meth-

ods (where mathematical equations are applied).

The Simple Moving Average (SMA) is the unweighted mean of the previous n

data points. It is commonly used with time series data to smooth out short-term

fluctuations and highlight longer-terms trends and is often used for the analysis of

financial data such as stock prices (P), returns or trading volumes. There are vari-

ous popular values for n, like 10 days (short term), 30 days (intermediate term), 70

days (long term) or 200 days depending on the number of days considered.

n

P P P

P SMA

n n

n

1 1

1

) (

+ + +

=

+

The moving averages are interpreted as support in a rising market or resistance in

a falling market. The SMA treats all data points equally and can be disproportion-

ally influenced by old data points. The exponential moving average addresses this

point by giving extra weight to more recent data points.

The Exponential Moving Average (EMA) applies weighting factors which de-

creases exponentially, giving much more importance to recent observations while

not discarding older observations totally. The degree of weighting decrease is ex-

pressed as a constant smoothing factor , which can be expressed in terms of n.

1

2

+

=

n

And the exponential moving average is expressed as:

) 1 ( ) (

1

+ =

+ n n n

EMA P P EMA

The moving averages are adequate indicators to determine tendencies. When a

short term moving average crosses over a longer term moving average we have a

rising tendency. Whereas when a short term moving average crosses under a

longer term moving average we have a falling tendency. The moving averages

have a drawback since the information provided has a time lag of several days.

This can be avoided by using more powerful indicators.

The Moving Average Convergence Divergence (MACD) is a combination of

two exponential moving averages with different number of data points (days). The

MACD is calculated by subtracting the y-day EMA from the x-day EMA. A z-day

EMA of the MACD, called the signal line, is plotted on top of the MACD, func-

tioning as a trigger for buy and sell signals. Variables x, y, and z are the parame-

ters of the MACD, usually x = 12, y = 26 and x = 9. This is illustrated in Figure 2,

the dark line is the MACD whereas the light line is the signal, on the top of the

figure the histogram (signal minus MACD) is drawn.

522 G. Fiol-Roig, M. Mir-Juli, and A. Pere Isern-Dey

Fig. 2 The Moving Average Convergence Divergence

The MACD is a trend-following momentum indicator that shows the relation-

ship between two moving averages of prices. Its two-folded interpretation is as

follows. If the MACD line and signal line are considered, when the MACD falls

below the signal line, it may be time to sell. Conversely, when the MACD rises

above the signal line, the price of the asset is likely to experience upward momen-

tum, it may be time to buy. If the histogram and the price curb are considered, a

divergence can be found and buying-selling signals can be obtained.

The stochastic (K) is an indicator that finds the range between an assets high

(H) and low price (L) during a given period of time, typically 5 sessions (1 week)

or 20 sessions (1 month). The current securities price at closing (C) is then ex-

pressed as a percentage of this range with 0% indicating the bottom of the range

and 100% indicating the upper limits of the range over the time period covered.

L H

L C

K

=100

An interesting interpretation for the stochastic can be obtained if the moving aver-

age associated with the stochastic D(K) is considered: The crossing points be-

tween K and D trigger buy and sell signals. When K crosses over a rising D it may

be time to buy, conversely when K crosses over a falling D selling might be con-

venient. If the stochastic and the price curb are considered, a divergence can be

found and buying-selling signals can be obtained.

The representation of the stochastic can be seen in Figure 3.

Fig. 3 The Stochastic Indicator (K)

The Bollinger Bands (BB) consists of a set of three curves drawn in relation to

security prices. The middle band is a measure of the intermediate-term trend, usu-

ally a moving average (simple or exponential) over n periods. This middle band

serves as the base to construct the upper and lower bands. The interval between

the upper and lower bands and the middle band is determined by volatility, typi-

cally the interval is Q times the standard deviation of the same data that were used

Applying Data Mining Techniques to Stock Market Analysis 523

to calculate the moving average. The default parameters most commonly used are

n = 20 periods and Q = 2 standard deviations.

95% of security prices can be found within the Bollinger Bands, the band repre-

sents areas of support and resistance when the market shows no tendencies. When

the bands lie close together a period of low volatility in stock price is indicated.

When they are far apart a period of high volatility in price is indicated.

Generally Bollinger Bands are used together with other indicators to reinforce

its validity. However, on its own they can trigger buy and sell signals. When

prices touch the lower band it might be convenient to buy, conversely when prices

touch the upper band selling might be convenient.

2.2 Processing of the Financial Indicators

The financial indicators introduced above will aid us in the construction of our

Object Attribute Table from the raw data base. In order to calculate these indica-

tors, R a language for statistical computing and graphics, is used [4]. R is a popu-

lar programming language used by a growing number of data analysts inside

corporations and academia. R is applied mainly as a statistical package but it can

be used in other areas of science such as numerical analysis, signal processing,

computer graphics and so on because of the number of growing implemented li-

braries in R to these scientific areas. See [5] for details on the use of R in acade-

mia and other institutions. R is available as Free Software under the terms of the

Free Software Foundations GNU General Public in source code form. It compiles

and runs on a wide variety of UNIX platforms and similar systems (including

FreeBSDband Linux, Windows and MacOS).

The financial indicators used in this paper are calculated by the application of

Rs TTR package to the raw data retrieved from Google finance. These indicators

must now be transformed into useful data as follows.

2.3 Transformed Data: The OAT

The next step in the data processing phase is the generation of the Object Attribute

Table [6, 7]. The indicators considered are the following: MACD (Moving Aver-

age Convergence Divergence), EMA(C) (Exponential Moving Average of the

Closing Price), EMA(V) (Exponential Moving Average of the Volume), K (Sto-

chastic) and BB (Bollinger Bands) and they generate eight binary attributes A1,

A2, A3, A4, A5, A6, A7 and A8 in the following manner.

- The MACD indicator identifies buying (B) or selling (S) orders depending

on the crossing of intermediate term moving averages. Two binary attrib-

utes A1 and A2 are used as follows: if an S signal is generated then A1=

1; if a B signal is generated then A2= 1.

- The EMA(C) indicator when compared to the stocks actual price generates

attribute A3, if the assets ranking is strong then A3=1.

524 G. Fiol-Roig, M. Mir-Juli, and A. Pere Isern-Dey

- The EMA(V) indicator when compared to the actual volume generates at-

tribute A4, if the assets reliability increases then A4= 1.

- The K indicator also identifies buying (B) or selling (S) orders depending

on the crossing of moving averages, in particular K and D. Two binary at-

tributes A5 and A6 are used as follows: if an S signal is generated then

A5= 1; if a B signal is generated then A6= 1.

- The BB indicator generates 4 zones. Two binary attributes A7 and A8 are

used as follows: if the assets ranking is strong then A7= 1 and A8=1; if

the assets ranking is weak then A7= 0 and A8=0; if a B hint is found then

A7= 0 and A8=1; if an S hint is obtained then A7= 1 and A8=0.

Thus, the OAT of transformed data has 8 columns corresponding to the above

mentioned binary attributes and a total of 3515 rows corresponding to available

transformed financial data, as illustrated in Table 2. The data contained in the ta-

ble must be filtered. First of all, repeated rows are eliminated. Secondly, incom-

plete rows are also removed. The remaining table has 61 rows. The next step is to

assign the class to each of the remaining rows taking into account the values of the

attributes as indicated by the financial expert. Three different classes are used de-

pending on whether buying (B), selling (S) or no action (N) must be taken.

Table 2 Object Attribute Table

A1 A2 A3 A4 A5 A6 A7 A8 class

0 0 0 0 1 0 1 0 S

0 1 0 0 0 0 0 1 B

0 0 1 0 0 1 0 1 N

0 0 0 1 0 1 1 0 N

1 0 1 0 0 0 0 1 S

1 0 1 1 0 0 1 0 S

3 The Data Mining Phase

In the data mining phase, the data contained in the OAT is converted into useful

patterns. In particular, a decision tree will be obtained. The decision tree is one of

the most popular classification algorithms. A decision tree can be viewed as a

graphical representation of a reasoning process and represents a sequence of deci-

sions, interrelationships between alternatives and possible outcomes. There are

different trees available depending on the evaluation functions considered. In this

project, the ID3 algorithm is used due to its simplicity and speed [8]. ID3 is based

on information theory to determine the most informative attribute at each step of

the process. In this sense, it uses information gain to measure how well a given at-

tribute separates the training examples into the classes.

Applying Data Mining Techniques to Stock Market Analysis 525

Figure 4 shows the resulting decision tree. The decision tree has a total of 37

nodes, 18 corresponding to the 8 attributes used and 19 leaf nodes or branches.

The average branch length is 6.

Attributes A1, A2 and A5 appear at the root of the tree suggesting the impor-

tance of indicators MACD and K in the classifying process. This is in accordance

with the experts that consider MACD and K as key indicators in the buying-selling

decisions, whereas EMA(C), EMA(V) and BB are used as reinforcement indicators.

A2

0

A1

A5

A7

A8 A8

0

0

0

0

1

1

1

0

A3

A8

0

0

1

A4

0

S

S

S N

N

B

A6

B

1

1

1

1

S

0

A4

1

1

0

A7

N

0

A6

N

A7

N B B

1

A7

A6 A6

N S N A3

B N

N

1

1 0

1

1

1

0

0 0 1

0

1 0

A2

0

A1

A5

A7

A8 A8

0

0

0

0

1

1

1

0

A3

A8

0

0

1

A4

0

S

S

S N

N

B

A6

B

1

1

1

1

S

0

A4

1

1

0

A7

N

0

A6

N

A7

N B B

1

A7

A6 A6

N S N A3

B N

N

1

1 0

1

1

1

0

0 0 1

0

1 0

Fig. 4 The ID3 Decision Tree

4 The Evaluation Phase

The evaluation phase proves the consistency of the ID3 decision tree by means of

a testing set. The testing set is formed by the stock market prices of a different as-

set, of similar characteristics to the one used to train the tree. This raw data are

transformed into a testing OAT using the previous indicators and attributes. Now

the decision tree is applied to the testing OAT and all instances are classified.

These classes obtained with the tree are compared with the classes corresponding

to the instances of the testing OAT as suggested by the expert. The evaluation of

the method provides the results shown on Table 3.

Table 3 Evaluation of the method

Parameters 3-class tree

Correct decision 113

Incorrect decision 176

Correct decision (%) 39.10

Incorrect decision (%) 60.90

526 G. Fiol-Roig, M. Mir-Juli, and A. Pere Isern-Dey

These results might seem poor. However, stock market analysis is an unpre-

dictable field and even though accuracy is desired, profit associated to capital gain

is crucial. Therefore, classification accuracy is not the only parameter that must be

evaluated. When profit is considered, the method provides the following results: if

an instance is correctly classified by the tree, the percentage gain is 579.71%, if an

instance is not correctly classified by the tree, the percentage loss is -461.56%.

This produces a total yield of 118.15% which is an acceptable result.

5 Conclusions and Future Work

The stock market is an unpredictable, erratic and changeable domain. Neverthe-

less, tools that predict the stock markets behaviour exist. This paper considers a

method based on financial indicators to generate a decision tree that classifies buy-

ing-selling orders. A decision tree in Artificial Intelligence is equivalent to an

automatic investment system in the stock market analysis. The procedure devel-

oped here is long and costly, but once the tree is generated, its automatic and ap-

plicable to any financial asset.

The initial raw data are daily stock market values of Alcoa, these data are trans-

formed into an OAT and used as a training set to construct a decision tree by

means of the ID3 algorithm. This decision tree has been tested and evaluated. The

results show a 40% accuracy percentage and a 118% profit gain that are consid-

ered satisfactory.

Currently, a second OAT with only two classes, buying (B) and selling (S) is

being studied. In this case the no action class (N) has not been considered. This 2-

class decision tree will allow for a more complete evaluation of the method.

This paper offers a limited vision of one of the many solutions available. The

following aspects can be considered as future work: other different financial indi-

cators may offer better results and should be studied, also other decision trees,

such as C4.5 or J48, can be obtained using different learning algorithms and

should be evaluated. The stock market is a dynamic world, and the decision tree

could present time-related disturbances. Therefore, an autonomous system capable

of dealing with these changes should be considered.

Acknowledgements. The authors want to thank Dami Vaquer, Gerard Draper and Jaume

Jaume for their helpful comments and suggestions.

References

1. Fayyed, U., Piatetsky-Shapiro, G., Smyth, P.: From Data Mining to Knowledge Discov-

ery in Databases. American Association for Artificial Intelligence, AI Magazine 96, 37

54 (Fall 1996)

2. Fiol-Roig, G.: UIB-IK: A Computer System for Decision Trees Induction. In: Ra,

Z.W., Skowron, A. (eds.) ISMIS 1999. LNCS, vol. 1609, pp. 601611. Springer,

Heidelberg (1999)

Applying Data Mining Techniques to Stock Market Analysis 527

3. Weinstein, S.: Stans Weinsteins Secrets For Profiting in Bull and Bear Markets.

McGraw-Hill, New York (1988)

4. The R project for Statistical Computing, http://www.r-project.org/

5. http://www.nytimes.com/2009/01/07/technology/

business-computing/07program.html

6. Mir-Juli, M.: Knowledge Discovery in Databases Using Multivalued Array Algebra.

In: Moreno-Daz, R., Pichler, F., Quesada-Arencibia, A. (eds.) Computer Aided Sys-

tems Theory - EUROCAST 2009. LNCS, vol. 5717, pp. 1724. Springer, Heidelberg

(2009)

7. Fiol-Roig, G.: Learning from Incompletely Specified Object Attribute Tables with Con-

tinuous Attributes. Frontiers in Artificial Intelligence and Applications, vol. 113, pp.

145152 (2004)

8. Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81106 (1986)

- A Novel Approach for Keyword extraction in learning objects using text miningCaricato daInternational Journal for Scientific Research and Development - IJSRD
- Jonathan Luke Paz ChargesCaricato daCaleb
- 5 Advanced Counter Trend Trading TechniquesCaricato dachandra369
- IFT2day OLIVER VALEZ.pdfCaricato daMohamedKeynan
- Daily CallsCaricato daAnujAsthana
- research paperCaricato daRishu Kaul
- Correlation Measure for Neutrosophic Re¯ned Sets and its application in Medical DiagnosisCaricato daDon Hass
- UntitledCaricato daeurolex
- ECL Research sample reportCaricato dawinnit25
- US WMA Overshoot AnalysisCaricato daHarman Bajwa
- Cup Pattern Breakout With MACaricato daMarlene
- Trade Smart OscillatorsCaricato daBimal Mahesh
- ΙΣΤΟΡΙΚΑ ΣΚΑΚΙCaricato dageorge
- UntitledCaricato daapi-237713995
- Technical Analysis ProjectCaricato dakshitish_michel
- ToolsCaricato daSiva Kalimuthu
- Forecast ExampleCaricato daEhli Hibre
- SMOOTHING METHODS OR.xlsxCaricato daAgrawal Anshul
- mgmt PART 1.xlsxCaricato daAnthony Anthony
- wu2017Caricato dalintasumroh
- SAS Miner Get Started 53Caricato daTruely Male
- IT444_Mid1 [MA]Caricato daMB
- All About Sheep and GoatsCaricato daMuhammad Umair Niaz
- data science_2Caricato daSatyavaraprasad Balla
- Concept and TeachniqueCaricato daIrfaan Khan Qasmi
- Triantakonstantis_MountrakisCaricato datrdimitrios
- Lecture2-ClassificationPrediction.pdfCaricato davemuriraja
- IpMGT497Caricato dagrvoneandonly
- DEVELOPING A FRAMEWORK FOR PREDICTION OF HUMAN PERFORMANCE CAPABILITY USING ENSEMBLE TECHNIQUESCaricato daCS & IT

- The Practice of Medicinal Chemistry, 4th EditionCaricato dalibrary2540
- SIRT1 Is a Regulator in High Glucose-Induced Inflammatory Response in RAW264.7 CellsCaricato dalibrary2540
- Sildenafil Vaginal SuppositoriesCaricato dalibrary2540
- Non-Thermal Plasma as a New Food Preservation MethodCaricato dalibrary2540
- Absorption and Metabolism of Pelargonidin-3-GlucosideCaricato dalibrary2540
- Cellular Automata Models of Cardiac ConductionCaricato dalibrary2540
- Neglected Taxa, Morphology and MoleculesCaricato dalibrary2540
- List of UniversitiesCaricato dalibrary2540
- 1v600860Caricato dalibrary2540
- Creative Space, Cultural Industry ClustersCaricato dalibrary2540
- Michael Acceptors as a Tool for Anticancer Drug DesignCaricato dalibrary2540
- On the Interaction and Integration of WheelsetCaricato dalibrary2540
- Performance Pay and Job SatisfactionCaricato dalibrary2540
- Ozone for Phenol Treatment in Industrial WastewaterCaricato dalibrary2540
- Thinking Aloud, Talking, And LeathinkingCaricato dalibrary2540
- Conferences & ExhibitionsCaricato dalibrary2540
- School IssuesVoice of ExperienceCaricato dalibrary2540
- Mesoscopic Mechanical Analysis of Filled ElastomerCaricato dalibrary2540
- Monoids, Acts and Categories.Caricato dalibrary2540
- English Colour Terms in ContextCaricato dalibrary2540
- Evaluation of Antioxidant Activity of Flavonoid ExtractsCaricato dalibrary2540
- Measurements of Hydrodynamic Loading OnCaricato dalibrary2540
- Applying Cognitive Linguistics to Second LanguageCaricato dalibrary2540

- 6-Electrical System 710655Caricato daEdwin Ruiz Vargas
- sony-ne5-mb-3e sony SCHEMATICCaricato davideoson
- EWA_PPRCaricato daAtul As Streetwarrior
- HKMO2016heatans.pdfCaricato daChai Usajai Usajai
- Lect 10 - Bottlenecks and Product MixCaricato daPowerup Capital
- Ci830 Memory MapsCaricato daEnrique Francisco Wekwert
- Tip 1Caricato daWM Suffy
- Lg w2600hp BfsCaricato dafwkl
- Identification Numbers and Check Digit SchemesCaricato daAniruddha Singhal
- 5 Info RetrievalCaricato daRaditya Adi Baskara
- Needle and Metal Contamination ControlCaricato daaristian27
- Manual ProtoCessorCaricato dajohaquin love
- Shapiro Wilk TableCaricato daAndrei Supra Sumeringah
- mn06130lCaricato dasheppie68
- Datasheet DRAM AsincronaCaricato daMatteo Zappia
- A Fixed Position Constraints in a Travelling Salesman Problem (TSP) with Multiple Job FacilitiesCaricato dadbpublications
- Influence of Water Bodies on Outdoor Air Temperature in Hot and Humid ClimateCaricato daNedyomukti Imam Syafii
- PT2262Caricato da9xbacninh
- Famous InequalitiesCaricato daG
- 030405 EnCaricato daRegistr Registr
- Business English Tests SampleCaricato daBONALA KONDAL
- EEPROM 28C16 datasheetCaricato daCesar
- SPC-RT-flex96C_WECS-9520_Dynex_2014-03.pdfCaricato dasammy_stuff1
- Cost HealthcareCaricato daDavide Falcioni
- Chapter 7Caricato datrushalvora
- MATH+1110_BOOKACaricato dabalvinder singh
- Ecg 09 ClippingCaricato daMihai Căta
- A7VO re92203_2009-06Caricato daLuisRenatoFerreiradaSilva
- Address DecodingCaricato daAunas Manzoor

## Molto più che documenti.

Scopri tutto ciò che Scribd ha da offrire, inclusi libri e audiolibri dei maggiori editori.

Annulla in qualsiasi momento.