Sei sulla pagina 1di 28

University of

Engineering and Technology, Lahore

Stock Market Price Prediction using Machine


Learning
Using K-Nearest Neighbors, Linear Regression and Long-Short Term
Memory Algorithms
By:
Declaration

We hereby declare that we carried out the work reported in this project ‘Stock Market Price
Prediction using Machine Learning, under the supervision of professor who deliver us
presentations. We solemnly declare that to the best of our knowledge, no part of this report
has been submitted here or elsewhere in a previous application for award of a degree. All
sources of knowledge used have been duly acknowledged.
Approval

This is to certify that the report titled “Stock Market Price Prediction using Machine
Learning” has been carried out by and has been read and approved for meeting part of the
requirements and regulations. It has been thoroughly reviewed and checked for both
content and accuracy.
Acknowledgment

Many capable people like our seniors have contributed in the field of Machine Learning
Project. It has been thoroughly reviewed and checked for both content and accuracy. Our
respected professor has contributed greatly to this project throughout the many phases of
development Dr. Kashif Javed have done an outstanding job editing the manuscript.
Thanks to all for management of the art and text program. We wish to express our
appreciation to those already mentioned as well as the reviewers who provided many
valuable suggestions and constructive criticism that greatly influenced this report.
Dedications

This report is dedicated to my parents and teachers.


Abstract

“Stock Market Price Prediction using Machine Learning”, in this project, we learnt many
technical skills their while visiting to whole algorithms. Moreover, this project was for
polishing our practical knowledge about machine learning that how it works. The training
of modules, testing on test, validation set, calculating errors, accuracy guided and showed
to us brief overview of machine learning algorithms and platforms used for using
algorithms.
8|Page

Contents

Chapter No.1 ............................................................................................................................. 11


Introduction.......................................................................................................................... 11
Chapter No.2 ........................................................................................................................... 12
Literature View .................................................................................................................... 12
Chapter No.3 ........................................................................................................................... 14
Problem Statement.............................................................................................................. 14
Chapter No.4 ........................................................................................................................... 15
Proposed Solution ............................................................................................................... 15
Chapter No.5 ........................................................................................................................... 16
Research Paper -I ................................................................................................................. 16
5.1 Methodology ............................................................................................................... 16
5.2 Mathematical Calculations and Visualizing Models ................................................ 16
5.3 Visualization Graph .................................................................................................... 17
5.4 Data Description, Analysis, Result and Conclusion ................................................. 17
Chapter No.6 ........................................................................................................................... 18
Implementing Research Paper -I ........................................................................................ 18
6.1 Our Data Set Collection ............................................................................................. 18
6.2 Division of Data Set ................................................................................................... 19
6.3 Environment & IDE .................................................................................................... 19
6.4 Preprocessing of Dataset ........................................................................................... 19
6.3 KNN Algorithm ..........................................................................................................20
6.4 Results ........................................................................................................................20
6.5 Analysis, Comparison and Conclusion ..................................................................... 21
Chapter No.7 ........................................................................................................................... 22
Research Paper -II ............................................................................................................... 22
7.1 Methodology ............................................................................................................... 22
7.2 Raw Data ..................................................................................................................... 23
7.3 Data Preprocessing ..................................................................................................... 23
9|Page

7.4 Feature Extraction ...................................................................................................... 23


7.5 Training Neural Network ........................................................................................... 23
7.6 Analysis, Results and Conclusion .............................................................................. 23
Chapter No.8 ........................................................................................................................... 25
Implementing Research Paper -II ....................................................................................... 25
8.1 LSTM Algorithm ......................................................................................................... 25
8.2 Results......................................................................................................................... 25
8.3 Comparison, Analysis and Conclusion .................................................................... 26
Chapter No.9 ........................................................................................................................... 27
Conclusion ........................................................................................................................... 27
Chapter No.10 ..........................................................................................................................28
References ............................................................................................................................28
10 | P a g e

List of Figures

Figure 1: Relation between Predicted/Actual Closing Price for 1-year Period ..................... 17
Figure 2: Dataset of Descon Oxychem ................................................................................... 18
Figure 3: Reading Dataset ....................................................................................................... 19
Figure 4: Output after Reading Dataset ................................................................................. 19
Figure 5: Jupyter Notebook for Python Code ........................................................................ 19
Figure 6: Creating Data Frame Variables ............................................................................... 19
Figure 7: Output after Preprocessing .....................................................................................20
Figure 8: Basic Graph between Data and Closing Price ........................................................20
Figure 9: KNN Model for Making Predictions .......................................................................20
Figure 10: KNN Results ........................................................................................................... 21
Figure 11: Calculating ERMS ................................................................................................... 21
Figure 12: Comparing Results ................................................................................................. 21
Figure 13: High/Low/Open/close with 500 epochs ...............................................................24
Figure 14: LSTM Model ........................................................................................................... 25
Figure 15: LSTM Results .......................................................................................................... 25
Figure 16: Comparing Results ................................................................................................ 26
11 | P a g e

Chapter No.1
Introduction

Stock market is a collection of buyers and sellers which show their interest with the trading
of stocks which are released by the companies for elevating the capital and are bought by
the investors in order to get a portion of the company. Stock Market is always aggressive;
you don’t know what will happen next as it is very difficult to predict the future stock price
of the companies since it keeps fluctuating every day. As we all know a company’s stock
depends on many factors and taking into account the nonlinearities and discontinuities of
the factors which are considered to impact stock markets. Some of the reasons are like
company related news, political events natural disasters etc. stock price prediction is one
of the most important issues to be investigated in academic and financial researches.
The stock market is an evolutionary, complex and a dynamic system. Market prediction is
characterized by noise, data intensity, non-stationary, uncertainty and hidden
relationships. The prediction of trend in stock market exchange has been a challenging and
important research topic. It is challenging because the data is noisy and not stationary. It
is important because it can yield important results for decision makers. Stock market is
such a location where companies invest high capital and do their shares trading. Stock
market prediction has disproved the Efficient Market Hypothesis which states that it is
impossible to predict the market because it is efficient. Researchers have proved that it is
possible to predict the stock market. The ability of making future stock market prediction
is an important factor for investors for making money. It also helps investors to make
selling or buying decisions to generate higher profits.
The following report will delve into the concept of using algorithms to attempt to predict
stock market prediction. This is an important aspect to knowledge because it can give rise
to many prediction techniques in the future and it introduces a link between the finance
field and the field of computer science. To explain the approach of the research, question
the following section will explain the scope of the project.
This report will explore the ability of using the KNN, Linear Regression and Long-Short
Term Memory algorithms as methods of predicting stock market movements. Several steps
will be followed as we attempt to answer and explore the research question.
12 | P a g e

Chapter No.2
Literature View

M.Suresh Babu et al., 2012, this paper investigates the significant clustering calculations: K-
Means, Hierarchical grouping calculation and turn around K-Means and look at the
execution of these noteworthy clustering calculations on part of effectively class savvy
group building capacity of calculation. The proposed strategy comprises of three stages. To
start with, they change over each money related report into an element vector and utilize
the various levelled agglomerative grouping strategy to isolate the changed over element
vectors into bunches. They consider both subjective and quantitative highlights in
monetary reports. Second, they join the upsides of two grouping techniques to propose a
compelling clustering strategy. Third, picking a fitting number of parts in HAC can limit
the bunches produced and in this way enhance the nature of the grouping created by the
K-means clustering.
Khalid Alkhatib et al., 2013, applied “K-Nearest Neighbour” algorithm and indirect relapse
approach so as to anticipate stock costs for an example of six noteworthy organizations
recorded on the Jordanian stock exchange to help financial specialists, administration,
chiefs, and clients in making right and educated ventures choices. As indicated by the
outcomes, the k-NN algorithm is vigorous with little mistake proportion; subsequently the
outcomes were sound and furthermore sensible.

Mahajan Shubhrata D et al., 2016, this paper is to anticipate future stock value utilizing
forecast idea. In that Parse Records at that point figure anticipated esteem and send to
client. Also, consequently perform activities like buy and deal shares utilizing
Automation idea. For that utilization Naïve Bayes Algorithm. There is Real time Access
by Download log shapes hurray back site and Store in dataset. The investigations uncover
a high capability of Naïve Bayes Algorithm in foreseeing the arrival on interest in the offer
market.
Xiao Ding et al., 2015, recommended that a deep learning technique for occasion driven
securities exchange expectation. Initially, occasions are removed from news message, and
spoke to as thick vectors, prepared utilizing a novel neural tensor system. Second, a deep
convolutional neural system is utilized to demonstrate both here and now and long haul
impacts of occasions on stock value developments. They exhibited that deep learning is
helpful for occasion driven stock value development forecast by proposing a novel neural
tensor system for learning occasion embedding, and utilizing a deep convolutional neural
13 | P a g e

system to demonstrate the joined impact of long haul occasions and here and now
occasions on stock value developments.
Adebiyi Ayodele et al., 2012, proposed investigation work to enhance the exactness of every
day stock value forecast of securities exchange records utilizing artificial neural
networks. The examination utilized three-layer, multilayer perceptron models (a
feedforward neural system demonstrate) prepared with backpropagation calculation.
This paper displays that hybridized approach can possibly upgrade the nature of basic
leadership of financial specialists in money markets by offering more precise stock forecast
contrasted with existing specialized investigation based approach.
Luckyson Khaidem et al., 2016, proposed a novel method to limit the danger of interest in
stock advertise by foreseeing the profits of a stock utilizing a class of intense machine
learning calculations known as ensemble learning. They have utilized four administered
learning calculations, i.e “Logistic Regression, Gaussian Discriminant Analysis,
Quadratic Discriminant Analysis, and SVM”.
Dinesh Bhuria et al., 2017, surveyed about stock market prediction using regression
techniques and proposed productive regression way to deal with foresee the stock market
cost from stock market information based. In future the consequences of multiple
regression approach could be enhanced utilizing more number of factors. This
examination ponder is to help the stock merchants and speculators for putting cash in the
stock market. The expectation plays an imperative part in stock market business which is
exceptionally confused and testing process due to dynamic nature of the stock market.
Murtaza Roondiwala, Harshal Patel, Shraddha Varma 2018 presented paper was modeled
and predicted the stock returns of NIFTY 50 using LSTM. Collected 5 years of historical
data of NIFTY 50 and used it for the training and validation purposes for the model.
14 | P a g e

Chapter No.3
Problem Statement
The stock market is an evolutionary, complex and a dynamic system. Market prediction is
characterized by noise, data intensity, non-stationary, uncertainty and hidden
relationships. The prediction of trend in stock market exchange has been a challenging and
important research topic. It is challenging because the data is noisy and not stationary. It
is important because it can yield important results for decision makers.

Stock market analysis is divided into two parts –


Fundamental Analysis and Technical Analysis.

 Fundamental Analysis involves analyzing the company’s future profitability.


 Technical Analysis, on the other hand, identify the trends in the stock market.

The problem statement we have chosen to work with in this project follows from the
hypothesis that the KNN algorithm is a more precise way of predicting closing prices. The
problem statement is formulated below:
Is using the LSTM algorithm a more precise way of predicting the future closing prices of
equities than using the more common method of KNN and Linear Regression? Basically it
comparing our three algorithms, we have used for simulating future stock market price
prediction.
15 | P a g e

Chapter No.4
Proposed Solution
In order to predict the stock prices in future markets, we have analyzed papers and has
given an overview on how these algorithms give precise and accurate future predictions. In
this project, we used several algorithms from which we observed that not all the algorithms
implemented can predict data we need. There has been a basic requirement for
computerized and atomized ways to deal with powerful and proficient usage of huge
measure of money related information to help organizations and people in vital arranging
and decision making on investments. We propose an online learning algorithm for
predicting the end-of-day price of a given stock with the help of KNN, Linear Regression
and Long Short Term Memory (LSTM), a type of Recurrent Neural Network (RNN).
The following report will delve into the concept of using algorithms to attempt to predict
stock market prediction. This is an important aspect to knowledge because it can give rise
to many prediction techniques in the future and it introduces a link between the finance
field and the field of computer science. To explain the approach of the research, question
the following section will explain the scope of the project.
This report will explore the ability of using the KNN, Linear Regression and Long-Short
Term Memory algorithms as methods of predicting stock market movements. Several steps
will be followed as we attempt to answer and explore the research question.
16 | P a g e

Chapter No.5
Research Paper -I
Stock Price Prediction Using K-Nearest Neighbor (kNN)
Algorithm
Khalid Alkhatib, Hassan Najadat, Ismail Hmeidi ( Vol. 3 No. 3; March 2013)

The kNN algorithm method is used on the stock data. Also, mathematical calculations and
visualization models are provided and discussed below.

5.1 Methodology
K-nearest neighbor technique is a machine learning algorithm that is considered as simple
to implement (Aha et al. 1991). The stock prediction problem can be mapped into a
similarity based classification. The historical stock data and the test data is mapped into a
set of vectors. Each vector represents N dimension for each stock features. Then, a
similarity metric such as Euclidean distance is computed to take a decision. In this section,
a description of kNN is provided. kNN is considered a lazy learning that does not build a
model or function previously, but yields the closest k records of the training data set that
have the highest similarity to the test (i.e. query record). Then, a majority vote is performed
among the selected k records to determine the class label and then assigned it to the query
record.
The prediction of stock market closing price is computed using kNN as follows:
a) Determine the number of nearest neighbors, k.
b) Compute the distance between the training samples and the query record.

Compute Distance Apply Mathematical


Determine Nearest between Training Calculations and
Visualization Graph
Neighbors 'k' Samples and Query Find Perofromance
Record of Model

5.2 Mathematical Calculations and Visualizing Models


The calculations include error estimation, total sum of squared error, average error,
cumulative closing price when sorted using predicted values, k-values and training Root
Mean Square (RMS) errors.
17 | P a g e

a) Root Mean Square Deviation (RMSD), RMSD = SQRT(Y-X)2.


b) Explained Sum of Squares (ESS) is computed as follows:

ESS =

Where yi: is the predicted variable, and y is the actual value.


c) Average Estimated Error (AEE) AEE is the total sum of RMS errors for all variables in
stock records divided by the total number of the records.
AEE =

5.3 Visualization Graph


To evaluate the performance of kNN learning model, lift graph is applied and drawn for
different companies’ stock values. The lift graph also shows the ratio between the results
obtained using the predictive model or not. The other graph applied is the plot curves to
show the relation between the actual and predicted stock price.

5.4 Data Description, Analysis, Result and Conclusion


The sample data was extracted from the Jordanian stock exchange. The study sample
included stock data of five randomly selected companies listed on the Jordanian stock
exchange as a sample training dataset from the period June 4, 2009 to December 24, 2009
Closing price is the main factor that affects the prediction process for a specific stock based
on kNN algorithm. The kNN algorithm is applied on a 1000 records to estimate predicted
values for each stock. The results were rational and reasonable. In addition, depending on
the actual stock prices data; the prediction results were close to actual prices. Having such
rational results for predictions in specific, and for using data mining techniques in real life;
this presents a good indication that the use of data mining techniques could help decision
makers at various levels when using kNN for data analysis.

Figure 1: Relation between predicted/actual closing price for 1-year period


18 | P a g e

Chapter No.6
Implementing Research Paper -I

6.1 Our Data Set Collection


There are multiple variables in the dataset – date, open, high, low, last, close,
total_trade_quantity, and turnover.
 The columns Open and Close represent the starting and final price at which the stock is
traded on a particular day.
 High, Low and Last represent the maximum, minimum, and last price of the share for
the day.
 Total Trade Quantity is the number of shares bought or sold in the day and Turnover
(Lacs) is the turnover of the particular company on a given date.
Some date values are missing like weekend. This is basically the data of Descon Oxychem
Ltd Company.

Figure 2: Dataset of Descon Oxychem


19 | P a g e

Figure 3: Reading Dataset

Figure 4: Output after Reading Dataset

6.2 Division of Data Set


Data set is dividided into three parts training set, validation set and test set. Total 75%
training data, 10% validation set and 15% test set.
Train set contain [0:852], valid [853:1096] and test set [1097:1218]

6.3 Environment & IDE


Python3 using Jupyter Notebook The interactive notebooks used in this course have an
".ipynb" file extension. The IPython Notebook is now known as the Jupyter Notebook. It is
an interactive computational environment, in which you can combine code execution, rich
text, mathematics, plots and rich media.

Figure 5: Jupyter Notebook for Python Code

6.4 Preprocessing of Dataset


Next create a data frame that contains only the Date and Close price columns.

Figure 6: Creating Data Frame Variables


20 | P a g e

Figure 7: Output after Preprocessing

Figure 8: Basic Graph between Data and Closing Price

6.3 KNN Algorithm


Interesting ML algorithm that one can use here is kNN (k nearest neighbours). Based on
the independent variables, kNN finds the similarity between new data points and old data
points. The code is given below which import libraries, read data, create data frame with
target values, create features, split test and validation set, scaling the data and fit the data
in model and make predictions.

Figure 9: KNN Model for Making Predictions

6.4 Results
The results are given as follows where origin color in plot shows the predicted price and
blue color shows the original price.
21 | P a g e

Figure 10: KNN Results

6.5 Analysis, Comparison and Conclusion


The root-mean-square deviation or root-mean-square error is a frequently used measure of
the differences between values predicted by a model or an estimator and the values
observed. Lower values of RMSE indicate better fit.

Figure 11: Calculating ERMS

Figure 12: Comparing Results

The KNN results are not quite promising if we compare it to our implementation. We got
Erms of 6.1 while in paper it is mentioned of 0.8.
22 | P a g e

Chapter No.7
Research Paper -II

Predicting Stock Prices Using LSTM


Murtaza Roondiwala, Harshal Patel, Shraddha Varma (Value (2015): 78.96 | Impact Factor (2015): 6.391)

Long Short-Term memory is one of the most successful RNNs architectures. LSTM
introduces the memory cell, a unit of computation that replaces traditional artificial
neurons in the hidden layer of the network. With these memory cells, networks are able to
effectively associate memories and input remote in time, hence suit to grasp the structure
of data dynamically over time with high prediction capacity.

7.1 Methodology
Various types of neural networks can be developed by the combination of different factors
like network topology, training method etc. For this experiment, they considered Recurrent
Neural Network and Long Short-Term Memory. This section we will discuss the
methodology of our system. Our system consists of several stages which are as follows: -

Raw Data

Data
Visualize Graph
Preprocessing

Feature
Calculate RMSE
Extraction

Prediction on Train Neural


Test Set Network
23 | P a g e

7.2 Raw Data


In this stage, the historical stock data is collected from https://www.quandl.com/data/NSE
and this historical data is used for the prediction of future stock prices.

7.3 Data Preprocessing


The pre-processing stage involves
a) Data discretization: Part of data reduction but with particular importance, especially for
numerical data
b) Data transformation: Normalization.
c) Data cleaning: Fill in missing values.
d) Data integration: Integration of data files.
After the dataset is transformed into a clean dataset, the dataset is divided into training
and testing sets so as to evaluate. Here, the training values are taken as the more recent
values. Testing data is kept as 5-10 percent of the total dataset.

7.4 Feature Extraction


In this layer, only the features which are to be fed to the neural network are chosen. We
will choose the feature from Date, open, high, low, close, and volume.

7.5 Training Neural Network


In this stage, the data is fed to the neural network and trained for prediction assigning
random biases and weights. Our LSTM model is composed of a sequential input layer
followed by 2 LSTM layers and dense layer with ReLU activation and then finally a dense
output layer with linear activation function.

7.6 Analysis, Results and Conclusion


For analyzing the efficiency of the system they used the Root Mean Square Error(RMSE).
The error or the difference between the target and the obtained output value is minimized
by using RMSE value. RMSE is the square root of the mean/average of the square of all of
the error. The use of RMSE is highly common and it makes an excellent general purpose
error metric for numerical predictions. Compared to the similar Mean Absolute Error,
RMSE amplifies and severely punishes large errors.

After performing various simulations with a different number of parameters and epochs,
they observed that by taking 4 features set (High/Low/Open/ Close) with 500 epochs they
achieved the best results with training RMSE of 0.00983 and testing RMSE of 0.00859.
24 | P a g e

Figure 13: High/Low/Open/close with 500 epochs


25 | P a g e

Chapter No.8
Implementing Research Paper -II

Here in this paper implementation, the first four steps; Dataset Collection, Division of
Dataset, Environment & IDE and Preprocessing of dataset is same, also calculating root
mean square error is same as above in KNN Model. So, proceeding further:

8.1 LSTM Algorithm


LSTMs are widely used for sequence prediction problems and have proven to be extremely
effective. The reason they work so well is because LSTM is able to store past information
that is important, and forget the information that is not. LSTM has three gates:

 The input gate: The input gate adds information to the cell state
 The forget gate: It removes the information that is no longer required by the model
 The output gate: Output Gate at LSTM selects the information to be shown as output

Figure 14: LSTM Model

8.2 Results
The following graph shows the reation between the actual and predicted stock price.

Figure 15: LSTM Results


26 | P a g e

8.3 Comparison, Analysis and Conclusion

Stable and robust with small error ratio, so the results are rational and reasonable. RMSE
is found to be 0.04 with confusion matrix given as follow below:

Figure 16: Comparing Results

Confusion matrix for the actual gradient Vs. the predicted gradient. Accuracy found to be
83.3%.

Up Down

Up 14 1 15

Down 3 6 9

17 7 25
27 | P a g e

Chapter No.9
Conclusion

We implemented two papers on Jupyter and carried out quite promising results in
comparison with research paper results. Overall accuracy of LSTM is better than the KNN
approach. It was a great experience and was a learning platform for us to polish our
technical skills and practical knowledge. We learnt many technical practical work related
to machine learning algorithms, about training, testing and implementation here in our
project. “Stock Market Price Prediction using Machine Learning”, in this project, we learnt
many technical skills their while visiting to whole algorithms. Moreover, this project was
for polishing our practical knowledge about machine learning that how it works. The
training of modules, testing on test, validation set, calculating errors, accuracy guided and
showed to us brief overview of machine learning algorithms and platforms used for using
algorithms.
28 | P a g e

Chapter No.10
References

https://www.analyticsvidhya.com/blog/2018/10/predicting-stock-price-machine-
learningnd-deep-learning-techniques-python/

https://github.com/jerrytigerxu/Stock-Price-Prediction

[1] RakhiMahant, TrilokNathPandey, Alok Kumar Jagadev, and SatchidanandaDehuri


―Optimized Radial Basis Functional Neural Network for Stock Index Prediction,‖
International Conference on Electrical, Electronics, and Optimization Techniques
(ICEEOT) - 2016.

[2] Kai Chen, Yi Zhou and FangyanDai ―A LSTM-based method for stock returns
prediction: A case study of China stock market,‖ IEEE International Conference on Big
Data,2015. [3] A.U.S.S Pradeep, SorenGoyal, J. A. Bloom, I. J. Cox, and M. Miller,
―Detection of statistical arbitrage using machine learning techniques in Indian Stock
market,‖ IIT Kanpur, April 15, 2013.

[3] Khalid Alkhatib Hassan Najadat Ismail Hmeidi Mohammed K. Ali Shatnawi ― Stock
Price Prediction Using K-Nearest Neighbor (kNN) Algorithm: Vol. 3 No. 3; March 2013.

[4] Murtaza Roondiwala Harshal Patel Shraddha Varma Predicting ― Stock Prices Using
LSTM: International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index
Copernicus Value (2015): 78.96 | Impact Factor (2015): 6.391.

Potrebbero piacerti anche