Paper PDF

Feature extraction using Deep Learning for
Intrusion Detection System
Mohammed Ishaque1, Ladislav hudec2

Slovak University of Technology, Bratislava, Slovak Republic
1
xishaque@is.stuba.sk
2
lhudec@fiit.stuba.sk
To deal with the problem of pattern change or attack

Abstract— Deep Learning is an area of Machine modification Computational Intelligence technique a
Learning research, which can be used to manipulate suitable technique as it has a high detection accuracy used
large amount of information in an intelligent way by for making intelligent detection model and automatically
using the functionality of computational intelligence. detect anomalous activities. The basic requirement of these
A deep learning system is a fully trainable system methodologies is to train the labelled data and understand
beginning from raw input to the final output of the behavior of data resulting in high resource cost. These
recognized objects. Feature selection is an important kind of classifiers follows an iterative process for
aspect of deep learning which can be applied for understand the behavior of patterns, making some
dimensionality reduction or attribute reduction and mathematical adjustment and then predicting the outcome.
making the information more explicit and usable. Deep Deep learning can learn or extract features directly from raw
data also it has the capability to find out the features
learning can build various learning models which can automatically which are important and useful for
abstract unknown information by selecting a subset of classification. Different Deep learning techniques are used
relevant features. This property of deep learning makes for extraction process like Deep belief network, restricted
it useful in analysis of highly complex information one Boltzmann machine and Auto encoders. One more
which is present in intrusive data or information advantage of using deep learning is its ability to handle huge
flowing with in a web system or a network which and raw data. In this research, we propose to handle
needs to be analyzed to detect anomalies. Our intrusion detection using Deep Learning approach for a web
approach combines the intelligent ability of Deep system. The Deep learning approach extracts key features
Learning to build a smart Intrusion detection system. from the data obtained from the web system which is
exposed to lot of vulnerabilities as it is on internet. Then a
Keywords — Deep learning, Intrusion Detection stacked denoised autoencoder is used as an enhancement
System, Computational Intelligence. stage. This classifier is basically used to classify the attacks
and distinguish between normal and anomalous traffic.
I. INTRODUCTION
II. RELATED WORK
The web system is an environment containing huge number
of mechanisms and technologies which includes HTTP
protocol, client side and server-side applications, web Sharma et al [1] they proposed a system comprising of
browser, and scripting mechanisms like java script, CGI. Deep Belief Network for anomaly detection and at the same
Applications which are made on these infrastructures are time, they tried to compare its performance with the classic
widely accepted but at a cost of their security because of neural network. The comparision was done by considering
inconsistency found in these technologies, so the features learned and how well it can detect the
implementation of security on these infrastructures is a big anomalies. They also explored the amount of set of features
challenge and as a result many web applications deployed learn by each layer of the deep architecture. A simple, more
on these systems are exposed to security vulnerabilities. reliable, accurate and fast mechanism is been provided by
There are three basic types of security vulnerabilities within them for visualizing the features at upper level. Here the
a web system, Input validation, session management, and authors tried to to develop a system for abnormality
application logic. Some common attacks which exploits detection which reduces the amount of data to be processed
these vulnerabilities are discussed here. Script injection and manually by focusing only on a specific part of data, for this
dataflow attacks belongs to the class of input validation a good set of features selection system is required and this
vulnerability which usually inject malicious scripts in web selection of good feature results in good abnormality
contents which results in malfunctioning of the output of detection. Deep Learning is a modern tool which can be
web system e.g. Cross site scripting XSS and SQL injection. used to learn features directly from raw data by deriving
high level features from low level features and form a
Copyright Notice: 978-1-7281-0108-8/19/$31.00 ©2019 IEEE

hierarchical representation. The performance of the system have a database system, where normal SQL queries can be
is tested on two types of data with Precision, Recall and F- used to define and manipulate the data.
score, one set is discrete in nature and the other is normal In [8] the authors come up with an approach that detects
data. The author used MNIST [2] dataset which includes intrusions using state transition analysis technique (STAT).
digit and non-digit dataset like handwritten data as digit data A comparison of machine learning techniques like Random
and face data as non-digit. Forest, Logistic Regression, Decision Tree, AdaBoost
Experimentally the authors explored and compared two classifier and SGD classifier is done here on web intrusion
types of structures, one is the classic neural network, and detection like SQL injection, buffer overflow and XSS. Out
another is Deep belief network. The deep belief model is of the compared machine learning techniques, Logistic
made by stacking Restricted Boltzmann machines. With the regression comes out as the best learning method for this
comparison of deep architecture with classic neural network problem. The performance is being further improved with
it is observed that deep architecture performs better different feature extractions and parameter tuning. It uses
compared conventional neural network also deep CSIS HTTP 2010 Data set.
architecture can be further explored and applied to detect A large number of web vulnerabilities can be targeted here
anomalies. On an average 42.4% to 66.5% improvement like Injections, Broken authentic and session management,
achieved on the two datasets used in experimentation. Cross site scripting, Insecure direct object reference, Web
Our approach implies concept of Deep learning to reduce application and server misconfiguration, Sensitive data
the dimensionality or extract the key feature of the data exposure, Broken access control, Cross site request forgery,
which the web system is dealing with. Restricted Boltzmann Using components with known vulnerabilities and
machine and auto encoders are the two methods in Deep invalidated redirects and forwards. These vulnerabilities are
learning which has the capability to extract key features included in the test data which is used to test the web
from unlabeled data [3, 4]. Alrawashdeh, Khaled, and Carla applications.
Purdy [5] In this work the main idea is detection of The web intrusion detection architecture is divided into two
intrusions over a network by using the concepts of Deep parts, the first part trains and validates the machine learning
Belief Network which is formed by layering the Restricted techniques and the other part captures packets on the
Boltzmann Machines together. The feature extraction is network card. The training and validation process use a
done here by using a single hidden layer of RBM. The dataset to train and test the machine learning models which
processed output from this layer is passed to another RBM are trained until these models fulfil the required condition in
which results in deep belief network. The trained data is the phase where validation is done, later the live captured
then passed through a layer of logistic regression to classify packets are classified with these models to detect attacks.
the data and put it according to its class. Here the obtained Here the authors have used SciKit Learn [9] to apply
output from logistic regression can be categorized into learning methods to the selected dataset. A comparative
anomalous and normal data analysis is done where this data set is applied on all the
Until now in the name of anomaly detection, neural network machine learning techniques mentioned above, each
is the only technique used in the name of classification but machine learning algorithm’s performance is measured by
in our approach, we use a stacked denoised auto encoder the precision, recall and the f1-score [8] on the testing
classify normal and anomalous traffic on the web system. dataset where precision indicates how useful the search
[6] come up with an approach for detectiong anomalies in results are, recall indicates how complete the search results
codes using the concept of deep learning by combining are and f1-score indicates measure of test accuracy.
Autoencoders and Deep belisf networks together. Experimental results show that Logistic regression comes up
Autoencoders are used to extract the key features and deep with good results with respect to high recall rate and high
belief networks are applied to detect anomalies in code. A precision rate as compared to other machine learning
new dataset will be generated over a web system which will algorithms.
carry both normal and anamolus data.
IV INTRUSION DETECTION SYSTEM
III WEB SYSTEM
In a computer system an Intrusion detection system scans

Web systems can easily be exploited by the use of occurrence of events in web or network system and
HTTP/HTTPS protocols. Http is a protocol designed to investigates for threats, inaccessible intrusions and
make communication between client and server and https is unauthorized access or entry [10]. An unstable system with
mainly used for a secured connection by encrypting the many loopholes is the main victims of these intrusions
connection. One disadvantage of https protocol with respect which attempts to access or manipulate information and
to intrusion detection system is that the encryption wraps the make the system non usable or unreliable. Denial of Service
network-based detection system which results in failure of (DoS) makes the machine unavailable for the user, worms
detection of packets as the data packets are in encrypted and viruses over the network exploits information of users
format. On the other hand, Host based intrusion detection and take advantage of privileged access of host systems
system does not face this problem as the end point is
vulnerabilities [11]. An Intrusion Detection System
protected and get decrypted to original form [7]. In our
basically protects and make them ready to handle attacks. A
experiment we will be using a small web application which
detailed description of different types of Intrusion detection
can upload, download and manipulate images online. It will
systems is shown in fig 1.
basically used for feature extraction and transformation.
Each layer present here utilizes the output obtained from the
previous layer input. With an arrangement of such
transformation, very complex and integrated functions can
be simplified and learned. The classification process takes
place when layers presents at higher level amplifies the
inputs from lower level and can easily be distinguishable.
With high end hardware, high dimensional dataset can
easily be manipulated and can be used in various
applications. Our main purpose of using deep learning in our
research is to reduce the dimensionality of the dataset
obtained from the web system.
Fig-1 Intrusion Detection System
V CONVENTIONAL FEATURE LEARNING

ALGORITHMS
Conventional feature extracting algorithms are shallow in

nature and its main purpose is to learn changes happening in
data that makes it simpler to extract important information
when building classifiers [12]. Mostly, feature extracting
algorithms are either linear or nonlinear, generative or Fig-2 Deep learning system
discriminative, supervised or unsupervised, local or global.
Principal Component Analysis (PCA) is an unsupervised,
linear, global and generative feature extracting algorithm Feature selection or extraction is a process of optimization
whereas Linear Discriminative Analysis (LDA) is a which decreases the dimensionality of a dataset by selecting
supervised, nonlinear, local and discriminative algorithm. only those features which are interesting without duplication
and irrelevancy. The selected best features are highly
Principal Component Analysis algorithm, because of its optimal for increasing the performance with respect to
simplicity has the capability to transform raw data into accuracy rate of the classification model by reducing the
orthogonal dimensional space. High variance Principal computation time and the space required for storing the
components are suitable for solving the machine learning information. It also takes care of the noise reduction of data
problems as it is a result of conversion from correlated and avoid over-fitting problem [14], [15]. The use of deep
variables to linear uncorrelated variables. Predicting if a learning in numerous fields is because of the huge
datapoint in a dataset belongs to one or two classes can development of three aspects, the first is its ability to learn
easily be done by the principle components with high features, availability of huge labelled data and finally
variance points and hence the best features will be selected because of available modern-day computing power of
in order to separate the data. In case of unsupervised graphical processing units (GPUs).
learning the model does not consider label of each data point
since the PCA observes the dataset as a whole and finds the Auto Encoders is a deep learning method proposed by G.E.
direction of high variance later the next direction finds the Hinton in 2006 and is basically used for feature Extraction.
next high variance which is orthogonal to the last one. The complete structure of auto encoder is divided into two
Though the datasets where the separation of data is not parts one is the encoding part and another is the decoding
based on high variance then even the most important part having three layers, input layer, hidden layer and output
principle component of PCA will not work and this is one of layer. At the cross section of encoder and decoder there is a
the important drawbacks of PCA [13] which can be layer called as code layer and this layer is the core of Auto
eliminated with the use of Deep Learning. Encoder and this core represents the important features of
high dimensional data sets with nested structure and at the
same time set the dimensions of this high-dimension data
set. The hidden layer neurons when they are less in number
as compared to both input layer neurons and output layer
VI DEEP LEARNING FOR FEATURE EXTRACTION neurons then the data dimensionality reduction is obtained.
Auto encoders are basically comprising of three steps
pretraining, Unrolling and fine-tuning [16].
It is a technology for learning and representation where y = s (Wx + b) transformed to
many levels exist comprising of modules, they are nonlinear z = s (W`y + b`)
and transforms the information present at the first level to
higher upper levels which are more abstract in nature
where y is the latent representation or code, s is a non-
linearity W is the weight matrix. VIII PERFORMANCE ANALYSIS
And the average re construction error can be represented as
For our approach we propose to study the following
performance metrics.
Accuracy: This measure is the result whose degree of
classification is conducted based on the proposed
framework and the input data.
VII PROPOSED SYSTEM ARCHITECTURE
False Positive rate: It is the ratio between the occurrences of

events which are negative and wrongly categorized as
positive and the total number of actual negative events
Capability to detect attack or vulnerability: It is the

Fig-3 Proposed architecture for Intrusion detection system capability of the system how it can detect attacks or
vulnerabilities as compared to other systems which are not
hybridized, where the deep learning approach is not used for
IV FEATURE EXTRACTION AND CLASSIFICATION feature extraction. A comparison will be done after the
USING STACKED DENOISED AUTO ENCODER experimentation part showing the clear difference between
our approach and other approaches mentioned in the
Stacked autoencoders contains more than one hidden layer literature.
[17]. By stacking the expressing capacity of the model
increases allowing the autoencoder to distinguish between
attacks and normal traffic.
In stacked autoencoders the output of each layer is fed to the
next corresponding layer [1] Sharma, Manoj Kumar, Debdoot Sheet, and Prabir Kumar Biswas.
"Abnormality Detecting Deep Belief Network." Proceedings of the
International Conference on Advances in Information Communication
h1 = f (x), hi = f (hi−1) …encoding Technology & Computing. ACM, 2016.
[2] PaxsonV. Bro: a system for detecting network intruders in real-time.
g1 = g(hi ), gi = g(gi−1) …decoding In: Proceedings of the 7th USENIX security symposium, (1998), San
Antonio, TX.
This stacked autoencoder is now pretrained using greedy [3] E. Albin, “A Comparative Analysis of Snort and Suricata Intrusion
Detection Systems”, Naval Postgraduate School, Dudley Know
layer training where the raw input is fed to the first layer of Library, September 2011.
encoder and parameter sets are obtained from it, then this [4] OpenWIPS-ng. http://www.openwips-ng.org/. Accessed: August 8,
layer will transform the raw input to the hidden units in the 2013.
first layer. The second layer is trained their after and obtain [5] Alrawashdeh, Khaled, and Carla Purdy. "Toward an Online Anomaly
the parameters. For all the layers, this process is repeated Intrusion Detection System Based on Deep Learning." Machine
Learning and Applications (ICMLA), 2016 15th IEEE International
where parameters are trained at each layer individually Conference on. IEEE, 2016.
while on the other layers, the parameters are unchanged. [6] Li, Yuancheng, Rong Ma, and Runhai Jiao. "A hybrid malicious code
The denoising of the autoencoders is necessary as it detection method based on deep learning." methods 9.5 (2015).
prevents autoencoders from over-fitting. Our aim is not just [7] Agarwal, Nancy, and Syed Zeeshan Hussain. "A closer look on
to target those attacks which are known but also to identify Intrusion Detection System for web applications." arXiv preprint
the new or unknown attacks. Our system should be able to arXiv:1803.06153 (2018).
consider cases that were not focused in the training phase. [8] Pham, Truong Son, and Tuan Hao Hoang. "Machine learning
techniques for web intrusion detection—A comparison." Knowledge
Denoising corrupts the original input by inserting some and Systems Engineering (KSE), 2016 Eighth International
noise in the input data hence allowing the encoder to repair Conference on. IEEE, 2016.
the input by reconstructing the corrupted version making the [9] Ilgun, Koral, Richard A. Kemmerer, and Phillip A. Porras. "State
hidden layer to get the statistical dependencies between the transition analysis: A rule-based intrusion detection approach." IEEE
inputs. If the onset value obtained from the stacked denoised transactions on software engineering 21.3 (1995): 181-199.
auto encoder is greater then the onset value then the request [10] J. P. Anderson, “Computer Security Threat Monitoring and
Surveillance,” James P Anderson Co, Fort Washington, Pennsylvania,
is taken as a normal request but if the onset value is less Tech. Rep., April 1980.
then the request is taken as attack abnormal request.
[11] Cannady,J. Artificial neural networks for misuse detection. In :
Proceedings of the National Information Systems Security
Conference, 1998, pp.443–456.
[12] Bengio Y, Courville A, Vincent P. Representation learning: a review
and new perspectives. IEEE Trans Pattern Anal Mach Intell. 2013;
35:1798e1828.
[13] Limitations of Applying Dimensionality Reduction using PCA by
Roberto Reif, https://www.robertoreif.com/blog/2018/1/9/pca.
[14] Liu, H., Yu. L,“Toward integrating feature selection algorithms for
classification and clustering,” IEEE Transactions on knowledge and
data engineering, Vol. 17, No. (4), PP. 491-502, 2005.
[15] Bostani, H., Sheikhan.M, “Hybrid of binary gravitational search
algorithm and mutual information for feature selection in intrusion
detection systems,” Soft computing, Vol. 21, No. (9), PP. 2307-2324,
2017.
[16] Li, Yuancheng, Rong Ma, and Runhai Jiao. "A hybrid malicious code
detection method based on deep learning." methods 9.5 (2015).
[17] Vincent, Pascal, et al. "Stacked denoising autoencoders: Learning
useful representations in a deep network with a local denoising
criterion." Journal of machine learning research11.Dec (2010): 3371-
3408.

Paper PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Paper PDF

Caricato da

Copyright:

Formati disponibili

Feature extraction using Deep Learning for

Intrusion Detection System

Mohammed Ishaque1, Ladislav hudec2

To deal with the problem of pattern change or attack

Copyright Notice: 978-1-7281-0108-8/19/$31.00 ©2019 IEEE

In a computer system an Intrusion detection system scans

Fig-1 Intrusion Detection System

V CONVENTIONAL FEATURE LEARNING

Conventional feature extracting algorithms are shallow in

False Positive rate: It is the ratio between the occurrences of

Capability to detect attack or vulnerability: It is the

Potrebbero piacerti anche