Slovak University of Technology, Bratislava, Slovak Republic 1 xishaque@is.stuba.sk 2 lhudec@fiit.stuba.sk
To deal with the problem of pattern change or attack
Abstract— Deep Learning is an area of Machine modification Computational Intelligence technique a Learning research, which can be used to manipulate suitable technique as it has a high detection accuracy used large amount of information in an intelligent way by for making intelligent detection model and automatically using the functionality of computational intelligence. detect anomalous activities. The basic requirement of these A deep learning system is a fully trainable system methodologies is to train the labelled data and understand beginning from raw input to the final output of the behavior of data resulting in high resource cost. These recognized objects. Feature selection is an important kind of classifiers follows an iterative process for aspect of deep learning which can be applied for understand the behavior of patterns, making some dimensionality reduction or attribute reduction and mathematical adjustment and then predicting the outcome. making the information more explicit and usable. Deep Deep learning can learn or extract features directly from raw data also it has the capability to find out the features learning can build various learning models which can automatically which are important and useful for abstract unknown information by selecting a subset of classification. Different Deep learning techniques are used relevant features. This property of deep learning makes for extraction process like Deep belief network, restricted it useful in analysis of highly complex information one Boltzmann machine and Auto encoders. One more which is present in intrusive data or information advantage of using deep learning is its ability to handle huge flowing with in a web system or a network which and raw data. In this research, we propose to handle needs to be analyzed to detect anomalies. Our intrusion detection using Deep Learning approach for a web approach combines the intelligent ability of Deep system. The Deep learning approach extracts key features Learning to build a smart Intrusion detection system. from the data obtained from the web system which is exposed to lot of vulnerabilities as it is on internet. Then a Keywords — Deep learning, Intrusion Detection stacked denoised autoencoder is used as an enhancement System, Computational Intelligence. stage. This classifier is basically used to classify the attacks and distinguish between normal and anomalous traffic. I. INTRODUCTION II. RELATED WORK The web system is an environment containing huge number of mechanisms and technologies which includes HTTP protocol, client side and server-side applications, web Sharma et al [1] they proposed a system comprising of browser, and scripting mechanisms like java script, CGI. Deep Belief Network for anomaly detection and at the same Applications which are made on these infrastructures are time, they tried to compare its performance with the classic widely accepted but at a cost of their security because of neural network. The comparision was done by considering inconsistency found in these technologies, so the features learned and how well it can detect the implementation of security on these infrastructures is a big anomalies. They also explored the amount of set of features challenge and as a result many web applications deployed learn by each layer of the deep architecture. A simple, more on these systems are exposed to security vulnerabilities. reliable, accurate and fast mechanism is been provided by There are three basic types of security vulnerabilities within them for visualizing the features at upper level. Here the a web system, Input validation, session management, and authors tried to to develop a system for abnormality application logic. Some common attacks which exploits detection which reduces the amount of data to be processed these vulnerabilities are discussed here. Script injection and manually by focusing only on a specific part of data, for this dataflow attacks belongs to the class of input validation a good set of features selection system is required and this vulnerability which usually inject malicious scripts in web selection of good feature results in good abnormality contents which results in malfunctioning of the output of detection. Deep Learning is a modern tool which can be web system e.g. Cross site scripting XSS and SQL injection. used to learn features directly from raw data by deriving high level features from low level features and form a
hierarchical representation. The performance of the system have a database system, where normal SQL queries can be is tested on two types of data with Precision, Recall and F- used to define and manipulate the data. score, one set is discrete in nature and the other is normal In [8] the authors come up with an approach that detects data. The author used MNIST [2] dataset which includes intrusions using state transition analysis technique (STAT). digit and non-digit dataset like handwritten data as digit data A comparison of machine learning techniques like Random and face data as non-digit. Forest, Logistic Regression, Decision Tree, AdaBoost Experimentally the authors explored and compared two classifier and SGD classifier is done here on web intrusion types of structures, one is the classic neural network, and detection like SQL injection, buffer overflow and XSS. Out another is Deep belief network. The deep belief model is of the compared machine learning techniques, Logistic made by stacking Restricted Boltzmann machines. With the regression comes out as the best learning method for this comparison of deep architecture with classic neural network problem. The performance is being further improved with it is observed that deep architecture performs better different feature extractions and parameter tuning. It uses compared conventional neural network also deep CSIS HTTP 2010 Data set. architecture can be further explored and applied to detect A large number of web vulnerabilities can be targeted here anomalies. On an average 42.4% to 66.5% improvement like Injections, Broken authentic and session management, achieved on the two datasets used in experimentation. Cross site scripting, Insecure direct object reference, Web Our approach implies concept of Deep learning to reduce application and server misconfiguration, Sensitive data the dimensionality or extract the key feature of the data exposure, Broken access control, Cross site request forgery, which the web system is dealing with. Restricted Boltzmann Using components with known vulnerabilities and machine and auto encoders are the two methods in Deep invalidated redirects and forwards. These vulnerabilities are learning which has the capability to extract key features included in the test data which is used to test the web from unlabeled data [3, 4]. Alrawashdeh, Khaled, and Carla applications. Purdy [5] In this work the main idea is detection of The web intrusion detection architecture is divided into two intrusions over a network by using the concepts of Deep parts, the first part trains and validates the machine learning Belief Network which is formed by layering the Restricted techniques and the other part captures packets on the Boltzmann Machines together. The feature extraction is network card. The training and validation process use a done here by using a single hidden layer of RBM. The dataset to train and test the machine learning models which processed output from this layer is passed to another RBM are trained until these models fulfil the required condition in which results in deep belief network. The trained data is the phase where validation is done, later the live captured then passed through a layer of logistic regression to classify packets are classified with these models to detect attacks. the data and put it according to its class. Here the obtained Here the authors have used SciKit Learn [9] to apply output from logistic regression can be categorized into learning methods to the selected dataset. A comparative anomalous and normal data analysis is done where this data set is applied on all the Until now in the name of anomaly detection, neural network machine learning techniques mentioned above, each is the only technique used in the name of classification but machine learning algorithm’s performance is measured by in our approach, we use a stacked denoised auto encoder the precision, recall and the f1-score [8] on the testing classify normal and anomalous traffic on the web system. dataset where precision indicates how useful the search [6] come up with an approach for detectiong anomalies in results are, recall indicates how complete the search results codes using the concept of deep learning by combining are and f1-score indicates measure of test accuracy. Autoencoders and Deep belisf networks together. Experimental results show that Logistic regression comes up Autoencoders are used to extract the key features and deep with good results with respect to high recall rate and high belief networks are applied to detect anomalies in code. A precision rate as compared to other machine learning new dataset will be generated over a web system which will algorithms. carry both normal and anamolus data. IV INTRUSION DETECTION SYSTEM III WEB SYSTEM
In a computer system an Intrusion detection system scans
Web systems can easily be exploited by the use of occurrence of events in web or network system and HTTP/HTTPS protocols. Http is a protocol designed to investigates for threats, inaccessible intrusions and make communication between client and server and https is unauthorized access or entry [10]. An unstable system with mainly used for a secured connection by encrypting the many loopholes is the main victims of these intrusions connection. One disadvantage of https protocol with respect which attempts to access or manipulate information and to intrusion detection system is that the encryption wraps the make the system non usable or unreliable. Denial of Service network-based detection system which results in failure of (DoS) makes the machine unavailable for the user, worms detection of packets as the data packets are in encrypted and viruses over the network exploits information of users format. On the other hand, Host based intrusion detection and take advantage of privileged access of host systems system does not face this problem as the end point is vulnerabilities [11]. An Intrusion Detection System protected and get decrypted to original form [7]. In our basically protects and make them ready to handle attacks. A experiment we will be using a small web application which detailed description of different types of Intrusion detection can upload, download and manipulate images online. It will systems is shown in fig 1. basically used for feature extraction and transformation. Each layer present here utilizes the output obtained from the previous layer input. With an arrangement of such transformation, very complex and integrated functions can be simplified and learned. The classification process takes place when layers presents at higher level amplifies the inputs from lower level and can easily be distinguishable. With high end hardware, high dimensional dataset can easily be manipulated and can be used in various applications. Our main purpose of using deep learning in our research is to reduce the dimensionality of the dataset obtained from the web system.
Fig-1 Intrusion Detection System
V CONVENTIONAL FEATURE LEARNING
ALGORITHMS
Conventional feature extracting algorithms are shallow in
nature and its main purpose is to learn changes happening in data that makes it simpler to extract important information when building classifiers [12]. Mostly, feature extracting algorithms are either linear or nonlinear, generative or Fig-2 Deep learning system discriminative, supervised or unsupervised, local or global. Principal Component Analysis (PCA) is an unsupervised, linear, global and generative feature extracting algorithm Feature selection or extraction is a process of optimization whereas Linear Discriminative Analysis (LDA) is a which decreases the dimensionality of a dataset by selecting supervised, nonlinear, local and discriminative algorithm. only those features which are interesting without duplication and irrelevancy. The selected best features are highly Principal Component Analysis algorithm, because of its optimal for increasing the performance with respect to simplicity has the capability to transform raw data into accuracy rate of the classification model by reducing the orthogonal dimensional space. High variance Principal computation time and the space required for storing the components are suitable for solving the machine learning information. It also takes care of the noise reduction of data problems as it is a result of conversion from correlated and avoid over-fitting problem [14], [15]. The use of deep variables to linear uncorrelated variables. Predicting if a learning in numerous fields is because of the huge datapoint in a dataset belongs to one or two classes can development of three aspects, the first is its ability to learn easily be done by the principle components with high features, availability of huge labelled data and finally variance points and hence the best features will be selected because of available modern-day computing power of in order to separate the data. In case of unsupervised graphical processing units (GPUs). learning the model does not consider label of each data point since the PCA observes the dataset as a whole and finds the Auto Encoders is a deep learning method proposed by G.E. direction of high variance later the next direction finds the Hinton in 2006 and is basically used for feature Extraction. next high variance which is orthogonal to the last one. The complete structure of auto encoder is divided into two Though the datasets where the separation of data is not parts one is the encoding part and another is the decoding based on high variance then even the most important part having three layers, input layer, hidden layer and output principle component of PCA will not work and this is one of layer. At the cross section of encoder and decoder there is a the important drawbacks of PCA [13] which can be layer called as code layer and this layer is the core of Auto eliminated with the use of Deep Learning. Encoder and this core represents the important features of high dimensional data sets with nested structure and at the same time set the dimensions of this high-dimension data set. The hidden layer neurons when they are less in number as compared to both input layer neurons and output layer VI DEEP LEARNING FOR FEATURE EXTRACTION neurons then the data dimensionality reduction is obtained. Auto encoders are basically comprising of three steps pretraining, Unrolling and fine-tuning [16]. It is a technology for learning and representation where y = s (Wx + b) transformed to many levels exist comprising of modules, they are nonlinear z = s (W`y + b`) and transforms the information present at the first level to higher upper levels which are more abstract in nature where y is the latent representation or code, s is a non- linearity W is the weight matrix. VIII PERFORMANCE ANALYSIS And the average re construction error can be represented as For our approach we propose to study the following performance metrics. Accuracy: This measure is the result whose degree of classification is conducted based on the proposed framework and the input data. VII PROPOSED SYSTEM ARCHITECTURE
False Positive rate: It is the ratio between the occurrences of
events which are negative and wrongly categorized as positive and the total number of actual negative events
Capability to detect attack or vulnerability: It is the
Fig-3 Proposed architecture for Intrusion detection system capability of the system how it can detect attacks or vulnerabilities as compared to other systems which are not hybridized, where the deep learning approach is not used for IV FEATURE EXTRACTION AND CLASSIFICATION feature extraction. A comparison will be done after the USING STACKED DENOISED AUTO ENCODER experimentation part showing the clear difference between our approach and other approaches mentioned in the Stacked autoencoders contains more than one hidden layer literature. [17]. By stacking the expressing capacity of the model increases allowing the autoencoder to distinguish between attacks and normal traffic. In stacked autoencoders the output of each layer is fed to the next corresponding layer [1] Sharma, Manoj Kumar, Debdoot Sheet, and Prabir Kumar Biswas. "Abnormality Detecting Deep Belief Network." Proceedings of the International Conference on Advances in Information Communication h1 = f (x), hi = f (hi−1) …encoding Technology & Computing. ACM, 2016. [2] PaxsonV. Bro: a system for detecting network intruders in real-time. g1 = g(hi ), gi = g(gi−1) …decoding In: Proceedings of the 7th USENIX security symposium, (1998), San Antonio, TX. This stacked autoencoder is now pretrained using greedy [3] E. Albin, “A Comparative Analysis of Snort and Suricata Intrusion Detection Systems”, Naval Postgraduate School, Dudley Know layer training where the raw input is fed to the first layer of Library, September 2011. encoder and parameter sets are obtained from it, then this [4] OpenWIPS-ng. http://www.openwips-ng.org/. Accessed: August 8, layer will transform the raw input to the hidden units in the 2013. first layer. The second layer is trained their after and obtain [5] Alrawashdeh, Khaled, and Carla Purdy. "Toward an Online Anomaly the parameters. For all the layers, this process is repeated Intrusion Detection System Based on Deep Learning." Machine Learning and Applications (ICMLA), 2016 15th IEEE International where parameters are trained at each layer individually Conference on. IEEE, 2016. while on the other layers, the parameters are unchanged. [6] Li, Yuancheng, Rong Ma, and Runhai Jiao. "A hybrid malicious code The denoising of the autoencoders is necessary as it detection method based on deep learning." methods 9.5 (2015). prevents autoencoders from over-fitting. Our aim is not just [7] Agarwal, Nancy, and Syed Zeeshan Hussain. "A closer look on to target those attacks which are known but also to identify Intrusion Detection System for web applications." arXiv preprint the new or unknown attacks. Our system should be able to arXiv:1803.06153 (2018). consider cases that were not focused in the training phase. [8] Pham, Truong Son, and Tuan Hao Hoang. "Machine learning techniques for web intrusion detection—A comparison." Knowledge Denoising corrupts the original input by inserting some and Systems Engineering (KSE), 2016 Eighth International noise in the input data hence allowing the encoder to repair Conference on. IEEE, 2016. the input by reconstructing the corrupted version making the [9] Ilgun, Koral, Richard A. Kemmerer, and Phillip A. Porras. "State hidden layer to get the statistical dependencies between the transition analysis: A rule-based intrusion detection approach." IEEE inputs. If the onset value obtained from the stacked denoised transactions on software engineering 21.3 (1995): 181-199. auto encoder is greater then the onset value then the request [10] J. P. Anderson, “Computer Security Threat Monitoring and Surveillance,” James P Anderson Co, Fort Washington, Pennsylvania, is taken as a normal request but if the onset value is less Tech. Rep., April 1980. then the request is taken as attack abnormal request. [11] Cannady,J. Artificial neural networks for misuse detection. In : Proceedings of the National Information Systems Security Conference, 1998, pp.443–456. [12] Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell. 2013; 35:1798e1828. [13] Limitations of Applying Dimensionality Reduction using PCA by Roberto Reif, https://www.robertoreif.com/blog/2018/1/9/pca. [14] Liu, H., Yu. L,“Toward integrating feature selection algorithms for classification and clustering,” IEEE Transactions on knowledge and data engineering, Vol. 17, No. (4), PP. 491-502, 2005. [15] Bostani, H., Sheikhan.M, “Hybrid of binary gravitational search algorithm and mutual information for feature selection in intrusion detection systems,” Soft computing, Vol. 21, No. (9), PP. 2307-2324, 2017. [16] Li, Yuancheng, Rong Ma, and Runhai Jiao. "A hybrid malicious code detection method based on deep learning." methods 9.5 (2015). [17] Vincent, Pascal, et al. "Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion." Journal of machine learning research11.Dec (2010): 3371- 3408.