Sei sulla pagina 1di 4

IJCST Vol. 3, ISSue 1, Jan.

- MarCh 2012

ISSN : 0976-8491 (Online) | ISSN : 2229-4333 (Print)

Performance Comparison of Features Reduction Techniques for Intrusion Detection System


1 1,2

Dept. of CSE, TIT, Bhopal, MP, India


II. The Data Set KDD99 is the mostly widely used data set for the evaluation of ID systems [13]. But this dataset suffers from problem of redundant records and uneven distribution of attacks, To solve these issues, a new data set, NSL-KDD [14], which consists of selected records of the complete KDD data set and does not suffer from any of mentioned shortcomings is used in our work. Some of the Advantages of NSL-KDD over KDD99 are listed in [13]. Single connection vectors in network traffic data contains 41 features and is labeled as either normal or an attack, with exactly one specific attack type. The simulated attacks fall in one of the following four categories: A. Denial of Service Attack (DoS) is an attack in which the attacker makes some computing or memory resource too busy or too full to handle legitimate requests, or denies legitimate users access to a machine. For example: ping of death and SYN flood. B. User to Root Attack (U2R) is a class of exploit in which the attacker starts out with access to a normal user account on the system and is able to exploit some vulnerability to gain root access to the system. For example: buffer overflow attack. C. Remote to Local Attack (R2L) occurs when an attacker who has the ability to send packets to a machine over a network but who does not have an account on that machine exploits some vulnerability to gain local access as a user of that machine. For example: password guessing. D. Probing Attack is an attempt to gather information about a network and find the systems known vulnerabilities. These vulnerabilities will be exploited to attack the system. For example: Port scanning. III. Network Data Feature Label Table 1: Basic Features of Individual TCP Connection Label A B C D E F G H I Network Data Features Duration protocol_type Service flag src_bytes dst_bytes land wrongfragment urgent

Rupali Datti, 2Shilpa Lakhina

Abstract The network traffic data provided for the design of intrusion detection system always are large with ineffective information, thus we need to remove the worthless information from the original high dimensional database. In this paper, we compare the performance of two features reduction techniques on NSLKDD dataset, which is now publicly available for the evaluation of Intrusion Detection System. These feature reduction techniques include Principal Component Analysis, Linear Discriminant Analysis. After reduction Error Back-Propagation Algorithm is used for classification. Our results shows that PCA performs better than LDA with small data set and with large data set LDA is superior than PCA. Keywords Linear Discriminant Analysis (LDA), Principal Component Analysis (PCA), Intrusion Detection, Neural Networks I. Introduction Intrusion Detection Systems (IDSs) are amongst the main tools for providing security in computer systems and networks. They detect intrusions and attacks through the analysis of TCP/IP packet data [2]. Based on the data source, IDSs are classified into host-based and network-based. Detections Techniques are classified as Misuse detection and anomaly detection. misuse detection generates an alarm when a known attack signature is matched, anomaly detection identifies activities that deviate from the normal behavior of the monitored system and thus has the potential to detect novel attacks [1]. Most of the existing Intrusion detection Systems use all 41 features of KDD, 99 dataset for their evaluation. As some of these features are redundant and irrelevant this makes detection process more time consuming. by using the features reduction techniques we can improve the classification accuracy and reduce the computer resources, both memory and CPU time required to detect attack. To improve the generalization ability, we usually generate a small set of features from the original input variables by Dimensionality reduction it is preliminary transformation applied to the data prior to the used of analysis tools in classification [3-4]. Dimensionality reduction techniques possess several significant advantages some of them [5]. The low dimensional data reliable in the sense that it is guaranteed to show genuine properties of the original data. Computational complexity of low dimensional data is low, both in time and in space. The traditional and the state-of-the-art dimensionality reduction methods can be generally classified into Feature Extraction (FE) [6], and Feature Selection (FS) [7], approaches. In general, FE approaches are more effective than the FS techniques [8]. (Except for some particular cases) and they have shown to be very effective for real-world dimensionality reduction problems [9], [10-11]. These algorithms aim to extract features by projecting the original high-dimensional data into a lower-dimensional space through algebraic transformations. The classical FE algorithms are generally classified into linear and nonlinear algorithms. Linear algorithms, such as Principal Component Analysis (PCA) Linear Discriminant Analysis (LDA) aim to project the high dimensional data to a lower-dimensional space by linear transformations according to some criteria.

332

InternatIonal Journal of Computer SCIenCe and teChnology

w w w. i j c s t. c o m

ISSN : 0976-8491 (Online) | ISSN : 2229-4333 (Print)

IJCST Vol. 3, ISSue 1, Jan. - MarCh 2012

Table 2: Content Features within a Aonnection Suggested by Domain Knowledge Label J K L M N O P Q R S T U V Network Data Features Hot num_failed_logins logged_in num_compromised root_shell su_attempted num_root num_file_creations num_shells num_access_files num_outbounds_cmds is_hot_login is_guest_login

Linear Discriminant Analysis Principal Component Analysis

V. Principal Component Analysis The goal of PCA is to reduce the dimensionality of the data while retaining as much as possible of the variation present in the original dataset. It is a way of identifying patterns in data, and expressing the data in such a way as to highlight their similarities and differences [16]. The average observation is defined as The deviation from the average is defined as The sample covariance matrix of the data set is defined as AAT Now Compute the eigen values and arrange them in descending order and compute the eigenvectors of corresponding to eigen values and keep only the terms corresponding to the K largest eigen values How to choose the principal components? - To choose K, use the following criterion > Threshold (e.g., 0.9 or 0.95 ) VI. Linear Discriminant Analysis (LDA) Linear Discriminant Analysis (LDA) finds the vectors in the underlying space that best discriminate among classes [17]. For all samples of all classes, the between-class scatter matrix SB and the within-class scatter matrix SW are defined by: ) Where is the number of training samples in class , is the number of distinct classes, is the mean vector of samples belonging to class and represents the set of samples belonging to class with being the th data of that class. represents the scatter of features around the mean of each class and represents the scatter of features around the overall mean for all classes. The goal is to maximize while minimizing , in other words, maximize the ratio det | SB | / det | Sw | This ratio is maximized when the column vectors of the projection matrix are the eigenvectors of Sw-1 SB. In order to prevent Sw to become singular, Information Gain is used as a preprocessing step. VII. Classification using Back-Propagation Algorithm After reduction by both the algorithms (PCA, and LDA), Backpropagation algorithm is used for classification of attack classes as is capable of making multi-class classification. VIII. Experimental Setup and Results We used java programming for implementation. Feed forward back propagation neural network algorithm has been developed for training process. The network has to discriminate the different kind of attacks We used 25192 training sample, 11850 sets of test samples as small data set and 311029 training samples and 25192 test samples as large data set (with 41 features).

Table 3: Traffic Features Computed using a Two-Second Label W X Y Z AA BB AC AD AE AF AG AH AI AJ AK AL AM AN AO Network Data Features Count sev_count serror_rate sev_serror_rate rerror_rate srv_rerror_rate same_srv_rate diff_srv_rate srv_diff_host_rate Dst_host_count Dst_host_srv_count Dst_host_same_srv_rate Dst_host_diff_srv_rate Dst_host_same_src_port_rate Dst_host_srv_diff_host_rate Dst_host_server_rate Dst_host_srv_serror_rate Dst_host_rerror_rate Dst_host_srv_rerror_rate

Table 1, Table 2 and Table 3, show all the features found in a connection. For easier referencing, each feature is assigned a label (A to AO). IV. Techniques of Feature Reduction In feature extraction, new features are extracted using some mapping (linear or nonlinear) from the original set of features. The features extraction methods (also known as transformation methods) reduce the dimensionality by projecting the original D-dimensional feature space on a d-dimensional subspace (where d<D) through a transformation, each feature in the reduced feature set is a combination of all the feature in the original feature set [15]. In our work, we are comparing two linear, dimension reduction techniques.
w w w. i j c s t. c o m

InternatIonal Journal of Computer SCIenCe and teChnology

333

IJCST Vol. 3, ISSue 1, Jan. - MarCh 2012

ISSN : 0976-8491 (Online) | ISSN : 2229-4333 (Print)

Table 4: Experimental Results After Feature Reduction Using LDA and PCA With Small Data Set: (Training steps=100, hidden layer=25) Attack Classes Normal DOS Probe U2R R2L Trainingtimetaken LDA 14433 8905 1854 0 0 72343 PCA 13695 9308 2189 0 0 45609 Target 13637 9234 2289 11 21

Fig. 3: Comparison of Number of Detections with LDA and PCA With Large Data Set: (Hidden Layers=25 and Max Steps for Training taken =100)

Fig. 1: Comparison of Number of Detections with LDA and PCA With Small Data Set: (Hidden Layers=25 and Max Steps for Training taken =100)

Fig. 4. Comparison of Training Time With LDA and PCA With Large Data Set: IX. PCA versus LDA The unsupervised PCA might outperform LDA when the number of samples per class is small or when the training data non uniformly sample the underlying distribution. one may never knows in advance the underlying distributions for the different classes. So, one could argue that in practice it would be difficult to ascertain whether or not the available training data is adequate for the job.In other words, PCA outperforms LDA, when the number of training samples per class is small, but it is not the case with LDA which is supervised and outperforms PCA when training samples per class is large. Only linearly separable classes will remain separable after applying LDA. This is not the case with PCA. The projection axes chosen by PCA might not provide good discrimination power while LDA preserve as much of the class discriminatory information as possible.

Fig. 2: Comparison of Training Time with LDA and PCA With Small Data Set: Table 5: Experimental Results After Feature Reduction Using LDA and PCA with Large Data Set: (Training Steps=100, Hidden Layer=25) Attack Classes Normal DOS Probe U2R R2L Trainingtimetaken LDA 13812 9114 2266 0 0 72094 PCA 14576 8221 982 0 1413 505907 Target 13637 9234 2289 11 21

334

InternatIonal Journal of Computer SCIenCe and teChnology

w w w. i j c s t. c o m

ISSN : 0976-8491 (Online) | ISSN : 2229-4333 (Print)

IJCST Vol. 3, ISSue 1, Jan. - MarCh 2012

X. Conclusion Our experimental results shows that with small data set, after features extraction using LDA and PCA , 41 features are reduced to 4 features and 8 features in dataset respectively. Results of Table 4, shows that training time taken to train the Artificial neural network with LDA is large than that of PCA and classification in 5 classes (Normal, DOS, Probe, U2R, R2L) by PCA is near to target result as compare to LDA .Hence PCA outperforms LDA in terms of accuracy and training time taken to train the network when dataset is small. With Large Data set our experimental results shows that the after features extraction using LDA and PCA, 41 features are reduced to 4 features and 5 features in dataset respectively. Results of Table 5, shows that training time taken to train the Artificial neural network with PCA is large than that of LDA and classification in 5 classes (Normal, DOS , Probe ,U2R , R2L) by LDA is near to target result as compare to PCA. Hence, with large dataset LDA outperforms PCA in terms of accuracy and training time taken. References [1] H. Debar et. al.,Towards a taxonomy of intrusion detection systems, Computer Network, pp. 805-822, April 1999. [2] Sundaram, A.,An Introduction to Intrusion Detection, Crossroads: The ACM student magazine, 2(4), 1996. [3] M.A. carreira-Perpinan,Continuous Latent variable model for dimensionality reduction and sequential data reconstruction, Phd thesis, 2001. [4] C.Archer,Adaptive Principal Component Analysis, Technical Report, CSE-02-008, Oregon health & science univer-sity, November, 2002. [5] K.K.Paliwal,dimensionality Reduction of the Enhanced Feature set for HMM-based speech recognizer, Digital signal processing 2,157-173, 1992. [6] D.D. Lewis,Feature Selection and Feature Extraction for Text Categorization, Proc. Workshop Speech and Natural Language, pp. 212-217, 1992. [7] A.L. Blum, P. Langley,Selection of Relevant Features and Examples in Machine Learning, Artificial Intelligence, Vol. 97, No. 1-2, pp. 245-271, 1997. [8] R.O. Duda, P.E. Hart, D.G Stork,Pattern Classification, second ed. John Wiley, 2001. [9] W. Fan, M.D. Gordon, P. Pathak,Effective Profiling of Consumer Information Retrieval Needs: A Unified Framework And Empirical Comparison, Decision Support Systems, Vol. 40, pp. 213-233, 2004. [10] R. Hoch,Using IR Techniques for Text Classification in Document Analysis, Proc. 17th Ann. Intl ACM SIGIR Conf. Research and Development in Information Retrieval, pp. 31-40, 1994. [11] D.D. Lewis,Feature Selection and Feature Extraction for Text Categorization, Proc. Workshop Speech and Natural Language, pp. 212-217, 1992. [12] A.M. Martinez, A.C. Kak,PCA versus LDA, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 23, pp. 228233, 2001. [13] Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, Ali A. Ghorbani,A Detailed Analysis of the KDD CUP 99 Data Set, proceeding of the 2009 IEEE symposium on computational Intelligence in security and defence application. [14] [Online] Available: http://www.nsl.cs.unb.ca/NSL-KDD. [15] Nitin Khosla,Dimensionality Reduction Using Factor Analysis, 2004.
w w w. i j c s t. c o m

[16] Lindsay I Smith,A tutorial on Principal Components Analysis, 2002. [17] Kresimir Delac, Mislav Grgic, Sonja Grgic,Independent Comparative Study of PCA, ICA, and LDA on the FERET Data Set, University of Zagreb, FER, Unska 3/XII, Zagreb, Croatia, Vol. 15, pp. 252-260, 2006. Rupali Datti Pathak, Assistant Professor in department of computer Science & Engineering at Prestige Institute of Engineering & Science, Indore, India. She has completed B.E and M.Tech in Computer Science & Engineering from, Bhopal, India. Her research work includes one national paper and two international papers. She is life time member of ISTE (Indian Society for Technical Education) and member of editorial board in IJCGA (International Journal of Computer Graphics & Animation) Shilpa Lakhina has completed B.E in Computer Science & Engineering from Truba Institute of Technology; Bhopal in the year 2005.She has completed M.Tech in Computer Science & Engineering from Technocrats Institute of Technology, Bhopal in the year 2010. Her research work includes three national papers and two international papers. She is life time member of ISTE.

InternatIonal Journal of Computer SCIenCe and teChnology

335

Potrebbero piacerti anche