Intrusion Detection Systems Using Decision Trees A

See
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/228740495
Intrusion detection systems using decision trees

and support vector machines
Article · January 2004
CITATIONS READS
58 675
3 authors, including:
Ajith Abraham
Machine Intelligence Research Labs (MIR Labs), Auburn, WA, United States
1,368 PUBLICATIONS 22,110 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
FSDE-Forced Strategy Differential Evolution used for data clustering View project
Chemical reaction optimization View project
All content following this page was uploaded by Ajith Abraham on 04 June 2014.
The user has requested enhancement of the downloaded file.

Intrusion Detection Systems Using Decision Trees
and Support Vector Machines
Sandhya Peddabachigari, Ajith Abraham*, Johnson Thomas
Department of Computer Science, Oklahoma State University, USA
Abstract
Security of computers and the networks that connect them is increasingly becoming
of great significance. Intrusion detection is a mechanism of providing security to
computer networks. Although there are some existing mechanisms for Intrusion
detection, there is need to improve the performance. Data mining techniques are a
new approach for Intrusion detection. In this paper we investigate and evaluate the
decision tree data mining techniques as an intrusion detection mechanism and we
compare it with Support Vector Machines (SVM). Intrusion detection with Decision
trees and SVM were tested with benchmark 1998 DARPA Intrusion Detection data
set. Our research shows that Decision trees gives better overall performance than the
SVM.
1. Introduction
Attacks on the nation’s computer infrastructures are becoming an increasingly
serious problem. Computer security is defined as the protection of computing systems
against threats to confidentiality, integrity, and availability [Sum97]. Confidentiality (or
secrecy) means that information is disclosed only according to policy, integrity means
that information is not destroyed or corrupted and that the system performs correctly,
availability means that system services are available when they are needed. Computing
system refers to computers, computer networks, and the information they handle.
Security threats come from different sources such as natural forces (such as flood),
accidents (such as fire), failure of services (such as power) and people known as
intruders. There are two types of intruders: the external intruders who are unauthorized
users of the machines they attack, and internal intruders, who have permission to access
the system with some restrictions. The traditional prevention techniques such as user
authentication, data encryption, avoiding programming errors and firewalls are used as
the first line of defense for computer security. If a password is weak and is compromised,
user authentication cannot prevent unauthorized use, firewalls are vulnerable to errors in
configuration and ambiguous or undefined security policies. They are generally unable to
protect against malicious mobile code, insider attacks and unsecured modems.
Programming errors cannot be avoided as the complexity of the system and application
software is changing rapidly leaving behind some exploitable weaknesses. Intrusion
detection is therefore required as an additional wall for protecting systems. Intrusion
detection is useful not only in detecting successful intrusions, but also provides important
information for timely countermeasures.
*
Corresponding author email: ajith.abraham@ieee.org
1
An Intrusion is defined [Hlm90] as “any set of actions that attempt to compromise
the integrity, confidentiality or availability of a resource. This includes a deliberate
unauthorized attempt to access information, manipulate information, or render a system
unreliable or unusable. An attacker can gain illegal access to a system by fooling an
authorized user into providing information that can be used to break into a system. An
attacker can deliver a piece of software to a user of a system which is actually a trojan
horse containing malicious code that gives the attacker system access. Bugs in trusted
programs can be exploited by an attacker to gain unauthorized access to a computer
system. There are legitimate actions that one can perform that when taken to the extreme
can lead to system failure. An attacker can gain access because of an error in the
configuration of a system. In some cases it is possible to fool a system into giving access
by misrepresenting oneself. An example is sending a TCP packet that has a forged source
address that makes the packet appear to come from a trusted host. Intrusions are
classified [Sun96] as six types.
1. Attempted break-ins, which are detected by atypical behavior profiles or violations
of security constraints.
2. Masquerade attacks, which are detected by atypical behavior profiles or violations
of security constraints.
3. Penetration of the security control system, which are detected by monitoring for
specific patterns of activity.
4. Leakage, which is detected by atypical use of system resources.
5. Denial of service, which is detected by atypical use of system resources.
6. Malicious use, which is detected by atypical behavior profiles, violations of security
constraints, or use of special privileges.
1.1 Intrusion Detection

The process of monitoring the events occurring in a computer system or network
and analyzing them for sign of intrusions is known as Intrusion detection. Intrusion
detection is classified into two types: misuse intrusion detection and anomaly intrusion
detection.
1. Misuse intrusion detection uses well-defined patterns of the attack that exploit
weaknesses in system and application software to identify the intrusions. These
patterns are encoded in advance and used to match against the user behavior to detect
intrusion.
2. Anomaly intrusion detection uses the normal usage behavior patterns to identify the
intrusion. The normal usage patterns are constructed from the statistical measures of
the system features, for example, the CPU and I/O activities by a particular user or
program. The behavior of the user is observed and any deviation from the constructed
normal behavior is detected as intrusion.
We have two options to secure the system completely, either prevent the threats
and vulnerabilities which come from flaws in the operating system as well as in the
application programs or detect them and take some action to prevent them in future and
also repair the damage. It is impossible in practice, and even if possible, extremely
difficult and expensive, to write a completely secure system. Transition to such a system
for use in the entire world would be an equally difficult task. Cryptographic methods can
be compromised if the passwords and keys are stolen. No matter how secure a system is,
2
it is vulnerable to insiders who abuse their privileges. There is an inverse relationship
between the level of access control and efficiency. More access controls make a system
less user-friendly and more likely of not being used. An Intrusion Detection system is a
program (or set of programs) that analyzes what happens or has happened during an
execution and tries to find indications that the computer has been misused. An Intrusion
detection system does not eliminate the use of preventive mechanism but it works as the
last defensive mechanism in securing the system. Data mining approaches are a relatively
new technique for intrusion detection. There are a wide variety of data mining algorithms
drawn from the fields of statistics, pattern recognition, machine learning, and databases.
Previous research of data mining approaches for intrusion detection model identified
several types of algorithms as useful techniques. Classification is one of the data mining
algorithms, which have been investigated as a useful technique for intrusion detection
models. In this paper we investigate the decision tree as intrusion detection model.
Comparing the decision tree model with already existing models shows its advantages
and drawbacks. In this paper we compare decision tress with support vector machines.
We investigate and evaluate intelligent systems as systems for Intrusion Detection. Our
specific objectives are to investigate and test:
1. the decision tree as an intrusion detection model
2. compare and evaluate their performance with a support vector machines intrusion
detection model
2. Literature Review
James Anderson [And80] first proposed that audit trails should be used to monitor
threats. All the available system security procedures were focused on denying access to
sensitive data from an unauthorized source. Dorothy Denning [Den87] first proposed the
concept of intrusion detection as a solution to the problem of providing a sense of
security in computer systems. The basic idea is that intrusion behavior involves abnormal
usage of the system. The model is a rule-based pattern matching system. Some models of
normal usage of the system could be constructed and verified against usage of the system
and any significant deviation from the normal usage flagged as abnormal usage. This
model served as an abstract model for further developments in this field and is known as
generic intrusion detection model and is depicted in figure 1 [Kum 95].
Audit Trail/Network Packets/Application Trails
Event Generator Assert New Rules

Modify existing Rules
Update Profiles
Activity profile Rule Set

Generate Anomaly
Records
Clock
Generate New Profiles Dynamically
Figure 1: A Generic Intrusion Detection Model
3
Statistical approaches compare the recent behavior of a user of a computer system
with observed behavior and any significant deviation is considered as intrusion. This
approach requires construction of a model for normal user behavior. IDES (Intrusion
Detection Expert System) [Lun90] first exploited the statistical approach for the detection
of intruders. It uses the intrusion detection model proposed by Denning [Den87] and
audit trails data as suggested in Anderson [And80]. IDES maintains profiles, which is a
description of a subject’s normal behavior with respect to a set of intrusion detection
measures. Profiles are updated periodically, thus allowing the system to learn new
behavior as users alter their behavior. These profiles are used to compare the user
behavior and informing significant deviation from them as the intrusion. IDES also uses
the expert system concept to detect misuse intrusions. This system has later developed as
NIDES (Next-generation Intrusion Detection Expert System) [Lun93]. The advantage of
this approach is that it adaptively learns the behavior of users, which is thus potentially
more sensitive than human experts. This system has several disadvantages. The system
can be trained for certain behavior gradually making the abnormal behavior as normal,
which makes intruders undetected. Determining the threshold above which an intrusion
should be detected is a difficult task. Setting the threshold too low results in false
positives (normal behavior detected as an intrusion) and setting too high results in false
negatives (an intrusion undetected). Attacks, which occur by sequential dependencies,
cannot be detected, as statistical analysis is insensitive to order of events.
Predictive pattern generation uses a rule base of user profiles defined as
statistically weighted event sequences [Tcl90]. This method of intrusion detection
attempts to predict future events based on events that have already occurred. This system
develops sequential rules of the from E1 – E2 – E3 (E4 = 94%; E5 = 6%) where the
various E’s are events derived from the security audit trail, and the percentage on the
right hand of the rule represent the probability of occurrence of each of the consequent
events given the occurrence of the antecedent sequence. This would mean that for the
sequence of observed events E1 followed by E2 followed by E3, the probability of event
E4 occurring is 94% and that of E5 is 6%. The rules are generated inductively with an
information theoretic algorithm that measures the applicability of rules in terms of
coverage and predictive power. An intrusion is detected if the observed sequence of
events matches the left hand side of the rule but the following events significantly deviate
from the right hand side of the rule. The main advantages of this approach include its
ability to detect and respond quickly to anomalous behavior, easier to detect users who
try to train the system during its learning period. The main problem with the system is its
inability to detect some intrusions if that particular sequence of events have not been
recognized and created into the rules.
State transition analysis approach construct the graphical representation of
intrusion behavior as a series of state changes that lead from an initial secure state to a
target compromised state. Using the audit trail as input, an analysis tool can be developed
to compare the state changes produced by the user to state transition diagrams of known
penetrations. State transition diagrams form the basis of a rule-based expert system for
detecting penetrations, called the State Transition Analysis Tool (STAT) [Por92]. The
STAT prototype is implemented in USTAT (Unix State Transition Analysis Tool) [Ilg92]
on UNIX based system. The main advantage of the method is it detects the intrusions
4
independent of audit trial record. The rules are produced from the effects of sequence of
audit trails on system state whereas in rule based methods the sequence of audit trails are
used. It is also able to detect cooperative attacks, variations to the known attacks and
attacks spanned across multiple user sessions. Disadvantages of the system are it can only
construct patterns from sequence of events but not from more complex forms and some
attacks cannot be detected, as they cannot be modeled with state transitions.
Keystroke monitoring technique utilizes a user’s keystrokes to determine the
intrusion attempt. The main approach is to pattern match the sequence of keystrokes to
some predefined sequences to detect the intrusion. The main problems with this approach
are lack of support from operating system to capture the keystroke sequences and also
many ways of expressing the sequence of keystrokes for the same attack. Some shell
programs like bash, ksh have the user definable aliases utility. These aliases make this
technique difficult to detect the intrusion attempts unless some semantic analysis of the
commands is used. Automated attacks by malicious executables cannot be detected by
this technique as they only analyze the keystrokes.
IDES [Lun90] used expert system methods for misuse intrusion detection and
statistical methods for anomaly detection. IDES expert system component evaluates
audit records as they are produced. The audit records are viewed as facts, which map to
rules in the rule-base. Firing a rule increases the suspicion rating of the user
corresponding to that record. Each user’s suspicion rating starts at zero and is increased
with each suspicious record. Once the suspicion rating surpasses a pre-defined threshold,
an intrusion is detected. There are some disadvantages to expert system method. An
Intrusion scenario that does not trigger a rule will not be detected by the rule-based
approach. Maintaining and updating a complex rule-based system can be difficult. The
rules in the expert system have to be formulated by a security professional which means
the system strength is dependent on the ability of the security personnel.
Model-Based approach attempts to model intrusions at a higher level of
abstraction than audit trail records. This allows administrators to generate their
representation of the penetration abstractly, which shifts the burden of determining what
audit records are part of a suspect sequence to the expert system. This technique differs
from the rule-based expert system technique, which simply attempt to pattern match audit
records to expert rules. Garvey and Lunt’s [Gl91] model-based approach consists of three
parts: anticipator, planner and interpreter. The anticipator generates the next set of
behaviors to be verified in the audit trail based on the current active models and passes
these sets to the planner. The planner determines how the hypothesized behavior is
reflected in the audit data and translates it into a system dependent audit trail match. The
interpreter then searches for this data in the audit trail. The system collects the
information this way until a threshold is reached, then it signals an intrusion attempt. The
advantage of this model is it can predict the intruder’s next move based on the intrusion
model, which is used to take preventive measures, what to look for next and verify
against the intrusion hypothesis. This also reduces the data to be processed as the planner
and interpreter filter the data based on their knowledge what to look for, which leads to
efficiency. There are some drawbacks to this system. The intrusion patterns must always
occur in the behavior it is looking for otherwise it cannot detect them.
The Pattern Matching [Ks94] approach encodes known intrusion signatures as
patterns that are then matched against the audit data. Intrusion signatures are classified
5
using structural interrelationships among the elements of the signatures. The patterned
signatures are matched against the audit trails and any matched pattern can be detected as
an intrusion. Intrusions can be understood and characterized in terms of the structure of
events needed to detect them. A Model of pattern matching is implemented using colored
petri nets in IDIOT [Ks95]. Intrusion signature is represented with Petri nets, the start
state and final state notion is used to define matching to detect the intrusion. This system
has several advantages. The system can be clearly separated into different parts. This
makes different solutions to be substituted for each component without changing the
overall structure of the system. Pattern specifications are declarative, which means
pattern representation of intrusion signatures can be specified by defining what needs to
be matched than how it is matched. Declarative specification of intrusion patterns enables
them to be exchanged across different operating systems with different audit trails. There
are few problems in this approach. Constructing patterns from attack scenarios is a
difficult problem and it needs human expertise. Attack scenarios that are known and
constructed into patterns by the system can only be detected, which is the common
problem of misuse detection.
3. Data Mining Techniques for Intrusion Detection

Data mining is a relatively new approach for intrusion detection. Data mining is
defined as [Gsr98] “the semi-automatic discovery of patterns, associations, changes,
anomalies, rules, and statistically significant structures and events in data”. There exist
many different types of data mining algorithms to include classification, link analysis,
clustering, association, rule abduction, deviation analysis, and sequence analysis. Using
these algorithms data mining extracts knowledge from the large data sets by analyzing
them and presents it in the intrusion detection model. This approach considers the
intrusion detection as data analysis process, whereas the previous approaches were
knowledge engineering processes.
Figure 2: Data mining process of building Intrusion detection models
6
Data mining approaches for intrusion detection was first implemented in Mining
Audit Data for Automated Models for Intrusion Detection (MADAMID) [Lsm98]. The
data mining process of building intrusion detection models is depicted in the figure 2
[Lee99]. First raw data is converted into ASCII network packet information which in turn
is converted into connection level information. These connection level records contain
within connection features like service, duration etc. Data mining algorithms are applied
to this data to create models to detect intrusions. Data mining algorithms used in this
approach are RIPPER (rule based classification algorithm), meta-classifier, frequent
episode algorithm and association rules. These algorithms are applied to audit data to
compute models that accurately capture the actual behavior of intrusions as well as
normal activities.
The RIPPER algorithm [Coh96] was used to learn the classification model in
order to identify normal and abnormal behavior. Frequent episode algorithm and
association rules together are used to construct frequent patterns from audit data records.
These frequent patterns represent the statistical summaries of network and system activity
by measuring the correlations among system features and sequential co-occurrence of
events. From the constructed frequent patterns the consistent patterns of normal activities
and the unique intrusion patterns are identified and analyzed, and then used to construct
additional features. These additional features are useful in learning the detection model
more efficiently in order to detect intrusions. RIPPER classification algorithm is then
used to learn the detection model. Meta classifier is used to learn the correlation of
intrusion evidence from multiple detection models and produce combined detection
model. The main advantage of this system is automation of data analysis through data
mining, which enables it to learn rules inductively replacing manual encoding of
intrusion patterns. However, some novel attacks may not be detected.
Audit Data Analysis and Mining (ADAM) [Bcj01] combines association rules and
classification algorithm to discover attacks in audit data. Association rules are used to
gather necessary knowledge about the nature of the audit data as the information about
patterns within individual records can improve the classification efficiency. This system
has two phases, training phase and detection phase. In the training phase database of
frequent item sets is created for the attack-free items from using only attack-free data set.
This serves as a profile against which frequent item sets found later will be compared.
Next a sliding-window, on-line algorithm is used to find frequent item sets in the last D
connections and compares them with those stored in the attack-free database, discarding
those that are deemed normal. In this phase classifier is also trained to learn the model to
detect the attack. In the detection phase a dynamic algorithm is used to produce item sets
that are considered as suspicious and used by the classification algorithm already learned
to classify the item set as attack, false alarm (normal event) or as unknown. Unknown
attacks are the ones which are not able to detect either as false alarms or as known
attacks. This method attempts to detect only anomaly attacks.
3.1 Neural Networks

Neural networks have been used both in anomaly intrusion detection as well as in
misuse intrusion detection. For anomaly intrusion detection, neural networks were
modeled to learn the typical characteristics of system users and identify statistically
significant variations from the user's established behavior. In misuse intrusion detection
7
the neural network would receive data from the network stream and analyze the
information for instances of misuse.
In the first approach of neural networks [Dbs92] for intrusion detection, the
system learns to predict the next command based on a sequence of previous commands
by a user. Here a shifting window of w recent commands is used. The predicted
command of the user is compared with the actual command of the user and any deviation
is signaled as intrusion. If w is too small, there will be many false positives and if it is too
big some attacks may not be detected. NNID (Neural Network Intrusion Detector)
[Rlm98] identifies users based on the distribution of commands used by the user. This
system has three phases. In the first phase it collects the training data from the audit logs
for each user for some period and constructs a vector from that data to represent the
command execution by each user. In the second phase, neural network is trained to
identify the user based on these command distribution vectors. In the final phase the
network identify the user for each new command distribution vector. If the networks
identified user is different from the actual user, it signals anomaly intrusion.
A neural network for misuse detection is implemented [Can98] in two ways. The
first approach incorporates the neural network component into an existing or modified
expert system. This method uses the neural network to filter the incoming data for
suspicious events and forward them to the expert system. This improves the effectiveness
of the detection system. The second approach uses the neural network as a stand alone
misuse detection system. In this method, the neural network would receive data from the
network stream and analyze it for misuse intrusion. There are several advantages to this
approach. It has the ability to learn the characteristics of misuse attacks and identify
instances that are unlike any which have been observed before by the network. It has high
degree of accuracy to recognize known suspicious events. Neural network works well on
noisy data. Inherent speed of neural networks is helpful in real time intrusion detection
system. It has some problems also. The main problem is training of neural networks. The
training phase requires very large amount of data.
3.2 Support Vector Machines

Support Vector Machines [Mjs02] have been proposed as a novel technique for
intrusion detection. A Support Vector Machine (SVM) maps input (real-valued) feature
vectors into a higher dimensional feature space through some nonlinear mapping. SVMs
are powerful tools for providing solutions to classification, regression and density
estimation problems. These are developed on the principle of structural risk minimization
[Vla95]. Structural risk minimization seeks to find a hypothesis h for which one can find
lowest probability of error. The structural risk minimization can be achieved by finding
the hyper plane with maximum separable margin for the data.
Computing the hyper plane to separate the data points i.e. training a SVM leads to
quadratic optimization problem [Vla95], [Joa98]. SVM uses a feature called kernel to
solve this problem. Kernel transforms linear algorithms into nonlinear ones via a map
into feature spaces. There are many kernel functions; some of them are Polynomial,
radial basis functions, two layer sigmoid neural nets etc. The user may provide one of
these functions at the time of training classifier, which selects support vectors along the
surface of this function. SVMs classify data by using these support vectors, which are
members of the set of training inputs that outline a hyper plane in feature space. The
implementation of SVM intrusion detection system has two phases: training and testing.
8
The main advantage of this method is speed of the SVMs, as the capability of detecting
intrusions in real-time is very important. SVMs can learn a larger set of patterns and be
able to scale better, because the classification complexity does not depend on the
dimensionality of the feature space. SVMs also have the ability to update the training
patterns dynamically whenever there is a new pattern during classification. The main
disadvantage is SVM can only handle binary-class classification whereas intrusion
detection requires multi-class classification.
3.3 Decision Trees

Decision tree induction is one of the classification algorithms in data mining. The
Classification algorithm is inductively learned to construct a model from the pre-
classified data set. Each data item is defined by values of the attributes. Classification
may be viewed as mapping from a set of attributes to a particular class. The Decision
tree classifies the given data item using the values of its attributes. The decision tree is
initially constructed from a set of pre-classified data. The main approach is to select the
attributes, which best divides the data items into their classes. According to the values of
these attributes the data items are partitioned. This process is recursively applied to each
partitioned subset of the data items. The process terminates when all the data items in
current subset belongs to the same class. A node of a decision tree specifies an attribute
by which the data is to be partitioned. Each node has a number of edges, which are
labeled according to a possible value of the attribute in the parent node. An edge connects
either two nodes or a node and a leaf. Leaves are labeled with a decision value for
categorization of the data.
Induction of the decision tree uses the training data, which is described in terms of
the attributes. The main problem here is deciding the attribute, which will best partition
the data into various classes. The ID3 algorithm [Qun86] uses the information theoretic
approach to solve this problem. Information theory uses the concept of entropy, which
measures the impurity of a data items. The value of entropy is small when the class
distribution is uneven, that is when all the data items belong to one class. The entropy
value is higher when the class distribution is more even, that is when the data items have
more classes. Information gain is a measure on the utility of each attribute in classifying
the data items. It is measured using the entropy value. Information gain measures the
decrease of the weighted average impurity (entropy) of the attributes compared with the
impurity of the complete set of data items. Therefore, the attributes with the largest
information gain are considered as the most useful for classifying the data items.
To classify an unknown object, one starts at the root of the decision tree and
follows the branch indicated by the outcome of each test until a leaf node is reached. The
name of the class at the leaf node is the resulting classification. Decision tree induction
has been implemented with several algorithms. Some of them are ID3 [Qun86] and later
on it was extended into C4.5 [Qun93] and C5.0. Another algorithm for decision trees is
CART [Bre84]. Of particular interest to this work is the C4.5 decision tree algorithm.
C4.5 avoids over fitting the data by determining a decision tree, it handles continuous
attributes, is able to choose an appropriate attribute selection measure, handles training
data with missing attribute values and improves computation efficiency. C4.5 builds the
tree from a set of data items using the best attribute to test in order to divide the data item
into subsets and then it uses the same procedure on each sub set recursively. The best
9
attribute to divide the subset at each stage is selected using the information gain of the
attributes.
3.3.1 Decision Trees as Intrusion Detection Model

Intrusion detection can be considered as classification problem where each
connection or user is identified either as one of the attack types or normal based on some
existing data. Decision trees can solve this classification problem of intrusion detection as
they learn the model from the data set and can classify the new data item into one of the
classes specified in the data set. Decision trees can be used as a misuse intrusion
detection as they can learn a model based on the training data and can predict the future
data as one of the attack types or normal based on the learned model.
Decision trees work well with large data sets. This is important as large amounts
of data flow across computer networks. The high performance of Decision trees makes
them useful in real-time intrusion detection. Decision trees construct easily interpretable
models, which is useful for a security officer to inspect and edit. These models can also
be used in the rule-based models with minimum processing. Generalization accuracy of
decision trees is another useful property for intrusion detection model. There will always
be some new attacks on the system which are small variations of known attacks after the
intrusion detection models are built. The ability to detect these new intrusions is possible
due to the generalization accuracy of decision trees.
4. Experimentation Setup And Performance Evaluation
4.1 Intrusion Data

The KDD Cup 1999 Intrusion detection contest data [KDD99] is used in our
experiments. This data was prepared by the 1998 DARPA Intrusion Detection Evaluation
program by MIT Lincoln Labs [MIT]. They acquired nine weeks of raw TCP dump data.
The raw data was processed into connection records, which are about five million
connection records. The data set contains 24 attack types. All these attacks fall into four
main categories.
1. Denial of Service (DOS): In this type of attacks an attacker makes some
computing or memory resources too busy or too full to handle legitimate requests,
or denies legitimate users access to a machine. Examples are Apache2, Back,
Land, Mailbomb, SYN Flood, Ping of death, Process table, Smurf, Teardrop.
2. Remote to User (R2L): In this type of attacks an attacker who does not have an
account on a remote machine sends packets to that machine over a network and
exploits some vulnerability to gain local access as a user of that machine.
Examples are Dictionary, Ftp_write, Guest, Imap, Named, Phf, Sendmail, Xlock.
3. User to Root (U2R): In this type of attacks an attacker starts out with access to a
normal user account on the system and is able to exploit vulnerability to gain root
access to the system. Examples are Eject, Loadmodule, Ps, Xterm, Perl, Fdformat.
4. Probing: In this type of attacks an attacker scans a network of computers to
gather information or find known vulnerabilities. An attacker with a map of
machines and services that available on a network can use this information to look
for exploits. Examples are Ipsweep, Mscan, Saint, Satan, Nmap.
10
The original data contains 744 MB data with 4,940,000 records. The data set has 41
attributes for each connection record plus one class label. Some features are derived
features which are useful in distinguishing normal connection from attacks. Some
features examine only the connections in the past two seconds that have the same
destination host as the current connection, and calculate statistics related to protocol
behavior, service, etc. These are called same host features. Some features examine only
the connections in the past two seconds that have the same service as the current
connection and are called same service features. Same host and same service features are
together called time-based traffic features of the connection level records. Some other
connection records were also sorted by destination host, and features were constructed
using a window of 100 connections to the same host instead of a time window. These are
called host-based traffic features. R2L and U2R attacks don’t have any sequential
patterns like DOS and Probe because the former attacks have the attacks embedded in the
data packets whereas the later attacks have many connections in a short amount of time.
So some features that look for suspicious behavior in the data packets like number of
failed logins are constructed and these are called content features.
4.2 Experimentation Setup and Results Analysis

Our experiments have two phases, namely a training and a testing phase. In the
training phase the system constructs a model using the training data to give maximum
generalization accuracy (accuracy on unseen data). The test data is tested with the
constructed model to detect the intrusion in the testing phase. We have written a C++
program for processing the data from 24 attacks to four classes of attacks. The main
purpose of Intrusion detection models is to classify the data set into one of the four attack
types or normal. The data set for our experiments contained 11982 records, which are
randomly generated from the data set. This data set has five different classes, random
generation of data include the number of data from each class proportional to its size,
except that the smallest class is completely included. This data set is again divided into
training data with 5092 records and testing with 6890 records. All the intrusion detection
models are trained and tested with the same set of data. As the data set has five different
classes we perform a 5-class classification. The normal data belongs to class1, probe
belongs to class2, denial of service (DOS) belongs to class3, user to root (U2R) belongs
to class4 and remote to local (R2L) belongs to class5. We used WEKA (Waikato
Environment for Knowledge Analysis) software for decision trees and SVM. WEKA
accepts the data in ARFF (Attribute-Relation File Format) file. An ARFF file is an ASCII
text file format, which contains a list of instances containing a set of attributes. This
ARFF file contains only nominal (categorical) and numeric values. Data should be first
saved as comma-separated file format (.CSV), through spread sheets. The next step is to
add the data set’s name using @relation tag, the attribute information using @attribute
tag and @data line. For nominal attributes, all possible values need to be defined at the
start of the file. We have written some C++ programs to preprocess the data to obtain the
values for all the nominal attributes. Our data set had seven nominal attributes. We used
an AMD Athlon, 1.67 GHz processor with 992 MB of RAM for our experiments.
4.2.1 Decision Tree

To compare decision tree performance with SVM which is a binary classifier, we
used binary decision tree classifier although they are capable of handling a 5-class
11
classification problem. We constructed five different classifiers. The data is partitioned
into the two classes of “Normal” and “Attack” patterns where Attack is the collection of
four classes (Probe, DOS, U2R, and R2L) of attacks. The objective is to separate normal
and attack patterns. We repeat this process for all the five classes. First a classifier is
constructed using the training data and then testing data is tested with the constructed
classifier to classify the data into normal or attack. Table 1 summarizes the results of the
test data. It shows the training and testing times of the classifier in seconds for each of the
five classes and their accuracy in percentage terms.
Training time Testing time Accuracy

(sec) (sec) (%)
Normal 1.53 0.03 99.64
Probe 3.09 0.02 99.86
DOS 1.92 0.03 96.83
U2R 1.16 0.03 68
R2L 2.36 0.03 84.19
Table 1: Performance of Decision Tree
4.2.2 Support Vector Machines

As SVMs are able to handle only binary class classification problems, we need to
employ five SVMs, for the 5-class classification intrusion detection. We divided the data
into the two classes of “Normal” and “Attack” patterns, where the Attack is collection of
four classes of attacks (Probe, DOS, U2R, and R2L). The classifier is learned from the
training data and it is used on the test data to classify the data into normal or attack
patterns. This process is repeated for all classes. The results are summarized in the table
2. It shows the training time and testing time in seconds for each of the five classes and
their accuracy in percentage terms.
Training time Testing time Accuracy

(sec) (sec) (%)
Normal 5.02 0.13 99.64
Probe 1.33 0.13 98.57
DOS 19.24 2.11 99.78
U2R 3.05 0.95 40.00
R2L 2.02 0.13 34.00
Table 2: Performance of the SVM
Kernel option defines the feature space in which the training set examples will be
classified. Both our trial and error experiments and a previous study [Aa02] showed that
12
polynomial kernel option often performs well on most of the datasets. Therefore we
decided to use the polynomial kernel for our experiments. From our experiments we
observed that for different class of data, different polynomial degrees gives different
performance and the results are presented in the table 3. We therefore used different
polynomial degrees for different classes.
Polynomial Degree 1 2 3
Normal 99.64 99.64 99.64
Probe 98.57 64.85 61.72
DOS 70.99 99.92 99.78
U2R 40.00 40.00 40.00
R2L 33.92 31.44 28.06
Table 3: Classification accuracy of different polynomial kernel degrees
4.2.3 Comparison of Decision tree with SVM

To evaluate the performance of decision tree intrusion detection model we
compared it with SVM in terms of accuracy, training and testing times and we summarize
the results in table 4. Decision tree gives better accuracy for Probe, R2L and U2R classes
compared to SVM and it gives worse accuracy for DOS class of attacks. For Normal
class both gives the same performance. There is a small difference in the accuracy for
Normal, Probe and DOS classes for decision trees and SVM but there is a significant
difference for U2R and R2L classes. These two classes have small training data
compared to other classes, so we can conclude that decision tree gives good accuracy
with small training data sets. The training time and testing times are also less for
decision tree compared to the SVM.
Decision Tree SVM

Class Training Testing Accuracy Training Testing Accuracy
time(s) time(s) (%) time(s) time(s) (%)
Normal 1.53 0.03 99.64 5.02 0.13 99.64
Probe 3.09 0.02 99.86 1.33 0.13 98.57
DOS 1.92 0.03 96.83 19.24 2.11 99.78
U2R 1.16 0.03 68.00 3.05 0.95 40.00
R2L 2.36 0.03 84.19 2.02 0.13 34.00
Table 4: Performance Comparison of Decision tree with SVM
The graph in figure 3 shows the performance of decision tree and SVM in terms
of accuracy for the R2L class of data. Data set of R2L class contains 563 data points and
13
as it is difficult to represent all of them in the graph 30 data points were used. The
classification value of 1 in the graph represents a correct classification and value of 2
represents a misclassification. The graph shows that SVM misclassification was much
more than the decision tree which classified most of them correctly. We can conclude
that a decision tree gives very good performance when compared to SVM for R2L class
data.
2.5
2
Classification
1.5
0.5
0
10
13
16
19
22
25
28
1
Data Points
Actual Decision tree SVM
Figure 3: Performance Comparison of Decision tree with SVM
5. Conclusions
In this research we have investigated some new techniques for intrusion detection
and evaluated their performance on the benchmark KDD Cup 99 Intrusion data. We have
first explored a decision tree as an intrusion detection model. We also conducted
experiments with support vector machines (SVM) and compared the decision tree
performance with this model. As the decision tree was used as a binary classifier, we
employed five classifiers for 5-class classification. The empirical results indicate that
decision tree gives better accuracy than SVM for Probe, U2R and R2L classes whereas
for Normal class both gives same accuracy and for DOS class decision tree gives slightly
worse accuracy than decision tree. From empirical results of U2R and R2L classes which
have small training data and for which decision tree gives better performance than SVM,
we can say that decision tree works well with small training data. The results also show
that testing time and training time of the classifiers are slightly better than SVM.
14
Moreover, decision tree is capable of multi-class classification which is not possible with
SVM. Multi-class classification is a very useful feature for intrusion detection models.
With the increasing incidents of cyber attacks, building an effective intrusion
detection models with good accuracy and real-time performance are essential. Data
mining is relatively new approach for intrusion detection. More data mining techniques
should be investigated and their efficiency should be evaluated as intrusion detection
models. Future work would include a hybridization approach with various different
models in order to overcome their individual limitations and improve the performance of
the model from their complementary features. The Ensemble approach should also be
investigated with various combinations of classifiers to improve the performance.
References
[Aa02] A. B. M. S. Ali, A. Abraham. An Empirical Comparison of Kernel Selection for Support Vector
Machines. 2nd International Conference on Hybrid Intelligent Systems: Design, Management and
Applications, The Netherlands, 2002.
[And80] J. P. Anderson. Computer Security Threat Monitoring and Surveillance. Technical report, James P
Anderson Co., Fort Washington, Pennsylvania, April 1980.
[Bcj01] D. Barbara, J. Couto, S. Jajodia and N. Wu. ADAM: A Testbed for Exploring the Use of Data
Mining in Intrusion Detection. SIGMOD Record, 30(4):15--24, 2001.
[Bre84] L. Brieman, J. Friedman, R. Olshen, and C. Stone. Classification of Regression Trees. Wadsworth
Inc., 1984.
[Can98] J. Cannady. Artificial Neural Networks for Misuse Detection. National Information Systems
Security Conference, 1998.
[Coh96] William Cohen. Learning Trees and Rules with Set-Valued Features. American Association for
Artificial Intelligence (AAAI), 1996.
[Den87] D. E. Denning. An Intrusion Detection Model. In IEEE Transactions on Software Engineering, pp.
222-228, February 1987.
[Dbs92] H. Debar, M. Becke, D. Siboni. A Neural Network Component for an Intrusion Detection System.
Proceedings of the IEEE Computer Society Symposium on Research in Security and Privacy,
1992.
[Gl91] T. D. Garvey and T. F. Lunt. Model based intrusion detection. In Proceedings of the 14th National
Computer Security Conference, pages 372-385, October 1991.
[Gsr98] R. Grossman, S. Kasif, R. Moore, D. Rocke, and J. Ullman. Data Mining Research: Opportunities
and Challenges, A report of three NSF workshops on Mining Large, Massive, and Distributed
Data, January 1998.
[Hlm90] R. Heady, G. Luger, A. Maccabe, and M. Servilla. The Architecture of a Network level Intrusion
Detection System. Technical report, Department of Computer Science, University of New Mexico,
August 1990.
[Joa98] Joachims T. Making Large-Scale SVM Learning Practical. LS8-Report, University of Dortmund,
LS VIII-Report, 1998.
15
[Ilg92] K. Ilgun. USTAT: A Real-Time Intrusion Detection System for UNIX. Master Thesis, University of
California, Santa Barbara, November 1992.
[KDD99] KDD cup 99 Intrusion detection data set.

<http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10_percent.gz>
[Ks94] S. Kumar and E. H. Spafford. An Application of Pattern Matching in Intrusion Detection. Technical
Report CSD-TR-94-013, Purdue University, 1994.
[Ks95] S. Kumar. Classification and Detection of Computer Intrusions. PhD Thesis, Department of
Computer Science, Purdue University, August 1995.
[Lee99] W. Lee. A Data Mining Framework for Constructing Features and Models for Intrusion Detection
Systems. PhD Thesis, Computer Science Department, Columbia University, June 1999.
[Lsm98] W. Lee and S. Stolfo. Data Mining Approaches for Intrusion Detection. In proceedings of the 7th
USENIX Security Symposium, 1998.
[Lsm99] W. Lee and S. Stolfo and K. Mok. A Data Mining Framework for Building Intrusion Detection
Models. In Proceedings of the IEEE Symposium on Security and Privacy, 1999.
[Lun90] T.F. Lunt, A. Tamaru, F. Gilham et al, A REAL-TIME INTRUSIONDETECTION EXPERT

SYSTEM (IDES), Final Technical Report, Project 6784, SRI International 1990
[Lun93] T. Lunt. Detecting intruders in computer systems. In Proceedings of the 1993 Conference on
Auditing and Computer Technology, 1993.
[MIT] MIT Lincoln Laboratory. <http://www.ll.mit.edu/IST/ideval/>
[Mjs02] S. Mukkamala, G. Janoski, A. Sung. Intrusion Detection Using Neural Networks and Support
Vector Machines. Proceedings of IEEE International Joint Conference n Neural Networks,
pp.1702-1707, 2002
[Por92] P. A. Porras. STAT: A State Transition Analysis Tool for Intrusion Detection. Master’s Thesis,
Computer Science Dept., University of California, Santa Barbara, July 1992.
[Qun86] J. R. Quinlan. Induction of Decision Trees. Machine Learning, 1:81-106, 1986.
[Qun93] J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
[Rlm98] J. Ryan, M. J. Lin, R. Miikkulainen. Intrusion Detection with Neural Networks. Advances in
Neural Information Processing Systems 10, Cambridge, MA: MIT Press, 1998.
[Sum97] R. C. Summers. Secure Computing: Threats and Safeguards. McGraw Hill, New York, 1997.
[Sun96] A. Sundaram. An Introduction to Intrusion Detection. ACM Cross Roads, Vol. 2, No. 4, April
1996.
[Tcl90] H. S. Teng, K. Chen and S. C. Lu. Security Audit Trail Analysis Using Inductively Generated
Predictive Rules. In Proceedings of the 11th National Conference on Artificial Intelligence
Applications, pages 24-29, IEEE, IEEE Service Center, Piscataway, NJ, March 1990.
[Vla95] Valdimir V. N. The Nature of Statistical Learning Theory, Springer, 1995.
16
View publication stats

Intrusion Detection Systems Using Decision Trees A

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Intrusion Detection Systems Using Decision Trees A

Caricato da

Copyright:

Formati disponibili

See

Intrusion detection systems using decision trees

Article · January 2004

Chemical reaction optimization View project

The user has requested enhancement of the downloaded file.

1.1 Intrusion Detection

Event Generator Assert New Rules

Activity profile Rule Set

Figure 1: A Generic Intrusion Detection Model

3. Data Mining Techniques for Intrusion Detection

Figure 2: Data mining process of building Intrusion detection models

3.1 Neural Networks

3.2 Support Vector Machines

3.3 Decision Trees

3.3.1 Decision Trees as Intrusion Detection Model

4. Experimentation Setup And Performance Evaluation

4.1 Intrusion Data

4.2 Experimentation Setup and Results Analysis

4.2.1 Decision Tree

Training time Testing time Accuracy

Probe 3.09 0.02 99.86

DOS 1.92 0.03 96.83

U2R 1.16 0.03 68

R2L 2.36 0.03 84.19

Table 1: Performance of Decision Tree

4.2.2 Support Vector Machines

Training time Testing time Accuracy

Probe 1.33 0.13 98.57

DOS 19.24 2.11 99.78

U2R 3.05 0.95 40.00

R2L 2.02 0.13 34.00

Table 2: Performance of the SVM

Normal 99.64 99.64 99.64

Probe 98.57 64.85 61.72

DOS 70.99 99.92 99.78

U2R 40.00 40.00 40.00

R2L 33.92 31.44 28.06

Table 3: Classification accuracy of different polynomial kernel degrees

4.2.3 Comparison of Decision tree with SVM

Decision Tree SVM

Probe 3.09 0.02 99.86 1.33 0.13 98.57

DOS 1.92 0.03 96.83 19.24 2.11 99.78

U2R 1.16 0.03 68.00 3.05 0.95 40.00

R2L 2.36 0.03 84.19 2.02 0.13 34.00

Table 4: Performance Comparison of Decision tree with SVM

Actual Decision tree SVM

Figure 3: Performance Comparison of Decision tree with SVM

[KDD99] KDD cup 99 Intrusion detection data set.

[Lun90] T.F. Lunt, A. Tamaru, F. Gilham et al, A REAL-TIME INTRUSIONDETECTION EXPERT

[MIT] MIT Lincoln Laboratory. <http://www.ll.mit.edu/IST/ideval/>

[Qun86] J. R. Quinlan. Induction of Decision Trees. Machine Learning, 1:81-106, 1986.

[Vla95] Valdimir V. N. The Nature of Statistical Learning Theory, Springer, 1995.

View publication stats

Potrebbero piacerti anche