Exploring On Deployment of Real Time Data Mining Based Intrusion Detection Systems

ISSN: 2395-0560
International Research Journal of Innovative Engineering

www.irjie.com
Volume1, Issue 5 of May 2015
Exploring on Deployment of Real Time

Data Mining based Intrusion detection systems
Anthony Raj. A
Assistant Professor, Dept of CS, Sri Bhagawan Mahaveer Jain College, KGF, Karnataka Research Scholar, PRIST University, Thanjavur,
Dr. A. Arul Lawrence Selvakumar

Professor & Head, Dept of CSE, Rajiv Gandhi Institute of Technology, Bangalore, Karnataka, India
Abstract As days go by new internet technologies like Big data, Cloud computing, new trends of e-business transactions keep
expanding , but the information security remains the great concern for the internet users, online service providers and
programmers, for researchers who are finding solutions to the current problems which are information security concerned, and
for the system developers who involved in finding system data safety requirements, analyzing ,designing the algorithms, coding,
testing, deploying, maintaining the current standard versions of the Intrusion detection system becomes a great challenge. The
existing Intrusion detection system (IDS) that needs to be upgraded and tested online to the latest attacks and threats with
emerging new technology. Current offline Intrusion detection systems found to be outdated as e-commerce trends increases and
the demand for Real time Intrusion detection system becomes more prominent in current complex high speed data network
environments.
In this manuscript we present key features required for deploying the real time data mining based intrusion detection
system that is going to be fit in todays complex network environment. We present important aspects of the domain in a
systematic way. First we present main objectives and requirements of the real-time data mining based IDS and next we present
current approach of network attack scenario then we propose Data mining based IDS model and its components required for
detecting and preventing and responding to the attacks. Then we present the cost effective, more accurate detective methods, and
with high detection rate, accuracy and low false positive rate oriented data mining techniques which are related to Real time IDS
and next present the issues related to real time data mining based IDS,and parameters concerned for evaluating the IDS, finally
present the findings and the limitations of the study with conclusion.
Keyword Real time ID, Alert reduction, False positive rate, Anomaly detection, DARPA data set.
.
1. Introduction
The work of intrusion detection system (IDS) must be able to detect, prevent, and react to the intrusions in computer data
networks. Intrusions are the malicious activities intended to steal and destroy and deny the services to the intended users.
Intrusion detection system generally classified based on the data source in which it depends on. They are like host based,
network based, sensor alerts, and application logs. The key components of IDS are sensors and detectors, database, management
server, management interface. Based on Time Aspects the IDS classified of two types the one is offline IDS and real time IDS.
The difference between the two, the first one offline -IDS examines and analysis and detects the intrusions after collecting the
data from online and detects for intrusions when session are switched off mode, the other Real time IDS which examines ,detects
for intrusion when the sessions are in on line mode. Intrusion detection system employs two different analysis strategies that they
are anomaly and misuse detection methods. In current scenario the tradition methods like misuse detection methods are not
found to be much useful and need to be combined with data mining techniques to produce more accurate detection rate of
intrusions. Most of the intrusion detection systems have binary- classify algorithms which differentiate between which is normal
and intrusion.
__________________________________________________________________________________________________
2015, IRJIE-All Rights Reserved
Page -6
ISSN: 2395-0560

www.irjie.com
In this paper we are interested in presenting the Real time data mining based intrusion Detection system. And focus more on
using applied Anomaly detection analysis strategy. In the next section we present role of real time IDS and important features
which are meant for intrusion detection. In later section we extract some features and present which is required for real time data
mining based intrusion detection. [1, 4]
2. Key features of Intrusion detection system (IDS)

As data networks day by day become more complex. It is necessary for us to identify the data network environment in which
we are interested in to deploy the real time IDS. The given below IDS feature diagram classifies IDS with the following features.
And these are the Information source, Analysis strategy; time Aspects, Architecture, Activeness, and continuity. One significant
aspect of this below given diagram is that it guides the researcher or developer to identify his area of an interest in deploying the
real time IDS.
In this paper we present following used features in deploying the data mining based real-time IDS. For proposed Real
time IDS the Information source mainly acquired from Network based, Analysis strategy used Anomaly and Misuse Detection
method, Time aspects mainly used real-time prediction and architecture used as distributed and heterogeneous environment, for
activeness of IDS active response method used, and continuity of IDS would be continuous monitoring. In this paper we explain,
procedures, ideas, techniques used and proposed in detail in later sections. [5, 7]
__________________________________________________________________________________________________
Page -7
ISSN: 2395-0560

www.irjie.com
Features of intrusion detection System
2.1. Attacks on trusted entities

Our aim in this paper is to present new emerging attacks and detect with real time data mining based IDS. Recent study on
network attacks reveal that the attackers found new approach to find a way to attack victims in which victims depend upon
trusted entities like internet, web application programs and other e-Business transactions which includes online and offline.
Attackers exploit the trusted entities by compromising the trusted sites and other application programs and intended to break the
integrity of the system. It is found 61% of the vulnerabilities were of web application. It is found that many dangerous viruses
have been launched via WebPages. Hence detecting, analysing and reducing them are essential. In this research paper we
present the IDS Model that will monitor the trusted entities using the Real time data mining based Intrusion detection system. [8]
__________________________________________________________________________________________________
Page -8
ISSN: 2395-0560

www.irjie.com
3. Proposed IDS Model

The proposed model of Real time data mining based Intrusion detection system that we have shown on right side figure found to
be potentially useful in detecting automated intrusion attacks. The model components contain sensors, detectors, data warehouse,
and data mining based model generator. The model support enormous data in collecting, storing, sharing, and in computing and
load balancing in multi sensor detectors environment. The model has got compatibility, reliability, accuracy and high
performance features.
The main issue in real-time intrusion detection system is that enormous and continuous incoming data that enters from the
outside information source i.e. may be internet or local networks need to be stored, handled and monitored, distributed carefully.
In our IDS model we introduced a new component called load balancing switch for distributing raw data across multiple sensors
and IDS detectors and computing resources, such as disk drives, central processing units or network links. Altogether model is
build to cope with real time environment. The load balancing component intended to optimize resource use, minimize response
time, maximize throughput, and to avoid overload of any one of the resources.
The advantage of using multiple components with load balancing instead of a single component may increase reliability
through redundancy. In our proposed model Load balancing is provided by dedicated software, hardware, such as multilayer
switch, a domain Name System server process. [9, 14]
3.1. Challenges of Managing an Intrusion Detection System (IDS)
Intrusion detection systems employ two main detection methods mainly Anomaly and Misuse detection. Studies reveal that there
are many approaches to these detection methods but main challenging of these approaches is that they generate enormous
amount of alerts in a day which are unmanageable. These include: managing the flood of alerts, creating actionable reports, and
following-up on the reported alerts. The 99% of alerts are found to be false alerts or false positive when legitimate actions were
falsely classified as false positive. The main challenge is to balance the imbalance between actual and false alarms. Through
study with research papers we found common that is in anomaly based IDS produce more false positive alerts or alarms to
__________________________________________________________________________________________________
Page -9
ISSN: 2395-0560

www.irjie.com
compare to Misuse based IDS. In the next section we offers the most required and applied data mining techniques for real time
systems and along with we find a solution for false positive alert reduction techniques through which accuracy of the IDS
system could be increased.[15]
4 . Study of Applied Techniques
In this section we present data mining techniques which are applicable in real time IDS which includes Combined
Misuse and Anomaly Detection methods, Machine learning algorithms, False positive Reduction Techniques and finally we
discuss IDS Evaluating Parameters.
4.1 Combined Misuse and Anomaly Detection Algorithms
The Misuse and Anomaly Detection algorithms both are distinct in nature. Misuse detection algorithm train over
labeled normal and intrusion data, Anomaly detection algorithm train over normal data. The drawbacks found with Misuse
detection algorithms were unable detect unanticipated, novel attacks and the signature database has to be revised for each new
discovered attack.
The drawbacks found in Anomaly detection methods were generating high false alarm rate and detected deviations do
not represent actual attacks. We use combined Misuse and Anomaly Detection method to alleviate these limitations. Because of
the reason they do not guarantee 100% accuracy results current research papers reveal that research scholars more interested in
finding Machine learning Algorithms for better detection rate, adaptability, accuracy results in Real time IDS. [16]
4.2 Machine learning Algorithms
Machine learning (ML) Algorithms that automatically learn to build models and to predict accurate results based on
past experience, observations or being taught and avoids using static program instructions i.e. Algorithms purely not rule based
but normally statistically based. There are two types of Machine learning, one is supervised learning and other unsupervised
learning.
In the unsupervised learning algorithm the correct results are not provided during the training. Algorithm extracts
hidden features using unlabeled data. The labeling can be carried out even if available labels are in small quantity of objects that
represents the desired classes. Researchers found that unsupervised learning Algorithm can be used to cluster the input data on
the basis of statistical properties only.
In the supervised learning algorithm the training data includes both inputs and desired results. These methods found to
be fast and accurate. The key terms used in machine learning algorithms are namely Training set, Validation set, and Test set. In
validation set the set of examples used to estimate the error and to tune the architecture of the classifier, in the test set the set of
examples used to assess the performance of specified classifier and finally in the Training set the set of examples used for
learning to fit the parameters of the classifier. Most used ML algorithms are given in the below table.
[16]
Machine learning algorithms
Unsupervised
Continuous
Clustering & Dimensionality Reduction

SVD
PCA
K-Means
Categorical
Association Analysis
o Apriori
o FP-Growth
Hidden Markow Model
Supervised
Regression
o Linear
o Polynomial
Decision Trees
Random Forests
Classification
o KNN
o Trees
o Logistic regression
o Navie Bayes
o SVM
__________________________________________________________________________________________________
Page -10
ISSN: 2395-0560

www.irjie.com
In our study machine-learning algorithms founds to be the best on working with classification problems in real time IDS.
However unsupervised learning algorithms are more effective and efficient because they are adaptive in learning and use of low
cost features.[16,17]
4.3 False positive Reduction Techniques
In this section we present data mining techniques for the IDS generating false positive alerts problems which are
obtained using anomaly and misuse detection algorithms. Main objective of the data mining techniques: Management of false
positive alerts, Reduction of false positive alerts, Enable to distinguish real attacks from false positives alerts and low priority
events. During the last decade there are two main approaches reviewed in published research papers for reducing the false alert
alarms in the IDS and these are namely: Configuration & Detection Techniques and Alert Processing Techniques.
4.3.1 Detection Techniques
Applied detection techniques
SVM
C4.5
Decision Tree Classification, Rule-based Classification
Decision Classification , Bayesian clustering
Self-Organizing Map, K-means Clustering
Sequential Association Mining.
All the above techniques compared with accuracy, detection rate, False alarm rate for different type of Attacks and data
network load environments. Overall in our study research scholars employed their techniques by method of combining the two
or more techniques and found to yield better results i.e. reducing the rate of false alarms and increasing the detection rate high.
[17]
4.3.2 Alerts Processing Techniques
Applied Alert processing techniques
Sequential Association Mining, fuzzy alert Aggregation
Clustering (Attribute Oriented Induction)
Machine learning(Adaptive learning for alert classification(ALAC)), clustering
Quality Parameters, Normalization, Root cause analysis.
Multi-Level Clustering (Fuzzy Cognitive Modeling)
Statistical Filtering, Rule-based Classification
Self-Organizing Map-means Clustering
Above Reviewed alert processing techniques and their experiment results found to be useful for the security analyst in
identifying the root causes and reducing alert load in the future.[17]
4.4 IDS evaluating Parameters
In this section we present reviewed standard measures to evaluate IDS like detection rate, False alarm rate, tradeoff
between detection rate and false alarm rate, Performance, Fault Tolerance, speed, cost, resource usage, effectiveness. However
current study reveals that accuracy of detection and false alarm rate are the main challenging issues in designing the effective
IDS.
In our paper we consider prediction ability is one of most important and standard evaluating parameter. Prediction
ability means to avoid or reduce misclassification rates and to give correct classification results of events.i.e. Ability to discern
which is normal and attack behavior. From the nature and prediction of IDS four possible outcomes can be listed these are:
False positive (FP) which refer where normal events are being classified as attacks, False Negative (FN) refer where attacks
being considered as normal, True negative (TN) refer the being successfully labeled as normal, True positive (TP) refer where
normal events being classified as attacks. From the four possible outcome variables from IDS, the Accuracy and precision (high
accuracy or consistency) computed using mathematical formula i.e. Accuracy = (TP+TN) / (TP+TN+FN+FP), Precision = TP /
(TP+FP). Overall IDS expected to produce high detection rate, low false alarm rate and accuracy. [16, 17]
__________________________________________________________________________________________________
Page -11
ISSN: 2395-0560

www.irjie.com
4.4.1 Evaluation Data Sources
Another aspect of IDS evaluating parameter is data source. We present some of reviewed selected standard data sets
which are KDD CUP 99, DARPA1998, DARPA 1999, DARPA 2000, Real world data sets were used to evaluate data mining
techniques and their experiment results.
4.4.2 KDD CUP 99

The KDD CUP 99 data set used in third international knowledge discovery data mining tools competition and which
was held in conjunction with KDD CUP 99 the fifth international conference on Knowledge discovery data mining. Here the
task was to build intrusion detection model that able to predict and distinguish between normal and intrusion attacks. The data
set includes wide variety of intrusions simulated in especially in military environment. Used widely in evaluating anomaly
detection methods and their experiment results. We reviewed statistically analyzed the entire KDD dataset and the analysis
showed the major draw backs which are affecting the evaluated IDS system, and results showing very poor in evaluation of
anomaly detection methods
. To overcome the draw backs new data set algorithms need to be explored and merged with
KDD data set for better evaluating the IDS systems. [17, 18]
4.4.3 DARPA Data set
The MIT Lincoln Laboratory presented IDS evaluation methodology for the practical solution for evaluating the
performance of Intrusion Detection Systems. The review of DARPA IDS evaluation dataset found to be having its limitation
and considered as a outdated dataset, and unable to accommodate the latest trend in attacks. We would like to present
various attacks classification found in network in detail in our thesis but in this paper provide in brief on IDS model
which support for the use of the DARPA IDS evaluation dataset. The Snort and Cisco IDS found to be signature-based IDSs,
and the PHAD and the ALAD are based on anomaly detectors which support the DARPA dataset for IDS evaluation. [17, 19]
5. Findings of the Study
In this paper we explored features of IDS which gives starting point for the researchers to identify the area in which he/she
in interested in.
Presented IDS architecture for real time data mining based IDS and load balancing switch will help effectively in managing
intrusions in data networks, in real time distributed network ,in load balancing & processing the data
Challenges of Managing an Intrusion Detection System (IDS) in the Enterprise which is presented in this paper, is very
useful for researchers to keep track issues and find new solutions during their research work.
The presented applied techniques focus more on Machine learning algorithms which are very much required especially for
real time data mining based IDS. Advantages found to be more accurate than human crafted rules. Disadvantages found to
be Machine learning (ML) needs a lot of labeled data.
Presented techniques toward reducing False Positives found to be more useful in alert handling and false reduction. They are
fully automated and able to adjust to environment changes without a human intervention.
Reviewed IDS evaluating Parameters reveals that other than DARPA and KDD CUP 99 dataset there are several other data
sets were used by the researchers to evaluate the better efficiency and performance of the proposed techniques and to
compare the results with others. [16,17,18,19]
6. Conclusion
In this paper we presented the explored main features of Real time IDS and data mining techniques which required for
enhancement of real time IDS. We focused on data mining techniques which are aimed to reduce false positives and alerts load.
The presented detection and alert processing techniques which are mostly used during last decade for reducing false positives
and alerts load. There is increasing interest in data mining techniques to get better results. We find there are some open issues
__________________________________________________________________________________________________
Page -12
ISSN: 2395-0560

www.irjie.com
and limitations of the study and that can be considered for further exploration. Most of the false alert reducing techniques only
acts in off-line mode, and to counteract the issue the researchers presented which are real-time, adaptive, Machine learning
algorithms and found to yield far more better results than the previous applied offline techniques But it requires much attention
to reduce its complexity of the technique and size of training set during system lifetime. In future there is much research need to
be done in detecting the intrusions at the application level due to emerging new trends of e-business, Big data and cloud
computing. Many attacks currently which are found to be targeting more on trusted entities than the system or network level.
Finally finding the better evaluation approach for Real time IDS other than using KDD CUP 99 and DARPA data set is an open
challenge for research scholars.[16,17,18,19]
ACKNOWLEDGEMENT
Im proud of the blessings of wisdom and understanding that God has bestowed upon us. I would like to thank Prof. Philip
K.Chan from Computer science Department, Florida Institute of Technology, Melbourne, Prof.Ahmad Faraahi from Payame
Noor University,Tehran, Iran and Prof. N.M shekokar for their valuable study and reviews on this paper entitled. Finally I
thank Prof. Dr.Arul Sir who is my guide and other Unknown authors for their ideas, encouragement and support.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
http://www.idrbt.ac.in/PDFs/PT%20Reports/2010/IDS_Shilpa_2010.pdf
http://www.docin.com/p-50767073.html
http://minds.cs.umn.edu/talks/tutorial.pdf
http://en.wikipedia.org/wiki?curid=328144
http://minds.cs.umn.edu/talks/tutorial.pdf
http://www.acc.com/_cs_upload/vl/membersonly/SampleFormPolicy/1189499_1.doc
http://ewh.ieee.org/cmte/cis/mtsc/ieeecis/tutorial2007/Dipankar_Dasgupta_2007.pdf
http://www.symantec.com/region/br/request/relatorio/en/Shared/files/5_Attackers_Entities.pdf
http://www.techrepublic.com/resource-library/whitepapers/load-balanced-with-distributed-self-organization-in-file-sharing-and-fileaccessing/
http://taags.net/
http://icmping.com/index.php/solutions/load-balancing
http://seminarprojects.com/Thread-real-time-data-mining-based-intrusion-detection-full-report
http://www.ijser.org/viewPaperDetail.aspx?I014046
http://www.citefactor.org/article/index/2782/data-mining-techniques-for-real-time-intrusion-detection-systems
http://www.sans.org/reading-room/whitepapers/detection/intrusion-detection-s
Monali Shetty, Prof. N.M.Shekokar Data Mining Techniques for Real Time Intrusion Detection Systems in International Journal of
Scientific & Engineering Research Volume 3, Issue 4, April-2012
Asieh Mokarian, Ahmad Faraahi, Arash Ghorbannia Delavar False Positive Reduction Techniques in Intrusion Detection System- A
Review in IJCSNS International Journal of Computer Science and network Security, VOL.13 No.10,October 2013
http://www.ee.ryerson.ca/~bagheri/papers/cisda.pdf
http://people.scs.carleton.ca/~soma/id-2007w/readings/mahoney-darpa.pdf
Prof. Wenke Lee, Prof. Philip K. Department of computer science, Florida Institute of Technology, Melbourne FL32901 , Real Time
Data Mining-based Intrusion Detection
__________________________________________________________________________________________________
Page -13

Exploring On Deployment of Real Time Data Mining Based Intrusion Detection Systems

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Exploring On Deployment of Real Time Data Mining Based Intrusion Detection Systems

Caricato da

Copyright:

Formati disponibili

ISSN: 2395-0560

International Research Journal of Innovative Engineering