Sei sulla pagina 1di 80

BAYESIAN NETWORK

References

[1]Jiawei Han:Data Mining Concepts and Techniques,ISBN 153860-489-8


Morgan Kaufman Publisher.
[2] Stuart Russell,Peter Norvig Artificial Intelligence A modern
Approach ,Pearson education.
[3] Kandasamy,Thilagavati,Gunavati , Probability, Statistics and
Queueing Theory , Sultan Chand Publishers.
[4] D. Heckerman: A Tutorial on Learning with Bayesian Networ
ks, In Learning in Graphical Models, ed. M.I. Jordan, The MIT
Press, 1998.
[5] http://en.wikipedia.org/wiki/Bayesian_probability
[6] http://www.construction.ualberta.ca/civ606/myFiles/Intro%2
0to%20Belief%20Network.pdf
[7] http://www.murrayc.com/learning/AI/bbn.shtml
[8] http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html
[9] http://en.wikipedia.org/wiki/Bayesian_belief_network

CONTENTS

HISTORY
CONDITIONAL PROBABILITY
BAYES THEOREM
NAVE BAYES CLASSIFIER
BELIEF NETWORK
APPLICATION OF BAYESIAN NETWORK
PAPER ON CYBER CRIME DETECTION

HISTORY

Bayesian Probability was named after


Reverend Thomas Bayes (1702-1761).
He proved a special case of what is current
ly known as the Bayes Theorem.
The term Bayesian came into use around
the 1950s.
Pierre-Simon, Marquis de Laplace (1749-1
827) independently proved a generalized ve
rsion of Bayes Theorem.

http://en.wikipedia.org/wiki/Bayesian_probability

HISTORY (Cont.)

1950s New knowledge in Artificial Intelligen


ce
1958 Genetic Algorithms by Friedberg (Hollan
d and Goldberg ~1985)
1965 Fuzzy Logic by Zadeh at UC Berkeley
1970 Bayesian Belief Network at Stanford
University (Judea Pearl 1988)
The ideas proposed above was not fully
developed until later. BBN became popular in
the 1990s.

http://www.construction.ualberta.ca/civ606/myFiles/Intro%20to%20Belief%20Network.pdf

HISTORY (Cont.)
Current uses of Bayesian Networks:
Microsofts printer troubleshooter.
Diagnose diseases (Mycin).
Used to predict oil and stock prices
Control the space shuttle
Risk Analysis Schedule and Cost Overru
ns.

CONDITIONAL PROBABILITY

Probability : How likely is it that an event will happen?


Sample Space S

Element of S: elementary event


An event A is a subset of S

P(A)
P(S) = 1
Events A and B

P(A|B)- Probability that event A occurs given that event B has


already occurred.

Example:
There are 2 baskets. B1 has 2 red ball and 5 blue ball. B2 has 4 re
d ball and 3 blue ball. Find
probability of picking a red ball
from
basket 1?

CONDITIONAL PROBABILITY
The question above wants P(red ball |
basket 1).
The answer intuitively wants the probability o
f red ball from only the sample space of b
asket 1.
So the answer is 2/7
The equation to solve it is:
P(A|B) = P(AB)/P(B) [Product Rule]
P(A,B) = P(A)*P(B) [ If A and B are independe
nt ]
How do you solve P(basket2 | red ball) ???

BAYESIAN THEOREM
A special case of Ba
yesian Theorem:
P(AB) = P(B) x P(A|B
)
A
B
P(BA) = P(A) x P(B|A
)
Since P(AB) = P(BA
),
P(B) x P(A|B) = P(A) x
P(B|A)
P( A) P( B | A)
P( A) P( B | A)
P( A | B)

( B)
P APB | A PA PB | A
=> P(A|B) = [P(A) xPP(

BAYESIAN THEOREM
Solution to P(basket2 | red ball) ?
P(basket 2| red ball) = [P(b2) x P(r | b2)]
/ P(r)
= (1/2) x (4/7)] / (6/14)
= 0.66

BAYESIAN THEOREM
Example 2: A medical cancer diagnosis
problem
There are 2 possible outcomes of a diagnos
is: +ve, -ve. We know .8% of world popul
ation has cancer. Test gives correct +ve r
esult 98% of the time and gives correct v
e result 97% of the time.
If a patients test returns +ve, should we
diagnose the patient as having cancer?

BAYESIAN THEOREM
P(cancer) = .008
P(+ve|cancer) = .98
P(+ve|-cancer) = .03

P(-cancer) = .992
P(-ve|cancer) = .02
P(-ve|-cancer) = .97

Using Bayes Formula:


P(cancer|+ve) = P(+ve|cancer)xP(cancer) / P(+ve)
= 0.98 x 0.008 = .0078 / P(+ve)
P(-cancer|+ve) = P(+ve|-cancer)xP(-cancer) / P(+ve)
= 0.03 x 0.992 = 0.0298 / P(+ve)
So, the patient most likely does not have cancer.

BAYESIAN THEOREM

General Bayesian Theorem:

Given E1, E2,,En are mutually disjoint ev


ents and P(Ei) 0, (i = 1, 2,, n)
P(Ei/A) = [P(Ei) x P(A|Ei)] / P(Ei) x P(A|
Ei)
i = 1, 2,, n

BAYESIAN THEOREM

Example:
There are 3 boxes. B1 has 2 white, 3 bla
ck and 4 red balls. B2 has 3 white, 2 bl
ack and 2 red balls. B3 has 4 white, 1 bla
ck and 3 red balls. A box is chosen at r
andom and 2 balls are drawn. 1 is white
and other is red. What is the probability t
hat they came from the first box??

BAYESIAN THEOREM
Let E1, E2, E3 denote events of choosing
B1, B2, B3 respectively. Let A be the ev
ent that 2 balls selected are white and r
ed.
P(E1) = P(E2) = P(E3) = 1/3
P(A|E1) = [2c1 x 4c1] / 9c2 = 2/9
P(A|E2) = [3c1 x 2c1] / 7c2 = 2/7
P(A|E3) = [4c1 x 3c1] / 8c2 = 3/7

BAYESIAN THEOREM
P(E1|A) = [P(E1) x P(A|E1)] / P(Ei) x P(A
|Ei)
= 0.23727
P(E2|A) = 0.30509
P(E3|A) = 1 (0.23727 + 0.30509) = 0.4576
4

BAYESIAN CLASSIFICATION
Why use Bayesian Classification:
Probabilistic learning: Calculate explicit
probabilities for hypothesis, among the mo
st practical approaches to certain types of
learning problems
Incremental: Each training example can
incrmentally increase/decrease the probabil
ity that a hypothesis is correct. Prior knowl
edge can be combined with observed dat
a.

BAYESIAN CLASSIFICATION

Probabilistic prediction: Predict multiple


hypotheses, weighted by their probabiliti
es
Standard: Even when Bayesian methods
are computationally intractable, they ca
n provide a standard of optimal decisio
n
making against which other method
s can be measured

NAVE BAYES CLASSIFIER

A simplified assumption: attributes are


conditionally independent:

Greatly reduces the computation cost, onl


y count the class distribution.

NAVE BAYES CLASSIFIER


The probabilistic model of NBC is to find the probab
ility of a certain class given multiple dijoint (assum
ed)
events.
The nave Bayes classifier applies to learning tasks
where each instance x is described by a conjuncti
on of attribute values and where the target functio
n f(x) can take on any value from some finite set
V. A set of training examples of the target function
is provided, and a new instance is presented, de
scribed by the
tuple of attribute values <a1,a2,
,an>. The learner is asked to predict the target v
alue, or classification, for this new instance.

NAVE BAYES CLASSIFIER


Abstractly, probability model for a classifier is a
conditional model
P(C|F1,F2,,Fn)
Over a dependent class variable C with a small
nuumber of outcome or classes conditional o
ver several feature variables F1,,Fn.
Nave Bayes Formula:
P(C|F1,F2,,Fn) = argmaxc [P(C) x P(F1|C) x
P(F2|C) xx P(Fn|C)] / P(F1,F2,,Fn)
Since P(F1,F2,,Fn) is common to all probabili
ties, we donot need to evaluate the denomitat
or for comparisons.

NAVE BAYES CLASSIFIER


Tennis-Example

NAVE BAYES CLASSIFIER


Problem:
Use training data from above to classify t
he following instances:
a) <Outlook=sunny, Temperature=cool,
Humidity=high, Wind=strong>
b) <Outlook=overcast, Temperature=cool
, Humidity=high, Wind=strong>

NAVE BAYES CLASSIFIER


Answer to (a):
P(PlayTennis=yes) = 9/14 = 0.64
P(PlayTennis=n) = 5/14 = 0.36
P(Outlook=sunny|PlayTennis=yes) = 2/9 = 0.22
P(Outlook=sunny|PlayTennis=no) = 3/5 = 0.60
P(Temperature=cool|PlayTennis=yes) = 3/9 = 0
.33
P(Temperature=cool|PlayTennis=no) = 1/5 = .2
0
P(Humidity=high|PlayTennis=yes) = 3/9 = 0.33
P(Humidity=high|PlayTennis=no) = 4/5 = 0.80
P(Wind=strong|PlayTennis=yes) = 3/9 = 0.33

NAVE BAYES CLASSIFIER


P(yes)xP(sunny|yes)xP(cool|yes)xP(high|y
es)xP(strong|yes) = 0.0053
P(no)xP(sunny|no)xP(cool|no)xP(high|no)
x P(strong|no) = 0.0206
So the class for this instance is no. We ca
n normalize the probility by:
[0.0206]/[0.0206+0.0053] = 0.795

NAVE BAYES CLASSIFIER


Answer to (b):
P(PlayTennis=yes) = 9/14 = 0.64
P(PlayTennis=no) = 5/14 = 0.36
P(Outlook=overcast|PlayTennis=yes) = 4/9 = 0.44
P(Outlook=overcast|PlayTennis=no) = 0/5 = 0
P(Temperature=cool|PlayTennis=yes) = 3/9 = 0.33
P(Temperature=cool|PlayTennis=no) = 1/5 = .20
P(Humidity=high|PlayTennis=yes) = 3/9 = 0.33
P(Humidity=high|PlayTennis=no) = 4/5 = 0.80
P(Wind=strong|PlayTennis=yes) = 3/9 = 0.33
P(Wind=strong|PlayTennis=no) = 3/5 = 0.60

NAVE BAYES CLASSIFIER


Estimating Probabilities:
In the previous example, P(overcast|no) =
0 which causes the formulaP(no)xP(overcast|no)xP(cool|no)xP(high|
no)xP(strong|nno) = 0.0
This causes problems in comparing becau
se the other probabilities are not consid
ered. We can avoid this difficulty by usin
g mestimate.

NAVE BAYES CLASSIFIER


M-Estimate Formula:
[c + k] / [n + m] where c/n is the origina
l
probability used before, k=1 and
m=
equivalent sample size.

Using this method our new values of


probility is given below-

NAVE BAYES CLASSIFIER


New answer to (b):
P(PlayTennis=yes) = 10/16 = 0.63
P(PlayTennis=no) = 6/16 = 0.37
P(Outlook=overcast|PlayTennis=yes) = 5/12 = 0.42
P(Outlook=overcast|PlayTennis=no) = 1/8 = .13
P(Temperature=cool|PlayTennis=yes) = 4/12 = 0.33
P(Temperature=cool|PlayTennis=no) = 2/8 = .25
P(Humidity=high|PlayTennis=yes) = 4/11 = 0.36
P(Humidity=high|PlayTennis=no) = 5/7 = 0.71
P(Wind=strong|PlayTennis=yes) = 4/11 = 0.36
P(Wind=strong|PlayTennis=no) = 4/7 = 0.57

NAVE BAYES CLASSIFIER


P(yes)xP(overcast|yes)xP(cool|yes)xP(hi
gh|yes)xP(strong|yes) = 0.011
P(no)xP(overcast|no)xP(cool|no)xP(high|
no)xP(strong|nno) = 0.00486

So the class of this instance is yes

NAVE BAYES CLASSIFIER


The conditional probability values of all t
he
attributes with respect to the class are
pre-computed and stored on disk.
This prevents the classifier from comput
ing the conditional probabilities every ti
me it runs.
This stored data can be reused to reduc
e the

BAYESIAN BELIEF NETWORK

In Nave Bayes Classifier we make the assumpti


on of class conditional independence, that is gi
ven the class label of a sample, the value of the
attributes are conditionally independent of one
another.
However, there can be dependences between
value of attributes. To avoid this we use Bayesia
n Belief Network which provide joint conditional
probability distribution.
A Bayesian network is a form of probabilistic
graphical model. Specifically, a Bayesian netwo
rk is a directed acyclic graph of nodes represe
nting
variables and arcs representing depen
dence
relations among the variables.

BAYESIAN BELIEF NETWORK

A Bayesian network is a representation of the


joint distribution over all the variables represen
ted by nodes in the graph. Let the variables be
X(1), ..., X(n).
Let parents(A) be the parents of the node A. T
hen the joint distribution for X(1) through X(n) i
s
represented as the product of the proba
bility
distributions P(Xi|Parents(Xi)) for i =
1 to n. If X has no parents, its probability dist
ribution is said to be unconditional, otherwise
it is conditional.

BAYESIAN BELIEF NETWORK

BAYESIAN BELIEF NETWORK


By the chaining rule of probability, the j
oint probability of all the nodes in the gr
aph
above is:
P(C, S, R, W) = P(C) * P(S|C) * P(R|C) *
P(W|S,R)
W=Wet Grass, C=Cloudy, R=Rain,
S=Sprinkler
Example: P(W-RSC)
= P(W|S,-R)*P(-R|C)*P(S|C)*P(C)
= 0.9*0.2*0.1*0.5 = 0.009

BAYESIAN BELIEF NETWORK


What is the probability of wet grass on a given
day - P(W)?
P(W) = P(W|SR) * P(S) * P(R) +
P(W|S-R) * P(S) * P(-R) +
P(W|-SR) * P(-S) * P(R) +
P(W|-S-R) * P(-S) * P(-R)
Here P(S) = P(S|C) * P(C) + P(S|-C) * P(-C)
P(R) = P(R|C) * P(C) + P(R|-C) * P(-C)
P(W)= 0.5985

Advantages of Bayesian Approac


h

Bayesian networks can readily handle


incomplete data sets.
Bayesian networks allow one to learn
about causal relationships
Bayesian networks readily facilitate use
of prior knowledge.

Sources/References

Naive Bayes Spam Filtering Using Word-Position-Based Attributes- http://www.ceas.cc/paper


s-2005/144.pdf
by-: Johan Hovold, Department of Computer Science,Lund University Box 118,
221 00
Lund, Sweden.[E-mail johan.hovold.363@student.lu.se]
[Presented at CEAS 2005 Second Conference on Email and Anti-Spam
July 21 & 22, at Stanford University]

Tom Mitchell , Machine Learning , Tata Mcgraw Hill

A Bayesian Approach to Filtering Junk EMail,


Mehran Sahami Susan Dumaisy David Heckermany Eric Horvitzy Gates Building
Computer Science Department Microsoft Research, Stanford University Redmond W
Stanford CA fsdumais heckerma horvitzgmicrosoftcom
[Presented at AAAI Workshop on Learning for Text Categorization, July 1998, Madison, Wiscon
sin]

Problem???

real world Bayesian network application


Learning to classify text.
Instances are text documents
we might wish to learn the target concept electronic ne
ws articles that I find interesting, or pages on the Worl
d Wide Web that discuss data mining topics.
In both cases, if a computer could learn the target conc
ept accurately, it could automatically filter the large volu
me of
online text documents to present only the most relevan
t
documents to the user.

TECHNIQUE

learning how to classify text, based on the


naive Bayes classifier
its a probabilistic approach and is among the most effe
ctive algorithms currently known for learning to classify t
ext documents,
Instance space X consists of all possible text documents
given training examples of some unknown target
function f(x), which can take on any value from some
finite set V
we will consider the target function classifying document
s as interesting or uninteresting to a particular person, u
sing the target values like and dislike to indicate these t
wo classes.

Design issues

how to represent an arbitrary text docume


nt in terms of attribute values

decide how to estimate the probabilities r


equired by the naive Bayes classifier

Approach

Our approach to representing arbitrary text document


s is disturbingly simple: Given a text document, such
as this paragraph, we define an attribute for each wor
d position in the document and define the value of t
hat attribute to be the English word found in that pos
ition. Thus, the current paragraph would be described
by 111 attribute values, corresponding to the 111 wor
d positions. The value of the first attribute is the word
our, the value of the second attribute is the word a
pproach, and so on. Notice that long text documents
will require a larger number of attributes than short do
cuments. As we shall see, this will not cause us any t
rouble.

ASSUMPTIONS

assume we are given a set of 700


training documents that a friend has
classified as dislike and another 300
she has classified as like
We are now given a new document and
asked to classify it
let us assume the new text document is
the preceding paragraph

We know (P(like) = .3 and P (dislike) = .7 in the current


example
P(ai , = wk|vj) (here we introduce wk to indicate the kth word
in the English vocabulary)
estimating the class conditional probabilities (e.g., P(ai =
ourIdislike)) is more problematic because we must
estimate one such probability term for each combination of
text position, English word, and target value.
there are approximately 50,000 distinct words in the
English vocabulary, 2 possible target values, and 111 text
positions in the current example, so we must estimate
2*111* 50, 000 =~10 million such terms from the training
data.

we shall assume the probability of encountering a specific


word wk (e.g., chocolate) is independent of the specific
word position being considered (e.g., a23 versus a95) .
we estimate the entire set of probabilities P(a1= wk|vj),
P(a2= wk|vj)... by the single position-independent
probability P(wklvj)
net effect is that we now require only 2* 50, 000 distinct
terms of the form P(wklvj)
We adopt the rn-estimate, with uniform priors and with m
equal to the size of the word vocabulary

n total number of word positions in all training examples


whose target value is v, nk is the number of times word Wk i
s found among these n word positions, and Vocabulary is th
e total number of distinct words (and other tokens) found wi
thin the training data.

Final Algorithm

Examples is a set of text documents along with their target values. V is the
set of all possible target values. This function learns the probability terms
P( wk| vj), describing the probability that a randomly drawn word from a
document in class vj will be the English word Wk. It also learns the class prior
probabilities P(vi).
1. collect all words, punctuation, and other tokens that occur in Examples
Vocabulary set of all distinct words & tokens occurring in any text
document from Examples
2. calculate the required P(vi) and P( wk| vj) probability terms
For each target value vj in V do
docsj the subset of documents from Examples for which the target value
is vj
P(v1) IdocsjI / \Examplesl
Textj a single document created by concatenating all members of docsj
n total number of distinct word positions in Textj
for each word Wk in Vocabulary
nk number of times word wk occurs in Textj
P(wkIvj) nk+1/n+|Vocabulary|
CLASSIFY_NAIVE_BAYES_TEXT( Doc)

Return the estimated target value for the document Doc. ai denotes the word
found in the ith position within Doc.
positions all word positions in Doc that contain tokens found in
Vocabulary
Return VNB, where

During learning, the procedure


LEARN_NAIVE_BAYES_TEXT examines all training
documents to extract the vocabulary of all words and
tokens that appear in the text, then counts their
frequencies among the different target classes to
obtain the necessary probability estimates. Later,
given a new document to be classified, the
procedure CLASSIFY_NAIVE_BAYESTEXT uses these
probability estimates to calculate VNB according to
Equation Note that any words appearing in the new
document that were not observed in the training set
are simply ignored by CLASSIFY_NAIVE_BAYESTEXT

Effectiveness of the Algorithm

Problem classifying usenet news articles


target classification for an article name of the usenet newsgroup in which
the article appeared
In the experiment described by Joachims (1996), 20 electronic newsgroups
were considered
1,000 articles were collected from each newsgroup, forming a data set of 20,0
00 documents. The naive Bayes algorithm was then applied using two-thirds o
f these 20,000 documents as training examples, and performance was measur
ed over the remaining third.
100 most frequent words were removed (these include words such as the
and of), and any word occurring fewer than three times was also removed.
The resulting vocabulary contained approximately 38,500 words.
The accuracy achieved by the program was 89%.
comp.graphics

misc.forsale

soc.religion.christian

alt.atheism

comp.os.ms-winclows.misc

rec.autos

talk.politics.guns

sci.space

cornp.sys.ibm.pc.hardware

rec.sport.baseball

talk.politics.mideast

sci.crypt

comp.windows.x

rec.motorcycles

talk.politics.misc

sci.electronics

comp.sys.mac.hardware

rec.sport.hockey

talk.creligion.misc

sci .med

APPLICATIONS

A newsgroup posting service that learns to


assign documents to the appropriate
newsgroup.
NEWSWEEDER systema program for reading
netnews that allows the user to rate articles as
he or she reads them. NEWSWEEDER then
uses these rated articles (i.e its learned profile
of user interests to suggest the most highly
rated new articles each day
Naive Bayes Spam Filtering Using Word- Positi
on-Based Attributes

Thank you !

Bayesian Learning Networks


Approach to
Cybercrime Detection

Bayesian Learning Networks Approach to


Cybercrime Detection
N S ABOUZAKHAR, A GANI and G MANSON
The Centre for Mobile Communications Research
(C4MCR),
University of Sheffield, Sheffield
Regent Court, 211 Portobello Street,
Sheffield S1 4DP, UK
N.Abouzakhar@dcs.shef.ac.uk
A.Gani@dcs.shef.ac.uk
G.Manson@dcs.shef.ac.uk

M ABUITBEL and D KING


The Manchester School of Engineering,
University of Manchester
IT Building, Room IT 109,
Oxford Road,
Manchester M13 9PL, UK
mostafa.abuitbel@stud.man.ac.uk
David.king@man.ac.uk

REFERENCES
1. David J. Marchette, Computer Intrusion Detection and Network Monitoring,
A statistical Viewpoint, 2001,Springer-Verlag, New York, Inc, USA.
2. Heckerman, D. (1995), A Tutorial on Learning with Bayesian Networks, Technical
Report MSR-TR-95-06, Microsoft Corporation.
3. Michael Berthold and David J. Hand, Intelligent Data Analysis, An Introduction, 1
999, Springer, Italy.
4. http://www.ll.mit.edu/IST/ideval/data/data_index.html, accessed on 01/12/2002
5. http://kdd.ics.uci.edu/ , accessed on 01/12/2002.
6. Ian H. Witten and Eibe Frank, Data Mining, Practical Machine Learning Tools and
Techniques with Java Implementations, 2000, Morgan Kaufmann, USA.
7. http://www.bayesia.com , accessed on 20/12/2002

Motivation behind the paper..


Growing dependence of modern society
on telecommunication and information
networks.

Increase in the number of interconnected


networks to the Internet has led to an
increase in security threats and cyber crimes.

Structure of the paper

In order to detect distributed network


attacks as early as possible, an under
research and development probabilistic
approach, based on Bayesian networks
has been proposed.

Where can this model be utilized

Learning Agents which deploy Bayesian


network approach are considered to be
a promising and useful tool in determini
ng suspicious early events of Internet
threats.

Before we look at the detai


ls given in the paper lets
understand what Bayesian
Networks are and how they
are constructed.

Bayesian Networks

A simple, graphical notation for conditional independe


nce assertions and hence for compact specification of
full
joint distributions.

Syntax:
a set of nodes, one per variable

a directed, acyclic graph (link "directly influences"


)
a conditional distribution for each node given its
parents:
P (Xi | Parents (Xi))
In the simplest case, conditional distribution represented
as a conditional probability table (CPT) giving the

Some conventions.

Variables depicted as node


s
Arcs represent probabilistic
dependence between
variables.
Conditional probabilities
encode the strength of
dependencies.
Missing arcs implies
conditional independence.

Semantics
The full joint distribution is defined as the product of
the
local conditional distributions:

P (X1, ,Xn) = i = 1 P (Xi | Parents(Xi))


e.g., P(j m a b e)

= P (j | a) P (m | a) P (a | b, e) P (b) P (e)

Example of Construction of a BN

Back to the discussion of the


paper.

Description

This paper shows how probabilistically B


ayesian network detects communication
network attacks, allowing for generalizati
on of Network Intrusion Detection Syste
ms
(NIDSs).

Goal
How well does our model detect or classif
y
attacks and respond to them later on.
The system requires the estimation of two
quantities:
The probability of detection (PD)
Probability of false alarm (PFA).
It is not possible to simultaneously achi
eve a PD of 1 and PFA of 0.

Input DataSet

The 2000 DARPA Intrusion Detection Ev


aluation Program which was prepared a
nd managed by MIT Lincoln Labs has pr
ovided the necessary dataset.

Sample dataset

Construction of the network


The following figure shows the Bayesian
network that has been automatically
constructed by the learning algorithms of
BayesiaLab.
The target variable, activity_type, is directl
y
connected to the variables that heavily
contribute to its knowledge such as servic

e
and protocol_type.

Data Gathering
MIT Lincoln Labs set up an environment t
o
acquire several weeks of raw TCP dump
data for a local-area network (LAN)
simulating a typical U.S. Air Force LAN. T
he
generated raw dataset contains about few
million connection records.

Mapping the simple


Bayesian Network that we saw to
the one used in the paper

Observation 1:
As shown in the next figure, the most pro
bable activity corresponds to a smurf at
tack (52.90%), an ecr_i (ECHO_REPLY)
service (52.96%) and an icmp protocol
(53.21%).

Observation 2:

What would happen if the probability of


receiving ICMP protocol packets is incre
ased? Would the probability of having a
smurf attack increase?
Setting the protocol to its ICMP value in
creases the probability of having a smur
f attack from 52.90% to 99.37%.

Observation 3:

Lets look at the problem from the opposite di


rection. If we set the probability of portsweep
attack to 100%,then the value of some associ
ated variables would inevitably vary.
We note from Figure 4 that the probabilities o
f the TCP protocol and private service have b
een increased from 38.10% to 97.49% and fr
om 24.71% to 71.45% respectively. Also, we
can notice an increase in the REJ and RSTR fl
ags.

How do the previous examples


work??
PROPOGATION

Data
Data

Benefits of the Bayesian Model

The benefit of using Bayesian IDSs is the abili


ty to adjust our IDSs sensitivity.
This would allow us to trade off between
accuracy and sensitivity.
Furthermore, the automatic detection network
anomalies by learning allows distinguishing th
e normal activities from the abnormal ones.
Allow network security analysts to see the
amount of information being contributed by e
ach variable in the detection model to the kno
wledge of the target node

Performance evaluation

QUESTIONS OR QUERIES

Thank you !

Potrebbero piacerti anche