Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
414, 1/4/05
Module 1
The Axioms of Probability
SET THEORY
We begin our discussion of set theory with a number of definitions and specifications of
nomenclature. A set is a collection of objects called elements. The subset, B, of a set A is such
that all of the elements of B are contained in A. The set notation is as follows:
A =
{ 1 , 2 , n } ,
where the Greek letters zeta, , are the elements of the set A. Alternatively one can write
For example, denote the faces of a die by fi. These faces are elements of the set
S = { f1 , f 2 , , f 6 } . Here n = 6 , thus S has 26 = 64 subsets,
{} , { f1} ,
, { f 6 } , { f1 f 2 } ,
, { f1 f 2 f3 } ,
,S .
D =
{ ht } ;
These subsets are represented by their properties. The above examples dealt with discrete events;
an example of a continuum of events might be the set of all points on the square of dimension
( 0,T ) , where S = {set of all points in square} (see sketch).
y
T
0 x T
0 y T
A possible subset of S might be the points such that x y , the shaded portion in this sketch:
x< y
x= y
Set Operations
We can represent sets graphically in terms of what are known as Venn diagrams. For
example the statement
If
C B and
A = B IFF
B A then C A
A B and
B A
You should be able to demonstrate these properties using Venn diagrams. We also define a
number of operations:
Union:
A B
This is the set whose elements are in set A or the set B, or in both sets A and B. This union
operation is associative and commutative:
Associative:
Commutative:
( A + B) + C
= A + ( B + C) ;
A +B = B+A.
definition:
Intersection:
A B
This is the set with elements that are common to both A and B. This operation is commutative,
associative, and distributive:
Commutative:
Associative:
Distributive:
AB = BA;
( A B) C = A ( B C) ;
A ( B C) = A B A C .
AB =
{} ,
or
A B =
{} .
A partition of a set is a collection of mutually exclusive and exhaustive subsets of the original
set. For example, a partition of the set, S, would be denoted
[ A1 , A2 , An ] ,
where
A1 + A2 + + An
= S;
Ai Aj
{}
for i j .
The complement, A , of a set is the set consisting of all elements of S that are not in A. It follows
that
A+ A = S ;
AA =
{}
;S
{} .
A+ B =
AB ;
AB =
A+B.
You can see a graphical depiction of this relationship in De Morgans Law on the
Mathematical Reference Tools page. It follows directly from this law that if unions and
intersections are interchanged and all sets are replaced by their complements, then the identity is
unchanged.
original
A
A
or +
or
S
replacement
A
A
or
or +
{}
{}
PROBABILITY SPACE
certain event ;
{ f1 , f 2 , f3 , f 4 , f5 , f 6 } ,
P(S ) = 1
II
III
If
AB =
{} ,
then P ( A + B ) = P ( A ) + P ( B )
Note that the last axiom deals with mutual exclusivity of the events A and B.
Two important properties that follow from these axioms are that
P {} = 0 ,
and
for any A, P ( A ) = 1 P ( A ) .
Proof of the second property relies on the fact that P ( A + A ) = 1 , i.e., the certain event (axiom
#2) and P ( A + A ) = P ( A ) + P ( A ) because of the mutual exclusivity of the two events (axiom
#3). Note that P ( A ) = 1 does not imply that A is the certain event and that P ( A ) = 0 does not
imply that A is the empty set, {} (see the Equality of events discussion that follows).
Now in general, the probability of the union of two events is given by
P ( A + B ) = P ( A ) + P ( B ) P ( AB ) .
As an exercise, draw the Venn diagrams for the following proof of this relationship.
The proof is given by writing A+B and B as mutually exclusive events:
A+ B =
B =
A + AB ;
AB + AB .
P ( A) P ( B ) .
Equality of events
The events A and B are equal if A and B consist of the same elements. The events A
and B are equal with probability 1 if P ( AB + AB ) = 0 . What are the implications of this? Well,
if P ( A ) = P ( B ) then A and B are equal in probability. This does not mean that A and B are
equal. In fact, they may be mutually exclusive. Moreover, " P ( A ) = P ( B ) " tells us nothing about
P ( AB ) . But if P ( AB + AB ) = 0 , then we conclude that
P ( A ) = P ( B ) = P ( AB ) = 1.
Class, F , of events
For one reason or another, we may not be interested in all possible subsets of a set. For
example, in the casting of a die, we may be interested only in the showing of an even number of
spots. In this case, the class, F , of subsets that we would consider consists of the events
{} , {even} , {odd } , S . We thus would assign probabilities only to these four events. In other
cases, it may not be possible to assign probabilities to all possible outcomes, e.g., the probability
of choosing a particular real number (the set of real numbers is uncountably infinite). We thus,
make the following definition:
8
If
If A F then A F
A F and B F then A B F .
If
A F
It follows that
and B F
then
A BF .
A field also contains the certain event, S, as well as the impossible event, {} .
Finally, all subsets that can be written as unions or intersections of subsets of F are also in F .
Borel Fields
Suppose A1 , A2 , , An is an infinite sequence of sets in F . If all unions and intersections
of these sets also belong to F , then F is a Borel field. The class of all subsets of S is a Borel
field. Suppose C is a class of subsets of S, but is not a Borel field. If we attach other subsets of
S, we can form a field with C as a subset. In fact there exists a smallest Borel field containing all
the elements of C .
Example
S
{a, b, c, d }
{a} , {b} .
The smallest (Borel) field containing {a} , {b} consists of the sets
+ An ) = P ( A1 ) + P ( A2 ) +
+ P ( An ) .
Extension to infinitely many sets does not follow directly. Rather, we cite the axiom of infinite
additivity:
= P ( A1 ) + P ( A2 ) +
Question: Is the a Borel field? Ans: Yes, it contains complements, unions and intersections of the
elementary events. We assign probabilities as follows:
P ( ) = 0;
P ( t ) = q;
P ( h) =
P(S ) =
p
p + q = 1.
These concepts have addressed discrete events, but can be extended to the real line. For example,
suppose ( x ) is a function such that
( x ) dx = 1;
( x) 0 .
xi
( x ) dx .
Likewise,
P { x1 < x x2 } =
x2
x1
( x ) dx .
10
{ x x1} { x1 < x x2 }
{ x x2 } ,
then
P { x x1} + P { x1 < x x2 } = P { x x2 } .
CONDITIONAL PROBABILITY
The probability of a given event provided that another event has occurred is called a conditional
probability. For example, the conditional probability of the event A, given that event M has
occurred is written
P( A| M ) =
P ( AM )
; P(M ) 0 .
P(M )
P ( A)
P(M )
P ( A)
(because AM = A ).
Question: Do conditional probabilities satisfy the Axioms of Probability?
1) Because P ( AM ) 0 & P ( M ) > 0
11
P (( A + B ) | M ) =
P (( A + B ) M )
P(M )
P ( AM ) + P ( BM )
P(M )
= P( A| M ) + P(B | M )
3rd Axiom OK
We conclude that all results involving probabilities hold also for conditional probabilities.
Total Probability and Bayes Theorem (See biographical note on the Glossary page)
For U
[ A1 , A2 ,
+ P ( B | An ) P ( An ) .
(1)
Proof:
B = BS
= B ( A1 + A2 +
= BA1 + BA2 +
+ An )
+ BAn .
+ P ( BAn ) ,
(2)
Furthermore, because
P ( BAi ) = P ( Ai | B ) P ( B ) ;
P ( Ai | B ) = P ( B | Ai )
P ( Ai )
,
P ( B)
12
P ( BAi ) = P ( Ai | B ) P ( B ) ,
therefore we have:
Bayes Theorem
P ( Ai | B ) =
P ( B | Ai ) P ( Ai )
.
P ( B | A1 ) P ( A1 ) + + P ( B | An ) P ( An )
Nomenclature:
P ( Ai )
P ( Ai | B )
Example
Consider four bins containing the indicated total number of components, some good and some
defective (see sketch). The experiment consists of randomly picking a bin and from it choosing
one component. What is the probability that the chosen component is defective, P ( D ) ?
2000
(1900 g
100 d)
B1
500
(300 g
200 d)
1000
(900 g
100 d)
1000
(900 g
100 d)
B2
B3
B4
P ( B1 ) = P ( B2 ) = P ( B3 ) = P ( B4 ) =
1
.
4
13
P ( D | B1 ) =
100
2000
P ( D | B2 ) =
200
500
P ( D | B3 ) =
100
1000
= 0.10
P ( D | B4 ) =
100
1000
= 0.10 .
= 0.05
= 0.40
1
4
+ 0.04
+ P ( D | B4 ) P ( B4 )
1
4
+ 0.10
1
4
+ 0.10
1
4
P ( D ) = 0.1625 .
Now suppose that we examine the selected component and find that it is defective. What is the
probability that it came from bin #2 ,i.e., what is P ( B2 | D ) ?
P ( D ) = 0.1625
P ( D | B2 ) = 0.40
P ( B2 ) = 0.25
Furthermore, P ( D ) P ( B2 | D ) = P ( B2 ) P ( D | B2 ) ,
or rearranging this equation, we have
P ( B2 | D ) =
P ( B2 ) P ( D | B2 )
P ( D)
( 0.25)( 0.40 )
( 0.1625 )
= 0.615 .
14
The a priori probability of selecting bin #2 is 0.25, and the a posteriori probability of defective
part having come from bin #2 is 0.615.
Independence
The events A & B are said to be independent if
P ( AB ) = P ( A ) P ( B ) .
Example
Consider two trains, X & Y, arriving in a station between 8AM and 8:20AM. Train X
stops for 4 min and train B for 5 min. We assume that the arrival times are independent. The
space of arrival times is illustrated in the sketch.
20
X
20
If we define the event A = {train X arrives in interval ( t1 , t2 )} , then it seems intuitive that the
probability of this event be proportional to the length of this interval;
P ( A) =
( t2 t1 )
20
= P {t1 X t2 } .
( t4 t3 ) .
20
15
20
t4
t3
t1
t2
X
20
Note that this may appear to be a Venn diagram, but it is not. Why? Based on our assumption of
independence, we have the following:
P ( AB ) = P ( A ) P ( B ) =
( t2 t1 )( t4 t3 ) .
400
Question: Why is this probability not the probability that both trains are in the station at the same
time?
Now consider the event C = { X y} , i.e., train X arrives before train Y. This region of the
space of arrival times is illustrated below.
y>x
Y
x=y
20
y<x
X
20
200
.
400
16
Finally, we ask, What is the probability that the trains meet (the commuter knows from
experience that this probability is zero)? To attack this problem consider the sketches below.
These time lines are in fact, variations on the theme of Venn diagrams.
Train X in station
Train Y in station
x+4
y+5
To have any overlap between these two periods, we must satisfy the two following constraints:
y
y+5
x+4
x+4
y+5
x
y+5
x+4
x
The combination of these two requirements gives the event D = {4 < x y 5} . This region is
shown shaded in the sketch below,
17
Y
y=x+4
20
y=x-5
4
0
20
159.5
.
400
End Module 1
18