Yokoyama, Introduction To Probability Theory (Probability and Statistics: The Logic of Chance)

1.
Introduction to Probability
Most people are familiar with the idea of a probability. For instance, if we know that a coin
is unbiased, the probability that any coin toss results in heads would be 12 . More generally, a
probability represents the degree of confidence we can place upon the occurrence of a future event.
Many fields of science and economics involve the study of future events, and it is often the goal in
these fields to find patterns within observed data. In order to conclude that a pattern exists, however,
we must first know what it means for a process to take place randomly. Elements of probability
theory are therefore naturally used to develop models to predict future events. These types of
problems are not limited to the physical sciences, but also apply very generally to many fields of
economics, competitive games, and a wide range of processes that occur according to chance.
1.1 Probabilities and events

A probability is the frequency at which some event is likely to occur according to chance. Such an
event might be rolling a six on a die, a bus arriving at a bus stop, or a coin toss resulting in heads.
Events are often denoted using uppercase letters, A, B,C, . . ., etc. For instance, we might designate
drawing an ace from a deck of cards as event A or a bus arriving at a particular bus stop as event B.
We might also use subscripts (A1 , A2 , A3 , . . .) to denote different possible events. The probability of
an event A is then denoted Pr(A). For example, if A is the event that a particular coin toss results in
heads, we would have Pr(A) = 12 , assuming the coin is unbiased. Probabilities are always given as
a frequency between 0 and 1.
Rule 1.1 — Probabilities. The probability of any event A occurring always gives
0  Pr(A)  1 (1.1)
When event A is impossible, we have Pr(A) = 0, while an event that inevitably occurs is
assigned a probability of Pr(A) = 1. (As a short side-note, although it is the case that an impossible
event is assigned a probability of zero, it is not the case that all events assigned a probability of
zero are impossible. We learn why this is in Chapter 3.)
4 Chapter 1. Introduction to Probability
Equally-likely outcomes
In probability theory, we often consider cases in which we can divide possible events into equally-
likely outcomes. In the most straight-forward cases, we have some number of outcomes that
are equally-likely, where the outcomes cannot co-occur together. The numbers in a single roll
of a die are a good example: you cannot roll both a two and a three on any particular roll of
the die. When two or more events cannot co-occur together, these events are called mutually
exclusive.
Definition 1.2 — Mutually exclusive events. Events which cannot co-occur together are
called mutually exclusive.
Consider now an experiment in which there are some number of mutually exclusive outcomes.
For instance, if we roll a die, there are six mutually exclusive outcomes. In specific cases, we
can assume that all of the possible outcomes are equally-likely. If there are n mutually exclusive
outcomes that are equally likely, then each possible event would occur with probability 1n .
Rule 1.3 — Probabilities for n mutually exclusive outcomes. For n mutually exclusive out-
comes, each equally likely, the probability p of any particular outcome occurring is
1
p= (1.2)
n
⌅ Example 1.1 — Rolling a die. As a simple example, suppose we roll a six-sided die. Let us
designate the event that a single die roll results in one as A1 , the event the roll results in two as A2 ,
etc. If the die was fair, each of the six possibilities are then equally likely. The probability that any
particular event occurs is Pr(Ai ) = 16 for all i = 1, 2, 3, 4, 5, 6. ⌅
⌅ Example 1.2 — Tossing a coin. In another simple example we toss a coin, which lands either
heads or tails. We can let H represent the event that the coin toss results in heads, and we let T
represent the event of landing on tails. Assuming that the coin is fair, the probability of landing
heads is Pr(H) = 12 . Similarly, the probability of landing tails is Pr(T ) = 12 . ⌅
Suppose again that we have several possible events, each equally-likely. In some cases, we
consider more than one of these events to represent a “success”. In this case, a success can simply
comprise any type of similar event, such as drawing an ace from a deck of cards, rolling the same
value with a pair of dice, etc. Assuming all possible events are equally-likely, the total probability
of success is equal to the fraction of these events that comprise a success.
Rule 1.4 — Probability of success for equally-likely outcomes. Suppose that we have n
possible mutually-exclusive outcomes, each equally-likely. If m of these possible events comprise
a success, the probability of success is
m
p= (1.3)
n
Again, note that the probability p of some type of outcome is between zero and 1. Assuming
that some, but not all, of the outcomes comprise a success, then this probability lies strictly between
0 and 1.
⌅ Example 1.3 — Drawing a ball from an urn. Suppose there are 100 balls in an urn, where 20
are colored white and 80 are colored red. If we were to choose a ball at random from the urn, we
20
would have the probabilities Pr(W hite) = 100 = 15 and Pr(Red) = 100
80
= 45 for choosing a white
ball or a red ball, respectively. ⌅
1.1 Probabilities and events 5
Outcome spaces
For any trial or experiment, we should always be aware of the set of possible outcomes. In some
cases, such as the roll of a die or the toss of a coin, the set of possible outcomes is relatively
straight-forward. For any particular experiment, the set of all possible outcomes is called the
outcome space.
Definition 1.5 — Outcome spaces. The set of possible outcomes of a given experiment is
called the outcome space of the experiment.
The outcome space is usually denoted as W (read “Omega”). For example, if we were to roll
a die, the outcome space would consist of six possible outcomes W = {1, 2, 3, 4, 5, 6}. Similarly,
if we were to draw balls from an urn containing some number of red balls and white balls, the
outcome space would be W = {Red,W hite}. In many cases, such as the roll of a die or a toss of a
coin, the outcome space is finite. In other cases, the outcome space can be infinite. For instance, in
some cases the outcome space can comprise every positive integer (W = {0, 1, 2, 3, . . .}) or even
every real number (W = ¬, the set of all real numbers).
Suppose now we were to perform an experiment, in which an event A may or may not occur.
The event that A does not occur is called the complement1 of A.
Definition 1.6 — The complement of an event. For any event A, the event that A does not
occur during the experiment is called the complement of A, and is denoted Ac .
For instance, if we were to designate drawing an ace from a deck of cards as event A, drawing a
card of any other rank would be designated as event Ac . Likewise, if event B was choose a white
ball from an urn, we would designate the event of drawing a ball of any other color as Bc . When we
know the probability of an event A, it is straight-forward to determine the probability of Ac .
Rule 1.7 — Probability of the complement of an event. For any event A, we have
Pr(Ac ) = 1 Pr(A) (1.4)
One way to illustrate events within the outcome space is a Venn diagram. Figure 1.1 shows an
example of a Venn diagram. The total region of the box in a Venn diagram comprises the outcome
space of the experiment W, while delineated regions represent possible events. The Venn diagram
illustrates the difference between two mutually exclusive events and two non-mutually exclusive
events.
⌅ Example 1.4 — Roulette wheel. There are 38 pockets on a roulette wheel, each with a dis-
tinct number (1 through 36, and 0 and 00). In this case, the outcome space would be W =
{1, 2, 3, . . . , 36, 0, 00}. Assuming an equal probability of landing in each pocket, there is then a
1
probability of p = 38 that a ball will land in a specific pocket on a particular roll. Pockets 0 and 00
are colored green, while 18 pockets are colored red and 18 are colored black. We could therefore
denote the events that a ball lands in a green, red, or black pocket by G, R, and B, respectively.
If we were interested in the color of the pocket, rather than the number of the pocket, the out-
come space would then be W = {G, R, B}. The probability that the ball lands in a green pocket is
2 1
Pr(G) = 38 = 19 . The probability that it lands in a red pocket is the same as the probability that it
lands in a black pocket, so that Pr(R) = Pr(B) = 18 9
38 = 19 . ⌅
1 In some other texts, the complement of event A is denoted Ā instead of Ac .

' $
' $
' $ Ac
' $ ' $
B B
& %
A and B
A A A
& %
& % & %W
& %
W W
Figure 1.1: Venn diagrams. Three Venn diagrams are shown, where the total region of each box
comprises the outcome space W of an experiment. Events are represented by regions within the
box. Events A and B are mutually exclusive on the left, meaning that there is no possible overlap
between the events. In the center panel, however, A and B are not mutually exclusive, and share
overlap. The right panel depicts a single event A and its complement Ac , which comprises all of the
outcome space not including event A.
1.2 Addition rule in probability

Suppose now that we are interested in whether an event A occurs or event B occurs. The event that
either A or B occurs is called the union of events A and B (denoted A [ B). For instance, given a
single roll of a six-sided die, we could let A1 , A2 , A3 , A4 , A5 , A6 represent the six possible outcomes
of the roll, so that Ai represents the outcome of rolling a value of i with the die. The probability
that the roll results in either a five or a six is then written Pr(A5 [ A6 ). For now, we will consider
only the case in which the events A and B are mutually exclusive, so that at most one of the events
can occur. The probability that one event or the other event occurs is simply the sum of the two
probabilities.
Rule 1.8 — Addition rule for mutually exclusive events. If A and B are mutually exclusive
events, the probability of A or B occurring is
Pr(A [ B) = Pr(A) + Pr(B) (1.5)
This rule does not hold if A and B are not mutually exclusive.
We note that this rule does not hold if A and B can occur together. If A and B are not mutually
exclusive, then the addition rule takes on a slightly different form. However, as long as events A and
B cannot co-occur, the formula holds, even in cases where the probabilities of the two events differ.
We note that the addition rule can be extended to any number of events, as long as all the events
are mutually exclusive. For instance, for three mutually exclusive events A1 , A2 , and A3 , we have
Pr(A1 [ A2 [ A3 ) = Pr(A1 ) + Pr(A2 [ A3 )

= Pr(A1 ) + Pr(A2 ) + Pr(A3 ) (1.6)
More generally, this rule can be extended to any number of events, as long as all events are
mutually exclusive. That is, if we have k events A1 , A2 , . . . , Ak in which no two events can occur
at the same time, then the probability that one of these events will occur is simply the sum of the
probabilities of these events.
Rule 1.9 — Addition rule for k mutually exclusive events. If A1 , A2 , . . . , Ak are all mutually
exclusive, the probability that one of these events will occur is
Pr(A1 [ A2 [ . . . [ Ak ) = Pr(A1 ) + Pr(A2 ) + . . . + Pr(Ak ) (1.7)

1.2 Addition rule in probability 7
B
z'}| {$
B2
B3
B1
& %
B4 W
Figure 1.2: A partition of events. Events B1 , B2 , B3 , and B4 form a partition of a larger event B.
Note that B1 , B2 , B3 , B4 are all mutually exclusive and comprise the entire event B.
⌅ Example 1.5 — Balls in a urn. Suppose there are 100 balls in an urn, where 40 are colored
red, 40 are colored blue, and 20 are colored white. The probability of choosing a red ball is
40
Pr(Red) = 100 = 25 , and the probability of choosing a white ball is Pr(W hite) = 100
20
= 15 . Since we
cannot choose a ball that is both red and white, the probability of choosing either a red or a white
ball is given by Pr(Red [W hite) = 25 + 15 = 35 . ⌅
Partitions
In many cases, we might have some set of events that together comprise a larger event. For instance,
the event that we draw a heart from a standard poker deck of 52-cards actually comprises thirteen
separate possible events, one event for each of the thirteen possible ranks. This is an example of a
partition. A partition of events is a collection of events comprising a larger event, where all of the
events are mutually exclusive. Such a partition is illustrated in Figure 1.2, where events B1 , B2 , B3 ,
and B4 form a partition of a larger event B. As illustrated in the Venn diagram, these events are all
mutually exclusive. Event B comprises the union of all four events, so that B = B1 [ B2 [ B3 [ B4 .
More generally, whenever an event B is partitioned into k distinct events B1 , B2 , . . . , Bk , we have
Pr(B) = Pr(B1 ) + Pr(B2 ) + . . . + Pr(Bk ) (1.8)
which results from the fact that all events in a partition are mutually exclusive.
⌅ Example 1.6 — Drawing a card from a deck. Suppose we were to draw a card from a standard
deck of 52-cards. Thirteen cards in the deck are hearts, so that Pr(~) = Pr(~2) + Pr(~3) + . . . +
1
Pr(~10) + Pr(~J) + Pr(~Q) + Pr(~K) + Pr(~A) = 52 ⇥ 13 = 14 . This should make intuitive sense,
since the four suits occur in equal numbers throughout the deck. ⌅
⌅ Example 1.7 — Roulette wheel. A standard roulette wheel has 38 pockets, which are colored
green, red, and black. Only 2 pockets (0 and 00) are colored green. The probability of landing on a
1 1 1
green pocket is therefore Pr(G) = 38 + 38 = 19 . ⌅
Perhaps the most essential partition of events is the partition of the entire outcome space.
Whether or not all possible events take on equal or distinct probabilities, the sum of the probabilities
is always equal to 1. If the outcome space W is given by W = {A1 , A2 , . . . , Ak }, then A1 , A2 , . . . , Ak
together form a partition of the outcome space W. Note that A1 , A2 , . . . , Ak are naturally mutually-
exclusive. Regardless of the experiment, the sum of the probabilities across every possible outcome
is always equal to one. This rule should make some intuitive sense, since it is inevitable that one of
the events A1 , A2 , . . . , Ak must occur.
Rule 1.10 — Sum of probabilities across the outcome space. Suppose that A1 , A2 , . . . , Ak
form a partition across the outcome space, so that W = {A1 , A2 , . . . , Ak }. We have
k
Â Pr(Ai ) = Pr(A1 ) + Pr(A2 ) + . . . + Pr(Ak ) = 1 (1.9)
i=1
Here, the symbol Â indicates a summation, so that Âki=1 xi represents the sum Âki=1 xi = x1 +
x2 + . . . + xk . The rule above holds regardless of whether events A1 , A2 , . . . , Ak take on equal or
distinct probabilities.
⌅ Example 1.8 — Roulette wheel. A roulette wheel has 38 pockets, where two pockets that
are green, 18 pockets that are red, and 18 pockets that are black. If G, R, and B represent the
events of landing in a green, red, and black pocket, respectively, we have Pr(G) + Pr(R) + Pr(B) =
2 18 18
38 + 38 + 38 = 1. ⌅
1.3 Conditional probabilities

In certain cases, the probability of occurrence for some event A can depend on whether or not
another event B occurs. For instance, suppose an urn contains a number of balls of different colors,
and we were to draw a ball at random from the urn. If we were to draw a second ball from the urn
without replacing the first ball, the color of the first ball would affect the probability that the second
ball was a particular color.
Definition 1.11 — Conditional probabilities. The probability of event A given event B is
written Pr(A|B). This is called a conditional probability.
The conditional probability Pr(A|B) is read “the probability of A given B”, or “the probability
of A conditioned upon B”.
⌅ Example 1.9 — Drawing balls from an urn. Suppose that there are five colored balls in an urn,
two white and three red, as illustrated in Figure 1.3. If we draw two balls from the urn successively
without replacing the first ball, the probability that the second ball would be white would depend
on the color of the first ball. Let Bwhite be the event that the first ball is white and Bred be the event
that the first ball is red. If the first ball was white, the probability that the second ball will also be
white would be Pr(Awhite |Bwhite ) = 25 11 = 14 , where Awhite is the event that the second ball is white.
On the other hand, if the first ball was red, we the probability the second ball will be white would
be Pr(Awhite |Bred ) = 5 2 1 = 12 .
⌅
Intersection of events
The event in which two events A and B both occur is called the intersection of events A and B.
The intersection of two events A and B is written A \ B. To simplify notation, we usually write
the probability that both A and B occur as Pr(A, B), rather than Pr(A \ B). This is called the joint
probability of events A and B.
Rule 1.12 — Joint probabilities. The joint probability that events A and B both occur is written
Pr(A, B). The conditional probability Pr(A|B) is related to the joint probability, so that
Pr(A, B)
Pr(A|B) = (1.10)
Pr(B)
1.4 Independent events 9
m ~
qqq qqq qqq
qq qqq qq qqq qq qqq
qqq qqq qqq qqq qqq qqq
qqqq qq qqqq qq qqqq qq
qqq qqq qqq qqq qqq qqq
qqq qqq qqq
qqq m qq qqq qq qqq m q
qqq
qqq ~ ~qqqq qqq ~ ~qqqq qqq ~ q
q
qqq ~ m qqqq qqq ~ m qqqq qqq ~ m qqqq
qqq qqq qqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
Figure 1.3: Drawing balls from an urn. An urn initially contains five balls, two white and three
colored balls. Suppose we were to draw a ball at random from the urn and keep the ball in hand. If
we were to draw a second ball from the urn without replacing the first ball, then the probability that
the second ball is white would depend on the color of the first ball drawn.
The probability that A occurs when we know that B has occurred is the random frequency at
which both occur divided by the probability of B.
⌅ Example 1.10 — Rolling two dice. Suppose that a friend were to roll a pair of dice, and report
the sum of the two dice. If the sum of the two dice was five, then what is the probability that a one
was rolled with at least one of the dice? Let Bk represent the event that the sum of the dice is k.
There are then a total of 6 ⇥ 6 = 36 possible equally-likely outcomes. If A1 represents the event
that the outcome for at least one of the dice was a one, then four out of the 36 possibilities will
4
produce a sum of five. Thus, we have Pr(B5 ) = 36 = 19 . In only two of the possibilities will a one
2 1
be rolled, so that Pr(A1 , B5 ) = 36 = 18 . The conditional probability that a one was rolled with one
1 1 1
of the dice given that the sum was five is then Pr(A1 |B5 ) = 18 9 = 2. ⌅
1.4 Independent events

We have discussed cases in which an event B affects the probability of another event A. In many
cases, however, the occurrence of one event may not affect the probability of another event. For
example, if we were to roll a die repeatedly, none of the rolls would affect the outcome of any of
the other rolls. Two events A and B are called independent whenever the probability of A does not
depend upon whether or not event B occurs.
Definition 1.13 — Independence of events. Two events A and B are said to be independent
whenever Pr(A) = Pr(A|B).
⌅ Example 1.11 — Drawing with replacement. Suppose we were to randomly draw a ball from
an urn as in Example 1.9, but that we were to replace the first ball back into the urn prior to drawing
a second time. In this case, the color of the ball from the second draw is unaffected by the color of
the ball during the first draw. If we let B represent the event that the first ball is white and let A
represent the event that the second ball is white, the probability of A would be 25 , regardless of the
color of the ball in the first draw. In this case, Pr(A) = Pr(A|B) = 25 , and therefore events A and B
are independent. ⌅
⌅ Example 1.12 — Independence of suit and rank. Let A be the event of drawing an ace from
a standard 52-card deck of cards, and let ~ denote the event of drawing a heart from the deck. These
events are independent, even though we are drawing only a single card from the deck. Namely,
4 1
the probability of drawing an ace is Pr(A) = 52 = 13 . The probability of drawing a heart from the
1
deck is Pr(~) = 4 , since there are four suits. If we know we have drawn an ace, the probability
that the ace will be a heart is still Pr(~|A) = 14 = Pr(~), since only one of the four aces is a heart.
Therefore, events A and ~ are independent events. ⌅
1.5 The product rule

If A and B are two separate events, we might want to determine the probability that both events
A and B will occur. Recall that the joint probability of A and B is the probability of both events
occurring, and is denoted Pr(A, B). Using the formula for the conditional probability, we can
rearrange the equation to find the joint probability of A and B.
Rule 1.14 — Product rule of two events. For any two events A and B (independent or not), the
joint probability of both occurring is
Pr(A, B) = Pr(A|B) ⇥ Pr(B) = Pr(B|A) ⇥ Pr(A) (1.11)
Note the symmetry in the formula above, where the probability Pr(A, B) that both A and B
occur is equal to both Pr(A|B) ⇥ Pr(B) as well as Pr(B|A) ⇥ Pr(A).
⌅ Example 1.13 — Drawing without replacement. Suppose that again we have five balls in an
urn, where two balls are white and three balls are red. If we draw two balls successively from the
urn without replacing the first ball, then the probability that both of the balls are white would be
Pr(Awhite , Bwhite ) = Pr(Awhite |Bwhite ) Pr(Bwhite ) = 25 11 ⇥ 25 = 10
1
, where Bwhite is the event that the
first ball is white and Awhite is the event that the second ball is white.
⌅
⌅ Example 1.14 — Drawing two cards. Suppose we were to draw two cards from a standard
52-card deck. Let A1 be the event that the first card drawn was an ace, and let A2 be the event that
the second card drawn was an ace. The probability Pr(A1 , A2 ) that both cards would be aces would
4 1 4 1
then be Pr(A1 , A2 ) = Pr(A2 |A1 ) ⇥ Pr(A1 ) = 52 1 ⇥ 52 = 221 . ⌅
Note that in both the above examples, the probability of the second event is dependent upon the
occurrence of the first event. The probability that a second card drawn from a deck of cards is an
ace, for instance, depends upon the rank of the first card drawn.
Joint probabilities of independent events

Recall that if two events A and B are independent, we have Pr(A|B) = Pr(A). In such a case, the
joint probability simplifies to be the product of the individual probabilities Pr(A) and Pr(B), so that
Pr(A, B) = Pr(A|B) ⇥ Pr(B)

= Pr(A) ⇥ Pr(B) (1.12)
We can easily extend this rule to find the joint probability of any number of independent events.
For instance, if A1 , A2 , and A3 are three independent events, we have
Pr(A1 , A2 , A3 ) = Pr(A1 ) ⇥ Pr(A2 , A3 )

= Pr(A1 ) ⇥ Pr(A2 ) ⇥ Pr(A3 ) (1.13)
This form of the product rule is frequently used in probability theory and statistics, since we
often want to determine the joint probability of many independent events.
Rule 1.15 — Product rule of independent events. For independent events A and B, the
1.5 The product rule 11
First die
◆⇣
◆⇣◆⇣◆⇣
1 2 3 4 5 6
✓⌘
◆⇣
✓⌘◆⇣
✓◆⇣
⌘✓⌘
1 2 3 4 5 6 7
0.2
✓⌘
◆⇣
✓⌘◆⇣
✓⌘
2 3 4 5 6 7 8
Second die
◆⇣
✓⌘◆⇣
✓⌘
3 4 5 6 7 8 9
Probability
✓⌘
◆⇣
✓⌘ ◆⇣
4 5 6 7 8 9 10
0.1
◆⇣
✓⌘ ◆⇣
✓⌘
5 6 7 8 9 10 11
✓⌘ ✓⌘
6 7 8 9 10 11 12
0
Sum of two six-sided dice 2 4 6 8 10 12
Sum of two dice
Figure 1.4: Probabilities for the sum of two six-sided dice. Numbers in the matrix on the left show
the sum of the outcomes for two six-sided dice. The probability Pr(Prime) that the sum on the
dice is a prime number (circled entries) is Pr(Prime) = 1536 . The histogram on the right shows the
probabilities of the sum on two six-sided dice.
probability that both A and B will occur is given by
Pr(A, B) = Pr(A) ⇥ Pr(B) (1.14)
More generally, for any number of independent events A1 , A2 , . . . , Ak , we have

k
Pr(A1 , A2 , . . . , Ak ) = ’ Pr(Ai ) = Pr(A1 ) ⇥ Pr(A2 ) ⇥ . . . ⇥ Pr(Ak ) (1.15)
i=1
Here, the symbol ’ denotes the product over multiple factors. Namely, we have ’ki=1 xi =
x1 ⇥ x2 ⇥ x3 ⇥ . . . ⇥ xk . Note that this form of the product rule only applies if all the events are
independent of each other.
⌅ Example 1.15 — Rolling three dice. Suppose we were to roll three fair dice. The probability
that we would roll 3 threes can be determined using the product rule. Since the probability
that we obtain a single three is Pr(3) = 16 , the probability that we would obtain 3 threes is then
Pr(3, 3, 3) = 16 ⇥ 16 ⇥ 16 = 216
1
= 0.0046. ⌅
Whenever we roll two fair dice, each of the 36 possibilities is equally-likely. We have to
differentiate between the two dice, so that one of the dice is the “first die” while the other is the
“second die”. Assuming the die is fair, each of the six possible outcomes of the die roll occurs at
probability p = 16 . Each possibility given two dice then has a probability of p = 361
, since the values
on the two dice are each independent of one another.
Suppose now we were to add the numbers on the two dice, so that the outcome space would be
W = {2, 3, 4 . . . , 12}. If we wanted to determine the probability of each possible outcome, we could
simply determine the sum for each of the 36 possible value-pairs of the two dice. The left-hand
panel of Figure 1.4 shows the 36 possible outcomes and the resulting sum of the two dice. Since
the two die rolls are independent of one another, each of the 36 entries are all equally-likely. The
probability distribution of the sum of the two values is plotted in a histogram in the right-hand
panel of Figure 1.4. We will discuss many possible probability distributions throughout the book.
We can also create a table containing the probability distribution across all of the possible values
for the sum of the two dice, as in Table 1.1. Note that the sum of the probabilities across the entire
outcome space is equal to 1.
Probabilities for a pair of six-sided dice

Sum on dice 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
Probabilities 36 36 36 36 36 36 36 36 36 36 36
1 1 1 1 5 1 5 1 1 1 1
(Reduced) 36 18 12 9 36 6 36 9 12 18 36
Table 1.1: Probabilities of the summed values on a pair of dice. The probability distribution of a
pair of fair six-sided dice is shown for each possible outcome, as illustrated on the right of Figure
1.4.
Addition rule for non-mutually exclusive events

We have previously discussed the addition rule for two mutually exclusive events. When two events
A and B are mutually exclusive, the probability that either A or B will occur is simply the sum of
the individual probabilities, so that Pr(A [ B) = Pr(A) + Pr(B).
However, in cases in which A and B are not mutually exclusive, the addition rule must be
revised. If we recall Figure 1.1 on Page 6, we see that in cases where A and B can occur together,
simply adding the probabilities Pr(A) + Pr(B) counts the overlap between A and B twice. The
probability that A or B occurs is therefore not simply the sum of the two probabilities; instead, we
must subtract the joint probability Pr(A, B) from the total sum, so that this overlap is counted only
once.
Rule 1.16 — General addition rule for two events. For any two events A and B, the probability
that A or B will occur is given by
Pr(A [ B) = Pr(A) + Pr(B) Pr(A, B) (1.16)
Note that if A and B are mutually exclusive, Pr(A, B) = 0.
Eq 1.16 is a more general equation than the addition rule for two mutually exclusive events,
given previously in Eq 1.5. That is, if A and B cannot occur together, then Pr(A, B) = 0. In such a
case, we again have Pr(A [ B) = Pr(A) + Pr(B), as in Eq 1.5.
⌅ Example 1.16 — Rolling a three. Suppose we were to roll two six-sided dice. If we want to
determine the probability that at least one of the dice lands on a three, we can let A3 be the event
that the first die lands on a three and B3 be the event that the second die lands on a three. It is
easy to check from the left panel of Figure 1.4 that the probability of rolling at least one three is
Pr(A3 [ B3 ) = Pr(A3 ) + Pr(B3 ) Pr(A3 , B3 ) = 13 36 1
= 11
36 . This is not the same as simply adding
the two probabilities Pr(A3 ) + Pr(B3 ), since there is the possibility that both dice will land on a
three. ⌅
1.6 Multiple events

Suppose now we have a number of events A1 , A2 , . . . , Ak , and we want to determine the joint
probability of all events taking place. For instance, if we wanted to determine the probability
of drawing five hearts with a standard 52-card poker deck, we could let ~i be the event that the
ith card drawn is a heart. To determine the probability of drawing five hearts, we would have to
determine the probability that events ~1 , ~2 , ~3 , ~4 , ~5 would all occur. In this case, the events
are not independent, since the suit of the first card will affect the probability of drawing a heart on
the second card, and so on.
Recall that the joint probability Pr(A, B) of two events A and B is given by Pr(A, B) = Pr(A|B) ⇥
1.7 Sequences of events 13
Pr(B). We can write the joint probability of all three events as Pr(A, B,C) = Pr(A \ B,C), so that
A \ B is now written as a single event. The joint probability of all three events is then
Pr(A, B,C) = Pr(A \ B,C) = Pr(C|A \ B) ⇥ Pr(A, B)

= Pr(C|A, B) ⇥ Pr(B|A) ⇥ Pr(A) (1.17)
where Pr(C|A, B) is the conditional probability of event C assuming that both A and B have both
occurred. This procedure can be extended to include any number of events.
Rule 1.17 — Joint probability of k events. For any events A1 , A2 , . . . , Ak , the joint probability
of these k events is
Pr(A1 , . . . , Ak ) = Pr(Ak |Ak 1 , Ak 2 , . . . , A1 ) ⇥ . . . Pr(A2 |A1 ) ⇥ Pr(A1 ) (1.18)
Note that the above equation is a more general formula than that given in Eq 1.15. If all events
are independent, then any conditional probability Pr(Ai |Ai 1 , . . . , A1 ) simplifies to Pr(Ai ), since
events A1 , A2 , . . . , Ai 1 would not affect the probability of occurrence for event Ai . The formula
given in Eq 1.18 above holds very generally, even if all events are not independent.
⌅ Example 1.17 — Drawing balls from an urn. Suppose an urn contains five balls, two white
balls and three red balls. If we were to draw three balls from the urn successively without
replacement, let A1 , A2 , and A3 represent the events of drawing a red ball on the first, second,
and third draw, respectively. The joint probability that all three balls are red would then be
Pr(A1 , A2 , A3 ) = Pr(A3 |A2 , A1 ) ⇥ Pr(A2 |A1 ) ⇥ Pr(A1 ) = 35 22 ⇥ 35 11 ⇥ 35 = 10
1
.
⌅
⌅ Example 1.18 — Drawing a queen. Suppose we were to draw cards from a standard 52-
card deck without replacing the cards until we draw a queen. Let A1 , A2 , . . . represent the
events of drawing a card other than a queen on the first draw, second draw, and so on, and
let Q1 , Q2 , . . . represent the events of drawing a queen on the first draw, second draw, and so
on. The probability that it would take exactly three cards to obtain a queen would then be
Pr(A1 , A2 , Q3 ) = Pr(Q3 |A1 , A2 ) ⇥ Pr(A2 |A1 ) ⇥ Pr(A1 ). It is easy to check that this joint probability
4
is given by Pr(A1 , A2 , Q3 ) = 50 ⇥ 47 48
51 ⇥ 52 = 0.068. ⌅
1.7 Sequences of events

Repeated trials arise frequently in both probability theory and statistics. A trial in this case can
represent a coin toss, the roll of a die, etc., whereas an experiment consists of one or more trials.
In certain processes, a number of events A1 , A2 , A3 , . . . occur in order. Such a process is called a
sequence of events.
Definition 1.18 — Sequences of events. A sequence of events comprises some number of
events A1 , A2 , A3 , . . . and so on, where each event Ai can occur only after the previous events.
In the special case where each event Ai can only occur when A1 , A2 , . . . , Ai 1 have all occurred,
the probability that the second event occurs is the same as the joint probability Pr(A1 , A2 ), since
event A2 can only occur if A1 has taken place. Similarly, A3 can only occur if both A1 and A2
have taken place; therefore the probability that the third event occurs is the joint probability
Pr(A1 , A2 , A3 ). In general, the kth event Ak can only occur if events A1 , A2 , . . . , Ak 1 have all taken
place, and thus the probability of this event is Pr(A1 , A2 , . . . , Ak ). This type of scenario is illustrated
in Figure 1.5.
◆ ⇣ ◆ ⇣ ◆ ⇣
Event Event Event
(A2 )c (A3 )c (A4 )c
✓ ⌘ ✓ ⌘ ✓ ⌘
◆◆
7 ◆◆
7 ◆◆
7
◆ ◆ ◆
◆ ⇣ ◆ ⇣ ◆ ⇣
◆ ◆ ◆
◆ ◆ ◆
Event ) Event
Pr(A2 |A1 )- Event Pr(A3 |A1 , A2- ... -
A1 A2 A3
✓ ⌘ ✓ ⌘ ✓ ⌘
Figure 1.5: Sequence of possible events A1 , A2 , A3 , . . .. Each event Ak is dependent upon the
occurrence of previous events A1 , A2 , . . . , Ak 1 . If we assume event A1 occurs, the probability that
event A2 will occur is the conditional probability Pr(A2 |A1 ). If we assume events A1 and A2 both
occur, the probability that event A3 will occur is then Pr(A3 |A1 , A2 ), and so on. Such a sequence of
events can be extended to any number of possible events.
Rule 1.19 — A sequence of k events. For a sequence of events A1 , A2 , A3 , . . . in which event

Ak can only occur if A1 , A2 , . . . , Ak 1 have also occurred, then the probability that Ak occurs is
given by the joint probability Pr(Ak ) = Pr(A1 , A2 , . . . , Ak ).
⌅ Example 1.19 — Tossing a coin repeatedly. Suppose that we were to flip a coin repeatedly
until the coin landed tails. Let A1 , A2 , A3 , . . . represent the events of landing heads on the first
toss, second toss, third toss, and so on. The event that there is a kth toss that lands heads is
Pr(A1 , A2 , . . . , Ak ). Thus, the probability that the first toss lands heads is Pr(A1 ) = 12 . The probability
that there is a second toss that lands heads is Pr(A1 , A2 ) = Pr(A2 |A1 ) ⇥ Pr(A1 ) = 12 ⇥ 12 = 14 . Note
that the probability that the second event takes place is a joint probability: it is the probability that
there exists a second toss and also that the second toss results in heads. The event that the kth coin
toss results in tails is then Ack . The probability that there are a total of k tosses, with the kth toss
1 k 1 1 1 k
landing on tails is then Pr(A1 , A2 , . . . , Ak 1 ) Pr(Ack |A1 , . . . , Ak 1 ) = 2 2 = 2 . ⌅
Memoryless processes
In some sequences of events, the probability that an event Ak occurs is dependent upon the previous
event Ak 1 , but no event prior to Ak 1 . In other words, the probability that Ak occurs depends strictly
upon whether or not Ak 1 has occurred. Such a process is said to have a memoryless property, due
to the fact that each successive event is only dependent upon the previous event occurring.
Definition 1.20 — Memoryless processes. A sequence of events A1 , A2 , . . . , Ak is said to have
a memoryless property whenever Pr(Ai |Ai 1 , Ai 2 , . . . , A1 ) = Pr(Ai |Ai 1 )
Memoryless processes can include a wide range of (potentially very complex) processes. There
are many cases in which a process is memoryless, although not all processes work in this manner.
For now, we discuss only scenarios in which a sequence of events is memoryless.
⌅ Example 1.20 — Inheritance. Every individual contains two copies of an autosomal gene
in their DNA, one inherited from the mother, and one from the father. Dominant mutations are
mutations in which a single mutant copy of the gene causes an abnormality in the individual.
Suppose a single individual has a rare dominant mutation in one copy of the gene. Let A1 , A2 , A3 , . . .
1.8 Total probability formula 15
1 2 3 4 5 6
qq qqq qq qqq qq qqq qq qqq qq qqq qq qqq
qq qqq qq qqq qq qqq qq qqq qq qqq qq qqq
qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq
qqqqq q qqqqq q qqqqq q qqqqq q qqqqq q qqqqq q
qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq qqq
qqq qq
q qqq qqqq qqq qq
q qqq qq
q qqq qq
q qqq qq
q
qqq m q
q qqq q qqq m q
q qqq ~ q
q qqq m q
q qqq m q
q
qqq ~ ~qq q qqq ~ ~qq q qqq ~ q
qq qqq ~ ~qq q qqq ~ mqq q qqq ~ ~qq q
qqq ~ m q
q
q qqq ~ m q
q
q qqq ~ m q
q
q qqq ~ m q
q
q qqq m m q
q
q qqq ~ q
q
q
qqqq qqq qqqq qqq qqqq qqq qqqq qqq qqqq qqq qqqq qqq
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
Figure 1.6: Drawing a ball from one of six urns. Suppose we were to roll a die, and draw a single
ball from the corresponding urn. The total probability of drawing a white ball can be obtained using
the total probability formula.
denote the event that the first generation, the second generation, the third generation, and so
on will acquire the same mutant copy of the gene. The probability that an individual in the
second generation will obtain the mutant copy is Pr(A1 , A2 ) = Pr(A2 |A1 ) ⇥ Pr(A1 ) = 12 ⇥ 12 = 14 .
In general, the probability that the mutant copy of the gene is present in the kth generation is
k
Pr(A1 , A2 , . . . , Ak ) = Pr(Ak |Ak 1 ) ⇥ . . . ⇥ Pr(A2 |A1 ) ⇥ Pr(A1 ) = 12 . ⌅
1.8 Total probability formula

Suppose we have a partition of the total outcome space W = {B1 , B2 , . . . , Bk } and a second type of
event A. As a simple example, suppose we have six urns as shown in Figure 1.6, each filled with
white and colored balls. We then roll a die and choose the corresponding urn. If we choose a single
ball from the chosen urn, what is the total probability of choosing a white ball?
Let A represent the event that we choose a white ball, and let B1 , B2 , . . . , B6 represent the
outcome of the die roll. In this case, we have more information about the joint probabilities
Pr(A, Bi ) than the probability of event A itself, since it is not immediately obvious how probable it
would be to draw a white ball. On the other hand, we can always find the joint probability Pr(A, Bi )
of choosing the ith urn and drawing a white ball from the urn.
If we want to determine the total probability at which we draw a white ball, the total probability
is given by Pr(A) = Â6i=1 Pr(A, Bi ). In other words, the probability of event A is the sum of the joint
probabilities of event A and all possible simultaneous events Bi . The product rule then produces the
total probability formula.
Rule 1.21 — Total probability formula. For a partition B1 , B2 , . . . , Bk of the total outcome space
and a second event A, the total probability of A is given by
k
Pr(A) = Â Pr(A|Bi ) ⇥ Pr(Bi ) (1.19)
i=1
⌅ Example 1.21 — Drawing a white ball. Suppose we were to roll a die and subsequently
choose one of the urns in Figure 1.6. If we then choose a ball at random from the urn, we can
determine the probability that the ball is white according to the total probability formula. Let
B1 , B2 , . . . , B6 represent the six possible outcomes of the die roll, and let A represent the event
of drawing a white ball. The probability that we roll a one and choose a white ball is then
Pr(A, B1 ) = Pr(A|B1 ) ⇥ Pr(B1 ) = 25 ⇥ 16 = 15 1
. Likewise, the probability that we roll a two and
choose a white ball is Pr(A, B2 ) = Pr(A|B2 ) ⇥ Pr(B2 ) = 14 ⇥ 16 = 24 1
, and so on. It is easy to check
that the total probability of choosing a white ball is Pr(A) = 16 ( 25 + 14 + 12 + 15 + 45 + 14 ) = 0.40. Note
that this is not the same as simply condensing all balls into a single urn and drawing from the one
urn.
⌅
Color Urn 1 Urn 2 Urn 3 Urn 4 Urn 5 Urn 6

White 0.167 0.104 0.208 0.083 0.333 0.104
Black 0.167 0.208 0.139 0.222 0.056 0.208
Table 1.2: Choosing the urn. If we were to roll a die and choose the corresponding urn in Figure
1.6, drawing a ball at random would result in either a white ball or a black ball. Entries in the table
show the probability that we had chosen each particular urn, assuming we knew the color of the
ball. Each probability is determined according to Bayes’ formula.
1.9 Bayes’ rule

Suppose again we have the six urns illustrated previously in Figure 1.6. This time a friend rolls a
die and selects one of the urns, but he does not tell us the outcome of the die roll. Instead, he hands
us the ball, and we see that it is white. What is the probability the ball originally came from the
first urn?
To answer this question, we need to use a formula called Bayes’ formula. Bayes’ formula is
derived from the symmetry in the product rule. Note that the joint probability of A and B can be
written as Pr(A, B) or Pr(B, A). Since these two ways of writing the joint probability are equivalent,
we have Pr(A|B) ⇥ Pr(B) = Pr(B|A) ⇥ Pr(A). This symmetry gives rise to Bayes’ formula.
Rule 1.22 — Bayes’ formula. For two events A and B, we have
Pr(B|A) Pr(A)
Pr(A|B) = (1.20)
Pr(B)
More generally, if A1 , A2 , . . . , An form a partition of the entire outcome space W, we have
Pr(B|Ah ) Pr(Ah )
Pr(Ah |B) = n (1.21)
Âi=1 Pr(B|Ai ) Pr(Ai )
Note that the second formula (Eq 1.21) is a result of the total probability formula Pr(B) =
Ânk=1 Pr(B|Ak ) Pr(Ak ). Bayes’ formula is used often in probability theory and, to a much larger
extent, statistics. In much of statistical theory, we seek to determine the mechanisms that caused a
particular outcome. This is explained in much more detail in the second part of this book. For now,
let us consider the use of Bayes’ formula in probability theory.
Bayes’ formula is used in cases where we know that a particular event has occurred, and our
goal is to find the circumstances that produced the event. More specifically, the goal is to assign a
probability to an event that may have produced our observation. In the scenario above, it is the goal
to determine which urn the ball came from (or, equivalently, the outcome of the die roll).
⌅ Example 1.22 — Finding the urn. Suppose a friend rolls a die and chooses one of the urns in
Figure 1.6 based upon the outcome of the die roll. Without telling us the outcome of the die roll, he
draws a ball at random from the urn. If he shows us that the ball is white, what is the probability that
the ball came from Urn 5? If we knew he had chosen Urn number 5, the probability that choosing a
random ball from the fifth urn would be Pr(A|B5 ) = 25 , where A is the event of choosing a white ball,
and B5 is the event that we chose a ball from the fifth urn. The probability that we chose the fifth
urn in the first place would be Pr(B5 ) = 16 . From Example 1.21, we know that the total probability
4 1
Pr(A|B5 ) Pr(B5 ) 5⇥6
of choosing a white ball is Pr(A) = 25 , Bayes’ formula gives Pr(B5 |A) = Pr(A) = 2 = 13 .
5
The remaining probabilities are given in Table 1.2.
⌅
1.9 Bayes’ rule 17
Likelihoods, priors, and posterior probabilities

The numerator in Bayes’ formula is a product of two probabilities. The conditional probability
Pr(B|A) in Eq 1.20 is called the likelihood. The likelihood provides a measure for how ‘likely’
it is to observe an outcome (such as drawing a white ball) assuming we know how this outcome
was produced (choosing the urn). The second probability, Pr(A) in Eq 1.20, is called the prior
probability. This probability represents the probability that our event occurs prior to performing
the experiment. For instance, if we choose one of the urns based upon a die roll, then each urn
has a probability of Pr(Bi ) = 16 of being chosen. Ultimately, the goal is to determine the posterior
probability (Pr(A|B) in Eq 1.20). Bayes’ formula therefore gives rise to the relation
posterior µ likelihood ⇥ prior (1.22)
The relation ‘A µ B’ is read ‘A is proportional to B’.

We generally use Bayes’ formula in cases where we can easily determine the likelihood Pr(B|A)
of the observed outcome (event B), but where it is unknown whether event A has occurred. We then
use Bayes’ theorem to determine the posterior probability Pr(A|B) that A has occurred, given the
observed event.
In many cases, the difficulty in applying Bayes’ theorem is in determining the prior probability.
If it is not possible to determine the prior probability, then it is not possible to use Bayes’ theorem.
However, in many cases, the prior probability can at least be estimated. Statisticians often argue
when it is or is not reasonable to use Bayes’ theorem. When the prior probability can be properly
deduced, however, there is no controversy over the use of Bayes’ theorem. The example below
illustrates another scenario in which we can use Bayes’ theorem.
⌅ Example 1.23 — Drawing from one of two decks. Suppose as in our example we have
two decks of cards. One is a standard 52-card deck and one 58-card deck with an extra nine,
ten, jack, queen, king, and ace of hearts. After choosing one of the decks randomly, the prior
probability of choosing the standard deck is Pr(52deck ) = 12 . The likelihood of drawing the ace
1
of hearts using the standard deck is Pr(~A|52deck ) = 52 . The total probability of obtaining
an ace of hearts can then be determined using the total probability formula. Namely, Pr(~A) =
2
Pr(~A|58deck ) Pr(58deck )+Pr(~A|52deck ) Pr(52deck ) = 58 ⇥ 12 + 52
1
⇥ 12 = 58
1
+ 104 1
. The probability
Pr(~A|52deck ) Pr(52deck )
that the standard 52-card deck was chosen is thus Pr(52deck |~A) = Pr(~A) = 0.358. ⌅
Summary
Event Nomenclature Probability Rule

Event A A occurs Pr(A) 0  Pr(A)  1
Not A Complement of A Pr(Ac ) Pr(Ac ) = 1 Pr(A)
A and B Joint probability Pr(A, B) Pr(A, B) = Pr(A|B) Pr(B)
A given B Conditional probability Pr(A|B) Pr(A|B) = Pr(A,B)
Pr(B)
A or B Union of A and B Pr(A [ B) Pr(A [ B) = Pr(A) + Pr(B) Pr(A, B)
Events
• Events are denoted using uppercase letters (A, B, C, etc.)

• The probability of event A denoted as Pr(A)
• The event that A does not occur is Ac
Probabilities
• Probabilities are always a frequency between 0 and 1, inclusive

• Pr(A) = 0 if event A is impossible
• Pr(A) = 1 if event A must occur
• Pr(A) = 1 Pr(Ac )
• Events A and B are mutually exclusive if they cannot occur together
• For n mutually-exclusive outcomes A1 , A2 , . . . , An , each equally probable, each
event Ai occurs with probability Pr(Ai ) = 1n
• For n mutually-exclusive outcomes, each equally probable, the probability of
success is p = mn whenever m of these outcomes comprise a success
Outcome space
• The outcome space is the total set of possible outcomes

• The outcome space is denoted as W (read “Omega”)
• The sum of all probabilities across the entire outcome space is always equal to
one, so that Âni=1 Pr(Ai ) = 1 where W = {A1 , A2 , . . . , An }
Addition rule
• Event A [ B is the event that event A or event B occurs (i.e., the union of A and
B)
• Pr(A [ B) = Pr(A) + Pr(B) Pr(A, B)
• Pr(A [ B) = Pr(A) + Pr(B) for mutually exclusive events A and B
Conditional probabilities
• The probability of event A given event B is written Pr(A|B) = Pr(A,B)

Pr(B)
• Pr(A|B) is sometimes said to be the probability of A conditioned upon B
Independent events
• Events are ‘independent’ when event A does not affect the probability of B
• A and B are independent whenever Pr(A|B) = Pr(A)
Product rule
• Pr(A, B) = Pr(A|B) ⇥ Pr(B) = Pr(B|A) ⇥ Pr(A)

• Pr(A, B) = Pr(A) ⇥ Pr(B) for independent events A and B
Total probability rule
• For any partition of events B1 , B2 , . . . , Bn across the outcome space, the proba-
bility of another event A is Pr(A) = Âni=1 Pr(A, Bi )
• The total probability formula is also written Pr(A) = Âni=1 Pr(A|Bi ) Pr(Bi )
Bayes’ formula
Pr(B|A) Pr(A)
• Pr(A|B) = Pr(B)
Pr(B|Ai ) Pr(Ai )
• Pr(Ai |B) where A1 , A2 , . . . , An form a partition of the out-
= Ân Pr(B|A j ) Pr(A j )
j=1
come space W
• Pr(B|A) is called the “likelihood”, Pr(A) is called the “prior probability”, and
Pr(A|B) is the “posterior probability”
• posterior µ likelihood ⇥ prior
Exercises
(For all Exercises in the book, the answers to the even questions are given in the Appendix.)
Exercise 1.1 For an event A, suppose Pr(A) = p. What is

1. The probability Pr(A [ A)?
2. The probability Pr(A [ Ac )?
3. The probability Pr(A, A)?
4. The probability Pr(A, Ac )?
5. The probability Pr(A|A)?
6. The probability Pr(A|Ac )?
⌅
Exercise 1.2 Show that if A and B are not mutually exclusive events, the probability Pr(A[B) <
Pr(A) + Pr(B). ⌅
Exercise 1.3 Show that if Pr(A|B) = Pr(A|Bc ) then events A and B are independent. ⌅
Exercise 1.4 Suppose we were to draw a card from a standard 52-card deck. What is the
probability of drawing a face card (Jack, Queen, King)? What is the probability of drawing five
cards, all of which are face cards? ⌅
Exercise 1.5 We have an urn filled with 100 balls.

• 40 balls are colored only red
• 20 balls are colored red with white dots
• 20 balls are colored yellow
• 20 balls are colored yellow with white dots
We choose one ball from the urn. What is the probability that we choose
1. A ball without white dots, given that the ball is yellow?
2. A ball with white dots, given that the ball is colored red?
3. A ball colored yellow, given that it has white dots?
⌅
Exercise 1.6 Suppose we have two standard decks of 52 cards, one colored red and one colored
blue on the back of the cards. Five cards are chosen from the red deck, where three of the cards
are hearts. Six cards are chosen from the blue deck, where two of the cards are hearts. Without
replacing the eleven cards, we then shuffle both decks together, until the cards are ordered
randomly. A friend chooses a card from this deck, which happens to be a heart (without showing
the color on the back). What is the probability that the card is from the red deck versus the blue
deck? ⌅
Exercise 1.7 Let A and B be two distinct events. Suppose that the probability of each is given
by the values Pr(A) = p and Pr(B) = q. Show that in all cases Pr(A|B)  qp . In what case is
Pr(A|B) = qp ? ⌅
Exercise 1.8 A person has a rare genetic condition of which there is a 12 probability of passing
it on to her child. What is the probability that her
1. Great grandchild has the condition?
2. Grandchild has the condition, given that the grandchild’s sister has the condition?
3. Grandchild has the condition, given that the grandchild’s cousin has the condition?
⌅
Exercise 1.9 Suppose we were to roll two dice and add the numbers on both of the dice. If we
were to repeat the experiment, again rolling two dice and adding the numbers on both of the
dice, what is the probability that the two sums will be equal? ⌅
Exercise 1.10 A ball is drawn from an urn, in which 20% of the balls are colored white. A ball
is drawn from the urn and returned to the urn. If the ball is white, another ball is drawn from
the urn and replaced. The process is repeated until a ball of a different color is drawn, at which
point the process stops. What is the probability that exactly n balls are drawn? ⌅
Exercise 1.11 A ball is drawn from an urn, where 20% of the balls are colored red and 80%
are colored blue. Three balls are drawn from the urn, each replaced before drawing the other
balls. What is the probability that one of the balls is colored red? ⌅
Exercise 1.12 Five different six-sided dice are rolled simultaneously. What is the probability
that a sequence is rolled; that is, a one, two, three, four, and five, or a two, three, four, five, and
six? ⌅
Exercise 1.13 We have an urn with a very large number of balls, each numbered between one
and 20 (inclusive), with each number occurring with equal probability. Six balls are drawn.
What is the probability that all six balls have numbers that are equal to or less than fourteen? ⌅
Exercise 1.14 What is the probability of drawing a five-card flush from a deck of 52 cards (i.e.,
where each of the five cards have the same suit)? ⌅
Exercise 1.15 We have a box filled with computer chips, 20% which are defective. All of the
computer chips are tested for defectiveness, and only 6% can be diagnosed as defective. None
of the computer chips that are not defective were diagnosed as defective. All of the computer
chips not diagnosed as defective were sold to retailers. If we buy a single computer chip from
the retailers, what is the probability that computer chip is defective? ⌅
Exercise 1.16 A particular dice game is played so that three dollars is won if the die lands on a
two, while one dollar is won if the die lands on a six. What should the price be per roll for fair
bet? ⌅
Exercise 1.17 A dart board consists of a circle 14 inches in diameter. If the bulls eye is one
inch in diameter, what is the probability of hitting the bulls eye on a random throw (assuming
the board itself is hit)? Hint: the area of a circle is pr2 , where d = 2r is the diameter of the
circle and r is the radius. ⌅
Exercise 1.18 An urn of balls contains 30% black balls, 30% red balls, 20% white balls, and
20% yellow balls. If a black ball is drawn, it is replaced and another ball in drawn, otherwise,
the ball drawn is not replaced. Balls are drawn until a ball that is not black is drawn, at which
point the process stops. What is the probability of eventually drawing a red ball? ⌅
Exercise 1.19 A standard 52 card deck has 13 possible ranks and four different suits. A
pinochle deck has 48 cards, with 6 ranks (9, 10, Jack, Queen, King, Ace) and four suits, where
each card occurs twice in the deck. If we pick one card from a standard deck and one card from
the pinochle deck, what is the probability that those two cards will be the same? ⌅
Exercise 1.20 Two players, player 1 and player 2, take turns rolling a six-sided die, starting
with player 1. The first player to roll a six wins the game. What is the probability of winning
for each player? (Hint: The series Â• k • k 1
k=0 p is given by Âk=0 p = 1 p whenever |p| < 1. The
5 2k 2 1
probability that player 1 wins on the kth turn is Pr(k) = 6 6 . Why is this?) ⌅
Exercise 1.21 Suppose we play a card game in which we are dealt five cards at random, choose
four or less cards to discard, and replace the cards with the same number of cards from the
remaining deck. Suppose we want to determine the best strategy when first being dealt a pair
of queens and an ace: Is it better to replace three of the cards, keeping the two queens and
discarding the ace, or to replace two cards, keeping the two queens and the ace? Determine the
better strategy if we want to obtain:
1. Two pairs of different ranking cards
2. Three-of-a-kind (where the remaining cards are unmatching)
⌅
Historical note
Interest in probability theory began among French mathematicians such as Blaise Pascal (1623-
1662), Girard Desargues (1591-1661), and Pierre de Fermat (1601-1665), primarily based upon a
interest in the odds of winning games of chance. The first writing on the subject was a short publi-
cation by Christiaan Huygens (1629-1695) called De ratiociniis in ludo aleae, or “On Reasoning
in Games of Chance”, published in 1657. This was one of the first official publications to outline
some of the basic foundations of probability theory. The publication introduced the concept of
fair-stakes odds in dice games and the chances of winning games associated with possible events
occurring at equal probabilities, although the scenarios discussed involved odds of winning that
were themselves unequal.

Probability theory first aroused interest among mathematicians based upon the following
question: Suppose that in a two-person game, the first person to win a total of three games wins the
stakes. One of the players has won two of the three games, and the second player has won only
one game. If, at such a point, the players decide to stop playing, how do they divide the stakes
for fair-bet odds, so that each player receives a fraction of the stakes proportional to the odds of
eventually winning the game? There are several variations on the topic that Huygens addresses in
this first publication on probability theory, introducing the concept of expectation, which is a topic
discussed in the next chapter.

Yokoyama, Introduction To Probability Theory (Probability and Statistics: The Logic of Chance)

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Yokoyama, Introduction To Probability Theory (Probability and Statistics: The Logic of Chance)

Caricato da

Copyright:

Formati disponibili

1.

1.1 Probabilities and events

Pr(Ac ) = 1 Pr(A) (1.4)

1 In some other texts, the complement of event A is denoted Ā instead of Ac .

1.2 Addition rule in probability

Pr(A [ B) = Pr(A) + Pr(B) (1.5)

Pr(A1 [ A2 [ A3 ) = Pr(A1 ) + Pr(A2 [ A3 )

Pr(A1 [ A2 [ . . . [ Ak ) = Pr(A1 ) + Pr(A2 ) + . . . + Pr(Ak ) (1.7)

Pr(B) = Pr(B1 ) + Pr(B2 ) + . . . + Pr(Bk ) (1.8)

1.3 Conditional probabilities

1.4 Independent events

1.5 The product rule

Pr(A, B) = Pr(A|B) ⇥ Pr(B) = Pr(B|A) ⇥ Pr(A) (1.11)

Joint probabilities of independent events

Pr(A, B) = Pr(A|B) ⇥ Pr(B)

Pr(A1 , A2 , A3 ) = Pr(A1 ) ⇥ Pr(A2 , A3 )

Sum of two dice

probability that both A and B will occur is given by

Pr(A, B) = Pr(A) ⇥ Pr(B) (1.14)

More generally, for any number of independent events A1 , A2 , . . . , Ak , we have

Probabilities for a pair of six-sided dice

Addition rule for non-mutually exclusive events

Pr(A [ B) = Pr(A) + Pr(B) Pr(A, B) (1.16)

Note that if A and B are mutually exclusive, Pr(A, B) = 0.

1.6 Multiple events

Pr(A, B,C) = Pr(A \ B,C) = Pr(C|A \ B) ⇥ Pr(A, B)

Pr(A1 , . . . , Ak ) = Pr(Ak |Ak 1 , Ak 2 , . . . , A1 ) ⇥ . . . Pr(A2 |A1 ) ⇥ Pr(A1 ) (1.18)

1.7 Sequences of events

Rule 1.19 — A sequence of k events. For a sequence of events A1 , A2 , A3 , . . . in which event

1.8 Total probability formula

Color Urn 1 Urn 2 Urn 3 Urn 4 Urn 5 Urn 6

1.9 Bayes’ rule

Rule 1.22 — Bayes’ formula. For two events A and B, we have

More generally, if A1 , A2 , . . . , An form a partition of the entire outcome space W, we have

Likelihoods, priors, and posterior probabilities

posterior µ likelihood ⇥ prior (1.22)

The relation ‘A µ B’ is read ‘A is proportional to B’.

Event Nomenclature Probability Rule

• Events are denoted using uppercase letters (A, B, C, etc.)

• Probabilities are always a frequency between 0 and 1, inclusive

• The outcome space is the total set of possible outcomes

• The probability of event A given event B is written Pr(A|B) = Pr(A,B)

• Pr(A, B) = Pr(A|B) ⇥ Pr(B) = Pr(B|A) ⇥ Pr(A)

Total probability rule

Exercise 1.1 For an event A, suppose Pr(A) = p. What is

Exercise 1.5 We have an urn filled with 100 balls.

were themselves unequal.

Potrebbero piacerti anche