Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1 Introduction
Bayess theorem is stated mathematically as the following
simple form:[1]
P (A|B) =
P (B|A) P (A)
P (B)
In probability theory and statistics, Bayes theorem (alternatively Bayes law or Bayes rule) relates current
the quotient P(B|A)/P(B) represents the support B
to prior belief. It also relates current to prior eviprovides for A.
dence. It is important in the mathematical manipulation
of conditional probabilities.[1] Bayess rule can be derived
Another form of Bayess Theorem that is generally enfrom more basic axioms of probability, specically concountered when looking at two competing statements or
ditional probability.
hypotheses is:
When applied, the probabilities involved in Bayess theorem may have any of a number of probability interpretations. In one of these interpretations, the theorem is used
P (B|A) P (A)
directly as part of a particular approach to statistical in- P (A|B) = P (B|A)P (A) + P (B|A)P (A)
ference. ln particular, with the Bayesian interpretation of
probability, the theorem expresses how a subjective de- For an epistemological interpretation:
gree of belief should rationally change to account for evFor proposition A and evidence or background B,[3]
idence: this is Bayesian inference, which is fundamental
to Bayesian statistics. However, Bayess theorem has applications in a wide range of calculations involving prob P(A),the prior probability, is the initial degree of beabilities, not just in Bayesian inference.
lief in A.
Bayess theorem is named after Rev. Thomas Bayes
(/bez/; 17011761), who rst showed how to use
new evidence to update beliefs. Bayes unpublished
manuscript was signicantly edited by Richard Price
before it was posthumously read at the Royal Society.
Bayes algorithm remained unknown until it was independently rediscovered and further developed by PierreSimon Laplace, who rst published the modern formulation in his 1812 Thorie analytique des probabilits.
P (W |L) =
P (A|B) =
P (B|A) P (A)
P (B)
Total
w+x
Condition
y+z
Total
w+y
x+z
w+x+y+z
P (L|W )P (W )
P (L|W )P (W )
=
w
w+y
w
____
P(M
(A|B)
________ = ________
P (L)
P (L|W )P (W ) + P (L|M )P
) P (B) = w+y
w+x+y+z w+x+y+z
w
w+x
w
P (B|A) P (A) = ____ ________ = ________
w+x w+x+y+z w+x+y+z
B) P(B) / P(X) etc.
5
0.75 0.50
= 0.83,
0.75 0.50 + 0.15 0.50
6
i.e., the probability that the conversation was held with a
woman, given that the person had long hair, is about 83%.
More examples are provided below.
3
P(A), the prior, is the initial degree of belief in A.
P(A|B), the posterior, is the degree of belief having accounted for B.
the quotient P(B|A)/P(B) represents the
support B provides for A.
4 Forms
4.1 Events
3.2
Frequentist interpretation
P (A|B) =
P(A B)
P(B|A)
P(A)
P(A)
P(A)
P(B|A)
P(B|A)
P(A)
P(A B)
P(A B)
P(B|A)
P(B)
P(B)
P(B)
P(B A)
P(A|B)
P(A|B)
P(A|B)
P(B)
P(A B)
Knowledge of one
diagram is sucient
to deduce the other
P(B A)
P (B|A) P (A)
P (B)
P(B A)
P(A|B)
P(B A)
and
c=
1
.
P (A) P (B|A) + P (A) P (B|A)
P (B|Ai ) P (Ai )
= P (Ai |B) =
P (B|Aj ) P (Aj )
+y
+f Y (y|X=x)
DERIVATION
Area = 1, x
+f X(x)
P (B|A) P (A)
P (A|B) =
4.2
Area = 1
+x
+x
Random variables
4.2.2 Extended form
+ fX,Y(x,y)
Strip volume
= P(Y=y)
Strip volume
= P(X=x)
+y
y
dy
dx
0
x
+x
Volume
= P(X=x Y=y)
P(Y=y|X=x) =
P(X=x Y=y)
P(X=x)
P(X=x|Y=y) =
P(X=x Y=y)
P(Y=y)
fY (y) =
fY (y|X = ) fX () d.
Consider a sample space generated by two random variables X and Y. In principle, Bayess theorem applies to
the events A = {X = x} and B = {Y = y}. However, O(A1 : A2 |B) = O(A1 : A2 ) (A1 : A2 |B)
terms become 0 at points where either variable has nite
probability density. To remain useful, Bayess theorem where
may be formulated in terms of the relevant densities (see
Derivation).
P (B|A1 )
(A1 : A2 |B) =
P (B|A2 )
4.2.1 Simple form
is called the Bayes factor or likelihood ratio and the odds
between two events is simply the ratio of the probabilities
If X is continuous and Y is discrete,
of the two events. Thus
fX (x|Y = y) =
P (Y = y|X = x) fX (x)
.
P (Y = y)
O(A1 : A2 ) =
P (A1 )
P (A2 )
fY (y|X = x) P (X = x)
.
fY (y)
fX (x|Y = y) =
fY (y|X = x) fX (x)
.
fY (y)
P (A1 |B)
P (A2 |B)
So the rule says that the posterior odds are the prior odds
times the Bayes factor, or in other words, posterior is proportional to prior times likelihood.
5 Derivation
6.2
5.1
Coin ip example
For events
P (A|B) =
P (A B)
, if P (B) = 0,
P (B)
P (B|A) =
P (A B)
, if P (A) = 0,
P (A)
5.2
P (B|A) P (A)
, if P (B) = 0.
P (B)
P (Rare|Pattern) =
P (Pattern|Rare)P (Rare)
P (Pattern|Rare)P (Rare) + P (Pattern|Common)P (C
0.98 0.001
0.98 0.001 + 0.05 0.999
1.9%.
For two continuous random variables X and Y, Bayess 6.2 Coin ip example
theorem may be analogously derived from the denition
of conditional density:
Concrete example from 5 August 2011 New York Times
article by John Allen Paulos (quoted verbatim):
fX (x|Y = y) =
fX,Y (x, y)
fY (y)
fY (y|X = x) =
fX,Y (x, y)
fX (x)
= fX (x|Y = y) =
6
6.1
fY (y|X = x) fX (x)
.
fY (y)
Examples
Frequentist example
P(R P)
P(P|R)
(0.098%)
98%
P(R)
0.1%
P(C)
(99.9%)
P(P|R)
(2%)
P(P|C)
P(R P)
P (coin Biased) =
1
3
P(C P)
P (coin Fair) =
2
3
P (H|coin Fair) =
1
2
(0.002%)
(4.995%)
5%
P(P|C)
(95%)
P(C P)
(94.905%)
P (HHH|coin Fair) =
1
1 1 1
=
2 2 2
8
P (HHH|coin Biased) = 1 1 1 = 1
P (coin Biased|HHH) =
1
3
1
8
2
3
1
3
10
24
4
5
6.3
7 Applications
Drug testing
P(U +)
P(+|U)
(0.495%)
99%
P(U)
0.5%
HISTORY
P(-|U)
(1%)
P(U -)
(0.005%)
Although Bayess theorem is commonly used to determine the probability of an event occurring, it can also be
(0.995%)
P(+|U)
(99.5%)
applied to verify someones credibility as a prognostica(1%)
tor. Many pundits claim to be able to predict the outcome
P(-|U)
of an event; political elections, trials, the weather and
99%
P(U -)
even sporting events. Larry Sabato, founder of Sabatos
(98.505%)
Crystal Ball, is a perfect example. His website provides
free political analysis and election predictions. His sucTree diagram illustrating drug testing example. U, U bar, extquotedbl+ extquotedbl and extquotedbl extquotedbl are the cess at predictions has even led him to be called a pun[5]
events representing user, non-user, positive result and negative dit with an opinion for every reporters phone call.
We even have Punxsutawney Phil, the famous groundhog,
result. Percentages in parentheses are calculated.
who tells us whether or not we can expect a longer winter
or an early spring. Bayess theorem tells us the dierence
Suppose a drug test is 99% sensitive and 99% specic. between whos on a hot streak and who is what they claim
That is, the test will produce 99% true positive results to be.
for drug users and 99% true negative results for non-drug
Lets say we live in an area where everyone gambles on
users. Suppose that 0.5% of people are users of the drug.
the outcome of coin ips. Because of that, there is a big
If a randomly selected individual tests positive, what is
business for predicting coin ips. Suppose that 5% of prethe probability he or she is a user?
dictors can actually win in the long run, and 80% of those
are winners over a 2-year period. 95% of predictors are
pretenders who are just guessing, and 20% of them are
winners over a 2-year period (everyone gets lucky once
P (+|User)P (User)
P (User|+) =
in a while). This means that 82.6% of bettors who are
P (+|User)P (User) + P (+|Non-user)P (Non-user)
winners over a 2-year period are actually long-term losers
who are winning above their real average.
0.99 0.005
=
0.99 0.005 + 0.01 0.995
P(U)
33.2%
P(U +)
8 History
Despite the apparent accuracy of the test, if an individual Bayess theorem was named after the Reverend Thomas
tests positive, it is more likely that they do not use the Bayes (170161), who studied how to compute a distridrug than that they do.
bution for the probability parameter of a binomial distri(in modern terminology). His friend Richard Price
bution
This surprising result arises because the number of nonedited
and presented this work in 1763, after Bayess
users is very large compared to the number of users; thus
death,
as
An Essay towards solving a Problem in the Docthe number of false positives (0.995%) outweighs the
[6]
The French mathematician Pierretrine
of
Chances.
number of true positives (0.495%). To use concrete numSimon
Laplace
reproduced
and extended Bayess results
bers, if 1000 individuals are tested, there are expected to
in
1774,
apparently
quite
unaware
of Bayess work.[7][8]
be 995 non-users and 5 users. From the 995 non-users,
0.01 995 10 false positives are expected. From the Stephen Stigler suggested in 1983 that Bayess theoby Nicholas Saunderson some time
5 users, 0.99 5 5 true positives are expected. Out of rem was discovered
[9]
before
Bayes.
However,
this interpretation has been
15 positive results, only 5, about 33%, are genuine.
disputed.[10]
Note: The importance of specicity can be illustrated by
[11]
[12]
showing that even if sensitivity is 100% and specicity is Martyn Hooper and Sharon McGrayne have argued
at 99% the probability of the person being a drug user is that Richard Price's contribution was substantial:
33% but if the specicity is changed to 99.5% and the
By modern standards, we should refer to
sensitivity is dropped down to 99% the probability of the
the BayesPrice rule. Price discovered Bayess
person being a drug user rises to 49.8%.
7
work, recognized its importance, corrected
it, contributed to the article, and found a use
for it. The modern convention of employing
Bayess name alone is unfair but so entrenched
that anything else makes little sense.
[12]
See also
Probabiliorism
10
Notes
11 Further reading
Bruss, F. Thomas (2013), 250 years of 'An Essay towards solving a Problem in the Doctrine of
Chance. By the late Rev. Mr. Bayes, communicated by Mr. Price, in a letter to John Canton, A.
M. F. R. S.' extquotedbl, DOI 10.1365/s13291-0130077-z, Jahresbericht der Deutschen MathematikerVereinigung, Springer Verlag, Vol. 115, Issue 3-4
(2013), 129-133.
Gelman, A, Carlin, JB, Stern, HS, and Rubin, DB
(2003), Bayesian Data Analysis, Second Edition,
CRC Press.
Grinstead, CM and Snell, JL (1997), Introduction
to Probability (2nd edition) extquotedbl, American
Mathematical Society (free pdf available) .
Hazewinkel, Michiel, ed. (2001), Bayes formula,
Encyclopedia of Mathematics, Springer, ISBN 9781-55608-010-4
McGrayne, SB (2011). The Theory That Would Not
Die: How Bayes Rule Cracked the Enigma Code,
12
Hunted Down Russian Submarines & Emerged Triumphant from Two Centuries of Controversy. Yale
University Press. ISBN 978-0-300-18822-6.
Laplace, P (1774/1986), Memoir on the Probability of the Causes of Events, Statistical Science
1(3):364378.
Lee, PM (2012), Bayesian Statistics: An Introduction, Wiley.
Rosenthal, JS (2005), Struck by Lightning: the Curious World of Probabilities. Harper Collings.
Stigler, SM (1986), Laplaces 1774 Memoir on Inverse Probability, Statistical Science 1(3):359363.
Stone, JV (2013), download chapter 1 of Bayes
Rule: A Tutorial Introduction to Bayesian Analysis, Sebtel Press, England.
12
External links
EXTERNAL LINKS
13
13.1
Text
13.2
Images
10
13
13.3
Content license