Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Bayesian Statistics
Bayesian techniques are an alternative to classical or frequentist methods.
Although frequentist statistical techniques are more commonly used, the
use of Bayesian methods is increasingly widespread. Bayesian techniques
are used in a broad range of situations such as radar tracking, inertial
navigation, spotting credit card fraud and analysing clinical trials.
Bayesian methods are increasingly popular • The Bayesian framework more naturally
due to the advantages they offer over allows for the modelling of biases or
frequentist statistical methods: systematic error, and for modelling any
hierarchical structure in the data or in the
• A Bayesian analysis provides direct
problem. For instance in a clinical trial with
statements about the quantities of interest,
200 subjects at 5 different hospitals, we
providing more intuitive results and feeding
can model not just the overall effect of the
naturally into a decision making process.
treatment studied, but also variations in
• Bayesian statistics allows us to obtain outcome between different hospitals.
some useful information even from a single
• Bayesian statistics provides greater
piece of evidence, whereas many more
flexibility. It offers a natural way to adapt an
samples would be required for a frequentist
experiment in progress in the light of
statistical result.
results collected so far, or to halt the
• The Bayesian approach allows all evidence experiment early if the result is clearer than
to be taken into account in an explicit way. expected and no more data is needed to
Different forms of evidence can be achieve the desired degree of certainty.
combined in the overall probability model
or included via the prior belief. Analysing
the data using different priors allows the
data to be interpreted from different points
of view, such as regulatory or business.
What is Bayesian Statistics? Should Statistics be Objective?
The essence of Bayesian methods is a From the Bayesian point of view, inferences
mathematical rule explaining how you should are always conditional, conditioned on prior
change your existing beliefs in the light of new information, i.e., what (we believe) we already
evidence. It allows people to combine new know. The same data can lead to different
data with their existing knowledge or expertise. conclusions depending upon other available
information.
In Bayesian statistics, probability represents
an individual’s degree of belief that a particular Objections are often raised because Bayesian
event will occur. Thus the Bayesian approach results are subjective. A Bayesian statistician
is based on personal or subjective would answer that in order to use frequentist
probabilities. Central to Bayesian methods is statistics, subjective decisions have to be
the use of Bayes’ Theorem, which states that if taken and that Bayesian statistics merely
A and B are events and P(B), the probability of makes this explicit.
event B, is greater than zero, then:
As an example, suppose we toss a coin 10
P( A).P(B | A) times and get 10 heads. Should we conclude
P( A | B ) = that the coin has two heads?
P (B )
A frequentist would calculate a P-value of
The theorem is most commonly interpreted 0.001 for the hypothesis that the coin is
such that A represents a model that we think normal. This does not mean that we are 99.9%
might be true (in frequentist terms, a sure that we have a two headed coin. It merely
hypothesis) and B represents our says that given a normal coin, what happened
observations. In this case: is highly unlikely. This is an objective fact but
is not an answer to the original question. We
• The prior estimate of probability, P(A), is
now need to decide how small the P-value
our initial belief about the probability of A
must be in order to convince us that we have a
being true.
two-headed coin rather than just being lucky.
• The posterior estimate P(A|B) is the This is where the frequentist analysis becomes
probability of A being true given that B has subjective.
been observed.
The Bayesian approach requires a prior for
• The likelihood factor, P(B|A) is the how likely two-headed coins are. Say we
probability of event B occurring if A is true. believe that, never having seen one, only one
• Finally we consider a range of possible A’s coin in a million has two heads. Then Bayes’
so we can calculate P(B), the total Theorem tells us that despite seeing 10 heads
probability of B happening for any A. Since in a row, the probability of this being a two-
B did happen, we need to divide by P(B) to headed coin is still only 0.001. If on the other
normalize the answer. hand the coin is very old and we think two-
headed coins were more common when it was
In frequentist inference, tests of significance
minted, this changed prior belief, through
are performed by supposing that one particular
Bayes’ Theorem, would lead to a higher
hypothesis, the null hypothesis, is true, and
posterior probability of the coin having two
then computing the probability of observing a
heads.
statistic at least as extreme as the one actually
observed during hypothetical future repeated Thus the Bayesian paradigm has given a clear
trials (this is the P-value). In other words, answer to the original question but the answer
frequentist statistics examine the probability of is subjective, depending on our beliefs about
the data given a model (hypothesis). By two-headed coins. However, since this
contrast, Bayesian statistics allows us to subjectivity is included explicitly, we can easily
examine the probability that a possible model test how important the choice of prior was by
is true, given the available data. It also allows trying different priors.
us to compare the different possible models
and assess their relative probabilities of being This can lead people new to the Bayesian
true. approach to wonder “what, or whose, prior
should I use?” or to think that they can prove
anything because they can set the prior to
whatever they want. But the answer is simple:
you need to use the prior of the person you
have to convince of the result.
Bayesian Networks In a decision-making process, Bayesian
An important subset of Bayesian statistics is Networks can also be used to optimize costs
Bayesian Networks, or Bayesian Belief or benefits. Outcomes are associated with
Networks (BBN). probabilities, but actions that might lead to
those outcomes can also be associated with
A Bayesian Network models a decision risks or rewards. The Bayesian Network can
problem by mapping out cause-and-effect then be used to calculate the route that is most
relationships among key variables and likely to give the best return, for example the
assigning to them probabilities that represent largest monetary profit.
the extent to which one variable is likely to
affect another. Each variable in the problem Adaptive Testing
domain is assigned to a node. Each node can Suppose we are trying to conduct an
take different states, and the probability of the experiment that consists of a number of tests
node taking each state, given the states of its (or measurements) to determine some
parent nodes, is defined using a Conditional parameters for a model – for instance the dose
Probability Table. response curve of a drug or the performance
of a car engine with different tunings. Whereas
A simple example is a network that models the
the frequentist approach tends to yield fixed
effect of a possible train strike on whether two
experiments with a pre-determined set of
workers are late (see diagram below).
tests, the Bayesian approach allows us to
This network has been used to calculate the modify the tests to optimize what we learn
overall probability of each worker being late. during the experiment. We do this by
But the main advantage of this method is the “updating” our prior after each test with the
capacity to update the probabilities given new results of that test, to get a new prior for the
information – for example, if Andy turns up next test. We can then simulate running this
late, then it is possible to calculate the next test with a computer program to
probability that there is a train strike, and thus determine which set of test conditions (which
calculate the new probability that Bill will be dose to give the next subject, which values of
late, given that Andy has already arrived late. tuning parameters to next run the engine with)
will tell us most about the outcome we are
Bayesian Networks can have hundreds of interested in. Such sets of tests are known as
individual nodes with probabilities that can be “sequential adaptive” experiments. They
updated every time new information is allows us to focus on the range of values for
received. This makes them ideal for assessing the parameters that prove most promising,
complex situations, and they are used as tools allowing the experiment to use fewer tests, be
to support decision-making processes. They completed quicker and yield more information
are suitable for problems where uncertainty is than a traditional frequentist design.
unavoidable, since they express such
uncertainty in an explicit way. They can be
used to generate optimal predictions or
decisions even when key pieces of information
are missing.
Train Strike
References
Many books are available that explain the principles of Bayesian statistics and Bayesian inference.
A few examples are given here:
• P. M. Lee, Bayesian Statistics: An Introduction, 3rd edition (Hodder Education, 2004)
• A. Gelman, Bayesian Data Analysis (Chapman & Hall / CRC Press, 2003)
• A. O’Hagan and J. Forster, Kendall's Advanced Theory of Statistics, Vol. 2B Bayesian
Inference, 2nd edition (Wiley, 2004)
Many more references to Bayesian statistics can be found on the Internet, including the following:
• For a good introduction to Bayesian statistics, without too much mathematics, see “A Brief
Introduction to Bayesian Statistics” by Floyd Bullard at
http://www.ncssm.edu/courses/math/Talks/PDFS/BullardNCTM2001.pdf
• An essay on the relative merits of Bayesian and Classical statistics in High Energy Physics, by
G. D’Agostini, at
http://nedwww.ipac.caltech.edu/level5/March01/Dagostini/Dagostini_contents.html
• For a more detailed critique of Classical vs. Bayesian statistics in a clinical arena, see Toward
Evidence-Based Medical Statistics. 1: The P Value Fallacy” by S. Goodman at
http://www.annals.org/cgi/content/full/130/12/995 and “Toward Evidence-Based Medical
Statistics. 2: The Bayes Factor” by S. Goodman at
http://www.annals.org/cgi/content/full/130/12/1005
Tessella plc 26 The Quadrant, Abingdon Science Park, Abingdon, Oxfordshire OX14 3YS, UK
T: +44 (0)1235 555511 | F: +44 (0)1235 553301 | E: info@tessella.com
Tessella Inc 233 Needham Street, Suite 300, Newton, MA 02464, USA
T: 1 617 454 1220 | F: 1 617 454 1001 | E: info@tessella.com
Tessella – successfully delivering IT and consulting services to world leaders in R&D, science and engineering.
For decades, Tessella has been successfully delivering IT and consulting services to world leaders in R&D, science, and engineering. Through the application of scientific methods and rigorous quality procedures,
we enable clients in life sciences, energy, the public sector, and consumer industries to achieve a wide range of objectives, including, forecasting floods, developing fusion power, enhancing military sensor capability,
increasing drug discovery and development efficiency, and reducing risk to health and the environment in the extraction and production of oil and gas. With offices in Europe and North America, global companies rely
on Tessella for business critical assignments.
Copyright © Tessella plc 2009, all trademarks acknowledged. Issue: V2.R1.M0 | June 2009
www.tessella.com