Sei sulla pagina 1di 6

zist of attached paper:-

2. bayesian decision fundamentals???how does it operate/make decisions???


The Bayesian approach, is a particular way of formulating and dealing with statistical decision problems. More specically, it oers a method of formalizing a priori beliefs and of combining them with the available observations, with the goal of allowing a rational derivation of optimal (in some sense) decision criteria. To understand the Bayesian decision we need to look into statistical decision theory at first:Statistical Decision Theory Basic Elements The fundamental conceptual elements supporting the theory of statistical decision making are the following: Formalization of the underlying unknown reality. This is done by considering that all that is unknown but relevant for the decision maker, the so-called state of nature, can be represented by an entity staking values on a state space S. Often, this will be a single unknown numerical quantity (a parameter), or an ordered set of numerical parameters (a vector). In other problems, the elements of S may not be of numerical nature. Throughout most of this chapter, we will implicitly assume that s is a single quantity. Formal model of the observations. The observations, based on which decisions are to be made, are possibly random and depend on the state of nature s. In formal probabilistic terms, this dependence is expressed by assuming that the observations are a sample x of a random variable (or process, or vector, or eld) X, taking values on a sample space X , whose probability (density or mass) function, for x X , is conditioned on the true state of nature s, i.e., we write fX(x|s). This probability function appears in the literature under several dierent names: class-conditional probability function (usually in pattern recognition problems, where the observations x are called features); observation model (typically in signal/image processing applications, where x is usually referred to as the observed signal or observed image); parametric statistical model, or likelihood function (terms from the statistics literature but also adopted by other communities).

Formal decision rules. These are the goal of decision theory in the following sense: based on the observations, a decision rule has to choose an action amongst a set A of allowed decisions or actions.Formally, a decision rule is a function1 (x) from X into A, specifying how actions/decisions are chosen, given observation(s) x. A set or class D of allowed decision rules may be specied. Quantication of the consequences of the decisions. This is formally expressed via a loss function L(s, a) : S A IR, specifying the cost that is incurred when the true state of nature is s and the chosen decision is a. It is usually required that L(s, a) Lmin > , often (but not necessarily) with Lmin = 0. Although L(s, a) is required to be a real valued function, its range does not necessarily have to be IR; it can be some subset of IR, with typical examples being IR+ 0 and {0,1}. Sometimes, the consequences are viewed optimistically (for example, in the economics and business literature) and, rather than losses, one talks about an utility function U(s, a) : S A IR, specifying the gain that is obtained when the state of nature is s, and a is the chosen action. Writing L(s, a) = U(s, a) makes it clear that these are two equivalent concepts. A statistical decision problem is then formalized by specifying this set of elements {S, A, X , L(s, a), D, fX(x|s)}. It will be considered solved when a decision rule (x) (from D, the set of allowed rules) is chosen such that it achieves some sort of optimality criterion (associated with the loss function). Below, we will look more in detail at how this is done, both under the (so-called) classical (or frequentist) and Bayesian frameworks.

Parameter estimation problems (also called point estimation problems), that is, problems in which some unknown scalar quantity (real valued) is to be estimated, can be viewed from a statistical decision perspective: simply let the unknown quantity be the state of nature s S IR; take A = S, meaning that the decision rule will output estimates (guesses) of the true s; design a loss function L(s, a) expressing how much wrong estimates are to be penalized (usually verifying L(s, s) = 0, and L(s, a) > 0 for any a 6= s). With this setup, the decision rule will output estimates, denoted sb = (x), of the state of nature s; This approach can, of course, be extended to multidimensional states of nature (when s is a set of unknowns). This class of problems then contains all types of signal and image estimation problems, including restoration and reconstruction. In addition to estimation problems, many other problems in image analysis and pattern recognition can naturally be cast into a statistical decision framework. For example, pattern classication is clearly a decision making problem, where S = A is the set of possible classes; here, x X is the observed feature vector which is input to the decision rule (x) : X A, and fX(x|s) is the class conditional probability function. For example, a system may have to decide what is the dominant type of vegetation in some satellite image, denoted x, and then, as an illustration, S = A = {pine,eucalyptus,oak}. Notice here, the non-numerical, often called categorical, nature of the set S = A. Signal detection, so widely studied in the statistical communications and signal processing

literature, can also be interpreted as a classication problem. The goal here is to design receivers which are able to optimally decide which of a possible set of symbols (for example, 0 or 1, in binary digital communication) was sent by an emitter, from a possibly noisy and somewhat corrupted received signal. Here again, S = A is the set of possible symbols (e.g., S = A = {bit 0,bit 1}). Both classication and estimation scenarios, i.e., those where the goal is to guess the state of nature (thus A = S), may be commonly referred to as inference problems (although this is a non-standard use if the term inference). The distinguishing feature is the discrete (classication) or continuous (estimation) nature of sets S and A. Nevertheless, the naming convention is not rigid; e.g., many situations where S = A is a discrete set with a large number of values are referred to as estimation problems.

Coming back to the Bayesian Decision Theory:The Bayesian approach to decision theory brings into play another element: a priori knowledge concerning the state of nature s, in the form of a probability function, usually referred to as the prior. Mainly in statistics texts, priors are often denoted as (), but we will not adopt that convention here. Instead, we will use the conventional notation for probability functions pS(s). From a conceptual point of view, the Bayesian approach to decision theory implies viewing probabilities as measures of knowledge or belief, the so-called personal, or subjective, probabilities. Only by accepting such a view, is it possible to use probabilities to formalize a priori knowledge. The classical frequency-based interpretation of probability is clearly inadequate in many situations: for example, suppose that the unknown state of nature under which a decision has to be made (e.g., whether or not to perform surgery) is the presence or absence (a binary variable) of some disease in a certain patient. Clearly, there is nothing random about it: either the patient does or does not have the disease. Any (probabilistic type) statement such as there is a 75% chance that the patient has the disease has no frequency interpretation; there is no way we can have a sequence of outcomes of that same patient. Another example is any statement involving the probability of there being extra-terrestrial intelligent life in the known universe. Since there is only one known universe, there can be no frequency interpretation of such a probability; it simply expresses a degree of belief or a state of knowledge. Nevertheless, the statement is perfectly valid under the perspective of subjective probability; it expresses quantitatively a degree of personal belief. Several authors undertook the task of building formal theories for dealing with degrees of belief; these theories have to be consistent with standard (Boolean) logic but involve additional aspects allowing them to deal quantitatively with degrees of belief. It turns out that the belief measures are subject to the classical rules of probability theory, which itself does not depend on any frequency interpretation. This fact legitimates the use of probabilistic tools to formally deal with degrees of belief. The reader interested in these foundational aspects is encouraged to consult. Let us then assume that the available knowledge (set of beliefs) about the unknown state of nature can be formalized by considering it as a random variable S, characterized by its probability function pS(s), for s S; this will be a probability density or mass function, depending on S being a continuous or discrete set, respectively. In rare cases it can also happen that S contains both isolated points

and continuous subsets, and thus pS(s) will have to be a mixed probability function including both point masses.

3. Practical use:Whenever a quantity is to be inferred, or some conclusion is to be drawn, from observed data, Bayesian principles and tools can be used. Examples, and this is by no means an exhaustive list of mutually exclusive areas, include: statistics, signal processing, speech analysis, image processing, computer vision, astronomy, telecommunications, neural networks, pattern recognition, machine learning, articial intelligence, psychology, sociology, medical decision making,econometrics, and biostatistics. Focusing more closely on the topic of interest to this book, we mention that, in addition to playing a major role in the design of machine (computer) vision techniques, the Bayesian framework has also been found very useful in understanding natural (e.g., human) perception; this fact is a strong testimony in favor of the Bayesian paradigm. 4.Loss Caliiberated:the loss-calibrated framework highlights which key ingredients need to be considered when calibrating approximate inference to a task. when inference is carried out only approximately, treating (approximate) inference and decision making independently can lead to suboptimal decisions for a fixed loss under consideration. We thus investigate whether one can \calibrate" the approximate inference algorithm to a fixed loss, and propose an analysis framework to analyze this situation.

5. Gaussian Classifiaction Process


Gaussian Processes for Machine Learning (GPML) is a generic supervised learning method primarily designed to solve regression problems. It has also been extended to probabilistic classification, but in the present implementation, this is only a post-processing of the regression exercise. The advantages of Gaussian Processes for Machine Learning are: The prediction interpolates the observations (at least for regular correlation models). The prediction is probabilistic (Gaussian) so that one can compute empirical confidence intervals and exceedence probabilities that might be used to refit (online fitting, adaptive

fitting) the prediction in some region of interest. Versatile: different linear regression models and correlation models can be specified. Common models are provided, but it is also possible to specify custom models provided they are stationary.

An introductory regression example


Say we want to surrogate the function . To do so, the function is evaluated

onto a design of experiments. Then, we define a GaussianProcess model whose regression and correlation models might be specified using additional kwargs, and ask for the model to be fitted to the data. Depending on the number of parameters provided at instanciation, the fitting procedure may recourse to maximum likelihood estimation for the parameters or alternatively it uses the given parameters.

Example: http://scikit-learn.org/0.11/auto_examples/gaussian_process/plot_gp_probabilistic_classification _after_regression.html http://www.informatik.uni-ulm.de/ni/ANNPR10/InvitedTalks/AtiyaANNPR.pdf pdf for help in case needed.

6.Bayesian postrior Risk


Previously we discussed Bayesian decision problems. As with the general decision problem setting, we have data X, a model P = {P : }, a loss function L(, d), and risk R(, ). In the Bayesian setting we additionally specify a prior, which is a measure over the parameter space. Assuming this measure denes a probability distribution, we interpret the parameter as an outcome of the random variable . Conditional on = , we think of the data as being generated by the distribution P. Now, the optimality goal for our decision problem of estimating g() is the minimization of the average risk r(, ) = E[L(, (X))] = E[E[L(, (X))]]. ISKA MAST EXAMPLE HAI http://www.stanford.edu/~lmackey/stats300a/doc/stats300a_fall13_lecture8.pdf AK BAR ISKE BARE ME HI DIYA HAI. http://www.stanford.edu/~lmackey/stats300a/doc/stats300a_fall13_lecture8.pdf

YE DEKH LO

7. Asymmetric Losses Ye padho mast haoi!!!!! Point no 2. http://www.ism.ac.jp/editsec/aism/pdf/047_3_0401.pdf

Potrebbero piacerti anche