Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Formal decision rules. These are the goal of decision theory in the following sense: based on the observations, a decision rule has to choose an action amongst a set A of allowed decisions or actions.Formally, a decision rule is a function1 (x) from X into A, specifying how actions/decisions are chosen, given observation(s) x. A set or class D of allowed decision rules may be specied. Quantication of the consequences of the decisions. This is formally expressed via a loss function L(s, a) : S A IR, specifying the cost that is incurred when the true state of nature is s and the chosen decision is a. It is usually required that L(s, a) Lmin > , often (but not necessarily) with Lmin = 0. Although L(s, a) is required to be a real valued function, its range does not necessarily have to be IR; it can be some subset of IR, with typical examples being IR+ 0 and {0,1}. Sometimes, the consequences are viewed optimistically (for example, in the economics and business literature) and, rather than losses, one talks about an utility function U(s, a) : S A IR, specifying the gain that is obtained when the state of nature is s, and a is the chosen action. Writing L(s, a) = U(s, a) makes it clear that these are two equivalent concepts. A statistical decision problem is then formalized by specifying this set of elements {S, A, X , L(s, a), D, fX(x|s)}. It will be considered solved when a decision rule (x) (from D, the set of allowed rules) is chosen such that it achieves some sort of optimality criterion (associated with the loss function). Below, we will look more in detail at how this is done, both under the (so-called) classical (or frequentist) and Bayesian frameworks.
Parameter estimation problems (also called point estimation problems), that is, problems in which some unknown scalar quantity (real valued) is to be estimated, can be viewed from a statistical decision perspective: simply let the unknown quantity be the state of nature s S IR; take A = S, meaning that the decision rule will output estimates (guesses) of the true s; design a loss function L(s, a) expressing how much wrong estimates are to be penalized (usually verifying L(s, s) = 0, and L(s, a) > 0 for any a 6= s). With this setup, the decision rule will output estimates, denoted sb = (x), of the state of nature s; This approach can, of course, be extended to multidimensional states of nature (when s is a set of unknowns). This class of problems then contains all types of signal and image estimation problems, including restoration and reconstruction. In addition to estimation problems, many other problems in image analysis and pattern recognition can naturally be cast into a statistical decision framework. For example, pattern classication is clearly a decision making problem, where S = A is the set of possible classes; here, x X is the observed feature vector which is input to the decision rule (x) : X A, and fX(x|s) is the class conditional probability function. For example, a system may have to decide what is the dominant type of vegetation in some satellite image, denoted x, and then, as an illustration, S = A = {pine,eucalyptus,oak}. Notice here, the non-numerical, often called categorical, nature of the set S = A. Signal detection, so widely studied in the statistical communications and signal processing
literature, can also be interpreted as a classication problem. The goal here is to design receivers which are able to optimally decide which of a possible set of symbols (for example, 0 or 1, in binary digital communication) was sent by an emitter, from a possibly noisy and somewhat corrupted received signal. Here again, S = A is the set of possible symbols (e.g., S = A = {bit 0,bit 1}). Both classication and estimation scenarios, i.e., those where the goal is to guess the state of nature (thus A = S), may be commonly referred to as inference problems (although this is a non-standard use if the term inference). The distinguishing feature is the discrete (classication) or continuous (estimation) nature of sets S and A. Nevertheless, the naming convention is not rigid; e.g., many situations where S = A is a discrete set with a large number of values are referred to as estimation problems.
Coming back to the Bayesian Decision Theory:The Bayesian approach to decision theory brings into play another element: a priori knowledge concerning the state of nature s, in the form of a probability function, usually referred to as the prior. Mainly in statistics texts, priors are often denoted as (), but we will not adopt that convention here. Instead, we will use the conventional notation for probability functions pS(s). From a conceptual point of view, the Bayesian approach to decision theory implies viewing probabilities as measures of knowledge or belief, the so-called personal, or subjective, probabilities. Only by accepting such a view, is it possible to use probabilities to formalize a priori knowledge. The classical frequency-based interpretation of probability is clearly inadequate in many situations: for example, suppose that the unknown state of nature under which a decision has to be made (e.g., whether or not to perform surgery) is the presence or absence (a binary variable) of some disease in a certain patient. Clearly, there is nothing random about it: either the patient does or does not have the disease. Any (probabilistic type) statement such as there is a 75% chance that the patient has the disease has no frequency interpretation; there is no way we can have a sequence of outcomes of that same patient. Another example is any statement involving the probability of there being extra-terrestrial intelligent life in the known universe. Since there is only one known universe, there can be no frequency interpretation of such a probability; it simply expresses a degree of belief or a state of knowledge. Nevertheless, the statement is perfectly valid under the perspective of subjective probability; it expresses quantitatively a degree of personal belief. Several authors undertook the task of building formal theories for dealing with degrees of belief; these theories have to be consistent with standard (Boolean) logic but involve additional aspects allowing them to deal quantitatively with degrees of belief. It turns out that the belief measures are subject to the classical rules of probability theory, which itself does not depend on any frequency interpretation. This fact legitimates the use of probabilistic tools to formally deal with degrees of belief. The reader interested in these foundational aspects is encouraged to consult. Let us then assume that the available knowledge (set of beliefs) about the unknown state of nature can be formalized by considering it as a random variable S, characterized by its probability function pS(s), for s S; this will be a probability density or mass function, depending on S being a continuous or discrete set, respectively. In rare cases it can also happen that S contains both isolated points
and continuous subsets, and thus pS(s) will have to be a mixed probability function including both point masses.
3. Practical use:Whenever a quantity is to be inferred, or some conclusion is to be drawn, from observed data, Bayesian principles and tools can be used. Examples, and this is by no means an exhaustive list of mutually exclusive areas, include: statistics, signal processing, speech analysis, image processing, computer vision, astronomy, telecommunications, neural networks, pattern recognition, machine learning, articial intelligence, psychology, sociology, medical decision making,econometrics, and biostatistics. Focusing more closely on the topic of interest to this book, we mention that, in addition to playing a major role in the design of machine (computer) vision techniques, the Bayesian framework has also been found very useful in understanding natural (e.g., human) perception; this fact is a strong testimony in favor of the Bayesian paradigm. 4.Loss Caliiberated:the loss-calibrated framework highlights which key ingredients need to be considered when calibrating approximate inference to a task. when inference is carried out only approximately, treating (approximate) inference and decision making independently can lead to suboptimal decisions for a fixed loss under consideration. We thus investigate whether one can \calibrate" the approximate inference algorithm to a fixed loss, and propose an analysis framework to analyze this situation.
fitting) the prediction in some region of interest. Versatile: different linear regression models and correlation models can be specified. Common models are provided, but it is also possible to specify custom models provided they are stationary.
onto a design of experiments. Then, we define a GaussianProcess model whose regression and correlation models might be specified using additional kwargs, and ask for the model to be fitted to the data. Depending on the number of parameters provided at instanciation, the fitting procedure may recourse to maximum likelihood estimation for the parameters or alternatively it uses the given parameters.
YE DEKH LO