Sei sulla pagina 1di 8

Introduction to BUGS

BUGS (Bayes Using Gibbs Sampling) is a statistical software package designed specifically to do Bayesian analyses of simple to intermediate complexity based on numerical simulation rather than solving for analytical solutions. he great thing about BUGS is that it keeps a lot of the mathematical and computational details !under the hood" so that you can focus on the structure of your model at a high level rather than being bogged down in the details. #lso$ since BUGS is designed %ust to do Bayesian &'&' computation it is very efficient at this ( which is nice since Bayesian &'&' computations can be time consuming. )n this tutorial we will work through the basics of how to analy*e a model in BUGS ( how to write a model$ how to load up data$ how to compile and execute the model$ and how to evaluate the numerical output for convergence. +esources BUGS is an open,source pro%ect and thus the software is free. -rom the BUGS homepage http.//www.openbugs.info the !0ownloads" section will allow you to download not only the BUGS software$ but also + packages that allow one to call BUGS directly from +. (1ote. )t is possible to run BUGS on a &ac or 2inux through 3)14). #lso noteworthy are !0ocumentation" links and the !'ommunity" links where one can get to the BUGS discussion email list and links to the previous !3inBUGS" site which contains many useful resources for learning more about BUGS specifically and Bayes in general. Software 'lick on the 5penBUGS icon on your desktop to start the software. his will open up a fairly uninteresting software window. 2et6s start by looking at some of the 5penBUGS menus. -irst click -ile 7 1ew to open up a new script window in BUGS. 5ne of the ma%or differences between + and BUGS is that + has a command line prompt (the 7 where you type things) and BUGS does not. #ll code in BUGS has to be written into a script. )n + when you write things to a script you can evaluate that script one line at a time$ in blocks$ or all at once. )n BUGS$ on the other hand$ the script can only be evaluated all at once$ so in general each script contains a single discrete analysis. 1ext$ let6s look at &anuals 7 5penBUGS User &anual. his will open up a window that shows the BUGS manual$ which is the same as you would have seen on the BUGS website. 'lick on !'ontents" and take a look at the list of topics covered. here6s a large amount of info in the manual but we will focus our attention on a few sections that you6ll find yourself coming back to repeatedly. 5ne of these topics is !&odel Specification" which provides a lot of detail on how to write models in BUGS$ how data must be formatted$ and the list of functions and distributions that BUGS knows about. he naming conventions for distributions in BUGS is very similar to the convention in +$ but there are a few cases where the parameteri*ation is different so it is always good to check that the values you are passing to a BUGS distribution are what you think they are. 5ne very important example of this is that the 1ormal distribution in BUGS$ dnorm$ is parameteri*ed in terms of a mean and precision (8/variance) rather than a mean and standard deviation$ which is the parameteri*ation in +. 1ext$ lets look at 4xamples 7 4xamples 9ol 8. BUGS has three volumes of examples that provided written explanations of analyses with the BUGS code and data embedded in them so that you can run the code directly from the example. 3orking through these examples is a great way to learn more about how BUGS works and when you are analy*ing your own data it is often easiest to start 8

from an existing example and modify it to meet your needs rather than starting from scratch. #nalysis in BUGS at a glance #ll analyses in BUGS follow the same basic outline. his section will present a general overview of the steps involved and then work through a simple example to explain the details of each step. :lease run this example as you read along. BUGS !recipe" 8. 3rite model ;. 2oad data set <. Set initial conditions (optional) =. 0on6t forget to save your script before running> ?. 5pen Model Specification Tool (&odel 7 Specification) @. Aighlight !model"$ click !'heck &odel" B. Aighlight data$ click !2oad 0ata" C. Set number of chains (typically <,?) D. 'lick !compile" 8E. 'lick !Gen inits" and/or highlight initial conditions and click !load inits" 88. 5pen Sample Monitor Tool ()nference 7 Samples) 8;. Specify variables to track (enter variable name in window$ click !set") 8<. 5pen Update Tool (&odel 7 Update) 8=. 'lick !update" to run sampler 8?. Use Sample Monitor Tool to evaluate model fit$ use Update Tool to run longer if necessary 1. Write model #s our first example$ let6s consider the simple case of finding the mean of a normal distribution with a known variance. his problem has an exact analytical solution so we can compare the numerical results to the analytical one. Aere we6re considering the case where both the likelihood of the data and the prior for the mean are assumed to be normal.
L=p y =N y , 2 prior =p =N 0, 2

)n BUGS the specification of any model begins with the word !model" and then encapsulates the rest of the model specification in curly brackets model F GG model goes here GG H 3hen writing models in BUGS we have to specify the data model$ process model$ and parameter model$ but we donIt need to specify explicitly the connections between them$ BUGS figures them out based on the conditional probabilities involved. )n BUGS$ deterministic calculations (e.g. process ;

model) make use of an arrow ( J, ) for assignment similar to + while assignment of random variables is done with a tilde (K). 0eterministic calculations and distributions can not be combined in the same line of code. -or example$ a regression model where weIre trying to predict data y based on observation x might include mu J, bE L b8M x y K dnorm(mu$tau) but in BUGS the same model can 15 be expressed as y K dnorm(bE L b8M x$ tau) -or our first model of a normal likelihood and normal prior this would be specified as model F mean K dnorm(?<$E.E=) prec J, 8/8C?.E N K dnorm(mean$prec) H he first line of this model specifies the prior and says that !mean" is a random variable (K) that is 1ormally distributed (dnorm) with an expected value of ?< and a precision of E.E= (variance of ;?). he second line says that the !prec" is calculated deterministically as 8/8C?. +emember that in this model weIre assuming that the variance is known (and in this case it is 8C?). he third line specifies the likelihood and says that there is a single data point !N" that is a random variable that has a 1ormal distribution with expected value !mean" and precision !prec". 2. Load data set he next step is to specify the data for this model. here are two ways to specify data in BUGS$ either as a list or as a table. 3eIll start with the list format$ which is the same as the + list format and more flexible. #ll in all$ BUGS is not designed for loading and manipulating data ( we recommend that you do that in + or in a spreadsheet first and then cut,and,paste it into your script file. (#side. it is possible to download + packages that call BUGS directly from + ( this is the easiest approach if you are dealing with large/complex data sets or a large number of runs) he data list in BUGS begins with the word !list" and a set of parenthesis. he data variable names and data and the data itself go between these parenthesis. -or this example with a single data point$ we have a line that follows the model that simply says list(NO=;) 5ne thing that is important is that the variable names in the list match those in the model$ because BUGS will check that the data set has the right names and is of the correct length. 3. Set initial conditions (optional) he next step is to specify the initial conditions. )n BUGS this is optional because if initial conditions are not specified then the code will draw them randomly from the priors. )f you use informative priors this is not usually a problem$ though if you use uninformative priors the initial parameter values will start far from the center of the distribution and may take a long time or even fail to converge$ same as <

with traditional optimi*ation. )nitial conditions are also specified using a list$ but in this case we want to name the variable names instead of the data set names. 5ne thing that is nice about BUGS is that we only have to specify some of the variable names ( those that we donIt specify will be initiali*ed based on the priors ( and therefore we can often get away with only initiali*ing a subset of variables. )n BUGS we will commonly want to run multiple independent chains of the &'&' that start from different initial conditions. )n this case weIll want one list for each chain. So in this example if we want to run three chains we might specify the initial conditions as list(mean O =E) list(mean O =?) list(mean O ?E) . !on"t for#et to sa$e %our script &efore runnin#' +emember to save early and often in BUGS. )t is particularly important to save your script before you run it in case the model crashes the software. (. )pen Model Specification Tool (Model * Specification) Aint. running a model in BUGS results in a bunch of small windows being open for different tools. ry to arrange them so theyIre not overlapping and you can see them. +. In t,e script- ,i#,li#,t t,e .ord /model0. In t,e Model Specification Tool clic1 /2,ec1 Model0 3hen you do this the message bar at the very bottom of BUGS will either say !model is syntactically correct" or will give an error message that indicates that there is a bug in the code. NouIll need to keep an eye on the message bar because the information is vital but understated (a US software designer probably would have made a large P colorful pop,up to make sure you didnIt miss these messages). #lso$ a 94+N )&:5+ #1 behavior of BUGS is that in specifying/running a model$ any time you encounter an error you have to start back at this step (e.g. if your data is specified wrong and you get an error message$ you canIt %ust fix the data and hit !load data" again$ you have to restart from !'heck &odel") 3. 4i#,li#,t data in t,e script- clic1 /Load !ata0 in t,e Model Specification Tool -or our data list all you have to do is highlight the word !list" before clicking 2oad 0ata. )f data are correctly specified and loaded the message bar will say !data loaded" 5. Set num&er of c,ains in t,e Specification Tool (t%picall% 36() 7. 2lic1 /compile0 in t,e Specification Tool )f this is successful the message bar will say !model compiled" 18. 4i#,li#,t initial conditions and clic1 /load inits0 and9or 2lic1 /Gen inits0 Since we had specified initial conditions weIll want to use !load inits". Begin by highlighting !list" for the first initial condition and then click !load inits". 3hen you do this the counter for !for chain" will switch from 8 to ;. NouIll then repeat this for the second and third initial conditions. 3hen you are done the message bar should say !model is initiali*ed" and the !gen inits" button should be off. )f this model had additional variables that we had not initiali*ed we would then hit !gen inits" and BUGS would set the remaining values based on the priors. #lternatively$ we could have specified no initial conditions and %ust hit !gen inits" from the beginning$ though as mentioned above that could have large =

implications for the rate/success of convergence. 11. )pen Sample Monitor Tool (Inference * Samples) 12. Specif% $aria&les to trac1 (enter $aria&le name in .indo.- clic1 /set0) BUGS will only store the values for the variables that you ask it to. )n the window labeled !node" youIll want to write in the name of the variables you want to track. 4ach time you enter a variable youIll want to hit the !set" button. 3ith only a few exceptions$ variables tracked have to be variables in the model. 5ne important exception is that you can always ask BUGS to track the !deviance". (+ecall that deviance O ,; ln 2) his is particularly useful in multiple chain models in case not all the chains converge to the same value,, in which case you will be interested in the chains with the lowest deviance. )f one of the variables in your model is a vector$ you can specify either the name of the whole vector or %ust specific values to track (e.g. !alphaQ<R" would only record the third value of the alpha variable). -or the model weIve specified lets track the variables !mean" and !deviance". 7.)pen Update Tool (Model * Update) 18.2lic1 /update0 to run sampler )nitially youIll want to set !Updates" to a small number (e.g. 8E) to make sure the model runs. 3hen you hit !update" is when the model is actually run. #fter youIve run the model a few steps you can change !updates" to a larger number (e.g. 8EEE) and hit !update" again. his will add more steps to the run rather than starting from scratch. )n this way you can check the progress of the &'&' and run it longer if need be. he total number of samples depends upon how Suickly the model converges$ the acceptance rate/auto,correlation of the samples$ and the Suantities you are most interested in (e.g. a good estimate of the posterior mean takes far fewer samples than a good estimate of the confidence intervals). #s a rule of thumb expect to run the &'&' ?$EEE to ?EE$EEE steps$ depending upon auto, correlation and time to convergence. 11.Use Sample Monitor Tool to e$aluate model fit- use Update Tool to run lon#er if necessar% BUGS has a number of options for generating default graphs and statistics within the Sample &onitor ool. o access these begin by specifying the variable you want to look at in the !node" window. #lternatively you can specify an asterisk (M) in the node window$ which will allow you to look at all of the variables you are tracking at once ( which can be a bad idea if youIre tracking lots of variable. )n this case weIre not so lets enter a !M" so that we can look at all the variables. here are a bunch of different options on the sample monitor tool so lets take a look at them one by one. 3eIve underlined the metrics that youIll want to pay the most attention to. o have all these outputs go to a single window$ rather than having each go to itIs own pop,up window$ open a log under )nfo 7 5pen 2og. ST:TS )f you click on the !stats" button a window should pop up that shows the posterior mean$ standard deviation$ median$ and D?T '). )t will also show the &'&' sample si*e and the &'&' standard error$ which is a reflection of the precision of the numerical approximation. #s you run a &'&' longer and longer this error will decrease but the posterior standard deviation will not$ because its width is a reflection of the sample si*e of the 0# # rather than the sample si*e of the &'&'. he &'&' error gives you guidance as to how many digits you should interpret from numerical approximation. )nfo from the stats window can be cut,and,pasted elsewhere for safe keeping. ?

!;<SIT= he !density" button will create histogram graphs of each posterior. 3eIre looking for smooth densities and in general distributions that are unimodal ( while truly multimodal posterior densities can exist$ this is much more often an indicator of a lack of convergence. #ll plots in BUGS$ including the density plots$ can be cut,and,pasted by right clicking on the graph and selecting !copy". Nou can also change the format of a graph by right clicking and selecting !properties". 2)!: he !coda" button opens one window for each chain and displays the raw &'&' values. his is useful if you want to save these values for subseSuent analysis (e.g. in +). he !coda" file format is that used by the !'onvergence 0iagnostics and 5utput #nalysis" software$ which exists as an + package named !coda" and includes a number of additional convergence statistics and graphing options beyond those built into BUGS. T>:2; race depicts the recent history of the &'&' chains$ with each chain depicted as a different color. )n a model that has converged these chains will be overlapping and will bounce around at random$ preferentially looking like white noise but sometimes showing longer term trends. he !trace" graph updates in real time if you hit !update" again in the Update ool. )t can be very useful to open the trace window at the start of a model run to follow the progress of the &'&'. ?UM@ 'reates a plot of the sSuared %ump distance of the &'&'. his allows you to see if the model is taking large or small steps BG> !I:G>:M he Brooks(Gelman(+ubin statistic is based on the ratio of variability within chains to among chains and is used to assess whether/when a &'&' is converged. Generally we are looking for the red line to converge to 8. Be aware that you should check all the variables in your model because sometimes one appears to have converged while another hasnIt yet. 4IST)>= )s analogous to trace but it shows the full history of the &'&' and does not update dynamically. Nou will want to assess when the model has converged based on this diagram$ the BG+ diagram$ and the Suantile diagram and then set that time point of convergence as the !beg" value in the Sample &onitor ool. his will cause all values prior to convergence to be ignored in making graphs and calculating summary statistics. :22;@T Graph of the acceptance rate of the &'&'. Generally models that use a Gibbs sampler will always be at 8 (8EET acceptance) while other numerical methods typically accept around <E, ?ET of proposed values once they tune their step si*e. )n general youIll want to thin the &'&' by at least 8/accept. AU:<TIL;S @

:rovides a moving average estimate of the median and '). his can be a useful convergence diagnostic$ especially if you are interested in estimating the ')$ because it shows whether these statistics for the chains have converged. :UT) 2)> his generates an autocorrelation diagram for the &'&'$ with the lag on the x,axis and the correlation coefficient on the y,axis. #t lag E the chain is always perfectly correlated with itself (autocorrelation of 8). # lag of one would show the correlation between each value in the &'&' with the one next to it$ a lag of two would be the correlation between values that are separated by two$ etc. 3e are looking for when the autocorrelation asymptotes to *ero (typically the diagram looks roughly exponential). Nou will want to set the !thin" in the Sample &onitor ool to the lag at which the correlation is almost *ero (i.e. samples are independent). his test is usually more conservative than that coming from evaluating the acceptance rate. Because the !thin" affects all other statistics youIll want to recompute any values you are keeping (e.g. density plots or summary statistics) if you change the thin. 1ow that we have a basic feel for BUGS weIll look at ways to progressively increase the complexity of the model. 'ase Study. -orest Stand 'haracteristics -or the next few examples weIll be using data on the diameter of trees from the 0uke loblolly pine -#'4 site. )n this example weIll %ust be looking at the diameter data in order to characteri*e the stand itself. 2etIs begin by expanding the model we specified above to account for a larger data set and for the uncertainty in the variance in the data. 5ur data set has ;DB values so when specifying the model in BUGS weIll need to loop over each value to calculate the likelihood of each data point and use the vector index notation ,, QiR ,, to specify which value weIre computing. model F GGpriors mean K dnorm(;E$E.E8) prec K dgamma(E.8$E.8) GGlikelihood for(i in 8.;DB)F NQiR K dnorm(mean$prec) H GGdiagnostics sd J, sSrt(8/prec) H

he data for this analysis that youIll want to cut,and,paste into your script is.
list(N O c(;E.D$ 8<.@$ 8?.B$ @.<$ ;.B$ ;?.@$ =$ ;E.D$ B.C$ ;B.8$ ;?.;$ 8D$ 8B.C$ ;;.C$ 8;.?$ ;8.8$ ;;$ ;;.=$ ?.8$ 8@$ ;E.B$ 8?.B$ ?.?$ 8C.D$ ;;.D$ 8?.?$ 8C.@$ 8D.<$ 8=.;$ 8;.<$ 88.C$ ;@.C$ 8B$ ?.B$ 8;$ 8D.C$ 8D$ ;<.@$ 8D.D$ C.=$ ;;$ 8C.8$ ;8.@$ 8B$ 8;.=$ ;.D$ ;;.@$ ;E.C$ 8C.;$ 8=.;$ 8B.<$ 8=.?$ C.@$ D.8$ ;.@$ 8D.C$ ;E$ ;;.;$ 8E.;$ 8;.D$ ;E.D$ ;8.8$ B.<$ ?.C$ ;<.8$ 8B$ ;8.?$ 8E.8$ 8C.=$ ;;.@$ ;8.;$ ;8.?$ ;;.=$ 8B.<$ 8@$ ;?$ ;;.=$ ;<.D$ ;<$ ;8.D$ 8D$ ;C.@$ 8@$ ;;.?$ ;<.;$ C.B$ ;<.=$ 8?.<$ ;?.@$ 8D.;$ 8B.=$ ;<.C$ ;E.=$ 8D$ <.@$ ;<.=$ 8D.@$ 8B.?$ 8@.?$ ;;$ 8D.B$ B.<?$ 8C$ 8B.C$ D.@$ 8?$ 8;$ 8B.B$ ;8.=$ 8B$ ;;.8$ 8C.D$ 8?.E?$ 8;.D$ 8D.<$ 8?.<$ 8<.@$ 8?.=$ 8E.@$ 88.<$ 88.C$ ;;.;$ ;;.;$ 8<.8$ B.=$ =.?$ 88.B$ 8D.?$ 8D.D$ 88.@$ 8<.D$ 8?.?$ 88$ 8C.@$ 8B.@$ 8;.B$ ;E.D$ 8C.C$ ;;.=$ ;8.;$ 8C.;$ 8?.<$ 8<.@$ B.<$ 8B.=$ 8B.=$ 8E.?$ ;;.D$ ;<.;$ 8<.C$ 8=.C$ ;;.;$ ;E.D$ 8<$ 8C.D$ 8D$ 8?.;$ 8@.C$ 8C$ ;=.@$ 8?.=$ 8B.;$ ;<.;$ ;;.C$ ;?.?$ B.C$ @$ @.=$ 8D$ 8<.?$ ;<.B$ 8C$ ;;.;$ ;;.=$ D.<$ 8<.B$ 8C.D$ ;E.?$ ;<.<$ ;E.C$ 8C.=$ =.?$ 8;.;$ 8@.D$ 8<.?$ 8B.C$ 8@.D$ ;E.=$ 8D.?$ ;;.;$ ;=.?$ ;8.;$ 8@.?$ 8C$ 8@.=$ <.D$ 8B.D$ ;;$ 8;.D$ ;8$ 8C$ D.;$ 8?.D$ C.8$ C.<$ 8E.B$ 8;$ 8D.D$ 8<.@$ 8B.<$ 88.?$ 8;.=$ 8?.8$ ;;$ 8D.<$ 8B.?$ 8=.?$ 8=.B$ 8B.?$ 8D.@$ 8;.D$ ;E.<$ 8B.D$ ;E.;$ 8C.<$ D.?$ 8D$ ;8$ 8<.8$ ;E.=$ 8@.<$ 8C.<$ 88.C$ ;<.<$ 8?.;$ ;E$ 8B.D$ 8;$ 8D.@$ 8C.?$ 8@.;$ 8E.D$ 8B.C$ 8<.C$ 8E$ 8B.D$ 8?.@$ ;E.<$ 8=.D$ 8C.@$ 8;.?$ 8C.;$ 8@$ 8C.B$ 8C$ 8?.<$ 8D$ 8B.D$ 8?.C$ 8B.B$ 8=.=$ 8D.@$ 8C.<$ 8C.B$ 8B.C$ 8C$ 8E.8$ 8C.C$ 8@.=$ ;8.;$ 8@.@$ 8@.B$ 8B.C$ 8@.?$ 8D.<$ 8@.<$ 8=.;$ 8<$ D.=$ 8D.B$ 8<.=$ ;.@$ 8B.@$ 8@.B$ 8B.@$ ?.C$ 8B.@$ ;E.8$ 8C.;$ 8@.B$ 8=$ 8<.D$ ?.8$ 8@.@$ <.D$ 8B.?$ 8C))

Potrebbero piacerti anche