![[logo]](/bugs/images/bugslogo.gif)
BUGS is a program that carries out Bayesian inference on statistical problems using a simulation technique known as Gibbs sampling.
BUGS assumes a Bayesian or full probability model, in which all quantities are treated as random variables. The model consists of a defined joint distribution over all unobserved (parameters and missing data) and observed quantities (the data); we then need to condition on the data in order to obtain a posterior distribution over the parameters and unobserved data. Marginalising over this posterior distribution in order to obtain inferences on the main quantities of interest is carried out using a Monte Carlo approach to numerical integration (Gibbs sampling).
There is a small set of BUGS commands to control a session in which a (possibly very complex) statistical model expressed using the BUGS language is analysed. A compiler processes the model and available data into an internal data structure suitable for efficient computation, and a sampler operates on this structure to generate appropriate values of the unknown quantities. A suite of S-plus (Statistical Science, Inc) functions, CODA [Best et al., 1995], is provided for analysis and plotting of the output files, and diagnosing convergence.
BUGS is intended for complex models in which there may be many unknown quantities but for which substantial conditional independence assumptions are appropriate. Particular structures include generalised linear models with hierarchical or crossed random effects, latent variable or frailty models, measurement errors in responses and covariates, informative censoring, constrained estimation, and missing data. See Gilks et al. (1993) for examples of typical applications. BUGS is a general purpose program and so it is inevitable that many types of models, such as spatial smoothing, could be more efficiently implemented in special purpose software. There are currently restrictions on the classes of model that can be fitted using this version of BUGS, and these will be made clear in the manual (see Section 3.2).
BUGS is intended for problems for which there is no exact analytic solution, and for which standard approximation techniques have difficulties. Markov Chain Monte Carlo (MCMC) is being increasingly used as an approach for dealing with such problems. The basic philosophy behind MCMC is to take a Bayesian approach and carry out the necessary numerical integrations using simulation: see Gelfand and Smith (1990) and Smith and Roberts (1993) for background. Instead of calculating exact or approximate estimates, this computer-intensive technique generates a stream of simulated values for each quantity of interest. To perform MCMC additional tools are required to form samples from the relevant distributions, monitor the stream for convergence, and summarise the accumulated samples. See Gilks et al. (1995) for a wide range of articles on all practical aspects of using MCMC.
We introduce a trivial problem for which exact solutions are possible in order to illustrate the nature of the Gibbs approach. This example will then be analysed in stages in order both to check the installation of the software and to illustrate the use of BUGS.
Consider a set of 5
observed
pairs
. We shall fit
a simple linear regression of y on x, using
the notation
Note that we have separated out the linear function (2) expressing the
dependence on
of the expectation of
, from the stochastic
link (1) between
and
. This is not strictly necessary but
enhances clarity and follows the tradition of generalized linear modelling.
The parameterisation
of the normal distribution is also
slightly non-standard, in that
= 1/variance(Y) = the precision
of Y.
Classical unbiased estimates are
, with
.
Both frequentist and Bayesian `noninformative' priors lead to inference being based on the pivotal quantities
and
both having
distributions with mean 0 and variance 3, and
having a
distribution, leading to 95% confidence/credible intervals
given below.
The BUGS language allows a concise expression for the model which is contained in the file line.bug, with the core relations (1) and (2) described as follows.
for (i in 1:N) {
Y[i] ~ dnorm(mu[i],tau);
mu[i] <- alpha + beta*(x[i] - x.bar);
}
Simple commands for running BUGS are in the file line.cmd and are reproduced below.
compile("line.bug")
update(500)
monitor(alpha)
monitor(beta)
monitor(sigma)
monitor(tau)
update(1000)
stats(alpha)
stats(beta)
stats(sigma)
stats(tau)
q()
This compiles the model, generates an initial run of 500 iterations as a ``burn-in"
in order (with luck) to
reach convergence, starts monitoring samples of parameters of interest,
and then performs 1000 iterations that result in a file containing a series of values simulated from
the joint posterior of the unknown quantities. Summary statistics are
available directly, or a graphics
program may be used to display the whole sample. For example, the CODA
suite
of functions for S-Plus
provided with BUGS (see Section 7) will give the output shown in Figure 1.
Figure 1: CODA plot showing output for model line.bug run in BUGS for 1000 iterations after a 500 iteration burn-in
Version 0.50: Instructions will be given for SUN Sparc Station, Hewlett-Packard, and PC 386+387/486/586 versions. The default for PCs is that a maths co-processor is available: however there is only a minor reduction in speed without one and we can provide versions that do not need it. Other platforms may soon be available - please contact us.
This version of BUGS is written in the high-level computer language Modula-2 and distributed as compiled code. Simple batch-files or shell-scripts are supplied to run BUGS and to pass to the executable file five command line arguments: the main output file (default bugs.out), the main index file (default bugs.ind), the log file (default bugs.log), the summary-output file (default bugs1.out), and the summary-index file (default bugs1.ind). The main output file contains the results of the sampling as a simple list of iteration number and sampled value pairs, and the main index file contains the names of the variables sampled and pointers to where the appropriate values start and finish in the main output file. These files are optionally written at the end of a BUGS run. The summary-output and -index file can be written out at any time: details of these files are given in Section 7. Output analysis is best handled in the CODA suite of S-Plus functions, but it is not assumed that S-Plus is available. Conversion programs for other graphics programs may become available.
The log file contains a complete record of everything the user has typed plus all the screen output produced by BUGS.
Section 2 covers installation of the software and checking that a simple example works. Sections 3 to 7 step through the stages of specifying a model, using the BUGS language, preparing data, and using BUGS commands to carry out an analysis. Section 8 discusses some common errors that are made. Section 9 deals with general modelling issues, and some techniques for enhancing performance.
A range of worked examples are provided in separate documents [Spiegelhalter et al. (1995a), Spiegelhalter et al. (1995b)], and potential users are encouraged to study these in detail and if possible use the files provided as templates for their own application. This will probably save a lot of effort!
Papers describing BUGS include Gilks et al. (1994), Spiegelhalter et al. (1995c), Spiegelhalter et al. (1995d) and Best et al. (1996). Alternatively this manual [Spiegelhalter et al., 1995e] may be used as a reference.
Users of the previous distributed versions will find that some irritating things have been taken out, many remain, and new ones have probably been introduced. Additional features include:
Future plans include:
Comments to bugs@mrc-bsu.cam.ac.uk
© 1995 MRC Biostatistics Unit
Return to the Welcome Page