**Chapter 4 Exercises**

**Student t simulation and analysis**

**Solutions**

1. Generate 4000 samples from a standard Student-t distribution with 4 degrees of freedom. How might you compare this distribution to a standard normal distribution in BUGS?

model {

Y ~ dt(0, 1, 4)

X ~ dnorm(0, 1)

}

node mean sd MC error 2.5% median 97.5% start sample

X -0.008038 0.997 0.01179 -1.954 -0.005036 1.945 1 4000

Y -0.01199 1.372 0.01853 -2.707 -0.01365 2.78 1 4000

Right-clicking on the density for X and selecting "Properties..." allows us to change the axis bounds so that the normal density is plotted on the same scale as the t density:

The central portions of the two densities look very similar but the t distribution has much heavier tails.

2. Adapt your code so that we can save the 4,000 simulated values as a data set for analysis in WinBUGS later. [Hint: create a vector of t-distributed variables and use "Save State" from the "Model" menu after just one iteration.]

model {

for (i in 1:4000) {

Y[i] ~ dt(0, 1, 4)

}

}

Click the arrows below to see the simulated data

3. Now assume that we do not know the values of any parameters and specify a model for the simulated data. Use the following priors: normal with mean zero and standard deviation 100 for the mean; uniform on the interval (0, 10) for the standard deviation; and a uniform prior on the interval (3, 30) for the degrees of freedom. [Remember that the "precision parameter" in the t distribution is given by d / ((d - 2)*variance), where "d" is the degrees of freedom; hence the reason for bounding the distribution of d away from 2.]

model {

for (i in 1:4000) {

Y[i] ~ dt(mu, tau, d)

}

mu ~ dnorm(0, 0.0001)

tau <- d/((d - 2)*sd*sd)
sd ~ dunif(0, 10)
d ~ dunif(3, 30)
}

4. Choose two sets of initial values and run a two-chain analysis. How many iterations should we discard (burn in)? How many iterations are required for accurate inferences?

list(mu = 1, sd = 10, d = 20)

list(mu = -3, sd = 5, d = 10)

After 1000 iterations, visual inspection of history plots suggests that a burn-in of 100 iterations is sufficient.

Running another 9000 iterations confirms this...

Looking at BGR, however, which is conservative, suggests a longer burn-in, just to be sure... The blue and green lines have yet to stabilise, suggesting a safe burn-in of at least 5000 iterations. (The fluctuations are most likely due to small sample sizes + high autocorrelation; a burn-in of 100 would probably be sufficient in practice.)

Running another 10000 iterations confirms that a burn-in of 5000 iterations is probably sufficient:

Results from iterations 5001--20000 are as follows:

node mean sd MC error 2.5% median 97.5% start sample

d 4.283 0.282 0.008279 3.765 4.27 4.871 5001 30000

mu -0.005894 0.01916 1.403E-4 -0.04335 -0.005937 0.0315 5001 30000

sd 1.415 0.03165 7.456E-4 1.357 1.413 1.482 5001 30000

MCSEs expressed as percentages of the posterior standard deviation are 2.9%, 0.73% and 2.4%, for d , mu and sd , respectively. Hence the rule of thumb about MCSEs being less than 5% of the posterior standard deviation is passed. If we required MCSEs to be less than 1% of the postertior standard deviation, then we would need 300,000 posterior samples as opposed to 30,000 -- a 10-fold increase!

node mean sd MC error 2.5% median 97.5% start sample

d 4.3 0.2924 0.002902 3.765 4.287 4.914 5001 300000

mu -0.005821 0.01916 4.753E-5 -0.04342 -0.005838 0.03174 5001 300000

sd 1.413 0.03222 2.605E-4 1.355 1.411 1.482 5001 300000