birkes:dodge:93 apply different regression models
to the much-analysed stack-loss data of brownlee:65. This
features 21 daily responses of stack loss (y), the amount of ammonia
escaping, with covariates being air flow (
), temperature (
) and
acid concentration (
). Part of the data is shown below.
We first assume a linear regression on the expectation of y, with a variety of different error structures. Specifically
where
are covariates
standardised to have zero mean and unit variance.
are initially
given independent ``noninformative" priors.
Maximum likelihood estimates for the double expontential (Laplace) distribution are essentially equivalent to minimising the sum of absolute deviations (LAD), while the other options are alternative heavy-tailed distributions. A t on 4 degrees of freedom has been chosen, although with more data it would be possible to allow this parameter also to be unknown.
We also consider the use of `ridge regression', intended to avoid the instability due to correlated covariates. This has been shown [Lindley and Smith1972] to be equivalent to assuming the regression coefficients of the standardised covariates to be exchangeable, so that
In the following example we extend the work of birkes:dodge:93 by applying this ridge technique to each of the possible error distributions.
birkes:dodge:93 suggest investigating outliers by
examining residuals
greater than 2.5 standard
deviations. We can calculate standardised residuals for each
of these distributions, and create a variable outlier[i]
taking on the value 1 whenever this condition is fulfilled. Mean
values of outlier[i] then show the confidence with which this
definition of outlier is fulfilled.
The appropriate graph for the ridge regression model is shown in Figure 10.
The following BUGS code will fit all the necessary models by changing the lines that are commented out: the version shown here fits a ridge regression for logistic errors.
Stacks: model specification in BUGS
model stacks;
const
p = 3, # number of covariates
N = 21, # number of observations
PI = 3.141593;
var
x[N,p], # raw covariates
z[N,p] , # standardised covariates
Y[N],mu[N], # data and expectations
stres[N], # standardised residuals
outlier[N], # indicator if |stan res| > 2.5
beta0,beta[p], # standardised intercept, coefficients
b0,b[p], # unstandardised intercept, coefficients
phi, # prior precision of standardised coefficeints
tau,sigma,d; # precision, sd and degrees of freedom of t distn
data Y,x in "stacks.dat";
inits in "stacks.in";
{
# Standardise x's and coefficients
for (j in 1:p) {
b[j] <- beta[j]/sd(x[,j]) ;
for (i in 1:N) {
z[i,j] <- (x[i,j] - mean(x[,j]))/sd(x[,j]) ;
}
}
b0 <- beta0-b[1]*mean(x[,1])-b[2]*mean(x[,2])-b[3]*mean(x[,3]);
# Model
d <- 4; # degrees of freedom for t
for (i in 1:N) {
# Y[i] ~ dnorm(mu[i],tau);
# Y[i] ~ ddexp(mu[i],tau);
Y[i] ~ dlogis(mu[i],tau);
# Y[i] ~ dt(mu[i],tau,d);
mu[i] <- beta0 + beta[1]*z[i,1]+beta[2]*z[i,2]+beta[3]*z[i,3];
stres[i] <- (Y[i] - mu[i])/sigma;
outlier[i] <- step(stres[i] - 2.5) + step(-(stres[i]+2.5) );
}
# Priors
beta0 ~ dnorm(0,.00001);
for (j in 1:p) {
# beta[j] ~ dnorm(0,.00001); # coeffs independent
beta[j] ~ dnorm(0,phi); # coeffs exchangeable (ridge regression)
}
tau ~ dgamma(1.0E-3,1.0E-3);
phi ~ dgamma(1.0E-3,1.0E-3);
# standard deviation of error distribution
# sigma <- sqrt(1/tau); # normal errors
# sigma <- sqrt(2)/tau; # double exponential errors
sigma <- sqrt(pow(PI,2)/3)/tau ; # logistic errors
# sigma <- sqrt(d/(tau*(d-2))); # t errors on d degrees of freedom
}
Figure 10:
Graphical model for stacks example
A BUGS run of 500 burn-in + 1000 iterations took between 4 and 14 secs for each model and gave the following output, including observations with probability greater than 0.1 of being outliers. Included are the results of birkes:dodge:93 (B&D) for least squares, least absolute deviation (LAD) and ridge regression.
We note the similar results between the Birkes and Dodge methods and the BUGS runs, and the lack of influence of the ridge technique in this context.