Introduction to Bayesian Thinking: Bayesian model selection

Here we illustrate one advantage of Bayesian regression modeling. By the use of an informative prior, it is straightforward to implement regression model selection.

Arnold Zellner introduced an attractive method of implementing prior information into a regression model. He assumes [beta | sigma^2] is normal with mean beta0 and variance-covariance matrix of the form

V = c sigma^2 (X'X)^(-1)

and then takes [sigma^2] distributed according to the noninformative prior proportional to 1/sigma^2. This is called Zellner's G-prior.

One nice thing about this prior is that it requires only two prior inputs from the user: (1) a choice of the prior mean beta0, and (2) c that can be interpreted as the amount of information in the prior relative to the sample.

We can use Zellner's G-prior to implement model selection in a regression analysis. Suppose we have p predictors of interest -- there are 2^p possible models corresponding to the inclusion or exclusion of each predictor in the model.

A G-prior is placed on the full model that contains all parameters. We assume that beta0 is the zero vector and choose c to be a large value reflecting vague prior beliefs. The prior on (beta, sigma^2) for this full model is

N(beta; beta0, c sigma^2 (X'X)^(-1)) (1/sigma^2)

Then for any submodel defined by a reduced design matrix X1, we take the prior on (beta, sigma^2) to be

N(beta; beta0, c sigma^2 (X1'X1)^(-1)) (1/sigma^2)

Then we can compare models by the computation of associated predictive probabilities.

To illustrate this methodology, we consider an interesting dataset on the behavior of puffins from Devore and Peck's text. One is interesting in understanding the nesting frequency behavior of these birds (the y variable) on the basis of four covariates: x1, the grass cover, x2, the mean soil depth, x3, the angle of slope, and x4, the distance from cliff edge.

We first write a function that computes the log posterior of (beta, log(sigma)) for a regression model with normal sampling and a Zellner G prior with beta0 = 0 and a given value of c.

regpost.mod=function(theta,stuff)
{
y=stuff$y; X=stuff$X; c=stuff$c
beta=theta[-length(theta)]; sigma=exp(theta[length(theta)])
if (length(beta)>1)
loglike=sum(dnorm(y,mean=X%*%as.vector(beta),sd=sigma,log=TRUE))
else
loglike=sum(dnorm(y,mean=X*beta,sd=sigma,log=TRUE))
logprior=dmnorm(beta,mean=0*beta,varcov=c*sigma^2*solve(t(X)%*%X),log=TRUE)
return(loglike+logprior)
}

We read in the puffin data and define the design matrix X for the full model.

puffin=read.table("puffin.txt",header=T)
X=cbind(1,puffin$x1,puffin$x2,puffin$x3,puffin$x4)

S, the input for the log posterior function, is a list with components y, X, and c.

S=list(y=puffin$y, X=X, c=100)

Since there are 4 covariates, there are 2^4 = 16 possible models. We define a logical matrix GAM of dimension 16 x 5 that describes the inclusion of exclusion of each covariate in the model. (The first column is TRUE since we want to include the constant term in each regression model.)

GAM=array(T,c(16,5))
TF=c(T,F); k=0
for (i1 in 1:2) {for (i2 in 1:2) {for (i3 in 1:2) {for (i4 in 1:2){
k=k+1; GAM[k,]=cbind(T,TF[i1],TF[i2],TF[i3],TF[i4])}}}}

For each model, we use the laplace function (in the LearnBayes package) to compute the marginal likelihood. The inputs to laplace are the function regpost.mod defining our model, an intelligent guess at the model (given by a least-squares fit), and the list S that contains y, X, and c.

gof=rep(0,16)
for (j in 1:16)
{
S$X=X[,GAM[j,]]
theta=c(lm(S$y~0+S$X)$coef,0)
gof[j]=laplace(regpost.mod,theta,S)$int
}

We display each model and the associated marginal likelihood values (on the log scale).

data.frame(GAM,gof)

X1 X2 X3 X4 X5 gof
1 TRUE TRUE TRUE TRUE TRUE -104.1850
2 TRUE TRUE TRUE TRUE FALSE -115.4042
3 TRUE TRUE TRUE FALSE TRUE -102.3523
4 TRUE TRUE TRUE FALSE FALSE -136.3972
5 TRUE TRUE FALSE TRUE TRUE -105.0931
6 TRUE TRUE FALSE TRUE FALSE -113.1782
7 TRUE TRUE FALSE FALSE TRUE -105.5690
8 TRUE TRUE FALSE FALSE FALSE -134.0486
9 TRUE FALSE TRUE TRUE TRUE -101.8833
10 TRUE FALSE TRUE TRUE FALSE -114.9573
11 TRUE FALSE TRUE FALSE TRUE -100.3735
12 TRUE FALSE TRUE FALSE FALSE -134.5129
13 TRUE FALSE FALSE TRUE TRUE -102.8117
14 TRUE FALSE FALSE TRUE FALSE -112.6721
15 TRUE FALSE FALSE FALSE TRUE -103.2963
16 TRUE FALSE FALSE FALSE FALSE -132.1824

I highlight the model (inclusion of covariates X3 and X5) that has the largest value of the log marginal likelihood. This tells us that the best model for understanding nesting behavior is the one that includes mean soil depth (X3) and the distance from cliff edge (X5) . One can also compare different models by the use of Bayes factors.

****************************************************************************
For students who have to do their own model selection, I've written a simple function

bayes.model.selection.R

that will give these log marginal likelihood values for all regression models. You have to download this function from bayes.bgsu.edu/m648 and have LearnBayes 1.11 installed on your machine.

Here's an example of using this function. I first load in the puffin dataset and define the response vector y, the covariate matrix X, and the prior parameter c.

puffin=read.table("puffin.txt",header=T)
X=cbind(1,puffin$x1,puffin$x2,puffin$x3,puffin$x4)
y=puffin$y
c=100

Then I just run the function bayes.model.selection with y, X, and c as inputs.

bayes.model.selection(y,X,c)

The output will be the data frame shown above.

Introduction to Bayesian Thinking

Saturday, November 17, 2007

Bayesian model selection

No comments:

Blog Archive

Contributors