## Thursday, November 29, 2007

### Comparing sampling models by Bayes factors

In the last posting, I illustrated fitting a t(4) sampling model to some baseball data where I suspected there was an outlier. To compare this model to other sampling models, we can use Bayes factors.

Here is a t sampling model with a convenient choice of prior.

1.$y_1, ..., y_n$is a random sample from $t(\mu, \sigma, df)$
2. $\mu$ and $\log \sigma$ are independent, with $\mu$ distributed $N(M_0, S_0)$ and $\log \sigma$ distributed $N(M_1, S_1)$

I define a R function tsampling.R that computes the logarithm of the posterior. Note that I am careful to include all of the normalizing constants, since I am primarily interested in computing the marginal density of $y$.

tsampling=function(theta,datapar)
{
mu=theta[1]; logsigma=theta[2]; sigma=exp(logsigma)

y=datapar\$y; df=datapar\$df
mu.mean=datapar\$mu.mean; mu.sd=datapar\$mu.sd
lsigma.mean=datapar\$lsigma.mean; lsigma.sd=datapar\$lsigma.sd

loglike=sum(dt((y-mu)/sigma,df,log=TRUE)-log(sigma))
logprior=dnorm(mu,mu.mean,mu.sd,log=TRUE)+
dnorm(logsigma,lsigma.mean,lsigma.sd,log=TRUE)

return(loglike+logprior)
}

We use this function together with the function laplace to compute the log marginal density for two models.

Model 1 -- t(4) sampling, $\mu$ is normal(90, 20), $\log \sigma$ is normal(1, 1).

Model 2 -- t(30) sampling, $\mu$ is normal(90, 20), $\log \sigma$ is normal(1, 1).

Note that I'm using the same prior for both models -- the only difference is the choice of degrees of freedom in the sampling density.

dataprior1=list(y=y, df=4, mu.mean=90, mu.sd=20,
lsigma.mean=1, lsigma.sd=1)
log.m1=laplace(tsampling,c(80, 3), dataprior1)\$int

dataprior2=list(y=y, df=30, mu.mean=90, mu.sd=20,
lsigma.mean=1, lsigma.sd=1)
log.m2=laplace(tsampling,c(80, 3), dataprior2)\$int

BF.12=exp(log.m1-log.m2)
BF.12
[1] 14.8463

We see that there is substantial support for the t(4) model over the "close to normal" t(30) model.