Saturday, October 27, 2007

Fitting an Exchangeable Model

Continuing our baseball example, we observe yi home runs in ei at-bats for the ith player. We assume

1. y1, ..., y20 are independent, yi is Poisson(lambda_i)
2. the true rates lambda_1,..., lambda_20 are independent Gamma(alpha, alpha/mu)
3. the hyperparameters alpha, mu are independent, mu is distributed 1/mu, alpha is distributed according to the proper prior z0/(alpha+z0)^2, where z0 is the prior median

The data is stored as a matrix d, where the first column are the ei and the second column are the yi. In our example, we let z0 = 1; that is, the prior median of alpha is one.

Here is our computing strategy.

1. First we learn about the hyperparameters (alpha, mu). The posterior of (log alpha, log mu) is programmed in the LearnBayes function poissgamexch. Here is a contour plot.

ycontour(poissgamexch,c(-1,8,-4.2,-2.5),datapar)
title(xlab="LOG ALPHA",ylab="LOG MU")

We use the function laplace to find the posterior mode and associated variance-covariance matrix. The output from laplace is used to find a proposal variance and scale parameter to use in a random walk Metropolis chain. We simulate 10,000 draws from the posterior of (log alpha, log mu). (This chain has approximately a 30% acceptance rate.)

fit=laplace(poissgamexch,c(2,-3.2),datapar)
proposal=list(var=fit\$var,scale=2)
mcmcfit=rwmetrop(poissgamexch,proposal,c(1,-3),10000,datapar)

By exponentiating the simulated draws from mcmcfit, we get draws from the marginal posteriors of alpha and mu. We draw a density plot of alpha and superimpose the prior density of alpha.

alpha=exp(mcmcfit\$par[,1])
mu=exp(mcmcfit\$par[,2])
main="POSTERIOR OF ALPHA")

prior=function(alpha,z0) z0/(alpha+z0)^2
theta=seq(.0001,20,length=200)
lines(theta,prior(theta,1),col="red",lwd=3)

2. Now we can learn about the true rate parameters. Conditional on hyperparameter values, lambda_1, ..., lambda_20 have independent gamma posteriors.

We write a short function to simulate draws of lambda_j for a particular player j.

trueratesim=function(j,data,alpha,mu)
{
e=data[,1]; y=data[,2]
rgamma(length(alpha),shape=alpha+y[j],rate=alpha/mu+e[j])
}

Then we can simulate draws of all the true rates by use of the sapply function.

lam=sapply(1:20,trueratesim,d,alpha,mu)

The output lam is a 10,000 by 20 matrix. We can compute posterior means by the apply function.

post.means=apply(lam,2,mean)

To show the behavior of the posterior means, we draw a plot where
(1) we show the observed rates yi/ei as red dots
(2) we show the posterior means as blue dots
(3) a horizontal line is draw at the mean home run rates for all 20 players.

1 comment:

Mario Díaz said...

Hi!

Amazing blog! Try to explain this case a bit slowly because all of us are not mathematician!

Best regards from spain!