Wednesday, October 17, 2007

Fitting a Mixture Sampling Model

To illustrate the application of a MCMC random walk algorithm, consider the following mixture model. Suppose we observe season home run counts y1, ..., y20 from a group of baseball players. Some of these players are "sluggers" who average many home runs (per season) and other players are "light hitters" who don't hit as many home runs. We know that 40% of all players are sluggers, but we don't know the identity (slugger or non-slugger) for each player. The sampling density for the ith player's home run count yi has the density

f(yi | lambda1, lambda2) = p f(yi | lambda1) + (1- p) f(yi | lambda2),

where we assume f(yi | lambda) is Poisson with mean lambda and we assume we know p = .4.

Suppose we assume that (lambda1, lambda2) has the prior 1/(lambda1 lambda2). Then the posterior is given (up to a proportionality constant) by

g(lambda1, lambda2 | y) = 1/(lambda1 lambda2) prod from i=1 to 20 [p f(yi | lambda1) + (1- p) f(yi | lambda2)]

Here is some simulated data

25 42 29 29 28 25 21 34 19 11 16 13 15 17 10 16 21 17 16 20

We first transform the parameters to the real-valued parameters

theta1 = log(lambda1), theta2 = log(lambda2)

and write the following function that computes the posterior of (theta1, theta2). Here the input variable data.p is a list that contains two elements: p, the known value of p, and y, the vector of observations.

poisson.mix=function(theta, data.p)
for (i in 1:length(y))

The data is stored in the vector y. We use the mycontour function to graph the bivariate posterior. We also show a perspective plot of this posterior.

You might be surprised to see that the posterior is bimodal. This is happening since the model is not well-defined. But in the next posting, we'll ignore this identifiability problem, and see if a random walk Metropolis can be successful in sampling from this posterior.

No comments: