Wednesday, October 10, 2007

Learning from selected order statistics

To illustrate some computational methods for summarizing a posterior, I describe a missing data problem motivated from my undergraduate days at Bucknell. Ken Kaminsky and Paul Nelson, two of my Bucknell professors, were interested in learning about populations based on selected order statistics. (I wrote an undergraduate honors thesis on this topic. ) Here is a simple illustration of the problem.

Suppose a sample y1, ..., y20 is taken from the two-parameter exponential distribution of the form f(y | mu, beta) =1/beta exp(-(y-mu)/beta), y > mu. But you don't observe the complete dataset -- all you observe are the two order statistics y(5) and y(15) (the order statistics are the observations arranged in ascending order).

Based on this selected data, we wish to (1) estimate the parameters mu and beta by 90% interval estimates and (2) predict the value of the order statistics y*(5) and y*(20) from a future sample taken from the same population.

Here's the plan:

1. First, we write the likelihood which is the density of the observed data (y(5) and y(20)) given values of the exponential parameters mu and beta. One can show that this likelihood is given by

L(mu, beta) = f(y(5)) f(y(15)) F(y(5))^4 (F(y(15) )-F(y(5)))^9 (1- P(y(15)))^5, mu>0, beta>0.

2. Assuming a flat (uniform) prior on (mu, beta), the posterior density is proportional to the likelihood. We write a R function kaminsky0.R that computes the logarithm of the posterior -- here the parameters are (mu, beta) and the data is (y(5), y(15)).

kaminsky0=function(theta,data)
{

f=function(y,mu,beta)
return(dexp(y-mu,rate=1/beta))
F=function(y,mu,beta)
return(pexp(y-mu,rate=1/beta))

y5=data[1]; y15=data[2]
mu=theta[,1]
beta=theta[,2]

loglike=log(f(y5,mu,beta))+log(f(y15,mu,beta))+
4*log(F(y5,mu,beta))+9*log(F(y15,mu,beta)-F(y5,mu,beta))+
5*log(1-F(y15,mu,beta))

return(loglike)
}

3. Graphing the posterior of (mu, beta), we see strong skewness in both parameters.
It is usually helpful to transform to real-valued parameters

theta1 = log(y(5) - mu) , theta1 = log(beta).

We write the following function kaminsky.R that computes the log posterior of (theta1, theta2).

kaminsky=function(theta,data)
{

f=function(y,mu,beta)
return(dexp(y-mu,rate=1/beta))
F=function(y,mu,beta)
return(pexp(y-mu,rate=1/beta))

y5=data[1]; y15=data[2]
mu=y5-exp(theta[,1])
beta=exp(theta[,2])

loglike=log(f(y5,mu,beta))+log(f(y15,mu,beta))+
4*log(F(y5,mu,beta))+9*log(F(y15,mu,beta)-F(y5,mu,beta))+
5*log(1-F(y15,mu,beta))

logjack=theta[,1]+theta[,2]

return(loglike+logjack)
}

Here's a graph of the posterior of the reexpressed parameters -- note that it is much more normal-shaped.


4. We'll use several functions in the next posting to summarize the posterior.

(a) The laplace function is useful in finding the posterior mode and normal approximation to the posterior.

(b) By use of the rwmetrop function, we construct a random-walk Metropolis algorithm to simulate from the joint posterior.

No comments: