Wednesday, October 10, 2007

Summarizing a posterior

Continuing our selected data example, remember that we have programmed the posterior of the transformed parameters theta1 and theta2 in the R function kaminsky.

To find a normal approximation to the posterior, we apply the function laplace in the LearnBayes package. The inputs are (1) the function defining the log posterior, (2) a starting guess at the mode, (3) the number of iterations of the Newton algorithm, and (4) the data vector (the 5th and 15th order statistics).

> start=array(c(-2,-1),c(1,2))
> fit.laplace=laplace(kaminsky,start,10,data)

The output is a list containing mode, the value of the posterior mode, and var, the estimate at the variance-covariance matrix.

> fit.laplace$mode
[,1] [,2]
[1,] -2.367745 -1.091989
> fit.laplace$var
[,1] [,2]
[1,] 0.3201467 0.1191059
[2,] 0.1191059 0.1191059

We can get more accurate summaries of the posterior by means of a Metropolis random walk algorithm. The function rwmetrop implements this algorithm for an arbitrary posterior. To use this function, we define "proposal", a list containing the variance and scale parameter for the normal proposal density, the starting value for the MCMC chain, the number of simulated draws, and the data vector. Note that we are using the approximate variance-covariance matrix from laplace in the proposal density for rwmetrop.

> proposal=list(var=fit.laplace$var,scale=2)
> fit.mcmc=rwmetrop(kaminsky,proposal,start,10000,data)

The output of rwmetrop is a list containing accept, the acceptance rate for the chain, and par, the matrix of simulated draws.

At this point, we should run some convergence diagnostics to see if the simulated draws show sufficient mixing and don't display unusually high autocorrrelations. For this example, the acceptance rate is about 29% which is within the acceptable range for this algorithm.

We display the simulated draws on top of the contour plot of theta1 and theta2 -- it seems that that most the simulated draws fall within the first contour line.

> mycontour(kaminsky,c(-5,0,-2.5,1),data)
> title(xlab="LOG(Y5-MU)",ylab="LOG BETA")
> points(fit.mcmc$par[,1],fit.mcmc$par[,2])


We are interested in the parameters mu and beta. We first compute vectors of simulated draws of mu and beta by transforming back the simulated draws of theta1 and theta.

> MU=data[1]-exp(fit.mcmc$par[,1])
> BETA=exp(fit.mcmc$par[,2])

We display the marginal posteriors of mu and beta.

> par(mfrow=c(2,1))
> plot(density(MU),lwd=3,main="POSTERIOR OF MU",xlab="MU")
> plot(density(BETA),lwd=3,main="POSTERIOR OF BETA",xlab="BETA")



We construct 90% interval estimates by extracting quantiles from the collection of simulated draws.

> quantile(MU,c(.05,.95))
5% 95%
9.864888 10.065765
> quantile(BETA,c(.05,.95))
5% 95%
0.2012635 0.6520858

Last, suppose we are interested in predicting the 5th order statistic ys5 from a future sample of 20 observations.

To simulate from the distribution of ys5, we (1) simulate (mu, beta) from the posterior and then (2) simulate a future sample y1,...,y20 from the exponential distribution with parameters mu and beta, and (3) storing the 5th ordered observation from the simulated sample. We repeat this process 1000 times, obtaining a simulated sample from ys5. We display this predictive distribution by a histogram.

ys5=rep(0,1000)
for (j in 1:1000)
{
ys=rexp(20,rate=1/BETA[5000+j])+MU[5000+j]
ys5[j]=sort(ys)[5]
}
hist(ys5,col="orange")

No comments: