Tuesday, October 2, 2007

Robust Modeling

To illustrate some of the computational methods to summarize a posterior, suppose that we observe a sample y1, ..., yn from a Cauchy density with location theta and scale parameter 1. If we assign a uniform prior to theta, then the posterior density of theta is proportional to

product from i=1 to n [ (1 + (yi - theta))^{-1} ].

Suppose we observe the following 20 values:

1.3 1.9 3.2 5.1 1.4 5.8 2.5 4.7 2.9 5.2 11.6 8.8 8.5 10.7 10.2
9.8 12.9 7.2 8.1 9.5

In R, we define a new function cpost.R that computes the logarithm of the posterior density.

cpost=function(theta,y)
{
val=0*theta
for (j in 1:length(y))
val=val+dt(y[j]-theta,df=1,log=TRUE)
return(val)
}

We compute and display the posterior on the interval [2, 12].

Note the interesting bimodal shape of the posterior. Clearly a normal approximation to this posterior will not be an accurate representation.

Suppose we wish to simulate a sample from this posterior. A general method for generating a sample is the reject algorithm. To construct a reject algorithm, we find a suitable p that is easily to simulate from and covers the posterior g in the sense that g(theta) <= c p(theta) for all theta. Then one simulates a variate u from a uniform(0, 1) distribution; if u < [ g(u)/c/p(u)], then we accept u as a draw from the target distribution g. Suppose we let p be a t density with mean mu, variance v, and degrees of freedom df. Here we let

MEAN=7; VAR=9; DF=3

To find the bounding constant, we plot the logarithm of the ratio, log g(theta) - log p(theta) over the range of theta.

p.density=dt(theta-MEAN,DF)/sqrt(VAR)
plot(theta,log(post/p.density),type="l") From inspection of the graph (and a little calculation), it appears that the log of the ratio is bounded above by -62.98.

In the R code, we draw the posterior density and the constant x proposal density that covers the posterior.

The function rejectsampling.R will implement reject sampling with a t covering density. The inputs to this function are (1) the definition of the log target density, (2) a list giving the parameters of the t covering density (mean, variance, df), (3) the value of the log of the bounding constant, (4) the number of values simulated from the proposal density, and (5) the data used in the target function. The output of this function is a vector of values from the target distribution.

tpar=list(m=MEAN,var=VAR,df=DF)
dmax=-62.98
n=10000
theta.sim=rejectsampling(cpost,tpar,dmax,n,y)

One can compute the acceptance rate of this algorithm by dividing the length of the output vector theta.sim by the number of simulated draws from the proposal. The acceptance rate for this example is about 12%. To demonstrate that this algorithm works, we draw in the following figure (1) the exact posterior in red and (2) a density estimate of about 6000 simulated draws in blue. I'm convinced that the algorithm has indeed produced a sample from the posterior distribution.

No comments: