Monday, October 1, 2007

Normal approximations to posteriors

Continuing our Phillies example, I'm going to change the example somewhat and consider the relationship between the number of runs the Phillies score and the game outcome. Here is the table:

runs scored
game outcome low (four or less) high (five or more)
Win 15 74
Loss 53 20

There appears to be a notable positive relationship here and we're interested in estimating the underlying correlation coefficient of the bivariate normal.

We first construct a function polycorr.R that computes the logarithm of the posterior of the correlation.

polycorr=function(rho,data)
{

# needs package mvtnorm

n11=data[1,1]; n12=data[1,2]; n21=data[2,1]; n22=data[2,2]

nA=n11+n12; nB=n11+n21; n=n11+n12+n21+n22

c=qnorm(nA/n); d=qnorm(nB/n); pc=nA/n; pd=nB/n

val=0*rho

for (j in 1:length(rho))

{

C=matrix(c(1,rho[j],rho[j],1),c(2,2))

p00=pmvnorm(lower=-Inf*c(1,1),upper=c(c,d),corr=C)

val[j]=n11*log(pc-p00)+n12*log(1-pc-pd+p00)+n21*log(p00)+n22*log(pd-p00)
}
return(val)

}

We input the data as a two by two matrix.

data=matrix(c(15,53,74,20),c(2,2))

We find the normal approximation by use of the laplace function in the LearnBayes package. The inputs are the function, the starting value for the Newton algorithm, the number of iterations of the algorithm, and the data used in the function.


fit=laplace(polycorr,.6,10,data)

From the output of this function, we get that rho is approximately N(.694, .00479). In the R code below we plot the exact and approximate posteriors. In the figure, we see some inaccuracy in the normal approximation.

rho=seq(.3,1,by=.01)
gpost=exp(polycorr(rho,data))

plot(rho,gpost/sum(gpost)/.01,type="l",lwd=3,col="red",
ylab="DENSITY",xlab="RHO")
lines(rho,dnorm(rho,fit$mode,sqrt(fit$var)),lwd=3,col="blue")

legend(locator(1),c("EXACT","NORMAL APPROX"),col=c("red","blue"),lwd=2)



One way of improving the accuracy of the normal approximation is to transform rho to the real-valued parameter theta = log [(rho+1)/(1-rho)]. We write a function to compute the log posterior of theta.

polycorr2=function(theta,data)
{

# needs package mvtnorm

rho=(exp(theta)-1)/(exp(theta)+1)

n11=data[1,1]; n12=data[1,2]; n21=data[2,1]; n22=data[2,2]

nA=n11+n12; nB=n11+n21; n=n11+n12+n21+n22

c=qnorm(nA/n); d=qnorm(nB/n); pc=nA/n; pd=nB/n

val=0*rho

for (j in 1:length(rho))

{

C=matrix(c(1,rho[j],rho[j],1),c(2,2))
p00=pmvnorm(lower=-Inf*c(1,1),upper=c(c,d),corr=C) val[j]=n11*log(pc-p00)+n12*log(1-pc-pd+p00)+n21*log(p00)+n22*log(pd-p00) }
return(val+log(1-rho)+log(1+rho))
}


We find the normal approximation using the function laplace. Here the approximation is that the transformed rho is normal with mean 1.662 and variance .0692.

fit1=laplace(polycorr2,0,10,data)

We plot the exact and approximate posteriors -- here the normal approximation appears very accurate.

theta=seq(0.4,3.0,by=.01)
gpost=exp(polycorr2(theta,data))
plot(theta,gpost/sum(gpost)/.01,type="l",lwd=3,col="red", ylab="DENSITY",xlab="THETA") lines(theta,dnorm(theta,fit1$mode,sqrt(fit1$var)),lwd=3,col="blue")
legend(locator(1),c("EXACT","NORMAL APPROX"),col=c("red","blue"),lwd=2)






No comments: