Wednesday, November 7, 2007

Test of Independence in a 2 x 2 Table

Consider data collected in a study described in Dorn (1954) to assess the relationship between smoking and lung cancer. In this study, a sample of 86 lung-cancer patients and a sample of 86 controls were questioned about their smoking habits. The two groups were chosen to represent random samples from a sub-population of lung-cancer patients and an otherwise similar population of cancer-free individuals.

Here is the 2 x 2 table of responses:

Cancer Control
Smokers 83 72
Non-smokers 3 14

Let pL and pC denote the proportions of lung-cancer patients and controls who smoke. We wish to test H: pL = pC against the alternative A: pL <> pC.

To construct a Bayesian test, we define a suitable model for H and for A, and then compute the Bayes factor in support of the alternative A.

1. To describe these models, first we transform the proportions to the logits

LogitL = log(pL/(1-pL)), Logit C = log(pC/(1-pC))

2. We then define two parameters theta1, theta2, that are equal to the difference and sum of the logits.

theta1 = LogitL - LogitC, theta2 = LogitL + LogitC.

theta1 is the log odds ratio, a popular measure of association in a 2 x 2 table. Under the hypothesis of independence H, theta1 = 0.

3. Consider the following prior on theta1 and theta2. We assume they are independent where

theta1 is N(0, tau1), theta2 is N(0, tau2).

4. Under H (independence), we assume theta1 = 0, so we set tau1 = 0. theta2 is a nuisance parameter that we arbitrarily be N(0, 1). (The Bayes factor will be insensitive to this choice.)

5. Under A (not independence), we assume theta1 is N(0, tau1), where tau1 reflects our beliefs about the location of theta1 when the proportions are different. We also assume again that theta2 is N(0, 1). (This means that our beliefs about theta2 are insensitive to our beliefs about theta1.)

To compute the marginal densities, we write a function that computes the logarithm of the posterior when (theta1, theta2) have the above prior.

logctable.test=function (theta, datapar)
{
theta1 = theta[1] # log odds ratio
theta2 = theta[2] # log odds product

s1 = datapar$data[1,1]
f1 = datapar$data[1,2]
s2 = datapar$data[2,1]
f2 = datapar$data[2,2]

logitp1 = (theta1 + theta2)/2
logitp2 = (theta2 - theta1)/2
loglike = s1 * logitp1 - (s1 + f1) * log(1 + exp(logitp1))+
s2 * logitp2 - (s2 + f2) * log(1 + exp(logitp2))
logprior = dnorm(theta1,mean=0,sd=datapar$tau1,log=TRUE)+
dnorm(theta2,mean=0,sd=datapar$tau2,log=TRUE)

return(loglike+logprior)
}

We enter the data as a 2 x 2 matrix:

data=matrix(c(83,3,72,14),c(2,2))
data
[,1] [,2]
[1,] 83 72
[2,] 3 14

The argument datapar in the function is a list consisting of data, the 2 x 2 data table, and the values of tau1 and tau2.

Suppose we assume theta1 is N(0, .8) under the alternative hypothesis. This prior is shown in the below figure.

By using the laplace function, we compute the log marginal density under both models. (For H, we are approximating the point mass of theta1 on 1 by a normal density with a tiny standard deviation tau1.)

l.marg0=laplace(logctable.test,c(0,0),list(data=data,tau1=.0001,tau2=1))$int
l.marg1=laplace(logctable.test,c(0,0),list(data=data,tau1=0.8,tau2=1))$int

We compute the Bayes factor in support of the hypothesis A.

BF.10=exp(l.marg1-l.marg0)
BF.10
[1] 7.001088

The conclusion is that the alternative hypothesis A is seven times more plausible than the null hypothesis H.

No comments: