Wednesday, September 19, 2007

Fitting a logistic model


Freshman students at BGSU take a mathematics placement test and a "placement score" is used to advise the student on the proper first mathematics course. For students taking a business calculus course, we record (1) his or her placement score and (2) his or her grade in the course. There are five possible placement levels that we code as 1, 2, 3, 4, 5. Let yi denote the number of students receiving A at placement level i. We suppose that yi is binomial(ni, pi), where pi is the probability a student at level i receives an A in the class. We let the probabilities satisfy the logistic model

logit(pi) = beta0 + beta1 i.

Assuming a uniform prior for beta = (beta0, beta1), the posterior distribution is proportional to

g(beta) = product pi^yi (1-pi)^(ni-yi).

The definition of the log posterior of beta is defined in the R function logisticpost.R.

We illustrate a "brute force" method of fitting this model.

1. We first read in the data -- we create three vectors y, n, and level. The matrix
data has columns level, n, and y.

> y=c(2,15,29,39,15)
> n=c(34,170,283,243,59)

> level=1:5

> data=cbind(level,n,y)

> data

level n y

[1,] 1 34 2

[2,] 2 170 15

[3,] 3 283 29

[4,] 4 243 39

[5,] 5 59 15

2. We illustrate the usual MLE fit using the R function glm. The MLE will be helpful in finding where the posterior is located.

> response=cbind(y,n-y)
> glm(response~level,family=binomial)


Coefficients:

(Intercept) level

-3.328 0.423


3. After some trial and error, we find a rectangle where the posterior is concentrated. The function mycontour is used to draw a contour plot.

> mycontour(logisticpost,c(-5,-1.5,-.2,1),data)

4. The function simcontour is used to simulate draws from the posterior computed on this grid. We plot the simulated draws on top of the scatterplot.

> s=simcontour(logisticpost,c(-5,-1.5,-.2,1),data,1000)
> points(s$x,s$y)
> title(xlab="BETA0",ylab="BETA1")

No comments: