Introduction to Bayesian Thinking: A Poisson Change-Point Model

In Chapter 11 of BCWR, I describe an analysis of a famous dataset, the counts of British coal mining disasters described in Carlin et al (1992). We observe the number of disasters $y_t$ for year $t$ , where $t =$ actual year - 1850. We assume for early years ( $t < \tau$ ), $y_t$ has a Poisson distribution with mean $\lambda_1$ , and for the later years, $y_t$ is Poisson( $\lambda_2$ ). Suppose we place vague priors on $(\lambda_1, \lambda_2, \tau)$ . (Specifically, we'll put a common gamma(c0, d0) prior on each Poisson mean.)

This model can be fit by the use of Gibbs sampling through the introduction of latent data. For each year, one introduces a state $Z_t$ where $Z_t =1$ or 2 if $y_t$ is Poisson( $\lambda_1$ ) or Poisson( $\lambda_2$ ). Then one implements Gibbs sampling on the vector $(\lambda_1, \lambda_2, Z_1$ , ..., $Z_n$ ).

In Chapter 11, I illustrate the use of WinBUGS and the R interface to simulate from this model. MCMCpack also offers a R function MCMCpoissonChangepoint to fit from this model which I'll illustrate here.

First we load in the MCMC package.

library(MCMCpack)

We load in the disaster numbers in a vector data.

data=c(4,5,4,1,0,4,3,4,0,6,
3,3,4,0,2,6,3,3,5,4,5,3,1,4,4,1,5,5,3,4,2,5,2,2,3,4,2,1,3,2,
1,1,1,1,1,3,0,0,1,0,1,1,0,0,3,1,0,3,2,2,
0,1,1,1,0,1,0,1,0,0,0,2,1,0,0,0,1,1,0,2,
2,3,1,1,2,1,1,1,1,2,4,2,0,0,0,1,4,0,0,0,
1,0,0,0,0,0,1,0,0,1,0,0)

Suppose we decide to assign gamma(c0, d0) priors on each Poisson mean where c0=1 and d0=1. Then we fit this changepoint model simply by running the function MCMCpoissonChangepoint:

fit=MCMCpoissonChangepoint(data, m = 1, c0 = 1, d0 = 1,
burnin = 10000, mcmc = 10000)

I have included the important arguments: data is obviously the vector of counts, m is the number of unknown changepoints (here m = 1), c0, d0 are the gamma prior parameters, we choose to have a burnin period of 10,000 iterations, and then collect the following 10,000 iterations.

MCMCpack includes several graphical and numerical summaries of the MCMC output: plot(fit), summary(fit), plotState(fit), and plotChangepoint(fit).

plot(fit) shows trace plots and density estimates for the two Poisson means.

summary(fit) gives you summary statistics, including suitable standard errors, for each Poisson mean

Iterations = 10001:20000
Thinning interval = 1
Number of chains = 1
Sample size per chain = 10000

1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:

Mean SD Naive SE Time-series SE
lambda.1 3.0799 0.2870 0.002870 0.002804
lambda.2 0.8935 0.1130 0.001130 0.001170

2. Quantiles for each variable:

2.5% 25% 50% 75% 97.5%
lambda.1 2.5411 2.8861 3.0660 3.271 3.667
lambda.2 0.6853 0.8153 0.8895 0.966 1.130

plotState(fit) - this shows the probability that the process falls in each of the two states for all years