Tuesday, December 4, 2007

A Poisson Change-Point Model

In Chapter 11 of BCWR, I describe an analysis of a famous dataset, the counts of British coal mining disasters described in Carlin et al (1992). We observe the number of disasters for year , where actual year - 1850. We assume for early years (), has a Poisson distribution with mean , and for the later years, is Poisson(). Suppose we place vague priors on . (Specifically, we'll put a common gamma(c0, d0) prior on each Poisson mean.)

This model can be fit by the use of Gibbs sampling through the introduction of latent data. For each year, one introduces a state where or 2 if is Poisson() or Poisson(). Then one implements Gibbs sampling on the vector , ..., ).

In Chapter 11, I illustrate the use of WinBUGS and the R interface to simulate from this model. MCMCpack also offers a R function MCMCpoissonChangepoint to fit from this model which I'll illustrate here.

First we load in the MCMC package.


We load in the disaster numbers in a vector data.


Suppose we decide to assign gamma(c0, d0) priors on each Poisson mean where c0=1 and d0=1. Then we fit this changepoint model simply by running the function MCMCpoissonChangepoint:

fit=MCMCpoissonChangepoint(data, m = 1, c0 = 1, d0 = 1,
burnin = 10000, mcmc = 10000)

I have included the important arguments: data is obviously the vector of counts, m is the number of unknown changepoints (here m = 1), c0, d0 are the gamma prior parameters, we choose to have a burnin period of 10,000 iterations, and then collect the following 10,000 iterations.

MCMCpack includes several graphical and numerical summaries of the MCMC output: plot(fit), summary(fit), plotState(fit), and plotChangepoint(fit).

plot(fit) shows trace plots and density estimates for the two Poisson means.

summary(fit) gives you summary statistics, including suitable standard errors, for each Poisson mean

Iterations = 10001:20000
Thinning interval = 1
Number of chains = 1
Sample size per chain = 10000

1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:

Mean SD Naive SE Time-series SE
lambda.1 3.0799 0.2870 0.002870 0.002804
lambda.2 0.8935 0.1130 0.001130 0.001170

2. Quantiles for each variable:

2.5% 25% 50% 75% 97.5%
lambda.1 2.5411 2.8861 3.0660 3.271 3.667
lambda.2 0.6853 0.8153 0.8895 0.966 1.130

plotState(fit) - this shows the probability that the process falls in each of the two states for all years

plotChangepoint(fit) -- this displays the posterior distribution of the changepoint location.
This analysis agrees with the analysis of this problem using WinBUGS described in Chapter 11 of BCWR.


adam brown said...

Hi, this is not so related to your page, but it is the site you asked me 1 month ago about the abs diet. I tried it, worked well. Well here is the site

Burak Bayramli said...

Hello, here is a Python implementation for this change point analysis.