Monday, October 1, 2007

Tribute to the Phillies

As some of you might know, the Philadelphia Phillies are in the Major League Baseball playoffs which is pretty amazing. So we'll have to fit a model to some Phillies data. For each game of the 2007 season, we'll record

(1) if they won or lost the game
(2) the margin of victory which is equal to the winners score minus the losers score

We are interested in exploring the relationship between these two variables. Suppose we classify the margin of victory as "close" (3 runs or less) or a "blowout" (4 runs or more). Here is a 2 x 2 contingency table classifying all games by result and margin of victory

margin
close blowout
L 44 29
W 51 38

One of the oldest approaches to estimating the relationship between two ordinal variables is the polychoric coefficient. One assumes that there is an underlying bivariate normal distribution with zero means, unit variances and correlation rho.

The observed counts are found by dividing this continuous measure by the cutpoints c (on the x scale) and d (on the y scale). One can estimate the cutpoints from the data (here one solves Phi(c) = 63/162 and Phi(d) = 95/162, and the likelihood of the correlation coefficient rho is given by

L(rho) = p1^44 p2^29 p3^51 p4^38,

where p1, p2, p3, p4 are the probabilities (dependent on rho) that the bivariate normal falls in the four regions divided by the cutpoints c and d. If we place a uniform prior on rho, then the posterior density will be proportion to the likelihood.

We'll use this example to illustrate different computational approaches to summarizing the posterior distribution.

No comments: