<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-5441519817884072981</id><updated>2011-07-07T17:55:28.093-07:00</updated><title type='text'>Introduction to Bayesian Thinking</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>51</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-9028812913667263531</id><published>2008-10-31T05:26:00.001-07:00</published><updated>2008-10-31T05:45:56.447-07:00</updated><title type='text'>How Many Electoral Votes will Obama Get?</title><content type='html'>Yesterday Chris Rump at BGSU gave an interesting presentation about simulating the 2008 Presidential Election.  He was explaining the methodology used by Nate Silver in the fivethirtyeight.com site.&lt;br /&gt;&lt;br /&gt;Here is a relatively simple Bayesian approach for estimating the number of electoral votes that Barack Obama will get in the election on Tuesday.&lt;br /&gt;&lt;br /&gt;First, using the polling data on cnn.com, I collected the percentages for McCain and Obama in the latest poll in each state.  The web site only gives the survey percentages and not the sample sizes.  A typical sample size in an election size is 1000 -- I will assume that each sample size is 500.  This is conversative and it allows for some changes in voting behavior in the weeks before Election Day.&lt;br /&gt;&lt;br /&gt;Suppose 500 voters in Ohio are sampled and 47% are for McCain and 51% are for Obama -- this means that 235 and 255 voters were for the two candidates.  Let p.M and p.O denote the proportion of the voting population in Ohio for the two candidates -- 1 - p.M - p.O denote the proportion of the population for someone else.  Assuming a vague prior on (p.M, p.O, 1-p.M-p.O), the posterior distribution for the proportions is proportional to&lt;br /&gt;&lt;br /&gt;p.M^ 235 p.O^255 (1-p.M - p.O)^10&lt;br /&gt;&lt;br /&gt;which is a Dirichlet distribution.  The probability that McCain wins the election is simply the posterior probability&lt;br /&gt;&lt;br /&gt;P(p.M &gt; p.O)&lt;br /&gt;&lt;br /&gt;For each state, I can easily estimate this probability by simulation.  One simulates 5000 draws from a Dirichlet distribution and computes the proportion of draws where p.M &gt; p.O.&lt;br /&gt;&lt;br /&gt;The following table summarizes my calculations.  For each state, I give the percentage of voters for McCain and Obama in the latest poll and my computed probability that McCain wins the state based on this data.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;                       State M.pct O.pct prob.M.wins EV&lt;br /&gt;1         Alabama    58    36       1.000  9&lt;br /&gt;2          Alaska    55    37       1.000  3&lt;br /&gt;3         Arizona    53    46       0.946 10&lt;br /&gt;4        Arkansas    53    41       0.997  6&lt;br /&gt;5      California    33    56       0.000 55&lt;br /&gt;6        Colorado    45    53       0.032  9&lt;br /&gt;7     Connecticut    31    56       0.000  7&lt;br /&gt;8        Delaware    38    56       0.000  3&lt;br /&gt;9            D.C.    13    82       0.000  3&lt;br /&gt;10        Florida    47    51       0.189 27&lt;br /&gt;11        Georgia    52    47       0.869 15&lt;br /&gt;12         Hawaii    32    63       0.000  4&lt;br /&gt;13          Idaho    68    26       1.000  4&lt;br /&gt;14       Illinois    35    59       0.000 21&lt;br /&gt;15        Indiana    45    46       0.416 11&lt;br /&gt;16           Iowa    42    52       0.009  7&lt;br /&gt;17         Kansas    63    31       1.000  6&lt;br /&gt;18       Kentucky    55    39       1.000  8&lt;br /&gt;19      Louisiana    50    43       0.949  9&lt;br /&gt;20          Maine    35    56       0.000  4&lt;br /&gt;21       Maryland    39    54       0.000 10&lt;br /&gt;22  Massachusetts    34    53       0.000 12&lt;br /&gt;23       Michigan    36    58       0.000 17&lt;br /&gt;24      Minnesota    38    57       0.000 10&lt;br /&gt;25    Mississippi    46    33       1.000  6&lt;br /&gt;26       Missouri    50    48       0.675 11&lt;br /&gt;27        Montana    48    44       0.825  3&lt;br /&gt;28       Nebraska    43    45       0.329  5&lt;br /&gt;29         Nevada    45    52       0.058  5&lt;br /&gt;30  New Hampshire    39    55       0.000  4&lt;br /&gt;31     New Jersey    36    59       0.000 15&lt;br /&gt;32     New Mexico    40    45       0.117  5&lt;br /&gt;33       New York    31    62       0.000 31&lt;br /&gt;34 North Carolina    46    52       0.088 15&lt;br /&gt;35   North Dakota    43    45       0.318  3&lt;br /&gt;36           Ohio    47    51       0.182 20&lt;br /&gt;37       Oklahoma    61    34       1.000  7&lt;br /&gt;38         Oregon    34    48       0.000  7&lt;br /&gt;39   Pennsylvania    43    55       0.004 21&lt;br /&gt;40   Rhode Island    31    45       0.000  4&lt;br /&gt;41 South Carolina    59    37       1.000  8&lt;br /&gt;42   South Dakota    48    41       0.951  3&lt;br /&gt;43      Tennessee    55    39       1.000 11&lt;br /&gt;44          Texas    57    38       1.000 34&lt;br /&gt;45           Utah    55    32       1.000  5&lt;br /&gt;46        Vermont    36    57       0.000  3&lt;br /&gt;47       Virginia    44    53       0.022 13&lt;br /&gt;48     Washington    34    55       0.000 11&lt;br /&gt;49  West Virginia    53    44       0.978  5&lt;br /&gt;50      Wisconsin    42    53       0.007 10&lt;br /&gt;51        Wyoming    58    32       1.000  3&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;Once we have these win probabilities for all states, it is easy to simulate the election.  Essentially one flips 51 biased coins where the probability that McCain wins are given by these win probabilities.  Once you have simulated the state winners, one can accumulate the electoral votes for the two candidates.  I'll focus on the electoral count for Obama since he is predicted to win the election.&lt;br /&gt;&lt;br /&gt;I repeated this process for 5000 simulated elections.  Here is a histogram of the Obama electoral count.  Note that all of the counts exceed 300 indicating that the probability that Obama wins the election is 1.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/SQr9VyuCpEI/AAAAAAAAASE/4rUotRPZ_T0/s1600-h/election2008.jpg"&gt;&lt;img style="margin: 0pt 10px 10px 0pt; float: left; cursor: pointer; width: 553px; height: 550px;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/SQr9VyuCpEI/AAAAAAAAASE/4rUotRPZ_T0/s320/election2008.jpg" alt="" id="BLOGGER_PHOTO_ID_5263297665369809986" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-9028812913667263531?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/9028812913667263531/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=9028812913667263531' title='45 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/9028812913667263531'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/9028812913667263531'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2008/10/how-many-electoral-votes-will-obama-get.html' title='How Many Electoral Votes will Obama Get?'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_V8g1rNtmHuM/SQr9VyuCpEI/AAAAAAAAASE/4rUotRPZ_T0/s72-c/election2008.jpg' height='72' width='72'/><thr:total>45</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-71280467887666841</id><published>2008-06-23T18:15:00.000-07:00</published><updated>2008-06-23T19:13:15.049-07:00</updated><title type='text'>Variance components model</title><content type='html'>Here is a simple illustration of an variance components model given by "Dyes" in the WinBUGS 1.4 Examples, volume 1:&lt;br /&gt;&lt;br /&gt;******************************************************&lt;br /&gt;Box and Tiao (1973) analyse data first presented by Davies (1967) concerning batch to batch variation in yields of dyestuff. The data (shown below) arise from a balanced experiment whereby the total product yield was determined for 5 samples from each of 6 randomly chosen batches of raw material.&lt;br /&gt;&lt;br /&gt;Batch        Yield (in grams)&lt;br /&gt;_______________________________________&lt;br /&gt;1    1545    1440    1440    1520    1580&lt;br /&gt;2    1540    1555    1490    1560    1495&lt;br /&gt;3    1595    1550    1605    1510    1560&lt;br /&gt;4    1445    1440    1595    1465    1545&lt;br /&gt;5    1595    1630    1515    1635    1625&lt;br /&gt;6    1520    1455    1450    1480    1445&lt;br /&gt;*******************************************************&lt;br /&gt;&lt;br /&gt;Let &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?y_%7Bij%7D" align="middle" border="0" /&gt; denote the jth observation in batch i.   To determine the relative importance of between batch variation versus sampling variation, we fit the multilevel model.&lt;br /&gt;&lt;br /&gt;1.  &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?y_%7Bij%7D" align="middle" border="0" /&gt; is N(&lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Cmu%20+%20b_i,%20%5Csigma_y%5E2" align="middle" border="0" /&gt;)&lt;br /&gt;&lt;br /&gt;2.  &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?b_1,%20...,%20b_N" align="middle" border="0" /&gt; are iid N(0, &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Csigma%5E2_b%29" align="middle" border="0" /&gt;&lt;br /&gt;&lt;br /&gt;3.  &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%28%5Csigma_y%5E2,%20%5Csigma_b%5E2%29" align="middle" border="0" /&gt; assigned a uniform prior&lt;br /&gt;&lt;br /&gt;In this situation, the focus is on the marginal posterior distribution of &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%28%5Cmu,%20%5Csigma_y%5E2,%20%5Csigma_b%5E2%29" align="middle" border="0" /&gt; .  It is possible to analytically integrate out the random effects &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?b_1,%20...,%20b_N" align="middle" border="0" /&gt;, resulting in the marginal posterior&lt;br /&gt;density&lt;br /&gt;&lt;br /&gt;&lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Cprod_%7Bi=1%7D%5EN%20%28%5Cexp%5C%7B-%5Cfrac%7B1%7D%7B2%5Csigma_y%5E2%7D%20S_i%5C%7D%20%5Cfrac%7B1%7D%7B%5Csigma_y%7D%29" align="middle" border="0" /&gt; &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Cprod_%7Bi=1%7D%5EN%20%28%5Cexp%5C%7B-%5Cfrac%7B1%7D%7B2%28%5Csigma%5E2_y/n+%5Csigma%5E2_b%29%7D%20%28%5Cbar%20y_i%20-%20%5Cmu%29%5E2%5C%7D%20%5Cfrac%7B1%7D%7B%5Csqrt%7B%5Csigma%5E2_y/n+%5Csigma%5E2_b%7D%7D%20%29" align="middle" border="0" /&gt;&lt;br /&gt;&lt;br /&gt;where &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?S_i" align="middle" border="0" /&gt; is the "within batch" sum of squares for the ith batch.  To use the computational algorithms in LearnBayes, we consider the log posterior distribution of&lt;br /&gt;&lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%28%5Cmu,%20%5Clog%20%5Csigma_y,%20%5Clog%20%5Csigma_b%29" align="middle" border="0" /&gt; that is programed in the function logpostnorm1:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;logpostnorm1=function(theta,y)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;{&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;mu = theta[1]; sigma.y = exp(theta[2]); sigma.b = exp(theta[3])&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;p.means=apply(y,1,mean); n=dim(y)[2]&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;like1=-(apply(sweep(y,1,p.means)^2,1,sum))/2/sigma.y^2-n*log(sigma.y)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;like2=-(p.means-mu)^2/2/(sigma.y^2/n+sigma.b^2)-.5*log(sigma.y^2/n+sigma.b^2)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;return(sum(like1+like2)+theta[2]+theta[3])&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;In the following R code, I load the LearnBayes package and read in the function logpostnorm1.R and the Dyes dataset stored in "dyes.txt".&lt;br /&gt;&lt;br /&gt;Then I summarize the posterior by use of the laplace function -- the mode of (&lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Clog%20%5Csigma_y,%20%5Clog%20%5Csigma_b" align="middle" border="0" /&gt;) is (3.80, 3.79).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; library(LearnBayes)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; source("logpostnorm1.R")&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; y=read.table("dyes.txt")&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; fit=laplace(logpostnorm1,c(1500,3,3),y)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; fit$mode&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;       [,1]     [,2]     [,3]&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;[1,] 1527.5 3.804004 3.787452&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-71280467887666841?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/71280467887666841/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=71280467887666841' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/71280467887666841'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/71280467887666841'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2008/06/variance-components-model.html' title='Variance components model'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-2248246146780615578</id><published>2008-01-06T12:33:00.000-08:00</published><updated>2008-01-06T12:50:36.636-08:00</updated><title type='text'>Modeling airline on-time arrival rates</title><content type='html'>I am beginning to teach a new course on multilevel modeling using a new book by Gelman and Hill.&lt;br /&gt;&lt;br /&gt;Here is a simple example of multilevel modeling.  The Department of Transportation  in May 2007 issued the Air Travel Consumer Report designed to give information to consumers regarding the quality of services of the airlines.  For 290 airports across the U.S., this report gives the on-line percentage for arriving flights.  Below I've plotted the on-line percentage against the log of the number of flights for these airlines.&lt;span style=";font-family:&amp;quot;;font-size:12;"  &gt;&lt;span style="font-family:arial;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_V8g1rNtmHuM/R4E8I5Zyd0I/AAAAAAAAAJ4/_-yEz6e7g6E/s1600-h/plot1.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_V8g1rNtmHuM/R4E8I5Zyd0I/AAAAAAAAAJ4/_-yEz6e7g6E/s320/plot1.jpg" alt="" id="BLOGGER_PHOTO_ID_5152465572234164034" border="0" /&gt;&lt;/a&gt;What do we notice in this figure?  There is a lot of variation in the on-time percentages.  Also there variation in the on-line percentages seems to decrease as the number of flights increases.&lt;br /&gt;&lt;br /&gt;What explains this variation?  There are a couple of causes.  First, there are genuine differences in the quality of service at the airports that would cause differences in on-time performance.  But also one would expect some natural binomial variability.  Even if a particular airport 's planes will be on-time 80% in the long-run, one would expect some variation in the  on-time performance of the airport in a short time interval.&lt;br /&gt;&lt;br /&gt;In multilevel modeling, we are able to isolate the two types of variation.  We are able to model the binomial variability and also model the differences between the true on-time performances of the airports.&lt;br /&gt;&lt;br /&gt;To show how multilevel model estimates behavior, I've graphed the estimates in red in the following graph.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_V8g1rNtmHuM/R4E-x5Zyd1I/AAAAAAAAAKA/HNfK_l6oH0k/s1600-h/plot2.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_V8g1rNtmHuM/R4E-x5Zyd1I/AAAAAAAAAKA/HNfK_l6oH0k/s320/plot2.jpg" alt="" id="BLOGGER_PHOTO_ID_5152468475632056146" border="0" /&gt;&lt;/a&gt;I call these multilevel estimates "bayes" in the figure.  Note that there are substantial differences between the basic estimates and the multilevel estimates for small airports with a relatively small number of flights.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-2248246146780615578?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/2248246146780615578/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=2248246146780615578' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/2248246146780615578'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/2248246146780615578'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2008/01/modeling-airline-on-time-arrival-rates.html' title='Modeling airline on-time arrival rates'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_V8g1rNtmHuM/R4E8I5Zyd0I/AAAAAAAAAJ4/_-yEz6e7g6E/s72-c/plot1.jpg' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-8938387254874665353</id><published>2007-12-31T10:04:00.000-08:00</published><updated>2007-12-31T10:34:38.102-08:00</updated><title type='text'>New version of LearnBayes</title><content type='html'>Based on my experience teaching Bayes this fall, I've added some new functions to LearnBayes.  The new version, LearnBayes 1.20, is available from the book website http://bayes.bgsu.edu/bcwr&lt;br /&gt;&lt;br /&gt;One simple way of extending conjugate priors is by the use of discrete mixtures.   There are three new functions, binomial.beta.mix, poisson.gamma.mix, and normal.normal.mix, that do the posterior computations for binomial, Poisson, and normal problems using mixtures of conjugate priors.&lt;br /&gt;&lt;br /&gt;Here I'll illustrate one of the functions for Poisson sampling.  Suppose we are interested in learning about the home run rate &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Clambda" align="middle" border="0" /&gt;for Derek Jeter before the start of the 2004 season.  (A home run rate is the proportion of official at-bats that are home runs.)  Suppose our prior beliefs are that the median of &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Clambda" align="middle" border="0" /&gt; is equal to 0.05 and the 90th percentile is equal to 0.081.&lt;br /&gt;Here are two priors that match this information.  The first is a conjugate Gamma prior and the second is a mixture of two conjugate Gamma priors.&lt;br /&gt;&lt;br /&gt;Prior 1:  Gamma(shape = 6, rate = 113.5)&lt;br /&gt;&lt;br /&gt;Prior 2:  0.88 x Gamma(shape = 10, rate = 193.5) + 0.12 x Gamma(shape = 0.2,  rate = 0.415)&lt;br /&gt;&lt;br /&gt;I check below that these two priors match the prior information.&lt;br /&gt;&lt;br /&gt;pgamma(c(.05,.081),shape=6,rate=113.5)&lt;br /&gt;[1] 0.5008145 0.8955648&lt;br /&gt;probs[1]*pgamma(c(.05,.081),shape=gammapar[1,1],rate=gammapar[1,2])+&lt;br /&gt;probs[2]*pgamma(c(.05,.081),shape=gammapar[2,1],rate=gammapar[2,2])&lt;br /&gt;[1] 0.5007136 0.9012596&lt;br /&gt;&lt;br /&gt;Graphs of the two priors are shown below.  They look similar, but the mixture prior has flatter tails.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/R3kycJZydxI/AAAAAAAAAJg/YxwZp_3fUoA/s1600-h/priors.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/R3kycJZydxI/AAAAAAAAAJg/YxwZp_3fUoA/s320/priors.jpg" alt="" id="BLOGGER_PHOTO_ID_5150203108016682770" border="0" /&gt;&lt;/a&gt;Now we observe some data -- the number of at-bats and home runs hit by Jeter in the 2004 season.  This data is contained in the dataset jeter2004 contained in the LearnBayes pacakge.&lt;br /&gt;&lt;br /&gt;library(LearnBayes)&lt;br /&gt;data(jeter2004)&lt;br /&gt;&lt;br /&gt;The function poisson.gamma.mix will compute the posterior for the mixture prior.  The inputs are prior, the vector of prior probabilities of the components, gammapar, a matrix of the gamma parameters for the components, and data, a list with components y (the observed counts) and t (the corresponding time intervals).&lt;br /&gt;&lt;br /&gt;probs=c(.88,.12)&lt;br /&gt;gammapar=rbind(c(10,193.5),c(.2,.415)&lt;br /&gt;data=list(y=jeter2004$HR,t=jeter2004$AB)&lt;br /&gt;&lt;br /&gt;Now we can run the function.&lt;br /&gt;&lt;br /&gt;poisson.gamma.mix(probs,gammapar,data)&lt;br /&gt;$probs&lt;br /&gt;[1] 0.98150453 0.01849547&lt;br /&gt;&lt;br /&gt;$gammapar&lt;br /&gt;     [,1]    [,2]&lt;br /&gt;[1,] 33.0 836.500&lt;br /&gt;[2,] 23.2 643.415&lt;br /&gt;&lt;br /&gt;We see from the output that the posterior for &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Clambda" align="middle" border="0" /&gt; is distributed&lt;br /&gt;&lt;br /&gt;0.98 x Gamma(33.0, 836.5) + 0.02 x Gamma(23.3, 643.4)&lt;br /&gt;&lt;br /&gt;Does the choice of prior make a difference here?  The following figure displays the posteriors for the two priors.  They look similar, indicating that inference about &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Clambda" align="middle" border="0" /&gt; is robust to the choice of prior.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_V8g1rNtmHuM/R3kyipZydyI/AAAAAAAAAJo/xeaUJV5js2k/s1600-h/posteriors1.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_V8g1rNtmHuM/R3kyipZydyI/AAAAAAAAAJo/xeaUJV5js2k/s320/posteriors1.jpg" alt="" id="BLOGGER_PHOTO_ID_5150203219685832482" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;But this robustness will depend on the observed data.  Suppose that Jeter is a "steriod slugger" during the 2004 season and hits 70 home runs in 500 at-bats.&lt;br /&gt;&lt;br /&gt;We run the function poisson.gamma.mix for these data.&lt;br /&gt;&lt;br /&gt;poisson.gamma.mix(probs,gammapar,list(y=70,t=500))&lt;br /&gt;$probs&lt;br /&gt;[1] 0.227754 0.772246&lt;br /&gt;&lt;br /&gt;$gammapar&lt;br /&gt;     [,1]    [,2]&lt;br /&gt;[1,] 80.0 693.500&lt;br /&gt;[2,] 70.2 500.415&lt;br /&gt;&lt;br /&gt;Here the posterior is&lt;br /&gt;&lt;br /&gt;0.23 x gamma(80, 693.5) + 0.77 x gamma(70.2, 500.4)&lt;br /&gt;&lt;br /&gt;I've graphed the two posteriors from the two priors.  Here we see that the two posteriors are significantly different, indicating that the inference depends on the choice of prior.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/R3kypJZydzI/AAAAAAAAAJw/152i6OOKdBU/s1600-h/posteriors2.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/R3kypJZydzI/AAAAAAAAAJw/152i6OOKdBU/s320/posteriors2.jpg" alt="" id="BLOGGER_PHOTO_ID_5150203331354982194" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-8938387254874665353?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/8938387254874665353/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=8938387254874665353' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/8938387254874665353'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/8938387254874665353'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/12/new-version-of-learnbayes.html' title='New version of LearnBayes'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_V8g1rNtmHuM/R3kycJZydxI/AAAAAAAAAJg/YxwZp_3fUoA/s72-c/priors.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-4776278474613177435</id><published>2007-12-04T06:50:00.000-08:00</published><updated>2007-12-04T11:12:42.428-08:00</updated><title type='text'>A Poisson Change-Point Model</title><content type='html'>In Chapter 11 of BCWR, I describe an analysis of a famous dataset, the counts of British coal mining disasters described in Carlin et al (1992).  We observe the number of disasters &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?y_t" align="middle" border="0" /&gt; for year &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?t" align="middle" border="0" /&gt;, where &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?t%20=" align="middle" border="0" /&gt; actual year - 1850.  We assume for early years (&lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?t%20%3C%20%5Ctau" align="middle" border="0" /&gt;), &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?y_t" align="middle" border="0" /&gt; has a Poisson distribution with mean &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Clambda_1" align="middle" border="0" /&gt;, and for the later years, &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?y_t" align="middle" border="0" /&gt; is Poisson(&lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Clambda_2" align="middle" border="0" /&gt;).  Suppose we place vague priors on &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%28%5Clambda_1,%20%5Clambda_2,%20%5Ctau%29" align="middle" border="0" /&gt;.  (Specifically, we'll put a common gamma(c0, d0) prior on each Poisson mean.)&lt;br /&gt;&lt;br /&gt;This model can be fit by the use of Gibbs sampling through the introduction of latent data.  For each year, one introduces a state &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?Z_t" align="middle" border="0" /&gt; where &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?Z_t%20=1" align="middle" border="0" /&gt; or 2 if &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?y_t" align="middle" border="0" /&gt; is Poisson(&lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Clambda_1" align="middle" border="0" /&gt;) or Poisson(&lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Clambda_2" align="middle" border="0" /&gt;).  Then one implements Gibbs sampling on the vector &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%28%5Clambda_1,%20%5Clambda_2,%20Z_1" align="middle" border="0" /&gt;, ..., &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?Z_n" align="middle" border="0" /&gt;).&lt;br /&gt;&lt;br /&gt;In Chapter 11, I illustrate the use of WinBUGS and the R interface to simulate from this model.  MCMCpack also offers a R function MCMCpoissonChangepoint to fit from this model which I'll illustrate here.&lt;br /&gt;&lt;br /&gt;First we load in the MCMC package.&lt;br /&gt;&lt;br /&gt;library(MCMCpack)&lt;br /&gt;&lt;br /&gt;We load in the disaster numbers in a vector data.&lt;br /&gt;&lt;br /&gt;data=c(4,5,4,1,0,4,3,4,0,6,&lt;br /&gt;3,3,4,0,2,6,3,3,5,4,5,3,1,4,4,1,5,5,3,4,2,5,2,2,3,4,2,1,3,2,&lt;br /&gt;1,1,1,1,1,3,0,0,1,0,1,1,0,0,3,1,0,3,2,2,&lt;br /&gt;0,1,1,1,0,1,0,1,0,0,0,2,1,0,0,0,1,1,0,2,&lt;br /&gt;2,3,1,1,2,1,1,1,1,2,4,2,0,0,0,1,4,0,0,0,&lt;br /&gt;1,0,0,0,0,0,1,0,0,1,0,0)&lt;br /&gt;&lt;br /&gt;Suppose we decide to assign gamma(c0, d0) priors on each Poisson mean where c0=1 and d0=1.  Then we fit this changepoint model simply by running the function MCMCpoissonChangepoint:&lt;br /&gt;&lt;br /&gt;fit=MCMCpoissonChangepoint(data,  m = 1, c0 = 1, d0 = 1,&lt;br /&gt;    burnin = 10000, mcmc = 10000)&lt;br /&gt;&lt;br /&gt;I have included the important arguments: data is obviously the vector of counts, m is the number of unknown changepoints (here m = 1), c0, d0 are the gamma prior parameters, we choose to have a burnin period of 10,000 iterations, and then collect the following 10,000 iterations.&lt;br /&gt;&lt;br /&gt;MCMCpack includes several graphical and numerical summaries of the MCMC output:  plot(fit), summary(fit), plotState(fit), and plotChangepoint(fit).&lt;br /&gt;&lt;br /&gt;plot(fit)  shows trace plots and density estimates for the two Poisson means.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/R1WlSJu6IMI/AAAAAAAAAJI/n92yNC5JlAw/s1600-h/changepoint1.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/R1WlSJu6IMI/AAAAAAAAAJI/n92yNC5JlAw/s320/changepoint1.jpg" alt="" id="BLOGGER_PHOTO_ID_5140196280982184130" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;summary(fit) gives you summary statistics, including suitable standard errors, for each Poisson mean&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;Iterations = 10001:20000&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;Thinning interval = 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;Number of chains = 1&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;Sample size per chain = 10000&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;1. Empirical mean and standard deviation for each variable,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt; plus standard error of the mean:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;         Mean     SD Naive SE Time-series SE&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;lambda.1 3.0799 0.2870 0.002870       0.002804&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;lambda.2 0.8935 0.1130 0.001130       0.001170&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;2. Quantiles for each variable:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;         2.5%    25%    50%   75% 97.5%&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;lambda.1 2.5411 2.8861 3.0660 3.271 3.667&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family: courier new;"&gt;lambda.2 0.6853 0.8153 0.8895 0.966 1.130&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;plotState(fit) - this shows the probability that the process falls in each of the two states for all years&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/R1WlyJu6INI/AAAAAAAAAJQ/yWire4oAqqE/s1600-h/changepoint2.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/R1WlyJu6INI/AAAAAAAAAJQ/yWire4oAqqE/s320/changepoint2.jpg" alt="" id="BLOGGER_PHOTO_ID_5140196830737998034" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;plotChangepoint(fit) -- this displays the posterior distribution of the changepoint location.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_V8g1rNtmHuM/R1WmCZu6IOI/AAAAAAAAAJY/y2vS5e6r0wQ/s1600-h/changepoint3.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_V8g1rNtmHuM/R1WmCZu6IOI/AAAAAAAAAJY/y2vS5e6r0wQ/s320/changepoint3.jpg" alt="" id="BLOGGER_PHOTO_ID_5140197109910872290" border="0" /&gt;&lt;/a&gt;This analysis agrees with the analysis of this problem using WinBUGS described in Chapter 11 of BCWR.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-4776278474613177435?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/4776278474613177435/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=4776278474613177435' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/4776278474613177435'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/4776278474613177435'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/12/poisson-change-point-model.html' title='A Poisson Change-Point Model'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_V8g1rNtmHuM/R1WlSJu6IMI/AAAAAAAAAJI/n92yNC5JlAw/s72-c/changepoint1.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-2848700098245633089</id><published>2007-12-01T16:26:00.000-08:00</published><updated>2007-12-01T16:48:32.456-08:00</updated><title type='text'>Probit modeling via MCMCpack</title><content type='html'>I thought briefly about doing a survey of Bayesian R packages in my book.  I'm sure a comparative survey would be helpful to many users, but it is difficult to cover all of the packages  in any depth in a 30 page chapter.  Also, since packages are evolving so fast, much of what I could say would quickly be out of date.&lt;br /&gt;&lt;br /&gt;One package that looks attractive is the MCMCpack package written by Andrew Martin and Kevin Quinn.   They provide MCMC algorithms for many popular statistical models and it seems, at first glance, easy to use.&lt;br /&gt;&lt;br /&gt;Since I just demonstrated the use of Gibbs sampling for a probit model with a normal prior, let's fit this model by MCMCpack.&lt;br /&gt;&lt;br /&gt;The appropriate R function to use is MCMCprobit which uses the same Albert-Chib sampling algorithm-- in it's most basic form, the function looks like&lt;br /&gt;&lt;br /&gt;fit = MCMCprobit(model,  data, burnin, mcmc, thin, b0, B0)&lt;br /&gt;&lt;br /&gt;Here&lt;br /&gt;&lt;br /&gt;fit:  is a description of the probit model, written as any R model like lm.&lt;br /&gt;data:  is the data frame that is used&lt;br /&gt;burnin:  is the number of iterations for the burnin period&lt;br /&gt;mcmc:  is the number of Gibbs iterations&lt;br /&gt;thin:  is the thinning interval&lt;br /&gt;b0:  is the prior mean of the multivariate prior&lt;br /&gt;B0:  is the prior precision matrix&lt;br /&gt;&lt;br /&gt;For my model, here is the syntax:&lt;br /&gt;&lt;br /&gt;fit=MCMCprobit(success~prev.success+act, data=as.data.frame(DATA), burnin=0,&lt;br /&gt; mcmc=10000, thin=1, b0=prior$beta, B0=prior$P)&lt;br /&gt;&lt;br /&gt;After it is run, one can get summaries of the simulated draws of beta by the summary command.&lt;br /&gt;&lt;br /&gt;summary(fit)&lt;br /&gt;&lt;br /&gt;Iterations = 1:10000&lt;br /&gt;Thinning interval = 1&lt;br /&gt;Number of chains = 1&lt;br /&gt;Sample size per chain = 10000&lt;br /&gt;&lt;br /&gt;1. Empirical mean and standard deviation for each variable,&lt;br /&gt;  plus standard error of the mean:&lt;br /&gt;&lt;br /&gt;                Mean      SD  Naive SE Time-series SE&lt;br /&gt;(Intercept)  -1.53215 0.75595 0.0075595      0.0110739&lt;br /&gt;prev.success  1.03590 0.24887 0.0024887      0.0038060&lt;br /&gt;act           0.05093 0.03700 0.0003700      0.0005172&lt;br /&gt;&lt;br /&gt;2. Quantiles for each variable:&lt;br /&gt;&lt;br /&gt;                2.5%      25%      50%      75%    97.5%&lt;br /&gt;(Intercept)  -3.03647 -2.03970 -1.53613 -1.02398 -0.04206&lt;br /&gt;prev.success  0.54531  0.86842  1.03246  1.20129  1.52685&lt;br /&gt;act          -0.02117  0.02567  0.05132  0.07559  0.12451&lt;br /&gt;&lt;br /&gt;Also, one can get trace plots and density estimates by the plot command.&lt;br /&gt;&lt;br /&gt;plot(command)&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_V8g1rNtmHuM/R1H-rpu6ILI/AAAAAAAAAJA/FCTisNv0F5k/s1600-R/mcmcpackplot1.jpeg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_V8g1rNtmHuM/R1H-rpu6ILI/AAAAAAAAAJA/0nouuKIcuxY/s320/mcmcpackplot1.jpeg" alt="" id="BLOGGER_PHOTO_ID_5139168675696877746" border="0" /&gt;&lt;/a&gt;What do I think about this particular function in MCMCpack?&lt;br /&gt;&lt;br /&gt;1.  The execution time for the MCMC is much faster using MCMCprobit since the sampling is done using compiled C++ code.  How much faster?  For 10,000 iterations of Gibbs sampling, it took my laptop 0.58 seconds to do this sampling in MCMCpack compared with 4.53 seconds using my R function.&lt;br /&gt;&lt;br /&gt;2.  MCMCprobit allows for more user input such as the burnin period, thinning rate, starting values, random number seed, etc.&lt;br /&gt;&lt;br /&gt;3.  It allows one to output latent residuals (Albert and Chib, Biometrika) and compute marginal likelihoods by the Laplace method.&lt;br /&gt;&lt;br /&gt;Generally, the function worked fine and I got essentially the same results as I had before.  My only quibble is that it took two tries for MCMCprobit to run.  It complained that my prior precision matrix was not symmetric, although I computed this matrix by the var command in R.  There was a quick fix -- I rounded the values of this matrix to two decimal places and MCMCprobit didn't complain.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-2848700098245633089?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/2848700098245633089/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=2848700098245633089' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/2848700098245633089'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/2848700098245633089'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/12/probit-modeling-via-mcmcpack.html' title='Probit modeling via MCMCpack'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_V8g1rNtmHuM/R1H-rpu6ILI/AAAAAAAAAJA/0nouuKIcuxY/s72-c/mcmcpackplot1.jpeg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-7205116905382747893</id><published>2007-12-01T06:52:00.000-08:00</published><updated>2007-12-01T07:24:52.136-08:00</updated><title type='text'>Probit Modeling</title><content type='html'>I should have put more prior modeling in my Bayesian R book.  One of the obvious advantages of the Bayesian approach is the ability to incorporate prior information.  I'll illustrate the use of informative priors in a simple setting -- binary regression modeling with a probit link where one has prior information about the regression vector.&lt;br /&gt;&lt;br /&gt;At my school, many students typically take a precalculus class that prepares them to take a business calculus class.  We want our students to do well (that is, get a A or B) in the calculus class and wish to understand how the student's performance in the precalculus class and his/her ACT math score are useful in predicting the student's success.&lt;br /&gt;&lt;br /&gt;Here is the probit model.  If y=1 and y=0 represent respectively a student doing well and not well in the calculus class, then we model the probability that y=1 by&lt;br /&gt;&lt;br /&gt;Prob(y=1) = Phi(beta0 + beta1 (PrecalculusGrade) + beta2 (ACT))&lt;br /&gt;&lt;br /&gt;where Phi() is the standard normal cdf, PrecalculusGrade is 1 (0) if the student gets an A (B or C) in the precalculus class, and ACT is the math ACT score.&lt;br /&gt;&lt;br /&gt;Suppose that the regression vector beta = (beta0, beta1, beta2) is assigned a multivariate normal prior with mean vector beta0 and precision matrix P.  Then there is a simple Gibbs sampling algorithm for simulating from the posterior distribution of beta.  The algorithm is based on the idea of augmenting the problem with latent continuous data from a normal distribution.&lt;br /&gt;&lt;br /&gt;Although you might not understand the code, one can implement one iteration of Gibbs sampling by three lines of R code:&lt;br /&gt;&lt;br /&gt;      z=rtruncated(N,LO,HI,pnorm,qnorm,X%*%beta,1)&lt;br /&gt;      mn=solve(BI+t(X)%*%X,BIbeta0+t(X)%*%z)&lt;br /&gt;      beta = t(aa) %*% array(rnorm(p), c(p, 1)) + mn&lt;br /&gt;&lt;br /&gt;Anyway, I want to focus on using this model with prior information.&lt;br /&gt;&lt;br /&gt;1.  First suppose I have a "prior dataset" of 50 students.  I fit this probit model with a vague prior on beta.  The inputs to the function bayes.probit.prior are (1) the vector of binary responses y, (2) the covariate matrix X, and (3) the number of iterations of the Gibbs sampoler.&lt;br /&gt;&lt;br /&gt;fit1=bayes.probit.prior(prior.data[,1],prior.data[,-1],1000)&lt;br /&gt;&lt;br /&gt;2.  I compute the posterior mean and posterior variance-covariance matrix of the simulated draws of beta.  I use these values for my multivariate normal prior on beta.&lt;br /&gt;&lt;br /&gt;prior=list(beta=apply(fit1,2,mean),P=solve(var(fit1)))&lt;br /&gt;&lt;br /&gt;(Note that I'm inputting the precision matrix P that is the inverse of the var-cov matrix.)&lt;br /&gt;&lt;br /&gt;3.  Using this informative prior, I fit the probit model with a new sample of 100 students.&lt;br /&gt;&lt;br /&gt;fit2=bayes.probit.prior(DATA[,1],DATA[,-1],10000,prior=prior)&lt;br /&gt;&lt;br /&gt;The only change in the input is that I input a list "prior" that includes the mean "beta" and the precision matrix P.&lt;br /&gt;&lt;br /&gt;4.  Now I summarize my fit to learn about the relationship of previous grade and ACT on the success of the calculus students.  I define a grid of ACT scores and consider two sets of covariates corresponding to students who were not successful (0) and successful (1) in precalculs.&lt;br /&gt;&lt;br /&gt;act=seq(15,29)&lt;br /&gt;x0=cbind(1,0,act)&lt;br /&gt;x1=cbind(1,1,act)&lt;br /&gt;&lt;br /&gt;Then I use the function bprobit.probs to obtain posterior samples of the probability of success in calculus for each set of covariates.&lt;br /&gt;&lt;br /&gt;fit.x0=bprobit.probs(x0,fit2)&lt;br /&gt;fit.x1=bprobit.probs(x1,fit2)&lt;br /&gt;&lt;br /&gt;I graph the posterior means of the fitted probabilities in the below graph.  There are two lines -- one corresponding to the students who aced the precalculus class, and another line corresponding to the students who did not ace precalculus.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_V8g1rNtmHuM/R1F7mpu6IKI/AAAAAAAAAI4/Io2MQwzvXG8/s1600-R/probit.fit.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_V8g1rNtmHuM/R1F7mpu6IKI/AAAAAAAAAI4/V7xXBVH_fvs/s320/probit.fit.jpg" alt="" id="BLOGGER_PHOTO_ID_5139024553774293154" border="0" /&gt;&lt;/a&gt;Several things are clear from this graph.  The performance in the precalc class matters -- students who ace precalculus have a 30% higher chance of succeeding in the calculus class.  On the other hand, the ACT score (that the student took during high school) has essentially no predictive ability.  It is interesting that the slopes of the lines are negative, but these clearly isn't significant.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-7205116905382747893?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/7205116905382747893/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=7205116905382747893' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/7205116905382747893'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/7205116905382747893'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/12/probit-modeling.html' title='Probit Modeling'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_V8g1rNtmHuM/R1F7mpu6IKI/AAAAAAAAAI4/V7xXBVH_fvs/s72-c/probit.fit.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-2728221368445859842</id><published>2007-11-29T06:19:00.000-08:00</published><updated>2007-11-29T06:33:43.395-08:00</updated><title type='text'>Comparing sampling models by Bayes factors</title><content type='html'>In the last posting, I illustrated fitting  a t(4) sampling model to some baseball data where I suspected there was an outlier.  To compare this model to other sampling models, we can use Bayes factors.&lt;br /&gt;&lt;br /&gt;Here is a t sampling model with a convenient choice of prior.&lt;br /&gt;&lt;br /&gt;1.&lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?y_1,%20...,%20y_n" align="middle" border="0" /&gt;is a random sample from &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?t%28%5Cmu,%20%5Csigma,%20df%29" align="middle" border="0" /&gt;&lt;br /&gt;2. &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Cmu" align="middle" border="0" /&gt; and &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Clog%20%5Csigma" align="middle" border="0" /&gt; are independent, with &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Cmu" align="middle" border="0" /&gt; distributed &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?N%28M_0,%20S_0%29" align="middle" border="0" /&gt; and &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Clog%20%5Csigma" align="middle" border="0" /&gt; distributed &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?N%28M_1,%20S_1%29" align="middle" border="0" /&gt;&lt;br /&gt;&lt;br /&gt;I define a R function tsampling.R that computes the logarithm of the posterior.  Note that I am careful to include all of the normalizing constants, since I am primarily interested in computing the marginal density of &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?y" align="middle" border="0" /&gt;.&lt;br /&gt;&lt;br /&gt;tsampling=function(theta,datapar)&lt;br /&gt;{&lt;br /&gt;mu=theta[1]; logsigma=theta[2]; sigma=exp(logsigma)&lt;br /&gt;&lt;br /&gt;y=datapar$y; df=datapar$df&lt;br /&gt;mu.mean=datapar$mu.mean; mu.sd=datapar$mu.sd&lt;br /&gt;lsigma.mean=datapar$lsigma.mean; lsigma.sd=datapar$lsigma.sd&lt;br /&gt;&lt;br /&gt;loglike=sum(dt((y-mu)/sigma,df,log=TRUE)-log(sigma))&lt;br /&gt;logprior=dnorm(mu,mu.mean,mu.sd,log=TRUE)+&lt;br /&gt;         dnorm(logsigma,lsigma.mean,lsigma.sd,log=TRUE)&lt;br /&gt;&lt;br /&gt;return(loglike+logprior)&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;We use this function together with the function laplace to compute the log marginal density for two models.&lt;br /&gt;&lt;br /&gt;Model 1 -- t(4) sampling, &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Cmu" align="middle" border="0" /&gt; is normal(90, 20), &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Clog%20%5Csigma" align="middle" border="0" /&gt; is normal(1, 1).&lt;br /&gt;&lt;br /&gt;Model 2 -- t(30) sampling, &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Cmu" align="middle" border="0" /&gt; is normal(90, 20), &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Clog%20%5Csigma" align="middle" border="0" /&gt; is normal(1, 1).&lt;br /&gt;&lt;br /&gt;Note that I'm using the same prior for both models -- the only difference is the choice of degrees of freedom in the sampling density.&lt;br /&gt;&lt;br /&gt;dataprior1=list(y=y, df=4, mu.mean=90, mu.sd=20,&lt;br /&gt;      lsigma.mean=1, lsigma.sd=1)&lt;br /&gt;log.m1=laplace(tsampling,c(80, 3), dataprior1)$int&lt;br /&gt;&lt;br /&gt;dataprior2=list(y=y, df=30, mu.mean=90, mu.sd=20,&lt;br /&gt;      lsigma.mean=1, lsigma.sd=1)&lt;br /&gt;log.m2=laplace(tsampling,c(80, 3), dataprior2)$int&lt;br /&gt;&lt;br /&gt;BF.12=exp(log.m1-log.m2)&lt;br /&gt;BF.12&lt;br /&gt;[1] 14.8463&lt;br /&gt;&lt;br /&gt;We see that there is substantial support for the t(4) model over the "close to normal" t(30) model.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-2728221368445859842?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/2728221368445859842/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=2728221368445859842' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/2728221368445859842'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/2728221368445859842'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/11/comparing-sampling-models-by-bayes.html' title='Comparing sampling models by Bayes factors'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-2411376072932355167</id><published>2007-11-29T05:40:00.000-08:00</published><updated>2007-11-29T06:19:22.864-08:00</updated><title type='text'>Robust Modeling using Gibbs Sampling</title><content type='html'>To illustrate the use of Gibbs sampling for robust modeling, here are the batting statistics for the "regular" players on the San Francisco Giants for the 2007 baseball season:&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;Pos Player              Ag   G   AB    R    H   2B 3B  HR  RBI  BB  SO   BA    OBP   SLG  SB  CS  GDP HBP  SH  SF IBB  OPS+&lt;br /&gt;---+-------------------+--+----+----+----+----+---+--+---+----+---+----+-----+-----+-----+---+---+---+---+---+---+---+----+&lt;br /&gt;&lt;span&gt;&lt;/span&gt;&lt;span class="full"&gt;C   &lt;a href="http://www.baseball-reference.com/m/molinbe01.shtml"&gt;Bengie Molina&lt;/a&gt;       32  134  497   38  137  19  1  19   81  15   53  .276  .298  .433   0   0  13   2   1   2   2   86&lt;br /&gt;&lt;/span&gt;&lt;span class="full"&gt;1B *&lt;a href="http://www.baseball-reference.com/k/kleskry01.shtml"&gt;Ryan Klesko&lt;/a&gt;         36  116  362   51   94  27  3   6   44  46   68  .260  .344  .401   5   1  14   1   1   1   2   92&lt;br /&gt;&lt;/span&gt;&lt;span class="full"&gt;2B #&lt;a href="http://www.baseball-reference.com/d/durhara01.shtml"&gt;Ray Durham&lt;/a&gt;          35  138  464   56  101  21  2  11   71  53   75  .218  .295  .343  10   2  18   2   0   9   6   65&lt;br /&gt;&lt;/span&gt;&lt;span class="full"&gt;3B  &lt;a href="http://www.baseball-reference.com/f/felizpe01.shtml"&gt;Pedro Feliz&lt;/a&gt;         32  150  557   61  141  28  2  20   72  29   70  .253  .290  .418   2   2  15   1   0   3   2   81&lt;br /&gt;&lt;/span&gt;&lt;span class="full"&gt;SS #&lt;a href="http://www.baseball-reference.com/v/vizquom01.shtml"&gt;Omar Vizquel&lt;/a&gt;        40  145  513   54  126  18  3   4   51  44   48  .246  .305  .316  14   6  14   1  14   3   6   62&lt;br /&gt;&lt;/span&gt;&lt;span class="full"&gt;LF *&lt;a href="http://www.baseball-reference.com/b/bondsba01.shtml"&gt;Barry Bonds&lt;/a&gt;         42  126  340   75   94  14  0  28   66 132   54  .276  .480  .565   5   0  13   3   0   2  43  170&lt;br /&gt;&lt;/span&gt;&lt;span class="full"&gt;CF *&lt;a href="http://www.baseball-reference.com/r/roberda07.shtml"&gt;Dave Roberts&lt;/a&gt;        35  114  396   61  103  17  9   2   23  42   66  .260  .331  .364  31   5   4   0   4   0   1   80&lt;br /&gt;&lt;/span&gt;&lt;span class="full"&gt;RF #&lt;a href="http://www.baseball-reference.com/w/winnra01.shtml"&gt;Randy Winn&lt;/a&gt;          33  155  593   73  178  42  1  14   65  44   85  .300  .353  .445  15   3  12   7   4   5   3  105&lt;br /&gt;&lt;/span&gt;&lt;span class="full"&gt;    &lt;a href="http://www.baseball-reference.com/a/aurilri01.shtml"&gt;Rich Aurilia&lt;/a&gt;        35   99  329   40   83  19  2   5   33  22   45  .252  .304  .368   0   0   8   4   0   3   1   73&lt;br /&gt;&lt;/span&gt;&lt;span class="full"&gt;    &lt;a href="http://www.baseball-reference.com/f/frandke01.shtml"&gt;Kevin Frandsen&lt;/a&gt;      25  109  264   26   71  12  1   5   31  21   24  .269  .331  .379   4   3  17   5   3   3   3   84&lt;br /&gt;&lt;/span&gt;&lt;span class="full"&gt;   *&lt;a href="http://www.baseball-reference.com/l/lewisfr02.shtml"&gt;Fred Lewis&lt;/a&gt;          26   58  157   34   45   6  2   3   19  19   32  .287  .374  .408   5   1   4   3   1   0   0  103&lt;br /&gt;&lt;/span&gt;&lt;span class="full"&gt;   #&lt;a href="http://www.baseball-reference.com/o/ortmeda01.shtml"&gt;Dan Ortmeier&lt;/a&gt;        26   62  157   20   45   7  4   6   16   7   41  .287  .317  .497   2   1   2   1   0   2   1  107&lt;br /&gt;&lt;/span&gt;&lt;span class="full"&gt;    &lt;a href="http://www.baseball-reference.com/d/davisra01.shtml"&gt;Rajai Davis&lt;/a&gt;         26   51  142   26   40   9  1   1    7  14   25  .282  .363  .380  17   4   0   4   2   0   1   93&lt;br /&gt;&lt;/span&gt;&lt;span class="full"&gt;   *&lt;a href="http://www.baseball-reference.com/s/schiena01.shtml"&gt;Nate Schierholtz&lt;/a&gt;    23   39  112    9   34   5  3   0   10   2   19  .304  .316  .402   3   1   0   1   0   2   0   85&lt;br /&gt;&lt;/span&gt;&lt;span class="full"&gt;   *&lt;a href="http://www.baseball-reference.com/s/sweenma01.shtml"&gt;Mark Sweeney&lt;/a&gt;        37   76   90   18   23   8  0   2   10  13   18  .256  .368  .411   2   0   0   3   1   0   0  102&lt;/span&gt;&lt;/pre&gt;We'll focus on the last batting measure OPS that is a summary of a player's batting effectiveness.&lt;br /&gt;&lt;br /&gt;We read the OPS values for the 15 players into R into the vector y.&lt;br /&gt;&lt;br /&gt;&gt; y&lt;br /&gt;[1]  86  92  65  81  62 170  80 105  73  84 103 107  93  85 102&lt;br /&gt;&lt;br /&gt;We assume that the observations y1, ..., y15 are iid from a t distribution with location &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Cmu" align="middle" border="0" /&gt;, scale &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Csigma" align="middle" border="0" /&gt; and 4 degrees of freedom.  We place the usual noninformative prior &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?1/%5Csigma" align="middle" border="0" /&gt; on &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%28%5Cmu,%20%5Csigma%29" align="middle" border="0" /&gt;.&lt;br /&gt;&lt;br /&gt;We implement 10,000 iterations of Gibbs sampling by use of the function robustt.R in the LearnBayes package.  This function is easy to use -- we just input the data vector y, the degrees of freedom, and the number of iterations.&lt;br /&gt;&lt;br /&gt;fit=robustt(y,4,10000)&lt;br /&gt;&lt;br /&gt;The object fit is a list with components mu, sigma2, and lam -- mu is a vector of simulated draws of &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Cmu" align="middle" border="0" /&gt;, sigma2 is a vector of simulated draws of &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Csigma%5E2" align="middle" border="0" /&gt;, and lam is a matrix of simulated draws of the scale parameters &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Clambda_1,%20...,%20%5Clambda_n" align="middle" border="0" /&gt;.  (The &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Clambda_i" align="middle" border="0" /&gt; are helpful for identifying outliers in the data -- here the outlier is pretty obvious.)&lt;br /&gt;&lt;br /&gt;Below I have graphed the data as solid dots and placed a density estimate of the posterior of &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Cmu" align="middle" border="0" /&gt; on top.  We see that the estimate of &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Cmu" align="middle" border="0" /&gt; appears to ignore the one outlier (Barry Bonds) in the data.&lt;br /&gt;&lt;br /&gt;plot(y,0*y,cex=1.5,pch=19,ylim=c(0,.1),ylab="DENSITY",xlab="MU")&lt;br /&gt;lines(density(fit$mu),lwd=3,col="red")&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_V8g1rNtmHuM/R07EeASLcYI/AAAAAAAAAIw/wyKJ71XyQ08/s1600-h/sfgiants.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_V8g1rNtmHuM/R07EeASLcYI/AAAAAAAAAIw/wyKJ71XyQ08/s320/sfgiants.jpg" alt="" id="BLOGGER_PHOTO_ID_5138260244627681666" border="0" /&gt;&lt;/a&gt;By the way, we have assumed that robust modeling was suitable for this dataset since I knew that the Giants had one unusually good hitter on their team.  Can we formally show that robust modeling the t(4) distribution is better than normal modeling for these data?&lt;br /&gt;&lt;br /&gt;Sure -- we can define two models (with the choice of proper prior distributions) and compare the models by use of a Bayes factor.  I'll illustrate this in my next posting.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-2411376072932355167?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/2411376072932355167/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=2411376072932355167' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/2411376072932355167'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/2411376072932355167'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/11/robust-modeling-using-gibbs-sampling.html' title='Robust Modeling using Gibbs Sampling'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_V8g1rNtmHuM/R07EeASLcYI/AAAAAAAAAIw/wyKJ71XyQ08/s72-c/sfgiants.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-4313499735735846811</id><published>2007-11-26T15:34:00.000-08:00</published><updated>2007-11-26T16:19:53.233-08:00</updated><title type='text'>Gibbs Sampling for Censored Regression</title><content type='html'>Gibbs sampling is very convenient for many "missing data" problems.  To illustrate this situation, suppose we have the simple regression model&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?z_i%20=%20%5Cbeta_0%20+%20%5Cbeta_1%20x_i%20+%20%5Cepsilon_i" align="middle" border="0" /&gt;,&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;where the errors &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Cepsilon_1,%20...,%20%5Cepsilon_n" align="middle" border="0" /&gt; are iid &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?N%280,%20%5Csigma%5E2%29" align="middle" border="0" /&gt;.  The problem is that some of the response variables &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?z_i" align="middle" border="0" /&gt; are right-censored and we actually observe &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?y_i%20=%20%5Cmin%28z_i,%20c_i%29" align="middle" border="0" /&gt;, where &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?c_i" align="middle" border="0" /&gt; is a known censoring value.  We know which observations are uncensored (&lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?y_i%20=%20z_i" align="middle" border="0" /&gt;) and which observations are censored (&lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?y_i%20=%20c_i" align="middle" border="0" /&gt;).&lt;br /&gt;&lt;br /&gt;We illustrate this situation by use of a picture inspired by a similar one in Martin Tanner's book.  We show a scatterplot of the covariate &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?x" align="middle" border="0" /&gt; against the observed &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?y" align="middle" border="0" /&gt;.  The points highlighted in red correspond to censored observations -- the actual observations (the &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?z_i" align="middle" border="0" /&gt;) exceed the censored values, which we indicate by arrows.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_V8g1rNtmHuM/R0thQwSLcXI/AAAAAAAAAIo/CABxtkPkHNU/s1600-h/scatterplot.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_V8g1rNtmHuM/R0thQwSLcXI/AAAAAAAAAIo/CABxtkPkHNU/s320/scatterplot.jpg" alt="" id="BLOGGER_PHOTO_ID_5137306740413133170" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;To apply Gibbs sampling to this situation, we imagine a complete data set where all of the &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?z_i" align="middle" border="0" /&gt;'s are known.   The unknowns in this problem are the regression coefficients &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Cbeta_0,%20%5Cbeta_1" align="middle" border="0" /&gt;, the error variance &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Csigma%5E2" align="middle" border="0" /&gt;, and the &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?z_i" align="middle" border="0" /&gt;'s corresponding to the censored observations.&lt;br /&gt;&lt;br /&gt;The joint posterior of all unobservables (assuming the usual vague prior for regression) has the form&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%3Cbr%3E%5Cfrac%7B1%7D%7B%5Csigma%5E2%7D%20%5Cprod_%7Bi=1%7D%5En%20%5Cphi%28z_i,%20%5Cbeta_0+%5Cbeta_1%20x_i,%20%5Csigma%5E2%29%20I%28z_i,%20y_i%29,%3Cbr%3E" align="middle" border="0" /&gt;&lt;br /&gt;&lt;/div&gt;where &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?I%28z_i,%20y_i%29" align="middle" border="0" /&gt; is equal to 1 if the observation is not censored, and &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?I%28z_i,%20y_i%29%20=%20I%28z_i%20%3E%20c_i%29" align="middle" border="0" /&gt; if the observation is censored at &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?c_i" align="middle" border="0" /&gt;.&lt;br /&gt;&lt;br /&gt;With the introduction of the complete data set, Gibbs sampling is straightforward.&lt;br /&gt;&lt;br /&gt;Suppose one has initial estimates at the regression coefficients and the error variance.  Then&lt;br /&gt;&lt;br /&gt;(1) One simulates from the distribution of the missing data (the &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?z_i" align="middle" border="0" /&gt; for the censored observations) given the parameters.  Specifically &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?z_i" align="middle" border="0" /&gt; is simulated from a normal(&lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Cbeta_0%20+%20%5Cbeta_1%20x_i,%20%5Csigma%5E2" align="middle" border="0" /&gt;) distribution censored below by &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?c_i" align="middle" border="0" /&gt;.&lt;br /&gt;&lt;br /&gt;(2)  Given the complete data, one simulates values of the parameters using the usual approach for a normal linear regression model.&lt;br /&gt;&lt;br /&gt;To do this on R, I have a couple of useful tools.  The function rtruncated.R will simulate draws from an arbitrary truncated distribution&lt;br /&gt;&lt;br /&gt;rtruncated=function(n,lo,hi,pf,qf,...)&lt;br /&gt; qf(pf(lo,...)+runif(n)*(pf(hi,...)-pf(lo,...)),...)&lt;br /&gt;&lt;br /&gt;For example, if one wishes to simulate 20 draws of a normal(mean = 10, sd = 2) distribution that is truncated below by 4, one writes&lt;br /&gt;&lt;br /&gt;rtruncated(20, 4, Inf, pnorm, qnorm, 10, 2)&lt;br /&gt;&lt;br /&gt;Also the function blinreg.R in the LearnBayes package will simulate draws of a regression coefficient &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Cbeta" align="middle" border="0" /&gt; and the error variance &lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%5Csigma%5E2" align="middle" border="0" /&gt; for a normal linear model with a noninformative prior.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-4313499735735846811?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/4313499735735846811/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=4313499735735846811' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/4313499735735846811'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/4313499735735846811'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/11/gibbs-sampling-for-censored-regression.html' title='Gibbs Sampling for Censored Regression'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_V8g1rNtmHuM/R0thQwSLcXI/AAAAAAAAAIo/CABxtkPkHNU/s72-c/scatterplot.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-8316302680776553943</id><published>2007-11-25T05:33:00.000-08:00</published><updated>2007-11-25T05:56:57.852-08:00</updated><title type='text'>Latex in my blog?</title><content type='html'>I didn't know if it was possible to add latex to my postings.  I asked John Shonder who is currently working on solutions on my book and has some latex in his WordPress blog.  John referred me to a page&lt;br /&gt;&lt;br /&gt;http://wolverinex02.googlepages.com/emoticonsforblogger2&lt;br /&gt;&lt;br /&gt;that describes a simple procedure for typing in latex in one's postings.&lt;br /&gt;&lt;br /&gt;Anyway, I tried it out on the "Gibbs sampling for hierarchical models" posting and it works!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-8316302680776553943?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/8316302680776553943/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=8316302680776553943' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/8316302680776553943'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/8316302680776553943'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/11/latex-in-my-blog.html' title='Latex in my blog?'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-8508498262722714002</id><published>2007-11-24T10:09:00.000-08:00</published><updated>2007-11-24T10:43:48.580-08:00</updated><title type='text'>A Comparison of Two MCMC Methods</title><content type='html'>In the last posting, I illustrated a Gibbs sampling algorithm for simulating from a normal/normal hierarchical model.  This method was based on successive substitution sampling from the conditional posterior densities of theta1, ..., thetak, mu, and tau2.  A second sampling method is outlined in Chapter 7 of BCUR.  In this method that we'll call "Exact MCMC" one integrates out the first-stage parameters theta1, ..., thetak and uses a random-walk Metropolis algorithm to simulate from the marginal posterior of (mu, log tau).&lt;br /&gt;&lt;br /&gt;We'll use a few pictures to compare these two methods, specifically in learning about the variance hyperparameter tau.  Here's the exact MCMC method:&lt;br /&gt;&lt;br /&gt;1.  We write a function defining the log posterior of mu and log tau.&lt;br /&gt;&lt;br /&gt;2.  We simulate from (mu, log tau) using the function rwmetrop.R in the LearnBayes package.  We choose a suitable proposal function from output from the laplace.R function.  The acceptance rate of this random walk algorithm is 20%.   We sample for 10,000 iterations, saving the draws of log tau in a vector.&lt;br /&gt;&lt;br /&gt;We then run the Gibbs sampler for 10,000 iterations, also saving the draws of log tau.&lt;br /&gt;&lt;br /&gt;DENSITY PLOTS.  We first compare density estimates of the simulated samples of log tau using the two methods.  The two estimates seem pretty similar.  The density estimate for the Gibbs sampling draws looks smoother, but that doesn't mean this sample is a better estimate of the posterior of log tau.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_V8g1rNtmHuM/R0huFQSLcTI/AAAAAAAAAII/TtJi_uz2vUk/s1600-h/densities.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_V8g1rNtmHuM/R0huFQSLcTI/AAAAAAAAAII/TtJi_uz2vUk/s320/densities.jpg" alt="" id="BLOGGER_PHOTO_ID_5136476411565666610" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;TRACE PLOTS.  We compare trace plots of the two sets of simulated draws.  They look pretty similar, but there are differences in closer inspection.  The draws from the Gibbs sampling run look more irregular, or perhaps they have more of a "snake-like" appearance.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/R0hvOASLcWI/AAAAAAAAAIg/T3itjv8Y468/s1600-h/traceplots.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/R0hvOASLcWI/AAAAAAAAAIg/T3itjv8Y468/s320/traceplots.jpg" alt="" id="BLOGGER_PHOTO_ID_5136477661401149794" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;AUTOCORRELATION PLOTS.  A autocorrelation plot is a graph of the autocorrelations&lt;br /&gt;&lt;br /&gt;corr (theta(j), theta(j-g))&lt;br /&gt;&lt;br /&gt;graphed against the lag g.  For most MCMC runs, the values of the stream of simulated draws will show positive autocorrelation, but hopefully the autocorrelation values decrease quickly as the lag increases.  It is pretty obvious that the lag autocorrelation values drop off slower for method 2 (Gibbs sampling); in contrast, the lag autocorrelations decrease to zero for method 1 (Exact MCMC) for values of the lag from 1 to 15.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_V8g1rNtmHuM/R0huSgSLcVI/AAAAAAAAAIY/2NHVLW0F9M4/s1600-h/autocorr.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_V8g1rNtmHuM/R0huSgSLcVI/AAAAAAAAAIY/2NHVLW0F9M4/s320/autocorr.jpg" alt="" id="BLOGGER_PHOTO_ID_5136476639198933330" border="0" /&gt;&lt;/a&gt;The moral of the story is that the exact MCMC method shows better mixing than Gibbs sampling in this example.  This means that, for a given sample size (here 10,000), one will have more accurate estimates at the posterior of tau using the exact MCMC method.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-8508498262722714002?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/8508498262722714002/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=8508498262722714002' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/8508498262722714002'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/8508498262722714002'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/11/comparison-of-two-mcmc-methods.html' title='A Comparison of Two MCMC Methods'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_V8g1rNtmHuM/R0huFQSLcTI/AAAAAAAAAII/TtJi_uz2vUk/s72-c/densities.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-3739349256188205252</id><published>2007-11-24T08:20:00.000-08:00</published><updated>2007-11-25T11:11:46.203-08:00</updated><title type='text'>Gibbs Sampling for Hierarchical Models</title><content type='html'>Gibbs sampling is an attractive "automatic" method of setting up a MCMC algorithm for many classes of models.  Here we illustrate using R to write a Gibbs sampling algorithm for the normal/normal exchangeable model.&lt;br /&gt;&lt;br /&gt;We write the model in three stages as follows.&lt;br /&gt;&lt;br /&gt;1. The observations y1, ..., yk are independent where yj is N(thetaj, sigmaj^2), where we write N(mu, sigma2) to denote the normal density with mean mu and variance sigma2.  We assume the sampling variances sigma1^2, ..., sigmak^2 are known.&lt;br /&gt;&lt;br /&gt;2.  The means theta1,..., thetak are a random sample from a N(mu, tau2) population.  (tau2 is the variance of the population).&lt;br /&gt;&lt;br /&gt;3.  The hyperparameters (mu, tau) are assumed to have a uniform distribution.  This implies that the parameters (mu, tau2) have a prior proportional to (tau2)^(-1/2).&lt;br /&gt;&lt;br /&gt;To write a Gibbs sampling algorithm, we write down the joint posterior of all parameters (theta1, ..., thetak, mu, tau2):&lt;br /&gt;&lt;br /&gt;&lt;img src="http://www.forkosh.dreamhost.com/mimetex.cgi?%3Cbr%3E%5Cfrac%7B1%7D%7B%5Ctau%7D%20%5Cprod_%7Bj=1%7D%5Ek%20%5Cfrac%7B1%7D%7B%5Csqrt%7B%5Csigma_j%5E2%7D%7D%20%5Cexp%5C%7B-%28y_j-%5Ctheta_j%29%5E2/%282%5Csigma_j%5E2%29%5C%7D%20%3Cbr%3E%5Ctimes%20%5Cprod_%7Bj=1%7D%5Ek%20%5Cfrac%7B1%7D%7B%5Csqrt%7B%5Ctau%5E2%7D%7D%20%5Cexp%5C%7B-%28%5Ctheta_j%20-%20%5Cmu%29%5E2/%282%20%5Ctau%5E2%29%5C%7D%3Cbr%3E" align="middle" border="0" /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;From this expression, one can see&lt;br /&gt;&lt;br /&gt;1.  The posterior of thetaj conditional on all remaining parameters is normal, where the mean and variance are given by the usual normal density/normal prior updating formula.&lt;br /&gt;&lt;br /&gt;2.  The hyperparameter mu has a normal posterior with mean theta_bar (the sample mean of the thetaj) and variance tau2/k.&lt;br /&gt;&lt;br /&gt;3.  The hyperparameter tau2 has an inverse gamma posterior with shape (k-1)/2 and rate 1/2 sum(thetaj - mu)^2.&lt;br /&gt;&lt;br /&gt;Given that all the conditional posteriors have convenient functional forms, we write a R function to implement the Gibbs sampling.  The only inputs are the data matrix (columns containing yj and sigmaj^2) and the number of iterations m.&lt;br /&gt;&lt;br /&gt;I'll display the function normnormexch.gibbs.R with notes.&lt;br /&gt;&lt;br /&gt;normnormexch.gibbs=function(data,m)&lt;br /&gt;{&lt;br /&gt;y=data[,1]; k=length(y); sigma2=data[,2]  # HERE I READ IN THE DATA&lt;br /&gt;&lt;br /&gt;THETA=array(0,c(m,k)); MU=rep(0,m); TAU2=rep(0,m)  # SET UP STORAGE&lt;br /&gt;&lt;br /&gt;mu=mean(y); tau2=median(sigma2) # INITIAL ESTIMATES OF MU AND TAU2&lt;br /&gt;&lt;br /&gt;for (j in 1:m)  # HERE'S THE GIBBS SAMPLING&lt;br /&gt;{&lt;br /&gt;p.means=(y/sigma2+mu/tau2)/(1/sigma2+1/tau2)  # CONDITIONAL POSTERIORS&lt;br /&gt;p.sds=sqrt(1/(1/sigma2+1/tau2))                                 # OF THETA1,...THETAK&lt;br /&gt;theta=rnorm(k,mean=p.means,sd=p.sds)&lt;br /&gt;&lt;br /&gt;mu=rnorm(1,mean=mean(theta),sd=sqrt(tau2/k))  # CONDITIONAL POSTERIOR OF MU&lt;br /&gt;&lt;br /&gt;tau2=rigamma(1,(k-1)/2,sum((theta-mu)^2)/2)   # CONDITIONAL POSTERIOR OF TAU2&lt;br /&gt;&lt;br /&gt;THETA[j,]=theta; MU[j]=mu; TAU2[j]=tau2       # STORE SIMULATED DRAWS&lt;br /&gt;}&lt;br /&gt;return(list(theta=THETA,mu=MU,tau2=TAU2))  # RETURN A LIST WITH SAMPLES&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;Here is an illustration of this algorithm for the SAT example from Gelman et al.&lt;br /&gt;&lt;br /&gt;y=c(28,8,-3,7,-1,1,18,12)&lt;br /&gt;sigma=c(15,10,16,11,9,11,10,18)&lt;br /&gt;data=cbind(y,sigma^2)&lt;br /&gt;fit=normnormexch.gibbs(data,1000)&lt;br /&gt;&lt;br /&gt;In the Chapter 7 exercise that used this example, a different sampling algorithm was used to simulate from the joint posterior of (theta, mu, tau2) -- it was a direct sampling algorithm based on the decomposition&lt;br /&gt;&lt;br /&gt;[theta, mu, tau2] = [mu, tau2] [theta | mu, tau2]&lt;br /&gt;&lt;br /&gt;In a later posting, I'll compare the two sampling algorithms.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-3739349256188205252?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/3739349256188205252/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=3739349256188205252' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/3739349256188205252'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/3739349256188205252'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/11/gibbs-sampling-for-hierarchical-models.html' title='Gibbs Sampling for Hierarchical Models'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-7393620582237155561</id><published>2007-11-17T05:10:00.000-08:00</published><updated>2007-11-18T16:56:25.404-08:00</updated><title type='text'>Bayesian model selection</title><content type='html'>Here we illustrate one advantage of Bayesian regression modeling.  By the use of an informative prior, it is straightforward to implement regression model selection.&lt;br /&gt;&lt;br /&gt;Arnold Zellner introduced an attractive method of implementing prior information into a regression model.  He assumes [beta | sigma^2] is normal with mean beta0 and variance-covariance matrix of the form&lt;br /&gt;&lt;br /&gt;V = c sigma^2 (X'X)^(-1)&lt;br /&gt;&lt;br /&gt;and then takes [sigma^2] distributed according to the noninformative prior proportional to 1/sigma^2.  This is called Zellner's G-prior.&lt;br /&gt;&lt;br /&gt;One nice thing about this prior is that it requires only two prior inputs from the user:  (1) a choice of the prior mean beta0, and (2) c that can be interpreted as the amount of information in the prior relative to the sample.&lt;br /&gt;&lt;br /&gt;We can use Zellner's G-prior to implement model selection in a regression analysis.  Suppose we have p predictors of interest -- there are 2^p possible models corresponding to the inclusion or exclusion of each predictor in the model.&lt;br /&gt;&lt;br /&gt;A G-prior is placed on the full model that contains all parameters.  We assume that beta0 is the zero vector and choose c to be a large value reflecting vague prior beliefs.   The prior on (beta, sigma^2) for this full model is&lt;br /&gt;&lt;br /&gt;N(beta; beta0, c sigma^2 (X'X)^(-1)) (1/sigma^2)&lt;br /&gt;&lt;br /&gt;Then for any submodel defined by a reduced design matrix X1, we take the prior on (beta, sigma^2) to be&lt;br /&gt;&lt;br /&gt;N(beta; beta0, c sigma^2 (X1'X1)^(-1)) (1/sigma^2)&lt;br /&gt;&lt;br /&gt;Then we can compare models by the computation of associated predictive probabilities.&lt;br /&gt;&lt;br /&gt;To illustrate this methodology, we consider an interesting dataset on the behavior of puffins from Devore and Peck's text.  One is interesting in understanding the nesting frequency behavior of these birds (the y variable) on the basis of four covariates:  x1, the grass cover, x2, the mean soil depth, x3, the angle of slope, and x4, the distance from cliff edge.&lt;br /&gt;&lt;br /&gt;We first write a function that computes the log posterior of (beta, log(sigma)) for a regression model with normal sampling and a Zellner G prior with beta0 = 0 and a given value of c.&lt;br /&gt;&lt;br /&gt;regpost.mod=function(theta,stuff)&lt;br /&gt;{&lt;br /&gt;y=stuff$y; X=stuff$X; c=stuff$c&lt;br /&gt;beta=theta[-length(theta)]; sigma=exp(theta[length(theta)])&lt;br /&gt;if (length(beta)&gt;1)&lt;br /&gt; loglike=sum(dnorm(y,mean=X%*%as.vector(beta),sd=sigma,log=TRUE))&lt;br /&gt;else&lt;br /&gt; loglike=sum(dnorm(y,mean=X*beta,sd=sigma,log=TRUE))&lt;br /&gt;logprior=dmnorm(beta,mean=0*beta,varcov=c*sigma^2*solve(t(X)%*%X),log=TRUE)&lt;br /&gt;return(loglike+logprior)&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;We read in the puffin data and define the design matrix X for the full model.&lt;br /&gt;&lt;br /&gt;puffin=read.table("puffin.txt",header=T)&lt;br /&gt;X=cbind(1,puffin$x1,puffin$x2,puffin$x3,puffin$x4)&lt;br /&gt;&lt;br /&gt;S, the input for the log posterior function, is a list with components y, X, and c.&lt;br /&gt;&lt;br /&gt;S=list(y=puffin$y, X=X, c=100)&lt;br /&gt;&lt;br /&gt;Since there are 4 covariates, there are 2^4 = 16 possible models.  We define a logical matrix GAM of dimension 16 x 5 that describes the inclusion of exclusion of each covariate in the model. (The first column is TRUE since we want to include the constant term in each regression model.)&lt;br /&gt;&lt;br /&gt;GAM=array(T,c(16,5))&lt;br /&gt;TF=c(T,F); k=0&lt;br /&gt;for (i1 in 1:2) {for (i2 in 1:2) {for (i3 in 1:2) {for (i4 in 1:2){&lt;br /&gt; k=k+1; GAM[k,]=cbind(T,TF[i1],TF[i2],TF[i3],TF[i4])}}}}&lt;br /&gt;&lt;br /&gt;For each model, we use the laplace function (in the LearnBayes package) to compute the marginal likelihood.  The inputs to laplace are the function  regpost.mod defining our model, an intelligent guess at the model (given by a least-squares fit), and the list S that contains y, X, and c.&lt;br /&gt;&lt;br /&gt;gof=rep(0,16)&lt;br /&gt;for (j in 1:16)&lt;br /&gt; {&lt;br /&gt; S$X=X[,GAM[j,]]&lt;br /&gt; theta=c(lm(S$y~0+S$X)$coef,0)&lt;br /&gt; gof[j]=laplace(regpost.mod,theta,S)$int&lt;br /&gt; }&lt;br /&gt;&lt;br /&gt;We display each model and the associated marginal likelihood values (on the log scale).&lt;br /&gt;&lt;br /&gt;data.frame(GAM,gof)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;   X1    X2    X3    X4    X5       gof&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;1  TRUE  TRUE  TRUE  TRUE  TRUE -104.1850&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;2  TRUE  TRUE  TRUE  TRUE FALSE -115.4042&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;3  TRUE  TRUE  TRUE FALSE  TRUE -102.3523&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;4  TRUE  TRUE  TRUE FALSE FALSE -136.3972&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;5  TRUE  TRUE FALSE  TRUE  TRUE -105.0931&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;6  TRUE  TRUE FALSE  TRUE FALSE -113.1782&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;7  TRUE  TRUE FALSE FALSE  TRUE -105.5690&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;8  TRUE  TRUE FALSE FALSE FALSE -134.0486&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;9  TRUE FALSE  TRUE  TRUE  TRUE -101.8833&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;10 TRUE FALSE  TRUE  TRUE FALSE -114.9573&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold; color: rgb(255, 0, 0);font-family:courier new;" &gt;11 TRUE FALSE  TRUE FALSE  TRUE -100.3735&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;12 TRUE FALSE  TRUE FALSE FALSE -134.5129&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;13 TRUE FALSE FALSE  TRUE  TRUE -102.8117&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;14 TRUE FALSE FALSE  TRUE FALSE -112.6721&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;15 TRUE FALSE FALSE FALSE  TRUE -103.2963&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;16 TRUE FALSE FALSE FALSE FALSE -132.1824&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I highlight the model (inclusion of covariates X3 and X5) that has the largest value of the log marginal likelihood.  This tells us that the best model for understanding nesting behavior is the one that includes mean soil depth (X3) and the distance from cliff edge (X5) .  One can also compare different models by the use of Bayes factors.&lt;br /&gt;&lt;br /&gt;****************************************************************************&lt;br /&gt;For students who have to do their own model selection, I've written a simple function&lt;br /&gt;&lt;br /&gt;bayes.model.selection.R&lt;br /&gt;&lt;br /&gt;that will give these log marginal likelihood values for all regression models.  You have to download this function from bayes.bgsu.edu/m648 and have LearnBayes 1.11 installed on your machine.&lt;br /&gt;&lt;br /&gt;Here's an example of using this function.  I first load in the puffin dataset and define the response vector y, the covariate matrix X, and the prior parameter c.&lt;br /&gt;&lt;br /&gt;puffin=read.table("puffin.txt",header=T)&lt;br /&gt;X=cbind(1,puffin$x1,puffin$x2,puffin$x3,puffin$x4)&lt;br /&gt;y=puffin$y&lt;br /&gt;c=100&lt;br /&gt;&lt;br /&gt;Then I just run the function bayes.model.selection with y, X, and c as inputs.&lt;br /&gt;&lt;br /&gt;bayes.model.selection(y,X,c)&lt;br /&gt;&lt;br /&gt;The output will be the data frame shown above.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-7393620582237155561?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/7393620582237155561/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=7393620582237155561' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/7393620582237155561'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/7393620582237155561'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/11/bayesian-model-selection.html' title='Bayesian model selection'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-5604419986649870946</id><published>2007-11-15T05:11:00.000-08:00</published><updated>2007-11-15T05:56:39.471-08:00</updated><title type='text'>Bayesian regression</title><content type='html'>To introduce Bayesian regression modeling, we consider a dataset from De Veaux, Velleman and Bock which collected physical measurements from a sample of 250 males.   One is interested in predicting a person's body fat from his height, waist, and chest measurements.&lt;br /&gt;&lt;br /&gt;The file Body_fat.txt contains the data that we read in R.&lt;br /&gt;&lt;br /&gt;data=read.table("Body_fat.txt",sep="\t",header=TRUE)&lt;br /&gt;names(data)&lt;br /&gt;[1] "Pct.BF" "Height" "Waist"  "Chest"&lt;br /&gt;attach(data)&lt;br /&gt;&lt;br /&gt;Suppose we wish to fit the regression model&lt;br /&gt;&lt;br /&gt;Pct.BF ~ Height + Waist + Chest&lt;br /&gt;&lt;br /&gt;The standard least-squares fit is done using the lm command in R.&lt;br /&gt;&lt;br /&gt;fit=lm(Pct.BF~Height+Waist+Chest)&lt;br /&gt;&lt;br /&gt;Here is a portion of the summary of the fit.&lt;br /&gt;&lt;br /&gt;summary(fit)&lt;br /&gt;&lt;br /&gt;Coefficients:&lt;br /&gt;           Estimate Std. Error t value Pr(&gt;|t|)  &lt;br /&gt;(Intercept)  2.06539    7.80232   0.265  0.79145  &lt;br /&gt;Height      -0.56083    0.10940  -5.126 5.98e-07 ***&lt;br /&gt;Waist        2.19976    0.16755  13.129  &lt; 2e-16 ***&lt;br /&gt;Chest       -0.23376    0.08324  -2.808  0.00538 **&lt;br /&gt;---&lt;br /&gt;Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1&lt;br /&gt;&lt;br /&gt;Residual standard error: 4.399 on 246 degrees of freedom&lt;br /&gt;Multiple R-Squared: 0.7221,    Adjusted R-squared: 0.7187&lt;br /&gt;F-statistic: 213.1 on 3 and 246 DF,  p-value: &lt; 2.2e-16&lt;br /&gt;&lt;br /&gt;Now let's consider a Bayesian fit of this model.   Suppose we place the usual noninformative prior on the regression vector beta and the error variance sigma^2.&lt;br /&gt;&lt;br /&gt;g(beta, sigma^2) = 1/sigma^2.&lt;br /&gt;&lt;br /&gt;Then there is a simple direct method (outlined in BCUR) of simulating from the posterior distribution of (beta, sigma^2).&lt;br /&gt;&lt;br /&gt;1.  We first create the design matrix X:&lt;br /&gt;&lt;br /&gt;X=cbind(1,Height,Waist,Chest)&lt;br /&gt;&lt;br /&gt;The response is contained in the vector Pct.BF.&lt;br /&gt;&lt;br /&gt;2.  To simulate 5000 draws from the posterior of (beta, sigma), we use the function blinreg in the LearnBayes package.&lt;br /&gt;&lt;br /&gt;fit=blinreg(Pct.BF, X, 5000)&lt;br /&gt;&lt;br /&gt;The output fit is a list with two components:  beta is a matrix of simulated draws of beta where each column corresponds to a sample from component of beta, and sigma is a vector of draws from the marginal posterior of sigma.&lt;br /&gt;&lt;br /&gt;We can summarize the simulated draws of beta by computing posterior means and standard deviations.&lt;br /&gt;&lt;br /&gt;apply(fit$beta,2,mean)&lt;br /&gt;        X    XHeight     XWaist     XChest&lt;br /&gt;1.8748122 -0.5579671  2.2031474 -0.2350661&lt;br /&gt;&lt;br /&gt;apply(fit$beta,2,sd)&lt;br /&gt;        X    XHeight     XWaist     XChest&lt;br /&gt;7.84069390 0.11050071 0.16839919 0.08286758&lt;br /&gt;&lt;br /&gt;Here is a graph of density estimates of simulated draws from the four regression parameters.&lt;br /&gt;&lt;br /&gt;par(mfrow=c(2,2))&lt;br /&gt;for (j in 1:4) plot(density(fit$beta[,j]),main=paste("BETA ",j),&lt;br /&gt;         lwd=3, col="red", xlab="PAR")&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/RzxPtASLcSI/AAAAAAAAAIA/xFjDBdkEhAg/s1600-h/regression1.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/RzxPtASLcSI/AAAAAAAAAIA/xFjDBdkEhAg/s320/regression1.jpg" alt="" id="BLOGGER_PHOTO_ID_5133065309884477730" border="0" /&gt;&lt;/a&gt;What's so special about Bayesian regression if we are essentially replicating the frequentist regression fit?&lt;br /&gt;&lt;br /&gt;We'll talk about the advantages of Bayesian regression in the next blog posting.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-5604419986649870946?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/5604419986649870946/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=5604419986649870946' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/5604419986649870946'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/5604419986649870946'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/11/bayesian-regression.html' title='Bayesian regression'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_V8g1rNtmHuM/RzxPtASLcSI/AAAAAAAAAIA/xFjDBdkEhAg/s72-c/regression1.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-1382405949719436675</id><published>2007-11-11T11:12:00.000-08:00</published><updated>2007-11-11T13:25:27.939-08:00</updated><title type='text'>Looking for True Streakiness</title><content type='html'>There is a lot of interest in streaky behavior in sports.   One observes players or teams that appear streaky with the implicit conclusion that this says something about the character of the athlete.&lt;br /&gt;&lt;br /&gt;Eric Byrnes had 412 opportunities to hit during the 2005 baseball season.  Here is his sequence of hits (successes) and outs (failures) during the season.&lt;br /&gt;&lt;br /&gt;[1] 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 0&lt;br /&gt;[38] 0 1 1 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0 1 0&lt;br /&gt;[75] 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0&lt;br /&gt;[112] 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0&lt;br /&gt;[149] 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 1 0&lt;br /&gt;[186] 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 0 0 0&lt;br /&gt;[223] 0 0 0 0 0 0 0 0 0 1 0 1 1 0 1 0 0 1 1 0 0 0 1 1 0 1 1 0 1 1 0 1 1 0 1 0 0&lt;br /&gt;[260] 0 0 0 0 1 0 1 0 0 0 0 1 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0&lt;br /&gt;[297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0&lt;br /&gt;[334] 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0&lt;br /&gt;[371] 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 1 1 1 0 0&lt;br /&gt;[408] 0 0 0 0 0&lt;br /&gt;&lt;br /&gt;One way of seeing the streaky behavior in this sequence is by a moving average graph where one plots the success rate (batting average) for windows of 40 at-bats.   I wrote a short program mavg.R to compute the moving averages.  The following R code plots the moving averages and plots a lowess smooth on top to help see the pattern.&lt;br /&gt;&lt;br /&gt;MAVG=mavg(byrne$x,k=40)&lt;br /&gt;plot(MAVG,type="l",lwd=2,col="red",xlab="GAME",ylab="AVG",&lt;br /&gt;main="ERIC BYRNES")&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_V8g1rNtmHuM/Rzdai4J22fI/AAAAAAAAAHg/I5XggIq_WPI/s1600-h/brynes1.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_V8g1rNtmHuM/Rzdai4J22fI/AAAAAAAAAHg/I5XggIq_WPI/s320/brynes1.jpg" alt="" id="BLOGGER_PHOTO_ID_5131669855647750642" border="0" /&gt;&lt;/a&gt;We some interesting patterns..  It seemed that Byrnes had a cold spell in the first part of the season, followed by a hot period, and then a very cold period.&lt;br /&gt;&lt;br /&gt;The interesting question is:  is this streaky pattern "real" or is it just a byproduct of bernoulli chance variation?&lt;br /&gt;&lt;br /&gt;We answer this question by means of a Bayes factor.  Suppose we partition Byrnes' 412 at-bats into groups of 20 at-bats.  We observe counts y1, ..., yn, where yi is the number of hits in the ith group.  Suppose yi is binomial(20, pi) where pi is the probability of a hit in the ith period.&lt;br /&gt;&lt;br /&gt;We define two hypotheses:&lt;br /&gt;&lt;br /&gt;H (not streaky)  the probabilities across periods are equal, p1 = ... = pn = p&lt;br /&gt;&lt;br /&gt;A (streaky) the probabilities across periods vary according to a beta distribution with mean eta and precision K.  This model is indexed by the parameter K.&lt;br /&gt;&lt;br /&gt;The functions bfexch and laplace  in the LearnBayes package can be used to compute a Bayes factor in support of A over H.  Here is how we do it.&lt;br /&gt;&lt;br /&gt;1.  The raw data is in the matrix BRYNE -- the first column contains the data (0's and 1's) and the second column contains the attempts (column of 1's).    We regroup the data into periods of 20 at-bats using the regroup function.&lt;br /&gt;&lt;br /&gt;regroup(BRYNE, 20)&lt;br /&gt;&lt;br /&gt;2.  The following R function laplace.int will compute the log, base10 of the Bayes factor in support of streakiness for a fixed value of log(K).&lt;br /&gt;&lt;br /&gt;laplace.int=function(logK,data=data1)&lt;br /&gt;log10(exp(laplace(bfexch,0,list(data=data,K=exp(logK)))$int))&lt;br /&gt;&lt;br /&gt;To illustrate, suppose we want to compute the log10 Bayes factor for our data for logK = 3:&lt;br /&gt;&lt;br /&gt;&gt; laplace.int(3,regroup(BRYNE,20))&lt;br /&gt;   [,1]&lt;br /&gt;[1,] 1.386111&lt;br /&gt;&lt;br /&gt;This indicates support for streakiness -- the log Bayes factor is 1.38 which means that A is over 10 times more likely than H.&lt;br /&gt;&lt;br /&gt;3.  Generally we'd like to compute the log10 Bayes factor for a sequence of values of log K.  I first write a simple function that does this:&lt;br /&gt;&lt;br /&gt;s.laplace.int=function(logK,data)&lt;br /&gt;list(x=logK,y=sapply(logK,laplace.int,data))&lt;br /&gt;&lt;br /&gt;and then  I use this function to compute the Bayes factor for values of log K from 2 to 6 in steps of 0.2.  I use the plot command to graph these values.  I draw a line at the value log10 BF = 0 -- this corresponds to the case where neither model is supported.&lt;br /&gt;&lt;br /&gt;plot(s.laplace.int(seq(2,6,by=.2),regroup(BRYNE,20)),type="l",&lt;br /&gt;xlab="LOG K", ylab="LOG 10 BAYES FACTOR", lwd=3, col="red", ylim=c(-3,2))&lt;br /&gt;lines(c(1,7),c(0,0),lwd=3,col="blue")&lt;br /&gt;title(main="ERIC BYRNES")&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/RzdkroJ22gI/AAAAAAAAAHo/SZXQKWaa5bU/s1600-h/brynes2.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/RzdkroJ22gI/AAAAAAAAAHo/SZXQKWaa5bU/s320/brynes2.jpg" alt="" id="BLOGGER_PHOTO_ID_5131681001087883778" border="0" /&gt;&lt;/a&gt;What we see that, for a range of values of K, the Bayes factor favors the model A by a factor of 10 or more.&lt;br /&gt;&lt;br /&gt;Actually we only looked at Eric Byrnes since he exhibited unusually streaky behavior during this 2005 season.  What if we look at other players?  Here are the Bayes factors graphs for the hitting data for two other players Chase Utley and Damian Miller (we are grouping the data in the same way).&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_V8g1rNtmHuM/Rzdo2IJ22hI/AAAAAAAAAHw/q-t0Pm6geQ4/s1600-h/utley.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_V8g1rNtmHuM/Rzdo2IJ22hI/AAAAAAAAAHw/q-t0Pm6geQ4/s320/utley.jpg" alt="" id="BLOGGER_PHOTO_ID_5131685579523021330" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_V8g1rNtmHuM/Rzdo94J22iI/AAAAAAAAAH4/xlk3ENoPSjo/s1600-h/miller.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_V8g1rNtmHuM/Rzdo94J22iI/AAAAAAAAAH4/xlk3ENoPSjo/s320/miller.jpg" alt="" id="BLOGGER_PHOTO_ID_5131685712667007522" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Here for both players, note that the log10 Bayes factors are entirely negative for the range of K values.  For both players, there is support for the non-streaky model H.  One distinctive features of Bayes factors is that they can provide support for the null or the alternative hypothesis.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-1382405949719436675?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/1382405949719436675/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=1382405949719436675' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/1382405949719436675'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/1382405949719436675'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/11/looking-for-true-streakiness.html' title='Looking for True Streakiness'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_V8g1rNtmHuM/Rzdai4J22fI/AAAAAAAAAHg/I5XggIq_WPI/s72-c/brynes1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-995508772497042540</id><published>2007-11-07T16:33:00.000-08:00</published><updated>2007-11-08T05:11:03.690-08:00</updated><title type='text'>Test of Independence in a 2 x 2 Table</title><content type='html'>&lt;span style="font-family:georgia;"&gt;Consider data collected in a study described in Dorn (1954) to assess the&lt;/span&gt;&lt;span style="font-family:georgia;"&gt; relationship between smoking and lung cancer. In this study, a sample of 86&lt;/span&gt;&lt;span style="font-family:georgia;"&gt; lung-cancer patients and a sample of 86 controls were questioned about their&lt;/span&gt;&lt;span style="font-family:georgia;"&gt; smoking habits. The two groups were chosen to represent random samples from&lt;/span&gt;&lt;span style="font-family:georgia;"&gt; a sub-population of lung-cancer patients and an otherwise similar population of&lt;/span&gt;&lt;span style="font-family:georgia;"&gt; cancer-free individuals.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Here is the 2 x 2 table of responses:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;           Cancer Control&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;Smokers      83     72&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;Non-smokers   3     14&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:times new roman;"&gt;&lt;span style="font-family:georgia;"&gt;Let pL and pC denote the proportions of lung-cancer patients and controls who smoke.  We wish to test H: pL = pC against the alternative A: pL &lt;&gt; pC.&lt;br /&gt;&lt;br /&gt;To construct a Bayesian test, we define a suitable model for H and for A, and then compute the Bayes factor in support of the alternative A.&lt;br /&gt;&lt;br /&gt;1.  To describe these models, first we transform the proportions to the logits&lt;br /&gt;&lt;br /&gt;LogitL = log(pL/(1-pL)),  Logit C = log(pC/(1-pC))&lt;br /&gt;&lt;br /&gt;2.  We then define two parameters theta1, theta2, that are equal to the difference and sum of the logits.&lt;br /&gt;&lt;br /&gt;theta1 = LogitL - LogitC,  theta2 = LogitL + LogitC.&lt;br /&gt;&lt;br /&gt;theta1 is the log odds ratio, a popular measure of association in a 2 x 2 table.   Under the hypothesis of independence H, theta1 = 0.&lt;br /&gt;&lt;br /&gt;3.  Consider the following prior on theta1 and theta2.  We assume they are independent where&lt;br /&gt;&lt;br /&gt;theta1 is N(0, tau1),  theta2 is N(0, tau2).&lt;br /&gt;&lt;br /&gt;4.  Under H (independence), we assume theta1 = 0, so we set tau1 = 0.  theta2 is a nuisance parameter that we arbitrarily be N(0, 1).  (The Bayes factor will be insensitive to this choice.)&lt;br /&gt;&lt;br /&gt;5.  Under A (not independence), we assume theta1 is N(0, tau1), where tau1 reflects our beliefs about the location of theta1 when the proportions are different.  We also assume again that theta2 is N(0, 1).  (This means that our beliefs about theta2 are insensitive to our beliefs about theta1.)&lt;br /&gt;&lt;br /&gt;To compute the marginal densities, we write a function that computes the logarithm of the posterior when (theta1, theta2) have the above prior.&lt;br /&gt;&lt;br /&gt;logctable.test=function (theta, datapar)&lt;br /&gt;{&lt;br /&gt;theta1 = theta[1]  # log odds ratio&lt;br /&gt;theta2 = theta[2]  # log odds product&lt;br /&gt;&lt;br /&gt;s1 = datapar$data[1,1]&lt;br /&gt;f1 = datapar$data[1,2]&lt;br /&gt;s2 = datapar$data[2,1]&lt;br /&gt;f2 = datapar$data[2,2]&lt;br /&gt;&lt;br /&gt;logitp1 = (theta1 + theta2)/2&lt;br /&gt;logitp2 = (theta2 - theta1)/2&lt;br /&gt;loglike = s1 * logitp1 - (s1 + f1) * log(1 + exp(logitp1))+&lt;br /&gt;  s2 * logitp2 - (s2 + f2) * log(1 + exp(logitp2))&lt;br /&gt;logprior = dnorm(theta1,mean=0,sd=datapar$tau1,log=TRUE)+&lt;br /&gt;  dnorm(theta2,mean=0,sd=datapar$tau2,log=TRUE)&lt;br /&gt;&lt;br /&gt;return(loglike+logprior)&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;We enter the data as a 2 x 2 matrix:&lt;br /&gt;&lt;br /&gt;data=matrix(c(83,3,72,14),c(2,2))&lt;br /&gt;data&lt;br /&gt;  [,1] [,2]&lt;br /&gt;[1,]   83   72&lt;br /&gt;[2,]    3   14&lt;br /&gt;&lt;br /&gt;The argument datapar in the function is a list consisting of data, the 2 x 2 data table, and the values of tau1 and tau2.&lt;br /&gt;&lt;br /&gt;Suppose we assume theta1 is N(0, .8) under the alternative hypothesis.   This prior is shown in the below figure.&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_V8g1rNtmHuM/RzMKx4J22eI/AAAAAAAAAHY/OIj69JTe-TE/s1600-h/prior.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_V8g1rNtmHuM/RzMKx4J22eI/AAAAAAAAAHY/OIj69JTe-TE/s320/prior.jpg" alt="" id="BLOGGER_PHOTO_ID_5130456252508723682" border="0" /&gt;&lt;/a&gt;&lt;span style="font-family:courier new;"&gt;&lt;span style="font-family:times new roman;"&gt;&lt;span style="font-family:georgia;"&gt;By using the laplace function, we compute the log marginal density under both models.  (For H, we are approximating the point mass of theta1 on 1 by a normal density with a tiny standard deviation tau1.)&lt;br /&gt;&lt;br /&gt;l.marg0=laplace(logctable.test,c(0,0),list(data=data,tau1=.0001,tau2=1))$int&lt;br /&gt;l.marg1=laplace(logctable.test,c(0,0),list(data=data,tau1=0.8,tau2=1))$int&lt;br /&gt;&lt;br /&gt;We compute the Bayes factor in support of the hypothesis A.&lt;br /&gt;&lt;br /&gt;BF.10=exp(l.marg1-l.marg0)&lt;br /&gt;BF.10&lt;br /&gt;[1] 7.001088&lt;br /&gt;&lt;br /&gt;The conclusion is that the alternative hypothesis A is seven times more plausible than the null hypothesis H.&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-995508772497042540?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/995508772497042540/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=995508772497042540' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/995508772497042540'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/995508772497042540'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/11/test-of-independence-in-2-x-2-table.html' title='Test of Independence in a 2 x 2 Table'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_V8g1rNtmHuM/RzMKx4J22eI/AAAAAAAAAHY/OIj69JTe-TE/s72-c/prior.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-6579545137266684582</id><published>2007-11-07T02:48:00.000-08:00</published><updated>2007-11-07T03:15:20.998-08:00</updated><title type='text'>Simple Illustration of Bayes Factors</title><content type='html'>Suppose we collect the number of accidents in a year for 30 Belgium drivers.  We assume that y1,..., y30 are independent Poisson(lambda), where lambda is the average number of accidents for all Belgium drivers.&lt;br /&gt;&lt;br /&gt;Consider the following four priors for lambda:&lt;br /&gt;&lt;br /&gt;PRIOR 1:  lambda is gamma with shape 3.8 and rate 8.1.  This prior reflects the belief that the quartiles of lambda are 0.29 and 0.60.&lt;br /&gt;&lt;br /&gt;PRIOR 2:  lambda is gamma with shape 3.8 and rate 4.  The mean of this prior is 0.95 so this prior reflects one's belief that lambda is close to 1.&lt;br /&gt;&lt;br /&gt;PRIOR 3:  lambda is gamma with shape 0.38 and 0.81.  This prior has the same mean as PRIOR 1, but it is much more diffuse, reflecting weaker information about lambda.&lt;br /&gt;&lt;br /&gt;PRIOR 4:  log lambda is normal with mean -0.87 and standard deviation 0.60.  On the surface, this looks different from the previous priors, but this prior also matches the belief that the quartiles of lambda are 0.29 and 0.60.&lt;br /&gt;&lt;br /&gt;Suppose we observe some data -- for the 30 drivers, 22 had no accidents, 7 had exactly one accident, and 1 had two accidents.  The likelihood is given by&lt;br /&gt;&lt;br /&gt;LIKE = exp(-30 lambda) lambda^9&lt;br /&gt;&lt;br /&gt;In the below graphs, we display the likelihood in blue and show the four priors in red.  Here's the R code to produce one of the graphs.  We simulate draws from the likelihood and the prior and display density estimates.&lt;br /&gt;&lt;br /&gt;like=rgamma(10000,shape=10,rate=30)&lt;br /&gt;p1=rgamma(10000,shape=3.8,rate=8.1)&lt;br /&gt;plot(density(p1),xlim=c(0,3),ylim=c(0,4),&lt;br /&gt;main="PRIOR 1",xlab="LAMBDA",lwd=3,col="red",col.main="red")&lt;br /&gt;lines(density(like),lwd=3,col="blue")&lt;br /&gt;text(1.2,3,"LIKELIHOOD",col="blue")&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_V8g1rNtmHuM/RzGeJraLXgI/AAAAAAAAAHQ/1c3qclia1EY/s1600-h/fourpriors.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_V8g1rNtmHuM/RzGeJraLXgI/AAAAAAAAAHQ/1c3qclia1EY/s320/fourpriors.jpg" alt="" id="BLOGGER_PHOTO_ID_5130055339660238338" border="0" /&gt;&lt;/a&gt;Note that Priors 1 and 4 are pretty consistent with the likelihood.  There is some conflict between Prior 2 and the likelihood and Prior 3 is pretty flat relative to the likelihood.&lt;br /&gt;&lt;br /&gt;We can compare the four models by use of Bayes factors.  We first compute a function that computes the log posterior for each prior.  There already is a function logpoissgamma in the LearnBayes package that computes the posterior of log lambda with Poisson sampling and a gamma prior. (This can be used for priors 1, 2, and 3.)  The function logpoissnormal can be used for Poisson sampling and a normal prior (prior 4).  Then we use the function laplace to approximate the value of the log predictive density.&lt;br /&gt;&lt;br /&gt;For example, here's the code to compute the log marginal density for prior 1.&lt;br /&gt;&lt;br /&gt;datapar=list(data=d,par=c(3.8,8.1))&lt;br /&gt;laplace(logpoissgamma,.5,datapar)$int&lt;br /&gt;0.4952788&lt;br /&gt;&lt;br /&gt;So log m(y) for prior 1 is about 0.5.&lt;br /&gt;&lt;br /&gt;We do this for each prior and get the following values:&lt;br /&gt;&lt;br /&gt;model    log m(y)&lt;br /&gt;-----------------&lt;br /&gt;PRIOR 1  0.495&lt;br /&gt;PRIOR 2  -0.729&lt;br /&gt;PRIOR 3 - 0.454&lt;br /&gt;PRIOR 4 0.558&lt;br /&gt;&lt;br /&gt;We can use this output to compute Bayes factors.  For example, the Bayes factor in support of PRIOR 1 over PRIOR 2 is&lt;br /&gt;&lt;br /&gt;BF_12 = exp(0.495 - (-0.729)) = 3.4&lt;br /&gt;&lt;br /&gt;This means that the model with PRIOR 1 is about three and a half times as likely as the model with PRIOR 2.  This is not surprising, seeing the conflict between the likelihood and the Bayes factor in the graph.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-6579545137266684582?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/6579545137266684582/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=6579545137266684582' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/6579545137266684582'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/6579545137266684582'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/11/simple-illustration-of-bayes-factors.html' title='Simple Illustration of Bayes Factors'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_V8g1rNtmHuM/RzGeJraLXgI/AAAAAAAAAHQ/1c3qclia1EY/s72-c/fourpriors.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-4117343865702769140</id><published>2007-11-04T14:49:00.000-08:00</published><updated>2007-11-04T16:07:26.633-08:00</updated><title type='text'>Conflict between Bayesian and Frequentist Measures of Evidence</title><content type='html'>Here's a simple illustration of the conflict between a p-value and a Bayesian measure of evidence.&lt;br /&gt;&lt;br /&gt;Suppose you take a sample y1,..., yn from a normal population with mean mu and known standard deviation sigma.   You wish to test&lt;br /&gt;&lt;br /&gt;H:  mu = mu0  A:  mu not equal to mu0&lt;br /&gt;&lt;br /&gt;The usual test is based on the statistic Z = sqrt(n)*(ybar - mu0)/sigma.  One computes the p-value&lt;br /&gt;&lt;br /&gt;p-value = 2 x P(Z &gt;= z0)&lt;br /&gt;&lt;br /&gt;and rejects H if the p-value is small.  Suppose mu0 = 0, sigma = 1, and one takes a sample of size n = 4 and observe ybar = 0.98.  Then one computes&lt;br /&gt;&lt;br /&gt;Z = sqrt(4)*0.98 = 1.96&lt;br /&gt;&lt;br /&gt;and the p-value is&lt;br /&gt;&lt;br /&gt;p-value = 2 * P(Z &gt; = 1.96) = 0.05.&lt;br /&gt;&lt;br /&gt;Consider the following Bayes test of H against A.  A Bayesian model is a specification of the sampling density and the prior density.  One model M0 says that the mean mu = mu0.   To complete the second model M1, we place a normal prior with mean mu0 and standard deviation tau on mu.  The Bayes factor in support of the M0 over the model M1 is given by the ratio of predictive densities&lt;br /&gt;&lt;br /&gt;BF = m(y | M0)/m(y|M1)&lt;br /&gt;&lt;br /&gt;and the posterior probability of M0 is given by&lt;br /&gt;&lt;br /&gt;P(M0| y) = p0 BF/(p0 BF + p1),&lt;br /&gt;&lt;br /&gt;where p0 is the prior probability of M0.&lt;br /&gt;&lt;br /&gt;The function mnormt.twosided in the LearnBayes package does this calculation.  To use this function, we specify (1) the value m0 to be tested, (2) the prior probability of H, (3) the value of tau (the spread of the prior under A), and (4) the data vector that is (ybar, n, sigma).&lt;br /&gt;&lt;br /&gt;Here we specify the inputs:&lt;br /&gt;&lt;br /&gt;mu0=0; prob=.5; tau=0.5&lt;br /&gt;ybar = 0.98; n = 4; sigma=1&lt;br /&gt;data=c(ybar,n,sigma)&lt;br /&gt;&lt;br /&gt;Then we can use mnormt.twosided -- the outputs are the Bayes factor and the posterior probability of H:&lt;br /&gt;&lt;br /&gt;mnormt.twosided(mu0,prob,tau,data)&lt;br /&gt;$bf&lt;br /&gt;[1] 0.5412758&lt;br /&gt;&lt;br /&gt;$post&lt;br /&gt;[1] 0.3511868&lt;br /&gt;&lt;br /&gt;We see that the posterior probability of H0 is 0.35 which is substantially higher than the p-value of 0.05.&lt;br /&gt;&lt;br /&gt;In this calculation, we assumed that tau = 0.5 -- this reflect our belief about the spread of mu about mu0 under the alternative hypothesis.  What if we chose a different value for tau?&lt;br /&gt;&lt;br /&gt;We investigate the sensitivity of this posterior probability calculation with respect to tau.&lt;br /&gt;&lt;br /&gt;We write a function that computes the posterior probability for a given value of tau.&lt;br /&gt;&lt;br /&gt;post.prob=function(tau)&lt;br /&gt; {&lt;br /&gt; data=c(.98,4,1); mu0=0; prob0=.5&lt;br /&gt; mnormt.twosided(mu0,prob,tau,data)$post&lt;br /&gt; }&lt;br /&gt;&lt;br /&gt;Then we use the curve function to plot this function for values of tau between 0.01 to 4.&lt;br /&gt;&lt;br /&gt;curve(post.prob,from=.01,to=4,xlab="TAU",ylab="PROB(H0)",lwd=3,col="red")&lt;br /&gt;&lt;br /&gt;In the figure below, it looks like the probability of H exceeds 0.32 for all tau.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/Ry5dTraLXeI/AAAAAAAAAHA/5KXOUbzPTWc/s1600-h/bfplot1.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/Ry5dTraLXeI/AAAAAAAAAHA/5KXOUbzPTWc/s320/bfplot1.jpg" alt="" id="BLOGGER_PHOTO_ID_5129139618272992738" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-4117343865702769140?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/4117343865702769140/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=4117343865702769140' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/4117343865702769140'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/4117343865702769140'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/11/conflict-between-bayesian-and.html' title='Conflict between Bayesian and Frequentist Measures of Evidence'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_V8g1rNtmHuM/Ry5dTraLXeI/AAAAAAAAAHA/5KXOUbzPTWc/s72-c/bfplot1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-8556991080318884690</id><published>2007-10-30T06:14:00.000-07:00</published><updated>2007-10-30T06:39:27.863-07:00</updated><title type='text'>A Prediction Contest</title><content type='html'>Concluding our baseball example, recall that we observed the home run rates for 20 players in the month of April and we were interested in predicting their home run rates for the next month.  Since we have collected data for both April and May, we can check the accuracy of three prediction methods.&lt;br /&gt;&lt;br /&gt;1.  The naive method would be to simply use the April rates to predict the May rates.  Recall that the data matrix is d where the first column are the at-bats and the second column are the home run counts.&lt;br /&gt;&lt;br /&gt;pred1=d[,2]/d[,1]&lt;br /&gt;&lt;br /&gt;2.   A second method, which we called the "pooled" method, predicts each player's home run rate by the pooled home run rate for all 20 players in April.&lt;br /&gt;&lt;br /&gt;pred2=sum(d[,2])/sum(d[,1])&lt;br /&gt;&lt;br /&gt;3.  The Bayesian method, predicts a player's May rate by his posterior mean of his true rate lambda_j.&lt;br /&gt;&lt;br /&gt;pred3=post.means&lt;br /&gt;&lt;br /&gt;One measure of accuracy is the sum of absolute prediction errors.&lt;br /&gt;&lt;br /&gt;The May home run rates are stored in the vector may.rates.  We first compute the individual prediction errors for all three methods.&lt;br /&gt;&lt;br /&gt;error1=abs(pred1-may.rates)&lt;br /&gt;error2=abs(pred2-may.rates)&lt;br /&gt;error3=abs(pred3-may.rates)&lt;br /&gt;&lt;br /&gt;We use the apply statement to sum the absolute errors.&lt;br /&gt;&lt;br /&gt;errors=cbind(error1,error2,error3)&lt;br /&gt;apply(errors,2,sum) # sum of absolute errors for all methods&lt;br /&gt;&lt;br /&gt; error1    error2    error3&lt;br /&gt;0.3393553 0.3111552 0.2622523&lt;br /&gt;&lt;br /&gt;By this criterion, Bayes beats Pooled beats Naive.&lt;br /&gt;&lt;br /&gt;Finally, suppose we are interested in predicting the number of home runs hit by the first player Chase Utley in May.  Suppose we know that he'll have 115 at-bats in May.&lt;br /&gt;&lt;br /&gt;1.  We have already simulated 10,000 draws from Utley's true home run rate lambda1 -- these are stored in the first column of the matrix lam vector.&lt;br /&gt;&lt;br /&gt;2.  Then 10,000 draws from Utley's posterior predictive distribution are obtained by use of the rpois function.&lt;br /&gt;&lt;br /&gt;ys1=rpois(10000,AB*lam[,1])&lt;br /&gt;&lt;br /&gt;We graph this predictive distribution by the command&lt;br /&gt;&lt;br /&gt;plot(table(ys1),xlab="NUMBER OF MAY HOME RUNS")&lt;br /&gt;&lt;br /&gt;The most likely number of May home runs is 3, but a 90% prediction interval is [1, 9].&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/Rycz7baLXdI/AAAAAAAAAG4/4Yw1vH82sao/s1600-h/baseball4.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/Rycz7baLXdI/AAAAAAAAAG4/4Yw1vH82sao/s320/baseball4.jpg" alt="" id="BLOGGER_PHOTO_ID_5127123796847451602" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-8556991080318884690?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/8556991080318884690/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=8556991080318884690' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/8556991080318884690'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/8556991080318884690'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/10/prediction-contest.html' title='A Prediction Contest'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_V8g1rNtmHuM/Rycz7baLXdI/AAAAAAAAAG4/4Yw1vH82sao/s72-c/baseball4.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-59815827043242498</id><published>2007-10-27T06:30:00.001-07:00</published><updated>2007-10-27T06:47:46.024-07:00</updated><title type='text'>Fitting an Exchangeable Model</title><content type='html'>Continuing our baseball example, we observe yi home runs in ei at-bats for the ith player.  We assume&lt;br /&gt;&lt;br /&gt;1.  y1, ..., y20 are independent, yi is Poisson(lambda_i)&lt;br /&gt;2.  the true rates lambda_1,..., lambda_20 are independent Gamma(alpha, alpha/mu)&lt;br /&gt;3.  the hyperparameters alpha, mu are independent, mu is distributed 1/mu, alpha is distributed according to the proper prior z0/(alpha+z0)^2, where z0 is the prior median&lt;br /&gt;&lt;br /&gt;The data is stored as a matrix d, where the first column are the ei and the second column are the yi.  In our example, we let z0 = 1; that is, the prior median of alpha is one.&lt;br /&gt;&lt;br /&gt;Here is our computing strategy.&lt;br /&gt;&lt;br /&gt;1.  First we learn about the hyperparameters (alpha, mu).  The posterior of (log alpha, log mu) is programmed in the LearnBayes function poissgamexch.  Here is a contour plot.&lt;br /&gt;&lt;br /&gt;ycontour(poissgamexch,c(-1,8,-4.2,-2.5),datapar)&lt;br /&gt;title(xlab="LOG ALPHA",ylab="LOG MU")&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_V8g1rNtmHuM/RyNA-raLXaI/AAAAAAAAAGg/FBeVdON4c2U/s1600-h/baseball1.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_V8g1rNtmHuM/RyNA-raLXaI/AAAAAAAAAGg/FBeVdON4c2U/s320/baseball1.jpg" alt="" id="BLOGGER_PHOTO_ID_5126012246426344866" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;We use the function laplace to find the posterior mode and associated variance-covariance matrix.  The output from laplace is used to find a proposal variance and scale parameter to use in a random walk Metropolis chain.  We simulate 10,000 draws from the posterior of (log alpha, log mu). (This chain has approximately a 30% acceptance rate.)&lt;br /&gt;&lt;br /&gt;fit=laplace(poissgamexch,c(2,-3.2),datapar)&lt;br /&gt;proposal=list(var=fit$var,scale=2)&lt;br /&gt;mcmcfit=rwmetrop(poissgamexch,proposal,c(1,-3),10000,datapar)&lt;br /&gt;&lt;br /&gt;By exponentiating the simulated draws from mcmcfit, we get draws from the marginal posteriors of alpha and mu.  We draw a density plot of alpha and superimpose the prior density of alpha.&lt;br /&gt;&lt;br /&gt;alpha=exp(mcmcfit$par[,1])&lt;br /&gt;mu=exp(mcmcfit$par[,2])&lt;br /&gt;plot(density(alpha,adjust=2),xlim=c(0,20),col="blue",lwd=3,xlab="ALPHA",&lt;br /&gt;main="POSTERIOR OF ALPHA")&lt;br /&gt;&lt;br /&gt;prior=function(alpha,z0) z0/(alpha+z0)^2&lt;br /&gt;theta=seq(.0001,20,length=200)&lt;br /&gt;lines(theta,prior(theta,1),col="red",lwd=3)&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_V8g1rNtmHuM/RyNBJraLXbI/AAAAAAAAAGo/eZ3XySCG3SU/s1600-h/baseball2.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_V8g1rNtmHuM/RyNBJraLXbI/AAAAAAAAAGo/eZ3XySCG3SU/s320/baseball2.jpg" alt="" id="BLOGGER_PHOTO_ID_5126012435404905906" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;2.  Now we can learn about the true rate parameters.  Conditional on hyperparameter values, lambda_1, ..., lambda_20 have independent gamma posteriors.&lt;br /&gt;&lt;br /&gt;We write a short function to simulate draws of lambda_j for a particular player j.&lt;br /&gt;&lt;br /&gt;trueratesim=function(j,data,alpha,mu)&lt;br /&gt;{&lt;br /&gt;e=data[,1]; y=data[,2]&lt;br /&gt;rgamma(length(alpha),shape=alpha+y[j],rate=alpha/mu+e[j])&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;Then we can simulate draws of all the true rates by use of the sapply function.&lt;br /&gt;&lt;br /&gt;lam=sapply(1:20,trueratesim,d,alpha,mu)&lt;br /&gt;&lt;br /&gt;The output lam is a 10,000 by 20 matrix.  We can compute posterior means by the apply function.&lt;br /&gt;&lt;br /&gt;post.means=apply(lam,2,mean)&lt;br /&gt;&lt;br /&gt;To show the behavior of the posterior means, we draw a plot where&lt;br /&gt;(1) we show the observed rates yi/ei as red dots&lt;br /&gt;(2) we show the posterior means as blue dots&lt;br /&gt;(3) a horizontal line is draw at the mean home run rates for all 20 players.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_V8g1rNtmHuM/RyNBUraLXcI/AAAAAAAAAGw/Zcg7wVvnrzg/s1600-h/baseball3.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_V8g1rNtmHuM/RyNBUraLXcI/AAAAAAAAAGw/Zcg7wVvnrzg/s320/baseball3.jpg" alt="" id="BLOGGER_PHOTO_ID_5126012624383466946" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-59815827043242498?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/59815827043242498/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=59815827043242498' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/59815827043242498'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/59815827043242498'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/10/fitting-exchangeable-model.html' title='Fitting an Exchangeable Model'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_V8g1rNtmHuM/RyNA-raLXaI/AAAAAAAAAGg/FBeVdON4c2U/s72-c/baseball1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-2099667034163746395</id><published>2007-10-26T13:54:00.000-07:00</published><updated>2007-10-26T13:57:30.947-07:00</updated><title type='text'>Fun to watch!</title><content type='html'>&lt;a href="http://www.youtube.com/watch?v=AA98PYAKzYo"&gt;http://www.youtube.com/watch?v=AA98PYAKzYo&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-2099667034163746395?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/2099667034163746395/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=2099667034163746395' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/2099667034163746395'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/2099667034163746395'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/10/fun-to-watch.html' title='Fun to watch!'/><author><name>Imed Jmiai</name><uri>http://www.blogger.com/profile/16849558239281533984</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-3056961108196419541</id><published>2007-10-25T11:58:00.000-07:00</published><updated>2007-10-25T12:11:06.790-07:00</updated><title type='text'>Predicting home run rates</title><content type='html'>Here is a simple prediction problem.  Suppose we observe the number of home runs and the number of at-bats for 20 baseball players during the first month of the baseball season (April).&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_V8g1rNtmHuM/RyDoY7aLXZI/AAAAAAAAAGY/z3bCK_hoZT0/s1600-h/datatable,jpg.JPG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_V8g1rNtmHuM/RyDoY7aLXZI/AAAAAAAAAGY/z3bCK_hoZT0/s320/datatable,jpg.JPG" alt="" id="BLOGGER_PHOTO_ID_5125351890909617554" border="0" /&gt;&lt;/a&gt;We observe corresponding home run rates:  Utley 4/88, Weeks 1/77, Aurila 4/71, and so on. &lt;br /&gt;&lt;br /&gt;From this data, we want to predict the home run rates for the same 20 players for the next month (May).&lt;br /&gt;&lt;br /&gt;How can we best do this?  Here are some opening comments.&lt;br /&gt;&lt;br /&gt;1.  One idea would be to estimate the May rates just by using the April rates.  So we predict Utley's May rate to be 4/88, Weeks rate to be 1/77, etc.  This may not be a good idea since we have limited data for each player.&lt;br /&gt;&lt;br /&gt;2.  Or maybe a better strategy would be to combine the data.  Collectively in April, this group hit 60 home runs in 1697 at-bats for a rate of 60/1697 = 0.035.  We could estimate each player's May rate by 0.035.  But this would ignore the fact that players have different abilities to hit home runs.&lt;br /&gt;&lt;br /&gt;3.  Actually, the best prediction strategy is a compromise between the first two ideas.  A good plan is to predict a player's May rate by a "shrinkage" estimate that shrinks a player's individual rate towards the combined rate.&lt;br /&gt;&lt;br /&gt;In the following postings, we'll illustate how to fit a Bayesian exchangeable model to these data that gives a good prediction strategy.&lt;br /&gt;&lt;br /&gt;&lt;img src="file:///C:/DOCUME%7E1/JIMALB%7E1/LOCALS%7E1/Temp/moz-screenshot.jpg" alt="" /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-3056961108196419541?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/3056961108196419541/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=3056961108196419541' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/3056961108196419541'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/3056961108196419541'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/10/predicting-home-run-rates.html' title='Predicting home run rates'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_V8g1rNtmHuM/RyDoY7aLXZI/AAAAAAAAAGY/z3bCK_hoZT0/s72-c/datatable,jpg.JPG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-126442850066053053</id><published>2007-10-23T10:41:00.001-07:00</published><updated>2007-10-23T11:17:45.646-07:00</updated><title type='text'>View of an Exchangeable Prior</title><content type='html'>In Chapter 7, we consider the following exchangeable prior for Poisson rates lam_1, ..., lam_k that is described in two stages.&lt;br /&gt;&lt;br /&gt;Stage I.  Conditional on parameters alpha, mu, lam_1, ..., lam_k are independent Gamma(alpha, alpha/mu)&lt;br /&gt;&lt;br /&gt;Stage II. The parameters (alpha, mu) come from a specified prior g(alpha, mu).&lt;br /&gt;&lt;br /&gt;Here mu is the prior mean of lam_i and alpha is a precision parameter.  This structure induces the following prior on lam_1, .., lam_k:&lt;br /&gt;&lt;br /&gt;g(lam_1, ..., lam_k) = integral prod P(lam_j | alpha, mu) g(alpha, mu) dalpha dmu.&lt;br /&gt;&lt;br /&gt;To see how this prior reflects dependence between the parameters, suppose we fix alpha to the value alpha_0 and let mu be distributed inverse gamma(a, b).  Then one can show the prior on lam_1,..., lam_k is given (up to a proportionality constant) by&lt;br /&gt;&lt;br /&gt;g(lam_1, ..., lam_k) = P^(alpha_0-1)/(alpha_0 S + b)^(k alpha_0 + a),&lt;br /&gt;&lt;br /&gt;where P is the product of lam_j and S is the sum of lam_j.&lt;br /&gt;&lt;br /&gt;To see this prior, we program a simple function pgexchprior that computes the logarithm of the prior of lam_1 and lam_2 given parameter values (alpha_0, a, b).&lt;br /&gt;&lt;br /&gt;pgexchprior=function(lambda,pars)&lt;br /&gt;{&lt;br /&gt;alpha=pars[1]&lt;br /&gt;a=pars[2]&lt;br /&gt;b=pars[3]&lt;br /&gt;(alpha-1)*log(prod(lambda))-(2*alpha+a)*log(alpha*sum(lambda)+b)&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;The following R commands construct contour plots of the prior for lam_1 and lam_2 for the precision parameters alpha_0 = 5, 20, 80, and 200.  (In each case, we assign mu an inverse-gamma (10, 10) prior.)&lt;br /&gt;&lt;br /&gt;alpha=c(5,20,80,400)&lt;br /&gt;par(mfrow=c(2,2))&lt;br /&gt;for (j in 1:4)&lt;br /&gt;{&lt;br /&gt;mycontour(pgexchprior,c(.001,5,.001,5),c(alpha[j],10,10))&lt;br /&gt;title(main=paste("ALPHA = ",alpha[j]),xlab="LAMBDA1",ylab="LAMBDA2")&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;These plots clearly show that, as alpha increases, the prior induces stronger correlation between the two Poisson rates.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/Rx42CttbgRI/AAAAAAAAAGQ/LwkeVvue33Q/s1600-h/cplots.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/Rx42CttbgRI/AAAAAAAAAGQ/LwkeVvue33Q/s320/cplots.jpg" alt="" id="BLOGGER_PHOTO_ID_5124592846251983122" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-126442850066053053?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/126442850066053053/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=126442850066053053' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/126442850066053053'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/126442850066053053'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/10/view-of-exchangeable-prior.html' title='View of an Exchangeable Prior'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_V8g1rNtmHuM/Rx42CttbgRI/AAAAAAAAAGQ/LwkeVvue33Q/s72-c/cplots.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-65787979840768358</id><published>2007-10-22T15:40:00.000-07:00</published><updated>2007-10-22T16:08:29.978-07:00</updated><title type='text'>Illustration of posterior predictive checking</title><content type='html'>To illustrate posterior predictive checking, suppose we observe data y1,...,yn that we assume is N(mu, sigma).  We place a noninformative prior on (mu, sigma) and learn about the parameters based on the posterior distribution.  We use the posterior predictive distribution to check out model -- to see if our sample is consistent with samples predicted from our fitted model.&lt;br /&gt;&lt;br /&gt;In practice, we construct a checking function d that is a function of the future sample y*.  In this example (66 measurements of the speed of light), we suspect that the smallest observation is an outlier.  So we use a checking function d(y) = y_min.&lt;br /&gt;&lt;br /&gt;Here's an R procedure for checking our model using this diagnostic.&lt;br /&gt;&lt;br /&gt;We load in the speed of light data:&lt;br /&gt;&lt;br /&gt;&gt; y=scan("speedoflight.txt")&lt;br /&gt;Read 66 items&lt;br /&gt;&gt; y&lt;br /&gt;[1]  28  22  36  26  28  28  26  24  32  30  27  24  33  21  36  32  31  25  24&lt;br /&gt;[20]  25  28  36  27  32  34  30  25  26  26  25 -44  23  21  30  33  29  27  29&lt;br /&gt;[39]  28  22  26  27  16  31  29  36  32  28  40  19  37  23  32  29  -2  24  25&lt;br /&gt;[58]  27  24  16  29  20  28  27  39  23&lt;br /&gt;&lt;br /&gt;We simulate 1000 draws from the normal/scale x inv-chi-square posterior using the function normpostsim.&lt;br /&gt;&lt;br /&gt;parameters=normpostsim(y,1000)&lt;br /&gt;&lt;br /&gt;The output is a list -- here parameters$mu contains draws from mu, and parameters$sigma2 contains draws of sigma2.&lt;br /&gt;&lt;br /&gt;The simulation of samples from the posterior predictive distribution is done by the function normpostpred.  By use of comments, we explain how this function works.&lt;br /&gt;&lt;br /&gt;normpostpred=function(parameters,sample.size,f=min)&lt;br /&gt;{&lt;br /&gt;# the function normalsample simulates a single sample given parameter values mu and sigma2.&lt;br /&gt;# the index j is the index of the simulation number&lt;br /&gt;&lt;br /&gt; normalsample=function(j,parameters,sample.size)&lt;br /&gt;   rnorm(sample.size,mean=parameters$mu[j],sd=sqrt(parameters$sigma2[j]))&lt;br /&gt;&lt;br /&gt;# we use the sapply command to simulate many samples, where the number of samples&lt;br /&gt;# corresponds to the number of parameter simulations&lt;br /&gt;&lt;br /&gt; m=length(parameters$mu)&lt;br /&gt; post.pred.samples=sapply(1:m,normalsample,parameters,sample.size)&lt;br /&gt;&lt;br /&gt;# on each posterior predictive sample we compute a stat, the default stat is min&lt;br /&gt;&lt;br /&gt; stat=apply(post.pred.samples,2,f)&lt;br /&gt;&lt;br /&gt;# the function returns all of the posterior predictive samples and a vector of values of the&lt;br /&gt;# checking diagnostic.&lt;br /&gt;&lt;br /&gt; return(list(samples=post.pred.samples,stat=stat))&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;Here the size of the original sample is 66.  To generate 1000 samples of size 66 from the posterior predictive distribution, and storing the mins of the samples, we type&lt;br /&gt;&lt;br /&gt;post.pred=normpostpred(parameters,66)&lt;br /&gt;&lt;br /&gt;The values of the minimum observation in all samples is stored in the vector post.pred$stat.  We display the mins in a histogram and draw a vertical line showing the location of the min of the data.&lt;br /&gt;&lt;br /&gt;&gt; hist(post.pred$stat,xlim=c(-50,15))&lt;br /&gt;&gt; lines(min(y)*c(1,1),c(0,350),lwd=3,col="red")&lt;br /&gt;&lt;br /&gt;Clearly the smallest observation is inconsistent with the mins generated from samples from the posterior predictive distribution.  This cast doubts on the normality assumption of the data.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/Rx0tT9tbgQI/AAAAAAAAAGI/mFvCbUlYWgU/s1600-h/histogram1.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/Rx0tT9tbgQI/AAAAAAAAAGI/mFvCbUlYWgU/s320/histogram1.jpg" alt="" id="BLOGGER_PHOTO_ID_5124301772023365890" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-65787979840768358?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/65787979840768358/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=65787979840768358' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/65787979840768358'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/65787979840768358'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/10/illustration-of-posterior-predictive.html' title='Illustration of posterior predictive checking'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_V8g1rNtmHuM/Rx0tT9tbgQI/AAAAAAAAAGI/mFvCbUlYWgU/s72-c/histogram1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-3323008563168557120</id><published>2007-10-20T06:58:00.001-07:00</published><updated>2007-10-20T07:39:45.341-07:00</updated><title type='text'>No more loops!</title><content type='html'>One aspect of the LearnBayes package that I've been uncomfortable with is the use of loops in my definition of the functions defining log posteriors.  I had a reason for using the loops (the theta argument to the function could be a matrix), but it is generally bad programming practice.   The worst aspect of the looping is that it makes the process of a writing a posterior more difficult than it really is.  Looping is bad from the user's perspective.  After all, we are teaching statistics not programming, and I want to encourage people to code the posteriors for their problems using R.&lt;br /&gt;&lt;br /&gt;Here is a simple example.  Suppose your model is that y1,..., yn are independent Cauchy with location mu and scale sigma.  The log posterior is given by&lt;br /&gt;&lt;br /&gt;log g = sum (log f),&lt;br /&gt;&lt;br /&gt;where log f is the log of the Cauchy density conditional on parameters.  My old way of programming the posterior had the loop&lt;br /&gt;&lt;br /&gt;  for (i in 1:length(data))&lt;br /&gt;      val = val + log(dt((data[i] - mu)/sigma, df = 1)/sigma)&lt;br /&gt;&lt;br /&gt;I think this new way is preferable.  First you define the function logf for a single observation y:&lt;br /&gt;&lt;br /&gt;logf=function(y,mu,sigma) log(dt((y-mu)/sigma,df=1)/sigma)&lt;br /&gt;&lt;br /&gt;Then the log posterior is given  by&lt;br /&gt;&lt;br /&gt;sum(logf(data,mu,sigma))&lt;br /&gt;&lt;br /&gt;Anyway, I think that by avoiding loops, the function for the log posterior becomes more transparent.&lt;br /&gt;&lt;br /&gt;The new version of the LearnBayes package will contain fewer loops.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-3323008563168557120?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/3323008563168557120/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=3323008563168557120' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/3323008563168557120'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/3323008563168557120'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/10/no-more-loops.html' title='No more loops!'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-1982288136724736367</id><published>2007-10-17T12:42:00.000-07:00</published><updated>2007-10-17T15:24:17.785-07:00</updated><title type='text'>Fitting a Mixture Sampling Model, Part II</title><content type='html'>In the previous post, we introduced the mixture sampling model and showed that the posterior had an interesting bimodal shape.  Here we illustrate the use of two Metropolis random walk algorithms to sample from this posterior.&lt;br /&gt;&lt;br /&gt;Recall that the definition of the log posterior is in poisson.mix.R and the data is contained in the vector y.&lt;br /&gt;&lt;br /&gt;We begin with a random walk algorithm using an identity var-cov matrix and a scale constant of 0.01.  We begin at value (3, 3) and run for 10,000 iterations.  We store the fit in the variable fit1.&lt;br /&gt;&lt;br /&gt;proposal=list(var=diag(c(1,1)),scale=.01)&lt;br /&gt;start=array(c(3,3),c(1,2))&lt;br /&gt;fit1=rwmetrop(poisson.mix,proposal,start,10000,list(y=y,p=.4))&lt;br /&gt;&lt;br /&gt;In our second algorithm, we also use an identity var-cov matrix, but use the large scale constant of 0.2.&lt;br /&gt;&lt;br /&gt;proposal=list(var=diag(c(1,1)),scale=.2)&lt;br /&gt;start=array(c(3,3),c(1,2))&lt;br /&gt;fit2=rwmetrop(poisson.mix,proposal,start,10000,list(y=y,p=.4))&lt;br /&gt;&lt;br /&gt;The acceptance rates of the two algorithms are:&lt;br /&gt;&lt;br /&gt;Acceptance rate&lt;br /&gt;fit1    94%&lt;br /&gt;fit2   23%&lt;br /&gt;&lt;br /&gt;One basic method of MCMC output analysis uses trace plots which are time series plots of the simulated draws.  We show trace plots for each random walk below.&lt;br /&gt;&lt;br /&gt;Random walk (scale = 0.01)&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_V8g1rNtmHuM/RxZpuD-aImI/AAAAAAAAAF0/tcN4wwXJo4E/s1600-h/mcmcfit1.trace.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_V8g1rNtmHuM/RxZpuD-aImI/AAAAAAAAAF0/tcN4wwXJo4E/s320/mcmcfit1.trace.jpg" alt="" id="BLOGGER_PHOTO_ID_5122397866242482786" border="0" /&gt;&lt;/a&gt;Note the snake-like appearance of this graph.  This is typical of MCMC random walk runs with high acceptance rates.&lt;br /&gt;&lt;br /&gt;Random walk (scale = 0.2)&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/RxZqRz-aIoI/AAAAAAAAAGA/0PWw3OQ8FYY/s1600-h/mcmcfit2.trace.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/RxZqRz-aIoI/AAAAAAAAAGA/0PWw3OQ8FYY/s320/mcmcfit2.trace.jpg" alt="" id="BLOGGER_PHOTO_ID_5122398480422806146" border="0" /&gt;&lt;/a&gt;This plot displays better mixing and it visits both areas of concentration of the posterior.&lt;br /&gt;&lt;br /&gt;To check the accuracy of each chain in simulating the marginal posterior of theta1 = log(lambda1), we first use the function simcontour that is based on simulations from the grid that covers the actual posterior.  A density estimate on the draws of theta1 based on this "exact" algorithm is presented as a red line and the density estimate of the draws of theta1 from the MCMC algorithm are presented in blue.&lt;br /&gt;&lt;br /&gt;We first show the random walk with scale 0.01 and then the random walk with scale 0.2.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_V8g1rNtmHuM/RxZpWj-aIkI/AAAAAAAAAFk/YBCmZ1wBwJw/s1600-h/mcmcfit1.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_V8g1rNtmHuM/RxZpWj-aIkI/AAAAAAAAAFk/YBCmZ1wBwJw/s320/mcmcfit1.jpg" alt="" id="BLOGGER_PHOTO_ID_5122397462515556930" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/RxZpcz-aIlI/AAAAAAAAAFs/74vIl1gquTc/s1600-h/mcmcfit2.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/RxZpcz-aIlI/AAAAAAAAAFs/74vIl1gquTc/s320/mcmcfit2.jpg" alt="" id="BLOGGER_PHOTO_ID_5122397569889739346" border="0" /&gt;&lt;/a&gt;In this example, it seems that the second MCMC run produced a better approximation to the marginal posterior density of theta1.  The first run with scale 0.01 actually missed the important area of the posterior.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-1982288136724736367?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/1982288136724736367/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=1982288136724736367' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/1982288136724736367'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/1982288136724736367'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/10/fitting-mixture-sampling-model-part-ii.html' title='Fitting a Mixture Sampling Model, Part II'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_V8g1rNtmHuM/RxZpuD-aImI/AAAAAAAAAF0/tcN4wwXJo4E/s72-c/mcmcfit1.trace.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-7026004337682664084</id><published>2007-10-17T05:06:00.000-07:00</published><updated>2007-10-17T05:24:57.703-07:00</updated><title type='text'>Fitting a Mixture Sampling Model</title><content type='html'>To illustrate the application of a MCMC random walk algorithm, consider the following mixture model.  Suppose we observe season home run counts y1, ..., y20 from a group of baseball players.  Some of these players are "sluggers" who average many home runs (per season) and other players are "light hitters" who don't hit as many home runs.  We know that 40% of all players are sluggers, but we don't know the identity (slugger or non-slugger) for each player.  The sampling density for the ith player's home run count yi has the density&lt;br /&gt;&lt;br /&gt;f(yi | lambda1, lambda2) = p f(yi | lambda1) + (1- p) f(yi | lambda2),&lt;br /&gt;&lt;br /&gt;where we assume f(yi | lambda) is Poisson with mean lambda and we assume we know p = .4.&lt;br /&gt;&lt;br /&gt;Suppose we assume that (lambda1, lambda2) has the prior 1/(lambda1 lambda2).  Then the posterior is given (up to a proportionality constant) by&lt;br /&gt;&lt;br /&gt;g(lambda1, lambda2 | y) = 1/(lambda1 lambda2) prod from i=1 to 20 [p f(yi | lambda1) + (1- p) f(yi | lambda2)]&lt;br /&gt;&lt;br /&gt;Here is some simulated data&lt;br /&gt;&lt;br /&gt;25 42 29 29 28 25 21 34 19 11 16 13 15 17 10 16 21 17 16 20&lt;br /&gt;&lt;br /&gt;We first transform the parameters to the real-valued parameters&lt;br /&gt;&lt;br /&gt;theta1 = log(lambda1), theta2 = log(lambda2)&lt;br /&gt;&lt;br /&gt;and write the following function that computes the posterior of (theta1, theta2).  Here the input variable data.p is a list that contains two elements:  p, the known value of p, and y, the vector of observations.&lt;br /&gt;&lt;br /&gt;poisson.mix=function(theta, data.p)&lt;br /&gt;{&lt;br /&gt;lambda1=exp(theta[,1])&lt;br /&gt;lambda2=exp(theta[,2])&lt;br /&gt;y=data.p$y&lt;br /&gt;p=data.p$p&lt;br /&gt;val=0*lambda1&lt;br /&gt;for (i in 1:length(y))&lt;br /&gt;val=val+log(p*dpois(y[i],lambda1)+(1-p)*dpois(y[i],lambda2))&lt;br /&gt;return(val)&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;The data is stored in the vector y.  We use the mycontour function to graph the bivariate posterior.  We also show a perspective plot of this posterior.&lt;br /&gt;&lt;br /&gt;mycontour(poisson.mix,c(2.2,3.8,2.2,3.8),list(y=y,p=.4))&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_V8g1rNtmHuM/RxX-BT-aIeI/AAAAAAAAAE8/JOGaNfh9ock/s1600-h/mixpoiss1.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_V8g1rNtmHuM/RxX-BT-aIeI/AAAAAAAAAE8/JOGaNfh9ock/s320/mixpoiss1.jpg" alt="" id="BLOGGER_PHOTO_ID_5122279449699164642" border="0" /&gt;&lt;/a&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_V8g1rNtmHuM/RxX-2T-aIfI/AAAAAAAAAFE/CuB_-yI8o8k/s1600-h/mixpoiss2.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_V8g1rNtmHuM/RxX-2T-aIfI/AAAAAAAAAFE/CuB_-yI8o8k/s320/mixpoiss2.jpg" alt="" id="BLOGGER_PHOTO_ID_5122280360232231410" border="0" /&gt;&lt;/a&gt;You might be surprised to see that the posterior is bimodal.  This is happening since the model is not well-defined.  But in the next posting, we'll ignore this identifiability problem, and see if a random walk Metropolis can be successful in sampling from this posterior.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-7026004337682664084?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/7026004337682664084/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=7026004337682664084' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/7026004337682664084'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/7026004337682664084'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/10/fitting-mixture-sampling-model.html' title='Fitting a Mixture Sampling Model'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_V8g1rNtmHuM/RxX-BT-aIeI/AAAAAAAAAE8/JOGaNfh9ock/s72-c/mixpoiss1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-1839672704598133503</id><published>2007-10-14T16:33:00.000-07:00</published><updated>2007-10-14T16:46:49.207-07:00</updated><title type='text'>Choosing the scale for a Metropolis RW algorithm</title><content type='html'>Let's return to the example where we are sampling from a Cauchy density with unknown median theta and known scale parameter 1 and we place a uniform prior on theta.  We observe the data&lt;br /&gt;&lt;br /&gt;1.3  1.9  3.2  5.1  1.4  5.8  2.5  4.7  2.9  5.2 11.6  8.8  8.5 10.7 10.2&lt;br /&gt;9.8 12.9  7.2  8.1  9.5&lt;br /&gt;&lt;br /&gt;An attractive method of obtaining a simulated sample from this posterior is the Metropolis random walk algorithm.  If theta^(t-1) represents the current simulated draw, then the next candidate is generated from the distribution&lt;br /&gt;&lt;br /&gt;theta^(t) = theta^(t-1) + c Z,&lt;br /&gt;&lt;br /&gt;where Z is standard normal.  The main issue here is the selection of the scale constant c.&lt;br /&gt;&lt;br /&gt;I did a simple experiment.  I simulated samples of 10,000 draws using the scale values  0.2, 1, 5, 25.  Here's the R code for the Metropolis random walk with the choice of scale parameter 0.2.&lt;br /&gt;&lt;br /&gt;cpost=function(theta,y)&lt;br /&gt;{&lt;br /&gt;val=0*theta&lt;br /&gt;for (j in 1:length(y))&lt;br /&gt;val=val+dt(y[j]-theta,df=1,log=TRUE)&lt;br /&gt;return(val)&lt;br /&gt;}&lt;br /&gt;proposal=list(var=1,scale=.2)&lt;br /&gt;fit1=rwmetrop(cpost,proposal,20,10000,data)&lt;br /&gt;&lt;br /&gt;The acceptance rates for these four values are given by&lt;br /&gt;&lt;br /&gt;scale  acceptance rate&lt;br /&gt;0.2     93%&lt;br /&gt;1.0     73%&lt;br /&gt;5.0     31%&lt;br /&gt;25       7%&lt;br /&gt;&lt;br /&gt;To assess the accuracy of a particular sample, we draw the exact posterior in red, and superimpose a density estimate of the simulated draws in blue. &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/RxKqOz-aIcI/AAAAAAAAAEs/kqMGVNXwcoY/s1600-h/mcmcplot1.jpeg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/RxKqOz-aIcI/AAAAAAAAAEs/kqMGVNXwcoY/s320/mcmcplot1.jpeg" alt="" id="BLOGGER_PHOTO_ID_5121342897720533442" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Comparing these four figures, the scale values of 1 and 5 seem to do pretty well.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-1839672704598133503?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/1839672704598133503/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=1839672704598133503' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/1839672704598133503'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/1839672704598133503'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/10/choosing-scale-for-metropolis-rw.html' title='Choosing the scale for a Metropolis RW algorithm'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_V8g1rNtmHuM/RxKqOz-aIcI/AAAAAAAAAEs/kqMGVNXwcoY/s72-c/mcmcplot1.jpeg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-8798209919738018535</id><published>2007-10-10T17:41:00.001-07:00</published><updated>2007-10-11T12:40:40.281-07:00</updated><title type='text'>Summarizing a posterior</title><content type='html'>Continuing our selected data example, remember that we have programmed the posterior of the transformed parameters theta1 and theta2 in the R function kaminsky.&lt;br /&gt;&lt;br /&gt;To find a normal approximation to the posterior, we apply the function laplace in the LearnBayes package.  The inputs are (1) the function defining the log posterior, (2) a starting guess at the mode, (3) the number of iterations of the Newton algorithm, and (4) the data vector (the 5th and 15th order statistics).&lt;br /&gt;&lt;br /&gt;&gt; start=array(c(-2,-1),c(1,2))&lt;br /&gt;&gt; fit.laplace=laplace(kaminsky,start,10,data)&lt;br /&gt;&lt;br /&gt;The output is a list containing mode, the value of the posterior mode, and var, the estimate at the variance-covariance matrix.&lt;br /&gt;&lt;br /&gt;&gt; fit.laplace$mode&lt;br /&gt;       [,1]      [,2]&lt;br /&gt;[1,] -2.367745 -1.091989&lt;br /&gt;&gt; fit.laplace$var&lt;br /&gt;       [,1]      [,2]&lt;br /&gt;[1,] 0.3201467 0.1191059&lt;br /&gt;[2,] 0.1191059 0.1191059&lt;br /&gt;&lt;br /&gt;We can get more accurate summaries of the posterior by means of a Metropolis random walk algorithm.  The function rwmetrop implements this algorithm for an arbitrary posterior.  To use this function, we define "proposal", a list containing the variance and scale parameter for the normal proposal density,  the starting value for the MCMC chain, the number of simulated draws, and the data vector.  Note that we are using the approximate variance-covariance matrix from laplace in the proposal density for rwmetrop.&lt;br /&gt;&lt;br /&gt;&gt; proposal=list(var=fit.laplace$var,scale=2)&lt;br /&gt;&gt; fit.mcmc=rwmetrop(kaminsky,proposal,start,10000,data)&lt;br /&gt;&lt;br /&gt;The output of rwmetrop is a list containing accept, the acceptance rate for the chain, and par, the matrix of simulated draws.&lt;br /&gt;&lt;br /&gt;At this point, we should run some convergence diagnostics to see if the simulated draws show sufficient mixing and don't display unusually high autocorrrelations.  For this example, the acceptance rate is about 29% which is within the acceptable range for this algorithm.&lt;br /&gt;&lt;br /&gt;We display the simulated draws on top of the contour plot of theta1 and theta2 -- it seems that that most the simulated draws fall within the first contour line.&lt;br /&gt;&lt;br /&gt;&gt; mycontour(kaminsky,c(-5,0,-2.5,1),data)&lt;br /&gt;&gt; title(xlab="LOG(Y5-MU)",ylab="LOG BETA")&lt;br /&gt;&gt; points(fit.mcmc$par[,1],fit.mcmc$par[,2])&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_V8g1rNtmHuM/Rw11O8jVbkI/AAAAAAAAAEU/QWS1dWL35rc/s1600-h/mcmc1.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_V8g1rNtmHuM/Rw11O8jVbkI/AAAAAAAAAEU/QWS1dWL35rc/s320/mcmc1.jpg" alt="" id="BLOGGER_PHOTO_ID_5119877251023072834" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;We are interested in the parameters mu and beta.  We first compute vectors of simulated draws of mu and beta by transforming back the simulated draws of theta1 and theta.&lt;br /&gt;&lt;br /&gt;&gt; MU=data[1]-exp(fit.mcmc$par[,1])&lt;br /&gt;&gt; BETA=exp(fit.mcmc$par[,2])&lt;br /&gt;&lt;br /&gt;We display the marginal posteriors of mu and beta.&lt;br /&gt;&lt;br /&gt;&gt; par(mfrow=c(2,1))&lt;br /&gt;&gt; plot(density(MU),lwd=3,main="POSTERIOR OF MU",xlab="MU")&lt;br /&gt;&gt; plot(density(BETA),lwd=3,main="POSTERIOR OF BETA",xlab="BETA")&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/Rw11WcjVblI/AAAAAAAAAEc/W6gs7Ls-t_I/s1600-h/mcmc2.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/Rw11WcjVblI/AAAAAAAAAEc/W6gs7Ls-t_I/s320/mcmc2.jpg" alt="" id="BLOGGER_PHOTO_ID_5119877379872091730" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;We construct 90% interval estimates by extracting quantiles from the collection of simulated draws.&lt;br /&gt;&lt;br /&gt;&gt; quantile(MU,c(.05,.95))&lt;br /&gt;    5%       95%&lt;br /&gt;9.864888 10.065765&lt;br /&gt;&gt; quantile(BETA,c(.05,.95))&lt;br /&gt;    5%       95%&lt;br /&gt;0.2012635 0.6520858&lt;br /&gt;&lt;br /&gt;Last, suppose we are interested in predicting the 5th order statistic ys5 from a future sample of 20 observations.&lt;br /&gt;&lt;br /&gt;To simulate from the distribution of ys5, we (1) simulate (mu, beta) from the posterior and then (2) simulate a future sample y1,...,y20 from the exponential distribution with parameters mu and beta, and (3) storing the 5th ordered observation from the simulated sample.  We repeat this process 1000 times, obtaining a simulated sample from ys5.  We display this predictive distribution by a histogram.&lt;br /&gt;&lt;br /&gt;ys5=rep(0,1000)&lt;br /&gt;for (j in 1:1000)&lt;br /&gt;{&lt;br /&gt;ys=rexp(20,rate=1/BETA[5000+j])+MU[5000+j]&lt;br /&gt;ys5[j]=sort(ys)[5]&lt;br /&gt;}&lt;br /&gt;hist(ys5,col="orange")&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_V8g1rNtmHuM/Rw11csjVbmI/AAAAAAAAAEk/uT6e1WbUaDc/s1600-h/mcmc3.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_V8g1rNtmHuM/Rw11csjVbmI/AAAAAAAAAEk/uT6e1WbUaDc/s320/mcmc3.jpg" alt="" id="BLOGGER_PHOTO_ID_5119877487246274146" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-8798209919738018535?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/8798209919738018535/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=8798209919738018535' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/8798209919738018535'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/8798209919738018535'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/10/summarizing-posterior.html' title='Summarizing a posterior'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_V8g1rNtmHuM/Rw11O8jVbkI/AAAAAAAAAEU/QWS1dWL35rc/s72-c/mcmc1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-8370635760052432507</id><published>2007-10-10T05:24:00.001-07:00</published><updated>2007-10-10T16:58:30.709-07:00</updated><title type='text'>Learning from selected order statistics</title><content type='html'>To illustrate some computational methods for summarizing a posterior, I describe a missing data problem motivated from my undergraduate days at Bucknell.   Ken Kaminsky and Paul Nelson, two of my Bucknell professors, were interested in learning about populations based on selected order statistics.  (I wrote an undergraduate honors thesis on this topic. ) Here is a simple illustration of the problem.&lt;br /&gt;&lt;br /&gt;Suppose a sample y1, ..., y20 is taken from the two-parameter exponential distribution of the form f(y | mu, beta) =1/beta exp(-(y-mu)/beta), y &gt; mu.  But you don't observe the complete dataset -- all you observe are the two order statistics y(5) and y(15) (the order statistics are the observations arranged in ascending order).&lt;br /&gt;&lt;br /&gt;Based on this selected data, we wish to (1) estimate the parameters mu and beta by 90% interval estimates and (2) predict the value of the order statistics y*(5) and y*(20) from a future sample taken from the same population.&lt;br /&gt;&lt;br /&gt;Here's the plan:&lt;br /&gt;&lt;br /&gt;1.  First, we write the likelihood which is the density of the observed data (y(5) and y(20)) given values of the exponential parameters mu and beta.  One can show that this likelihood is given by&lt;br /&gt;&lt;br /&gt;L(mu, beta) = f(y(5)) f(y(15)) F(y(5))^4 (F(y(15) )-F(y(5)))^9 (1- P(y(15)))^5, mu&gt;0, beta&gt;0.&lt;br /&gt;&lt;br /&gt;2.  Assuming a flat (uniform) prior on (mu, beta), the posterior density is proportional to the likelihood.  We write a R function kaminsky0.R that computes the logarithm of the posterior -- here the parameters are (mu, beta) and the data is (y(5), y(15)).&lt;br /&gt;&lt;br /&gt;kaminsky0=function(theta,data)&lt;br /&gt;{&lt;br /&gt;&lt;br /&gt;f=function(y,mu,beta)&lt;br /&gt;return(dexp(y-mu,rate=1/beta))&lt;br /&gt;F=function(y,mu,beta)&lt;br /&gt;return(pexp(y-mu,rate=1/beta))&lt;br /&gt;&lt;br /&gt;y5=data[1]; y15=data[2]&lt;br /&gt;mu=theta[,1]&lt;br /&gt;beta=theta[,2]&lt;br /&gt;&lt;br /&gt;loglike=log(f(y5,mu,beta))+log(f(y15,mu,beta))+&lt;br /&gt;4*log(F(y5,mu,beta))+9*log(F(y15,mu,beta)-F(y5,mu,beta))+&lt;br /&gt;5*log(1-F(y15,mu,beta))&lt;br /&gt;&lt;br /&gt;return(loglike)&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;3.  Graphing the posterior of (mu, beta), we see strong skewness in both parameters.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_V8g1rNtmHuM/RwzInMjVbiI/AAAAAAAAAEE/FCKWCgzXbco/s1600-h/untransformedpost.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_V8g1rNtmHuM/RwzInMjVbiI/AAAAAAAAAEE/FCKWCgzXbco/s320/untransformedpost.jpg" alt="" id="BLOGGER_PHOTO_ID_5119687452123295266" border="0" /&gt;&lt;/a&gt;It is usually helpful to transform to real-valued parameters&lt;br /&gt;&lt;br /&gt;theta1 = log(y(5) - mu) , theta1 = log(beta).&lt;br /&gt;&lt;br /&gt;We write the following function kaminsky.R that computes the log posterior of (theta1, theta2).&lt;br /&gt;&lt;br /&gt;kaminsky=function(theta,data)&lt;br /&gt;{&lt;br /&gt;&lt;br /&gt;f=function(y,mu,beta)&lt;br /&gt;return(dexp(y-mu,rate=1/beta))&lt;br /&gt;F=function(y,mu,beta)&lt;br /&gt;return(pexp(y-mu,rate=1/beta))&lt;br /&gt;&lt;br /&gt;y5=data[1]; y15=data[2]&lt;br /&gt;mu=y5-exp(theta[,1])&lt;br /&gt;beta=exp(theta[,2])&lt;br /&gt;&lt;br /&gt;loglike=log(f(y5,mu,beta))+log(f(y15,mu,beta))+&lt;br /&gt;4*log(F(y5,mu,beta))+9*log(F(y15,mu,beta)-F(y5,mu,beta))+&lt;br /&gt;5*log(1-F(y15,mu,beta))&lt;br /&gt;&lt;br /&gt;logjack=theta[,1]+theta[,2]&lt;br /&gt;&lt;br /&gt;return(loglike+logjack)&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;Here's a graph of the posterior of the reexpressed parameters -- note that it is much more normal-shaped.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_V8g1rNtmHuM/RwzIsMjVbjI/AAAAAAAAAEM/0zI2HimBMs0/s1600-h/transformedpost.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_V8g1rNtmHuM/RwzIsMjVbjI/AAAAAAAAAEM/0zI2HimBMs0/s320/transformedpost.jpg" alt="" id="BLOGGER_PHOTO_ID_5119687538022641202" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;4.  We'll use several functions in the next posting to summarize the posterior.&lt;br /&gt;&lt;br /&gt;(a)  The laplace function is useful in finding the posterior mode and normal approximation to the posterior.&lt;br /&gt;&lt;br /&gt;(b)  By use of the rwmetrop function, we construct a random-walk Metropolis algorithm to simulate from the joint posterior.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-8370635760052432507?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/8370635760052432507/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=8370635760052432507' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/8370635760052432507'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/8370635760052432507'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/10/learning-from-selected-order-statistics.html' title='Learning from selected order statistics'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_V8g1rNtmHuM/RwzInMjVbiI/AAAAAAAAAEE/FCKWCgzXbco/s72-c/untransformedpost.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-4276497872169151084</id><published>2007-10-07T10:15:00.000-07:00</published><updated>2007-10-07T10:52:41.770-07:00</updated><title type='text'>The SIR algorithm</title><content type='html'>There is a simple method, called the SIR algorithm, of taking a simulated sample of draws from one distribution p, and using these draws to produce a sample from a different distribution g.  We illustrate this method for the Cauchy sampling model example introduced in the last post.&lt;br /&gt;&lt;br /&gt;Suppose that we have a proposal density g(theta) that we believe is a rough approximation to the posterior (in terms of location and spread).  Here we suppose that a t density with mean 7, variance 9 and degrees of freedom 3 is a rough approximation to our bimodal posterior density for theta.&lt;br /&gt;&lt;br /&gt;There are three steps in the SIR algorithm.&lt;br /&gt;&lt;br /&gt;1. (S) We Sample 1000 draws from the proposal density p.  We are storing these in the vector theta.p.&lt;br /&gt;&lt;br /&gt;theta.p=sqrt(VAR)*rt(1000,DF)+MEAN&lt;br /&gt;&lt;br /&gt;2. (I) We compute Importance sampling weights for this sample equal to the ratios of the target density (g) to the proposal density (p).&lt;br /&gt;&lt;br /&gt;p.theta=dt(theta.p-MEAN,DF)/sqrt(VAR)&lt;br /&gt;g.theta=exp(cpost(theta.p,y))&lt;br /&gt;weights=g.theta/p.theta&lt;br /&gt;&lt;br /&gt;The following figure plots the simulated draws from the proposal density against the weights.&lt;br /&gt;&lt;br /&gt;plot(theta.p,weights,xlim=c(2,12),xlab="THETA",ylab="POST/PROPOSAL")&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_V8g1rNtmHuM/RwkcwMjVbhI/AAAAAAAAAD8/YAuv8iWpr9w/s1600-h/sirplot1.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_V8g1rNtmHuM/RwkcwMjVbhI/AAAAAAAAAD8/YAuv8iWpr9w/s320/sirplot1.jpg" alt="" id="BLOGGER_PHOTO_ID_5118654065812008466" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;3. (R)  We resample 1000 draws with replacement from the simulated draws theta.p, where the sampling probabilities are proportional to the weights.&lt;br /&gt;&lt;br /&gt;probs=weights/sum(weights)&lt;br /&gt;theta.sample=sample(theta.p,size=1000,prob=probs,replace=TRUE)&lt;br /&gt;&lt;br /&gt;The values in the vector theta should be (approximately) from the posterior density g.&lt;br /&gt;&lt;br /&gt;The sir algorithm with a t proposal density is implemented in the function sir.R in the LearnBayes algorithm.&lt;br /&gt;&lt;br /&gt;To illustrate this function, remember the definition of the log posterior is in cpost and the data is stored in the vector y.  We create a list tpar that contains the components of the t proposal density.&lt;br /&gt;&lt;br /&gt;MEAN=7; VAR=9; DF=3&lt;br /&gt;tpar=list(m=MEAN,var=VAR,df=DF)&lt;br /&gt;&lt;br /&gt;Then we implement the algorithm using the sir function -- the output is a vector of simulated draws from the posterior.&lt;br /&gt;&lt;br /&gt;s=sir(cpost,tpar,1000,y)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-4276497872169151084?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/4276497872169151084/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=4276497872169151084' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/4276497872169151084'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/4276497872169151084'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/10/sir-algorithm.html' title='The SIR algorithm'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_V8g1rNtmHuM/RwkcwMjVbhI/AAAAAAAAAD8/YAuv8iWpr9w/s72-c/sirplot1.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-7696258324388464064</id><published>2007-10-02T15:11:00.000-07:00</published><updated>2007-10-02T18:28:20.561-07:00</updated><title type='text'>Robust Modeling</title><content type='html'>To illustrate some of the computational methods to summarize a posterior, suppose that we observe a sample y1, ..., yn from a Cauchy density with location theta and scale parameter 1.  If we assign a uniform prior to theta, then the posterior density of theta is proportional to&lt;br /&gt;&lt;br /&gt;product from i=1 to n [ (1 + (yi - theta))^{-1} ].&lt;br /&gt;&lt;br /&gt;Suppose we observe the following 20 values:&lt;br /&gt;&lt;br /&gt;1.3  1.9  3.2  5.1  1.4  5.8  2.5  4.7  2.9  5.2 11.6  8.8  8.5 10.7 10.2&lt;br /&gt;9.8 12.9  7.2  8.1  9.5&lt;br /&gt;&lt;br /&gt;In R, we define a new function cpost.R that computes the logarithm of the posterior density.&lt;br /&gt;&lt;br /&gt;cpost=function(theta,y)&lt;br /&gt;{&lt;br /&gt;val=0*theta&lt;br /&gt;for (j in 1:length(y))&lt;br /&gt;val=val+dt(y[j]-theta,df=1,log=TRUE)&lt;br /&gt;return(val)&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;We compute and display the posterior on the interval [2, 12].&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_V8g1rNtmHuM/RwLDfVk2HZI/AAAAAAAAADc/qpAzJZPe3Vc/s1600-h/plot1.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_V8g1rNtmHuM/RwLDfVk2HZI/AAAAAAAAADc/qpAzJZPe3Vc/s320/plot1.jpg" alt="" id="BLOGGER_PHOTO_ID_5116867069781351826" border="0" /&gt;&lt;/a&gt;Note the interesting bimodal shape of the posterior.  Clearly a normal approximation to this posterior will not be an accurate representation.&lt;br /&gt;&lt;br /&gt;Suppose we wish to simulate a sample from this posterior.  A general method for generating a sample is the reject algorithm.  To construct a reject algorithm, we find a suitable p that is easily to simulate from and covers the posterior g  in the sense that g(theta) &lt;= c p(theta) for all theta.  Then one simulates a variate u from a uniform(0, 1) distribution; if u &lt; [ g(u)/c/p(u)], then we accept u as a draw from the target distribution g.  Suppose we let p be a t density with mean mu, variance v, and degrees of freedom df.   Here we let &lt;br /&gt;&lt;br /&gt;MEAN=7; VAR=9; DF=3 &lt;br /&gt;&lt;br /&gt;To find the bounding constant, we plot the logarithm of the ratio, log g(theta) - log p(theta) over the range of theta. &lt;br /&gt;&lt;br /&gt;p.density=dt(theta-MEAN,DF)/sqrt(VAR)&lt;br /&gt;plot(theta,log(post/p.density),type="l")  &lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_V8g1rNtmHuM/RwLGkVk2HaI/AAAAAAAAADk/IkEBXkilPmI/s320/plot2.jpg" alt="" id="BLOGGER_PHOTO_ID_5116870454215581090" border="0" /&gt;From inspection of the graph (and a little calculation), it appears that the log of the ratio is bounded above by -62.98.&lt;br /&gt;&lt;br /&gt;In the R code, we draw the posterior density and the constant x proposal density that covers the posterior.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_V8g1rNtmHuM/RwLHslk2HbI/AAAAAAAAADs/0QFC-21VMoM/s1600-h/plot3.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_V8g1rNtmHuM/RwLHslk2HbI/AAAAAAAAADs/0QFC-21VMoM/s320/plot3.jpg" alt="" id="BLOGGER_PHOTO_ID_5116871695461129650" border="0" /&gt;&lt;/a&gt;The function rejectsampling.R will implement reject sampling with a t covering density.  The inputs to this function are (1) the definition of the log target density, (2) a list giving the parameters of the t covering density (mean, variance, df), (3) the value of the log of the bounding constant, (4) the number of values simulated from the proposal density, and (5) the data used in the target function.  The output of this function is a vector of values from the target distribution.&lt;br /&gt;&lt;br /&gt;tpar=list(m=MEAN,var=VAR,df=DF)&lt;br /&gt;dmax=-62.98&lt;br /&gt;n=10000&lt;br /&gt;theta.sim=rejectsampling(cpost,tpar,dmax,n,y)&lt;br /&gt;&lt;br /&gt;One can compute the acceptance rate of this algorithm by dividing the length of the output vector theta.sim by the number of simulated draws from the proposal.  The acceptance rate for this example is about 12%.  To demonstrate that this algorithm works, we draw in the following figure (1) the exact posterior in red and (2) a density estimate of about 6000 simulated draws in blue.  I'm convinced that the algorithm has indeed produced a sample from the posterior distribution.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/RwLIh1k2HcI/AAAAAAAAAD0/jTR2i9vK6ik/s1600-h/plot4.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/RwLIh1k2HcI/AAAAAAAAAD0/jTR2i9vK6ik/s320/plot4.jpg" alt="" id="BLOGGER_PHOTO_ID_5116872610289163714" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-7696258324388464064?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/7696258324388464064/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=7696258324388464064' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/7696258324388464064'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/7696258324388464064'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/10/robust-modeling.html' title='Robust Modeling'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_V8g1rNtmHuM/RwLDfVk2HZI/AAAAAAAAADc/qpAzJZPe3Vc/s72-c/plot1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-7636870621090887032</id><published>2007-10-01T15:41:00.000-07:00</published><updated>2007-10-02T18:38:37.775-07:00</updated><title type='text'>Normal approximations to posteriors</title><content type='html'>Continuing our Phillies example, I'm going to change the example somewhat and consider the relationship between the number of runs the Phillies score and the game outcome.  Here is the table:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;                                                runs scored&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;game outcome low (four or less) high (five or more)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;Win                           15                   74&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;Loss                          53                   20&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There appears to be a notable positive relationship here and we're interested in estimating the underlying correlation coefficient of the bivariate normal.&lt;br /&gt;&lt;br /&gt;We first construct a function polycorr.R that computes the logarithm of the posterior of the correlation.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;polycorr=function(rho,data)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;{&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;# needs package mvtnorm&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;n11=data[1,1]; n12=data[1,2]; n21=data[2,1]; n22=data[2,2]&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;nA=n11+n12; nB=n11+n21; n=n11+n12+n21+n22&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;c=qnorm(nA/n); d=qnorm(nB/n); pc=nA/n; pd=nB/n&lt;/span&gt;  &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;val=0*rho&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;for (j in 1:length(rho))&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;{&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;C=matrix(c(1,rho[j],rho[j],1),c(2,2))&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;p00=pmvnorm(lower=-Inf*c(1,1),upper=c(c,d),corr=C)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;val[j]=n11*log(pc-p00)+n12*log(1-pc-pd+p00)+n21*log(p00)+n22*log(pd-p00)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;}&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;return(val)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;We input the data as a two by two matrix.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;data=matrix(c(15,53,74,20),c(2,2))&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We find the normal approximation by use of the laplace function in the LearnBayes package.  The inputs are the function, the starting value for the Newton algorithm, the number of iterations of the algorithm, and the data used in the function.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;fit=laplace(polycorr,.6,10,data)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;From the output of this function, we get that rho is approximately N(.694, .00479).  In the R code below we plot the exact and approximate posteriors.  In the figure, we see some inaccuracy in the normal approximation.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;rho=seq(.3,1,by=.01)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;gpost=exp(polycorr(rho,data))&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;plot(rho,gpost/sum(gpost)/.01,type="l",lwd=3,col="red",&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;              ylab="DENSITY",xlab="RHO")&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;lines(rho,dnorm(rho,fit$mode,sqrt(fit$var)),lwd=3,col="blue")&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;legend(locator(1),c("EXACT","NORMAL APPROX"),col=c("red","blue"),lwd=2)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_V8g1rNtmHuM/RwF8A1k2HXI/AAAAAAAAADM/Q33QV0ktj48/s1600-h/figure1.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_V8g1rNtmHuM/RwF8A1k2HXI/AAAAAAAAADM/Q33QV0ktj48/s320/figure1.jpg" alt="" id="BLOGGER_PHOTO_ID_5116507005493058930" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;span style="font-family:georgia;"&gt;One way of improving the accuracy of the normal approximation is to transform rho to the real-valued parameter theta = log [(rho+1)/(1-rho)].  We write a function to compute the log posterior of theta.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;polycorr2=function(theta,data)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;{&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;# needs package mvtnorm&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;rho=(exp(theta)-1)/(exp(theta)+1)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;n11=data[1,1]; n12=data[1,2]; n21=data[2,1]; n22=data[2,2]&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;nA=n11+n12; nB=n11+n21; n=n11+n12+n21+n22&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;c=qnorm(nA/n); d=qnorm(nB/n); pc=nA/n; pd=nB/n&lt;/span&gt;  &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;val=0*rho&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;for (j in 1:length(rho))&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;{&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;C=matrix(c(1,rho[j],rho[j],1),c(2,2))&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;p00=pmvnorm(lower=-Inf*c(1,1),upper=c(c,d),corr=C)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;val[j]=n11*log(pc-p00)+n12*log(1-pc-pd+p00)+n21*log(p00)+n22*log(pd-p00)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;return(val+log(1-rho)+log(1+rho))&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We find the normal approximation using the function laplace.  Here the approximation is that the transformed rho is normal with mean 1.662 and variance .0692.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;fit1=laplace(polycorr2,0,10,data)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We plot the exact and approximate posteriors -- here the normal approximation appears very accurate.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;theta=seq(0.4,3.0,by=.01)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;gpost=exp(polycorr2(theta,data))&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;plot(theta,gpost/sum(gpost)/.01,type="l",lwd=3,col="red",&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;            ylab="DENSITY",xlab="THETA")&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;lines(theta,dnorm(theta,fit1$mode,sqrt(fit1$var)),lwd=3,col="blue")&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;legend(locator(1),c("EXACT","NORMAL APPROX"),col=c("red","blue"),lwd=2)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_V8g1rNtmHuM/RwF8L1k2HYI/AAAAAAAAADU/DuuLuUWBrr8/s1600-h/figure2.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_V8g1rNtmHuM/RwF8L1k2HYI/AAAAAAAAADU/DuuLuUWBrr8/s320/figure2.jpg" alt="" id="BLOGGER_PHOTO_ID_5116507194471619970" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;span style="font-family:georgia;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-7636870621090887032?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/7636870621090887032/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=7636870621090887032' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/7636870621090887032'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/7636870621090887032'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/10/normal-approximations-to-posteriors.html' title='Normal approximations to posteriors'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_V8g1rNtmHuM/RwF8A1k2HXI/AAAAAAAAADM/Q33QV0ktj48/s72-c/figure1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-4458848943628306395</id><published>2007-10-01T06:03:00.000-07:00</published><updated>2007-10-01T06:30:02.072-07:00</updated><title type='text'>Tribute to the Phillies</title><content type='html'>As some of you might know, the Philadelphia Phillies are in the Major League Baseball playoffs which is pretty amazing.  So we'll have to fit a model to some Phillies data.  For each game of the 2007 season, we'll record&lt;br /&gt;&lt;br /&gt;(1) if they won or lost the game&lt;br /&gt;(2) the margin of victory which is equal to the winners score minus the losers score&lt;br /&gt;&lt;br /&gt;We are interested in exploring the relationship between these two variables.  Suppose we classify the margin of victory as "close" (3 runs or less) or a "blowout" (4 runs or more).   Here is a 2 x 2 contingency table classifying all games by result and margin of victory&lt;br /&gt;&lt;br /&gt;                          &lt;span style="font-family:courier new;"&gt;    margin&lt;br /&gt;    close blowout&lt;br /&gt;  L    44      29&lt;br /&gt;  W    51      38&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;One of the oldest approaches to estimating the relationship between two ordinal variables is the polychoric coefficient.  One assumes that there is an underlying bivariate normal distribution with zero means, unit variances and correlation rho.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_V8g1rNtmHuM/RwD1E1k2HWI/AAAAAAAAADE/Xr7HJ6-nOVA/s1600-h/binorm1.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_V8g1rNtmHuM/RwD1E1k2HWI/AAAAAAAAADE/Xr7HJ6-nOVA/s320/binorm1.jpg" alt="" id="BLOGGER_PHOTO_ID_5116358640142785890" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;The observed counts are found by dividing this continuous measure by the cutpoints c (on the x scale) and d (on the y scale).  One can estimate the cutpoints from the data (here one solves Phi(c) = 63/162 and Phi(d) = 95/162, and the likelihood of the correlation coefficient rho is given by&lt;br /&gt;&lt;br /&gt;L(rho) = p1^44 p2^29 p3^51 p4^38,&lt;br /&gt;&lt;br /&gt;where p1, p2, p3, p4 are the probabilities (dependent on rho) that the bivariate normal falls in the four regions divided by the cutpoints c and d.  If we place a uniform prior on rho, then the posterior density will be proportion to the likelihood.&lt;br /&gt;&lt;br /&gt;We'll use this example to illustrate different computational approaches to summarizing the posterior distribution.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-4458848943628306395?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/4458848943628306395/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=4458848943628306395' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/4458848943628306395'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/4458848943628306395'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/10/tribute-to-phillies.html' title='Tribute to the Phillies'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_V8g1rNtmHuM/RwD1E1k2HWI/AAAAAAAAADE/Xr7HJ6-nOVA/s72-c/binorm1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-2703530759104158805</id><published>2007-09-26T13:13:00.001-07:00</published><updated>2007-09-26T13:33:09.030-07:00</updated><title type='text'>Inferences for Gamma Sampling Problem</title><content type='html'>In the previous post, I considered the problem of modeling lengths of cell phone calls.  Here we focus on several types of inferences and predictions that might be of interest.&lt;br /&gt;&lt;br /&gt;Following the general computing strategy described in Chapter 5 of BCWR, I first transform the gamma parameters (alpha, beta) to (theta1 = log alpha, theta2 = log mu= log (alpha beta)).   The function gamma.sampling.post computes the posterior of (theta1, theta2).  The function mycontour draws a contour plot and the function simcontour simulates from this grid.  The figure shows the contour plot with the simulated draws placed on top.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; y=c(12.2,.9,.8,5.3,2,1.2,1.2,1,.3,1.8,3.1,2.8)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; library(LearnBayes)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; gamma.sampling.post=function(theta,data)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;+ {&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;+ a=exp(theta[,1])&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;+ mu=exp(theta[,2])&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;+ n=length(data)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;+ val=0*a&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;+ for (i in 1:n) val=val+dgamma(data[i],shape=a,scale=mu/a,log=TRUE)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;+ return(val-log(a)+log(a)+log(mu))&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;+ }&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt;br /&gt;&gt; mycontour(gamma.sampling.post,c(-1.5,1.5,0,3),y)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; title(main="POSTERIOR OF (LOG ALPHA, LOG MU)",xlab="log alpha",&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;+   ylab="log mu")&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; s=simcontour(gamma.sampling.post,c(-1.5,1.5,0,3),y,1000)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; points(s$x,s$y)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/RvrA11k2HVI/AAAAAAAAAC8/Ii374k7sl2I/s1600-h/nplot2.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/RvrA11k2HVI/AAAAAAAAAC8/Ii374k7sl2I/s320/nplot2.jpg" alt="" id="BLOGGER_PHOTO_ID_5114612357979839826" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Suppose we are interested in the mean length of cell phone calls mu.  In particular, what is the probability that the mean length exceeds 4 minutes?  The figure displays a density estimate of the simulated draws of mu, and I have labeled the desired probability.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; mu=exp(s$y)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; alpha=exp(s$x)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; beta=mu/alpha&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; plot(density(mu),main="POSTERIOR OF MEAN LENGTH",xlab="mu",lwd=3)&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt; lines(c(4,4),c(0,ss$y[135]),lwd=3)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; text(8,.15,"P(MU &gt; 4) = 0.178")&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; arrows(7,.1,4.5,.05,lwd=2)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_V8g1rNtmHuM/RvrAslk2HUI/AAAAAAAAAC0/tyMHwlan9OQ/s1600-h/nplot3.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_V8g1rNtmHuM/RvrAslk2HUI/AAAAAAAAAC0/tyMHwlan9OQ/s320/nplot3.jpg" alt="" id="BLOGGER_PHOTO_ID_5114612199066049858" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Next, suppose we are interested in the predictive distribution of the length of a single cell phone call.  Since we have already collected simulated draws from the posterior of (alpha, beta), it just takes one additional command to simulate the predictive distribution of y* (using the function rgamma).  I have displayed a density estimate of the predictive density.&lt;br /&gt;&lt;br /&gt;Note that the probability the mean call length exceeds 4 minutes is 0.178; the probability a future call exceeds 4 minutes is 0.263&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; ys=rgamma(1000,shape=alpha,scale=beta)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; plot(density(ys),xlab="CALL LENGTH", lwd=3, main="POSTERIOR PREDICTIVE DENSITY")&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; mean(ys&gt;4)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;[1] 0.263&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_V8g1rNtmHuM/RvrAjFk2HTI/AAAAAAAAACs/ePnxFtluNS4/s1600-h/nplot4.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_V8g1rNtmHuM/RvrAjFk2HTI/AAAAAAAAACs/ePnxFtluNS4/s320/nplot4.jpg" alt="" id="BLOGGER_PHOTO_ID_5114612035857292594" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Last, suppose you plan on making 20 calls next month and you're interested in the total amount of time used.  By use of a loop, we simulate 20 draws from the predictive distribution -- the variable ysum contains 1000 realizations of the total.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; ysum=rep(0,1000)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; for (j in 1:20) ysum=ysum+rgamma(1000,shape=alpha,scale=beta)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; hist(ysum, main="PREDICTIVE DISTRIBUTION OF LENGTH OF 20 CALLS")&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_V8g1rNtmHuM/RvrAWlk2HSI/AAAAAAAAACk/bdvrJQIQwMM/s1600-h/nplot5.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://1.bp.blogspot.com/_V8g1rNtmHuM/RvrAWlk2HSI/AAAAAAAAACk/bdvrJQIQwMM/s320/nplot5.jpg" alt="" id="BLOGGER_PHOTO_ID_5114611821108927778" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-2703530759104158805?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/2703530759104158805/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=2703530759104158805' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/2703530759104158805'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/2703530759104158805'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/09/inferences-for-gamma-sampling-problem.html' title='Inferences for Gamma Sampling Problem'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_V8g1rNtmHuM/RvrA11k2HVI/AAAAAAAAAC8/Ii374k7sl2I/s72-c/nplot2.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-7092830650576190015</id><published>2007-09-25T17:19:00.000-07:00</published><updated>2007-09-25T17:44:29.791-07:00</updated><title type='text'>Modeling Cell Phone Call Durations with a Gamma Density</title><content type='html'>Suppose we observe a sample y1, ..., yn from a gamma(alpha, beta) density where the sampling density is proportional to y^{alpha-1} exp(-y/beta), and we assign a uniform prior on (alpha, beta). &lt;br /&gt;&lt;br /&gt;As an example, suppose we wish to fit a gamma density to the durations (in minutes) of a group of cell phone calls.&lt;br /&gt;&lt;br /&gt;12.2  0.9  0.8  5.3  2.0  1.2  1.2  1.0  0.3  1.8  3.1  2.8&lt;br /&gt;&lt;br /&gt;Here is the R function that computes the log posterior of the density:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;gamma.sampling.post1=function(theta,data)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;{&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;a=theta[,1]&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;b=theta[,2]&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;n=length(data)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;val=0*a&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;for (i in 1:n) val=val+dgamma(data[i],shape=a,scale=b,log=TRUE)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;return(val)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;The first figure is a contour graph of the posterior density of (alpha, beta).  (In R, beta is called the scale parameter.)&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/RvmmZFk2HPI/AAAAAAAAACM/qIfM6TWk3Xw/s1600-h/plot.par1.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/RvmmZFk2HPI/AAAAAAAAACM/qIfM6TWk3Xw/s320/plot.par1.jpg" alt="" id="BLOGGER_PHOTO_ID_5114301801779567858" border="0" /&gt;&lt;/a&gt;Note the strong curvature in the posterior.&lt;br /&gt;&lt;br /&gt;Instead, suppose we consider the joint posterior of alpha and the "rate" parameter theta = 1/beta.  Here is a contour plot of the posterior of (alpha, theta).&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/Rvmm0Fk2HQI/AAAAAAAAACU/k9Eq0FWyRJI/s1600-h/plot.par2.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/Rvmm0Fk2HQI/AAAAAAAAACU/k9Eq0FWyRJI/s320/plot.par2.jpg" alt="" id="BLOGGER_PHOTO_ID_5114302265636035842" border="0" /&gt;&lt;/a&gt;This doesn't display the strong curvature.&lt;br /&gt;&lt;br /&gt;Last, suppose you consider the joint posterior of alpha and the mean mu = alpha beta.  The last figure displays the posterior of (alpha, mu).&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_V8g1rNtmHuM/RvmnVlk2HRI/AAAAAAAAACc/NsCpUupEya4/s1600-h/plot.par3.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://4.bp.blogspot.com/_V8g1rNtmHuM/RvmnVlk2HRI/AAAAAAAAACc/NsCpUupEya4/s320/plot.par3.jpg" alt="" id="BLOGGER_PHOTO_ID_5114302841161653522" border="0" /&gt;&lt;/a&gt;The moral here is that the choice of parameterization can be important when summarizing the posterior distribution.  In the next chapter, we'll suggest a rule of thumb for transforming parameters that makes it easier to summarize many posteriors.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-7092830650576190015?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/7092830650576190015/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=7092830650576190015' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/7092830650576190015'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/7092830650576190015'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/09/modeling-cell-phone-call-durations-with.html' title='Modeling Cell Phone Call Durations with a Gamma Density'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_V8g1rNtmHuM/RvmmZFk2HPI/AAAAAAAAACM/qIfM6TWk3Xw/s72-c/plot.par1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-3424985737579694187</id><published>2007-09-24T18:04:00.000-07:00</published><updated>2007-09-24T18:23:53.819-07:00</updated><title type='text'>Fitting a Beta Sampling Model</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_V8g1rNtmHuM/Rvhhq1k2HNI/AAAAAAAAAB8/HJrePxx9O8k/s1600-h/plot1.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://4.bp.blogspot.com/_V8g1rNtmHuM/Rvhhq1k2HNI/AAAAAAAAAB8/HJrePxx9O8k/s320/plot1.jpg" alt="" id="BLOGGER_PHOTO_ID_5113944765443218642" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;To illustrate a "brute-force" method of summarizing a posterior, suppose that we observe a sample y1, ..., yn from a beta distribution with parameters a and b.  If we assign (a, b) a uniform prior, then the posterior density is given by&lt;br /&gt;&lt;br /&gt;g(a, b | data) propto prod_{i=1}^n  f(y_i; a, b),&lt;br /&gt;&lt;br /&gt;where f(y; a, b) = 1/B(a, b) y^(a-1) (1-y)^(b-1) is the beta density.   As an example, suppose we are given the following proportions of students who are "math proficient" on the Ohio Graduation Test for a random sample of 20 schools in Ohio.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;y=c(0.955, 0.819, 0.472, 0.925, 0.780, 0.931, 0.945, 0.926, 0.852, 0.920, 0.885, 0.890,&lt;/span&gt;&lt;span style="font-family:courier new;"&gt; 0.789, 0.973, 0.831, 0.835, 0.884, 0.904, 0.900, 0.806)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Here is our method:&lt;br /&gt;&lt;br /&gt;1.  We write a short R function betasampling.post.R that computes the logarithm of the posterior density.  Note that the built-in function dbeta is used -- the log=TRUE option gives the logarithm of the beta density.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;betasampling.post=function(theta,data)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;{&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;a=theta[,1]&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;b=theta[,2]&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;n=length(data)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;val=0*a&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;for (i in 1:n) val=val+dbeta(data[i],a,b,log=TRUE)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;return(val)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;2.  Next, by trial and error, we find a rectangle (a_lo, a_hi, b_lo, b_hi) that contains the contour plot of the joint posterior (remember that the contours given in the function mycontour.R are located at 10%, 1%, and 0.1% of the height of the density at the mode.)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;mycontour(betasampling.post,c(.001,35,.001,6),y)&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;title(main="Posterior density of (a, b)",xlab="a",ylab="b")&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;3.  We then sample from the grid of values of (a, b) that is used in constructing the scatterplot.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;s=simcontour(betasampling.post,c(.001,35,.001,6),y,1000)&lt;/span&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_V8g1rNtmHuM/RvhhyFk2HOI/AAAAAAAAACE/w8E5rBL3OV0/s1600-h/plot2.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://1.bp.blogspot.com/_V8g1rNtmHuM/RvhhyFk2HOI/AAAAAAAAACE/w8E5rBL3OV0/s320/plot2.jpg" alt="" id="BLOGGER_PHOTO_ID_5113944889997270242" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;4.  Suppose we are interested in the marginal posterior densities of a and b.  We find these by use of density estimates on the simulated draws of a and b.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;par(mfrow=c(2,1))&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;plot(density(s$x),main="POSTERIOR OF a",xlab="a",lwd=3)&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;plot(density(s$y),main="POSTERIOR OF b",xlab="b",lwd=3)&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-3424985737579694187?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/3424985737579694187/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=3424985737579694187' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/3424985737579694187'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/3424985737579694187'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/09/fitting-beta-sampling-model.html' title='Fitting a Beta Sampling Model'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_V8g1rNtmHuM/Rvhhq1k2HNI/AAAAAAAAAB8/HJrePxx9O8k/s72-c/plot1.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-1529085132475284535</id><published>2007-09-23T16:19:00.000-07:00</published><updated>2007-09-23T16:30:36.318-07:00</updated><title type='text'>Conditional means prior</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_V8g1rNtmHuM/Rvb3EFk2HMI/AAAAAAAAAB0/5gVbof-83J8/s1600-h/conditionalmeansplot.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://3.bp.blogspot.com/_V8g1rNtmHuM/Rvb3EFk2HMI/AAAAAAAAAB0/5gVbof-83J8/s320/conditionalmeansplot.jpg" alt="" id="BLOGGER_PHOTO_ID_5113546076514032834" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;In an earlier post, we illustrated Bayesian fitting of a logistic model using a noninformative prior.  Suppose instead that we have subjective beliefs about the regression vector.  A convenient way of representing these beliefs is by use of a conditional means prior.  We illustrate this for our math placement example.&lt;br /&gt;&lt;br /&gt;First, we consider the probability p1 that a student in placement level 1 receives an A in the class.  Our best guess at p1 is 0.05 and this belief is worth 200 observations -- we match this info to a beta(200*0.05, 200*0.95) prior.  Next, we consider the probability p5 that a student in level 5 gets an A -- our guess at this probability is 0.15 and this guess is worth 200 observations.  We match this belief to a beta(200*0.15, 200*0.85) prior.  Assuming that our beliefs about p1 and p5 are independent, the joint prior on (p1, p5) is a product of beta densities.  Transforming back to the (beta0, beta1) scale, one can show that the prior on beta is given by&lt;br /&gt;&lt;br /&gt;g(beta) = p1^a1 (1-p1)^b1 p5^a2 (1-p5)^b2&lt;br /&gt;&lt;br /&gt;(Note that the conditional means prior translates to the same functional form as the likelihood where the beta parameters a1, b1, a2, b2 play the role of "prior data".)&lt;br /&gt;&lt;br /&gt;The below figure displays (in red) a sample from the likelihood and (in blue) a sample from the prior.  Here we see some conflict between the prior beliefs and the data information about the parameter.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-1529085132475284535?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/1529085132475284535/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=1529085132475284535' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/1529085132475284535'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/1529085132475284535'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/09/conditional-means-prior.html' title='Conditional means prior'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_V8g1rNtmHuM/Rvb3EFk2HMI/AAAAAAAAAB0/5gVbof-83J8/s72-c/conditionalmeansplot.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-7149496099261515153</id><published>2007-09-22T11:33:00.000-07:00</published><updated>2007-09-22T12:04:47.669-07:00</updated><title type='text'>Using a Mixture of Conjugate Priors</title><content type='html'>One way to extend the use of conjugate priors is by the use of discrete mixtures.   Here is a simple example.  Suppose you have a coin that you believe may be fair or biased towards heads.  If p represents the probability of flipping heads, then suppose your prior is&lt;br /&gt;&lt;br /&gt;g(p) = 0.5 beta(p, 20, 20) + 0.5 beta(p, 30, 10)&lt;br /&gt;&lt;br /&gt;Suppose that we flip the coin 30 and get 20 heads.  It can be shown that the posterior density can also be represented by a mixture of beta distributions.&lt;br /&gt;&lt;br /&gt;I have written a short R function pbetamix that computes the posterior distribution when a proportion has a prior that is a mixture of beta distributions.&lt;br /&gt;&lt;br /&gt;The matrix bpar contains the beta parameters where each row gives the beta parameters for each component.  The vector prob gives the prior probabilities of the components.  The vector data contains the number of successes and failures.  Here are the values of these components for the example.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; bpar=rbind(c(20,20),c(30,10))&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; prob=c(.5,.5)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; data=c(20,10)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We use the function pbetamix with the inputs prob, bpar, and data.  The output is the posterior probabilities of the components and the beta parameters of each component.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; pbetamix(prob,bpar,data)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;$probs&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;[1] 0.3457597 0.6542403&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;$betapar&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;     [,1] [,2]&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;[1,]   40   30&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;[2,]   50   20&lt;/span&gt;&lt;br /&gt;&lt;a style="font-family: courier new;" onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_V8g1rNtmHuM/RvVlGlk2HLI/AAAAAAAAABs/HJ224P_xNUQ/s1600-h/mixbeta.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://4.bp.blogspot.com/_V8g1rNtmHuM/RvVlGlk2HLI/AAAAAAAAABs/HJ224P_xNUQ/s320/mixbeta.jpg" alt="" id="BLOGGER_PHOTO_ID_5113104115789339826" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;The posterior density is&lt;br /&gt;&lt;br /&gt;g(p | data) = 0.346 beta(p, 40, 30) + 0.654 beta(p, 50, 20)&lt;br /&gt;&lt;br /&gt;We plot the prior and posterior densities&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; plot(p,.5*dbeta(p,30,30)+.5*dbeta(p,30,10),type="l",ylim=c(0,5),col="red",lwd=2)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; lines(p,.346*dbeta(p,40,30)+.654*dbeta(p,50,20),lwd=2,col="blue")&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; text(locator(n=1),"PRIOR")&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; text(locator(n=1),"POSTERIOR")&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family: georgia;"&gt;Here the data (20 heads and 10 tails) is more supportive of the belief that the coin is biased and the posterior places more of its mass where p &gt; .5.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-7149496099261515153?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/7149496099261515153/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=7149496099261515153' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/7149496099261515153'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/7149496099261515153'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/09/using-mixture-of-conjugate-priors.html' title='Using a Mixture of Conjugate Priors'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_V8g1rNtmHuM/RvVlGlk2HLI/AAAAAAAAABs/HJ224P_xNUQ/s72-c/mixbeta.jpg' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-4251411508920619828</id><published>2007-09-19T19:00:00.000-07:00</published><updated>2007-09-20T04:52:43.218-07:00</updated><title type='text'>Fitting a logistic model</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_V8g1rNtmHuM/RvHZuhnB-sI/AAAAAAAAABk/UBA8B-oSMc4/s1600-h/logisticplot1.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://3.bp.blogspot.com/_V8g1rNtmHuM/RvHZuhnB-sI/AAAAAAAAABk/UBA8B-oSMc4/s320/logisticplot1.jpg" alt="" id="BLOGGER_PHOTO_ID_5112106445361511106" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Freshman students at BGSU take a mathematics placement test and a "placement score" is used to advise the student on the proper first mathematics course.   For students taking a business calculus course, we record (1) his or her placement score and (2) his or her grade in the course.   There are five possible placement levels that we code as 1, 2, 3, 4, 5.   Let yi denote the number of students receiving A at placement level i.  We suppose that yi is binomial(ni, pi), where pi is the probability a student at level i receives an A in the class.  We let the probabilities satisfy the logistic model&lt;br /&gt;&lt;br /&gt;logit(pi) = beta0 + beta1 i.&lt;br /&gt;&lt;br /&gt;Assuming a uniform prior for beta = (beta0, beta1), the posterior distribution is proportional to&lt;br /&gt;&lt;br /&gt;g(beta) = product pi^yi (1-pi)^(ni-yi).&lt;br /&gt;&lt;br /&gt;The definition of the log posterior of beta is defined in the R function logisticpost.R.&lt;br /&gt;&lt;br /&gt;We illustrate a "brute force" method of fitting this model.&lt;br /&gt;&lt;br /&gt;1.  We first read in the data -- we create three vectors y, n, and level.  The matrix&lt;br /&gt;data has columns level, n, and y.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; y=c(2,15,29,39,15)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; n=c(34,170,283,243,59)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; level=1:5&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; data=cbind(level,n,y)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; data&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;   level   n  y&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;[1,]     1  34  2&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;[2,]     2 170 15&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;[3,]     3 283 29&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;[4,]     4 243 39&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;[5,]     5  59 15&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;2.  We illustrate the usual MLE fit using the R function glm.  The MLE will be helpful in finding where the posterior is located.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; response=cbind(y,n-y)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; glm(response~level,family=binomial)&lt;/span&gt;  &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt;br /&gt;Coefficients:&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;(Intercept)        level  &lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;    -3.328        0.423  &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;3.  After some trial and error, we find a rectangle where the posterior is concentrated.  The function mycontour is used to draw a contour plot.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; mycontour(logisticpost,c(-5,-1.5,-.2,1),data)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;4.  The function simcontour is used to simulate draws from the posterior computed on this grid.  We plot the simulated draws on top of the scatterplot.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; s=simcontour(logisticpost,c(-5,-1.5,-.2,1),data,1000)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; points(s$x,s$y)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; title(xlab="BETA0",ylab="BETA1")&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-4251411508920619828?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/4251411508920619828/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=4251411508920619828' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/4251411508920619828'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/4251411508920619828'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/09/fitting-logistic-model.html' title='Fitting a logistic model'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_V8g1rNtmHuM/RvHZuhnB-sI/AAAAAAAAABk/UBA8B-oSMc4/s72-c/logisticplot1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-264237194782277079</id><published>2007-09-18T06:56:00.000-07:00</published><updated>2007-09-18T07:09:08.142-07:00</updated><title type='text'>Example of Normal Inference</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/Ru_bLEJ4SvI/AAAAAAAAABU/k-akTbd2GBI/s1600-h/normplot1.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/Ru_bLEJ4SvI/AAAAAAAAABU/k-akTbd2GBI/s320/normplot1.jpg" alt="" id="BLOGGER_PHOTO_ID_5111545085229026034" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;To illustrate inference about a normal mean, suppose we are interested in learning about the mean math ACT score for the students who are currently taking business calculus.  I take a random sample of 20 students and put the values in the R vector y.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; y=sample(placedata$ACT,size=20)&lt;br /&gt;&gt; y&lt;br /&gt; [1] 20 22 22 19 27 17 21 21 20 19 17 18 20 21 20 17 18 21 17 20&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;I compute some summary statistics.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; ybar=mean(y)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; S=sum((y-ybar)^2)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; n=length(y)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The definition of the log posterior of (mean, variance) with a noninformative prior is stored in the R function normchi2post.  I construct a contour plot of this density by use of the mycontour function.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; mycontour(normchi2post, c(17,23,1.8,20),y)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; title(xlab="MEAN",ylab="VARIANCE")&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;To simulate draws of (mean, variance),  I first simulate 1000 draws of the variance parameter from the scale times inverse chi-square density, and then simulate draws of the mean parameter.  I plot the simulated draws on top of the contour density&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; sigma2 = S/rchisq(1000, n - 1)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; mu = rnorm(1000, mean = ybar, sd = sqrt(sigma2)/sqrt(n))&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; points(mu,sigma2)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Suppose we are interested in inferences about the 90th percentile of the population curve that is given by Q = mu + sigma z, where z is the 90th percentile of the standard normal.  One can obtain a simulated sample from this posterior by simply computing Q on the simulated draws of (mu, variance).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; Q=mu+sqrt(sigma2)*qnorm(.90)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I draw the posterior density of Q by use of a density estimate on the simulated sample.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_V8g1rNtmHuM/Ru_bQkJ4SwI/AAAAAAAAABc/KvcJRwUkfng/s1600-h/normplot2.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://4.bp.blogspot.com/_V8g1rNtmHuM/Ru_bQkJ4SwI/AAAAAAAAABc/KvcJRwUkfng/s320/normplot2.jpg" alt="" id="BLOGGER_PHOTO_ID_5111545179718306562" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; plot(density(Q),main="POSTERIOR FOR 90TH PERCENTILE",lwd=3)&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-264237194782277079?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/264237194782277079/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=264237194782277079' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/264237194782277079'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/264237194782277079'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/09/example-of-normal-inference.html' title='Example of Normal Inference'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_V8g1rNtmHuM/Ru_bLEJ4SvI/AAAAAAAAABU/k-akTbd2GBI/s72-c/normplot1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-66783230733489372</id><published>2007-09-14T05:14:00.000-07:00</published><updated>2007-09-14T05:17:33.268-07:00</updated><title type='text'>Evolution of statistical thought</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_V8g1rNtmHuM/Rup710J4SuI/AAAAAAAAABM/YKhRWpR7hbc/s1600-h/evobayes.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://4.bp.blogspot.com/_V8g1rNtmHuM/Rup710J4SuI/AAAAAAAAABM/YKhRWpR7hbc/s400/evobayes.jpg" alt="" id="BLOGGER_PHOTO_ID_5110032891668613858" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;One of you asked for some Bayesian/frequentist humor.  This cartoon describes the historical evolution of statistical thinking.  (This was taken from Mike West's website.)&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-66783230733489372?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/66783230733489372/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=66783230733489372' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/66783230733489372'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/66783230733489372'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/09/evolution-of-statistical-thought.html' title='Evolution of statistical thought'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_V8g1rNtmHuM/Rup710J4SuI/AAAAAAAAABM/YKhRWpR7hbc/s72-c/evobayes.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-7691613817889556509</id><published>2007-09-13T07:18:00.002-07:00</published><updated>2007-09-13T07:19:51.023-07:00</updated><title type='text'>Any R problems?</title><content type='html'>Here's your opportunity to post any problems you're having using R.  I'll try to respond to each problem.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-7691613817889556509?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/7691613817889556509/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=7691613817889556509' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/7691613817889556509'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/7691613817889556509'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/09/any-r-problems.html' title='Any R problems?'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-6358633010471825058</id><published>2007-09-12T19:41:00.000-07:00</published><updated>2007-09-12T19:57:14.077-07:00</updated><title type='text'>Test of hypothesis that coin is fair</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_V8g1rNtmHuM/RuimnUJ4SsI/AAAAAAAAAA8/RNW9k3Ni5d0/s1600-h/coinfair.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://4.bp.blogspot.com/_V8g1rNtmHuM/RuimnUJ4SsI/AAAAAAAAAA8/RNW9k3Ni5d0/s320/coinfair.jpg" alt="" id="BLOGGER_PHOTO_ID_5109516971607083714" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;In Section 3.5, I describe a Bayesian test of the hypothesis H that a proportion is equal to 0.5.  The R function pbetat.R (in the LearnBayes package) computes the posterior probability of H assuming a beta prior on the alternative hypothesis.  We illustrate how the posterior probability depends on the beta parameter a.&lt;br /&gt;&lt;br /&gt;Here's a quick way of constructing Figure 3.5 (page 52) using the curve function.&lt;br /&gt;&lt;br /&gt;First we revise the function pbetat to accept a matrix argument for the beta parameters where each row of the matrix corresponds to a pair (a, b).  The new function is called pbetat.v.  Next, we write a short function best that computes the posterior probability of H for our example (5 successes and 15 failures) for a vector of values of log a.  (We are assuming a symmetric beta(a, a) prior on the alternative hypothesis.)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;best=function(loga)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;{p0=.5; prob=.5&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;data=c(5,15)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;AB=exp(cbind(loga,loga))&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;s=pbetat.v(p0,prob,AB,data)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;return(s$post)}&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;To generate the figure, we can use the curve function.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;curve(best,from=-4,to=5,xlab="log a",ylab="Prob(coin is fair)",lwd=3)&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;We see that for all values of the beta parameter a, the posterior probability of the hypothesis exceeds 0.2.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-6358633010471825058?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/6358633010471825058/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=6358633010471825058' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/6358633010471825058'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/6358633010471825058'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/09/test-of-hypothesis-that-coin-is-fair.html' title='Test of hypothesis that coin is fair'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_V8g1rNtmHuM/RuimnUJ4SsI/AAAAAAAAAA8/RNW9k3Ni5d0/s72-c/coinfair.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-759307235956602380</id><published>2007-09-11T12:40:00.000-07:00</published><updated>2007-09-11T12:54:00.958-07:00</updated><title type='text'>Illustration of a gui for the Bayesian triplot</title><content type='html'>Richard Gonzalez found my triplot.R function and enhanced it by adding a graphical user interface.  By the use of sliders, one can change the beta parameters a and b, the data values s and f, and see the effect of the changes on the triplot (prior, likelihood, and posterior) and the predictive distribution.&lt;br /&gt;&lt;br /&gt;Here are a couple of illustrations. &lt;br /&gt;In the first example, the prior is beta(3, 7) and one observes 6 successes&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_V8g1rNtmHuM/Rubwrgj05tI/AAAAAAAAAAs/FY75I1eXHkg/s1600-h/triplot2.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://3.bp.blogspot.com/_V8g1rNtmHuM/Rubwrgj05tI/AAAAAAAAAAs/FY75I1eXHkg/s320/triplot2.jpg" alt="" id="BLOGGER_PHOTO_ID_5109035457563649746" border="0" /&gt;&lt;/a&gt; and 14 failures.  The prior information is consistent with the data information and the observed number of successes is in the middle of the predictive distribution.&lt;br /&gt;&lt;br /&gt;In the second example, the beta(6, 2) prior is in conflict with the data (s=6, f=14) and the observed number of successes is in the tail of the predictive distribution.&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_V8g1rNtmHuM/Rubw2Aj05uI/AAAAAAAAAA0/fTeS_HpNWOU/s1600-h/triplot1.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://1.bp.blogspot.com/_V8g1rNtmHuM/Rubw2Aj05uI/AAAAAAAAAA0/fTeS_HpNWOU/s320/triplot1.jpg" alt="" id="BLOGGER_PHOTO_ID_5109035637952276194" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-759307235956602380?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/759307235956602380/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=759307235956602380' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/759307235956602380'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/759307235956602380'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/09/illustration-of-gui-for-bayesian.html' title='Illustration of a gui for the Bayesian triplot'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_V8g1rNtmHuM/Rubwrgj05tI/AAAAAAAAAAs/FY75I1eXHkg/s72-c/triplot2.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-2534575563430697708</id><published>2007-09-11T07:02:00.000-07:00</published><updated>2007-09-11T07:17:44.394-07:00</updated><title type='text'>Brute-force computation of a posterior</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/RuajKQj05sI/AAAAAAAAAAk/H2hlAUudRQk/s1600-h/norm.tprior.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/RuajKQj05sI/AAAAAAAAAAk/H2hlAUudRQk/s320/norm.tprior.jpg" alt="" id="BLOGGER_PHOTO_ID_5108950223937660610" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;Suppose we observe y that is normal with mean theta and standard deviation sigma.  Instead of using a conjugate prior, suppose that theta has a t distribution with location mu, scale tau, and degrees of freedom df.&lt;/span&gt;  &lt;span style="font-family:georgia;"&gt;Although there is not a nice form for the posterior density, it is straightforward to compute the posterior by use of the "prior x likelihood" recipe.&lt;/span&gt;  &lt;span style="font-family:georgia;"&gt;We write a function post.norm.t.R that computes the posterior.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;# we source this function into R&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;source(url("http://bayes.bgsu.edu/m648/post.norm.t.R"))&lt;br /&gt;&lt;br /&gt;# define parameters of problem&lt;br /&gt;&lt;br /&gt;s=list(y=125,sigma=15/2,mu=100,tau=6.85,df=2)&lt;br /&gt;&lt;br /&gt;# set up grid of values of theta&lt;br /&gt;&lt;br /&gt;theta=seq(80,160,length=100)&lt;br /&gt;&lt;br /&gt;# compute the posterior on the grid&lt;br /&gt;&lt;br /&gt;post=post.norm.t(theta,s)&lt;br /&gt;&lt;br /&gt;# convert the posterior value to probabilities&lt;br /&gt;&lt;br /&gt;post.prob=post/sum(post)&lt;br /&gt;&lt;br /&gt;# sample from discrete distribution on grid&lt;br /&gt;&lt;br /&gt;sim.theta=sample(theta,size=10000,replace=TRUE,prob=post.prob)&lt;br /&gt;&lt;br /&gt;# construct a histogram of simulated sample&lt;br /&gt;# and place exact posterior on top&lt;br /&gt;&lt;br /&gt;hist(sim.theta, freq=FALSE)&lt;br /&gt;d=diff(theta[1:2])&lt;br /&gt;con=sum(d*post)    # this is normalizing constant&lt;br /&gt;lines(theta,post/con)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:georgia;"&gt;From the simulated sample, we can compute any summary of the posterior of interest.&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-2534575563430697708?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/2534575563430697708/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=2534575563430697708' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/2534575563430697708'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/2534575563430697708'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/09/brute-force-computation-of-posterior.html' title='Brute-force computation of a posterior'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_V8g1rNtmHuM/RuajKQj05sI/AAAAAAAAAAk/H2hlAUudRQk/s72-c/norm.tprior.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-6928478722917637158</id><published>2007-09-10T05:02:00.000-07:00</published><updated>2007-09-10T18:06:51.731-07:00</updated><title type='text'>Inference for a Poisson Rate</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_V8g1rNtmHuM/RuU0UQj05qI/AAAAAAAAAAU/DhH2h7RE_Sk/s1600-h/howard1.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://1.bp.blogspot.com/_V8g1rNtmHuM/RuU0UQj05qI/AAAAAAAAAAU/DhH2h7RE_Sk/s320/howard1.jpg" alt="" id="BLOGGER_PHOTO_ID_5108546874968958626" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;In 2006, Ryan Howard had 58 home runs in 581 at bats.  Suppose the number of home runs y is distributed Poisson with mean ab lambda, where lambda is the true home run rate.  If we assume the usual noninformative prior, the posterior for lambda is Gamma with shape alpha=y and rate ab.&lt;br /&gt;&lt;br /&gt;To learn about lambda, we simulate 1000 draws from the Gamma posterior.  We construct a density estimate of the simulated draws and then find a 90% interval estimate by finding the 5th and 95th percentiles of the draws.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; y=58&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; t=581&lt;br /&gt;&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&gt; lambda=rgamma(1000,shape=y,rate=t)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; plot(density(lambda),xlab="LAMBDA",main="POSTERIOR")&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt; quantile(lambda,c(.05,.95))&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;       5%        95% &lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;0.07897515 0.12170184 &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;We see that a 90% interval estimate for lambda is (.079, .122).&lt;br /&gt;&lt;br /&gt;Next, suppose that Howard has 550 at-bats in 2007 -- how many home runs will he hit?   We learn about the future number of home runs ys by use of the posterior predictive distribution.  In the R output, we (1) simulate 1000 draws from the posterior predictive distribution, (2) tabulate the values by use of the table function, (3) construct a graph of this distribution, and (4) construct a 90% interval estimate for ys by use of the discint function (from the LearnBayes package).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; ys=rpois(1000,500*lambda)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; T=table(ys)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&gt; plot(T,main="POSTERIOR PREDICTIVE")&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; &lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; ys=as.integer(names(T))&lt;/span&gt; &lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/RuU0bgj05rI/AAAAAAAAAAc/OdRKdBBByuY/s1600-h/howard2.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://2.bp.blogspot.com/_V8g1rNtmHuM/RuU0bgj05rI/AAAAAAAAAAc/OdRKdBBByuY/s320/howard2.jpg" alt="" id="BLOGGER_PHOTO_ID_5108546999523010226" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; freq=as.integer(T)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; freq=freq/sum(freq)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; dist=cbind(ys,freq)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&gt; discint(dist,.9)&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;$prob&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;[1] 0.905&lt;/span&gt;  &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;$set&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;[1] 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60&lt;/span&gt; &lt;span style="font-family:courier new;"&gt;&lt;br /&gt;[26] 61 63 64 65 66 67&lt;br /&gt;&lt;/span&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_V8g1rNtmHuM/RuU0bgj05rI/AAAAAAAAAAc/OdRKdBBByuY/s1600-h/howard2.jpg"&gt;&lt;br /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-6928478722917637158?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/6928478722917637158/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=6928478722917637158' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/6928478722917637158'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/6928478722917637158'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/09/inference-for-poisson-rate.html' title='Inference for a Poisson Rate'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_V8g1rNtmHuM/RuU0UQj05qI/AAAAAAAAAAU/DhH2h7RE_Sk/s72-c/howard1.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-1909537235842500502</id><published>2007-09-06T09:28:00.000-07:00</published><updated>2007-09-06T09:43:56.015-07:00</updated><title type='text'>Plot of two distributions for proportion inference</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_V8g1rNtmHuM/RuAtWQj05pI/AAAAAAAAAAM/79RXU4Ghk1I/s1600-h/twographs.jpg"&gt;&lt;img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://4.bp.blogspot.com/_V8g1rNtmHuM/RuAtWQj05pI/AAAAAAAAAAM/79RXU4Ghk1I/s320/twographs.jpg" alt="" id="BLOGGER_PHOTO_ID_5107131837863749266" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;In tomorrow's class, I'll hand out the graph below that shows a Bayesian "triplot" (showing the likelihood, prior, and posterior) and the prior predictive distribution.  The triplot is useful to  show how the the two types of information (data and prior) are combined.  The predictive plot is helpful in judging the suitability of the Bayesian model.&lt;br /&gt;&lt;br /&gt;I wrote two short R functions, triplot and predplot, that construct the graphs.   I assume that you have already loaded the LearnBayes package.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; library(LearnBayes)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; source(url("http://bayes.bgsu.edu/m648/triplot.R"))&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; source(url("http://bayes.bgsu.edu/m648/predplot.R"))&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; prior=c(6.8,2.5)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; data=c(9,15)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; n=sum(data)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; par(mfrow=c(2,1))&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; triplot(prior,data)&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; predplot(prior,n,data[1])&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-1909537235842500502?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/1909537235842500502/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=1909537235842500502' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/1909537235842500502'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/1909537235842500502'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/09/plot-of-two-distributions-for.html' title='Plot of two distributions for proportion inference'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_V8g1rNtmHuM/RuAtWQj05pI/AAAAAAAAAAM/79RXU4Ghk1I/s72-c/twographs.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-2543575272508196749</id><published>2007-09-05T11:17:00.001-07:00</published><updated>2007-09-05T11:35:24.873-07:00</updated><title type='text'>R computations for a proportion using a beta prior</title><content type='html'>Today we talked about using a beta prior to learn about a proportion.  Inference about p is done by use of the beta posterior distribution and prediction about future samples is done by means of the predictive distribution.&lt;br /&gt;&lt;br /&gt;Here are the R computations for the cell-phone example.  I'll first illustrate inference for the proportion p, and then I'll illustrate the use of the special function pbetap (in the LearnBayes package) to compute the beta-binomial predictive distribution to learn about the number of successes in a future sample.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; library(LearnBayes)&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt; a=6.8; b=2.5  # parameters of beta prior&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt; n=24; y=9     # sample size and number of yes's in sample&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt; a1=a+y; b1=b+n-y  # parameters of beta posterior&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt; # I'll illustrate different types of inferences&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt; # a point estimate is given by the posterior mean&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt; a1/(a1+b1)&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;[1] 0.4744745&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt; # or you could find the posterior median&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt; qbeta(.5,a1,b1)&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;[1] 0.4739574&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt; # a 90% interval estimate is found by use of the 5th and 95th quantiles&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt; # of the beta curve&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt; qbeta(c(.05,.95),a1,b1)&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;[1] 0.3348724 0.6158472&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt; # we illustrate prediction by use of the function pbetap&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt; # suppose we take a future sample of 20&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt; # how many will be driving when using a cell phone?&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt; m=20; ys=0:m&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt; pred.probs=pbetap(c(a1,b1),m,ys)&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt; # display the predictive probabilities&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt; cbind(ys,pred.probs)&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;      ys   pred.probs&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt; [1,]  0 7.443708e-05&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt; [2,]  1 6.444416e-04&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt; [3,]  2 2.897264e-03&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt; [4,]  3 8.968922e-03&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt; [5,]  4 2.139155e-02&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt; [6,]  5 4.170364e-02&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt; [7,]  6 6.884411e-02&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt; [8,]  7 9.841322e-02&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt; [9,]  8 1.236003e-01&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;[10,]  9 1.376228e-01&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;[11,] 10 1.365218e-01&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;[12,] 11 1.208324e-01&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;[13,] 12 9.524434e-02&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;[14,] 13 6.650657e-02&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;[15,] 14 4.075296e-02&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;[16,] 15 2.159001e-02&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;[17,] 16 9.665296e-03&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;[18,] 17 3.527764e-03&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;[19,] 18 9.889799e-04&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;[20,] 19 1.901993e-04&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;[21,] 20 1.891124e-05&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt; # what is the probability that there are at least&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt; # 10 cell phone drivers in my sample?&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&gt; sum(pred.probs*(ys&gt;=10))&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;[1] 0.4958392&lt;br /&gt;&lt;/span&gt;&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-2543575272508196749?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/2543575272508196749/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=2543575272508196749' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/2543575272508196749'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/2543575272508196749'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/09/r-computations-for-proportion-using.html' title='R computations for a proportion using a beta prior'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-5441519817884072981.post-4450313571992426864</id><published>2007-09-04T19:37:00.000-07:00</published><updated>2007-09-04T19:51:26.638-07:00</updated><title type='text'>Welcome to the MATH 648 Blog</title><content type='html'>We are now talking about the elements of Bayesian inference, starting with the problem of learning about a population proportion.   In class, I will talk about ideas and methods, and I will use this blog to describe computational aspects using the R language.&lt;br /&gt;&lt;br /&gt;In the computer lab last week, I gave a brief introduction to the R language, describing manipulations of vectors and matrices.  Also, I described how to load the LearnBayes package and illustrated the use of several functions for learning about a population proportion.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/5441519817884072981-4450313571992426864?l=learnbayes.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://learnbayes.blogspot.com/feeds/4450313571992426864/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=5441519817884072981&amp;postID=4450313571992426864' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/4450313571992426864'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/5441519817884072981/posts/default/4450313571992426864'/><link rel='alternate' type='text/html' href='http://learnbayes.blogspot.com/2007/09/welcome-to-math-648-blog.html' title='Welcome to the MATH 648 Blog'/><author><name>Jim Albert</name><uri>http://www.blogger.com/profile/12622333572321654094</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry></feed>
