To introduce Bayesian regression modeling, we consider a dataset from De Veaux, Velleman and Bock which collected physical measurements from a sample of 250 males. One is interested in predicting a person's body fat from his height, waist, and chest measurements.
The file Body_fat.txt contains the data that we read in R.
 "Pct.BF" "Height" "Waist" "Chest"
Suppose we wish to fit the regression model
Pct.BF ~ Height + Waist + Chest
The standard least-squares fit is done using the lm command in R.
Here is a portion of the summary of the fit.
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.06539 7.80232 0.265 0.79145
Height -0.56083 0.10940 -5.126 5.98e-07 ***
Waist 2.19976 0.16755 13.129 < 2e-16 ***
Chest -0.23376 0.08324 -2.808 0.00538 **
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.399 on 246 degrees of freedom
Multiple R-Squared: 0.7221, Adjusted R-squared: 0.7187
F-statistic: 213.1 on 3 and 246 DF, p-value: < 2.2e-16
Now let's consider a Bayesian fit of this model. Suppose we place the usual noninformative prior on the regression vector beta and the error variance sigma^2.
g(beta, sigma^2) = 1/sigma^2.
Then there is a simple direct method (outlined in BCUR) of simulating from the posterior distribution of (beta, sigma^2).
1. We first create the design matrix X:
The response is contained in the vector Pct.BF.
2. To simulate 5000 draws from the posterior of (beta, sigma), we use the function blinreg in the LearnBayes package.
fit=blinreg(Pct.BF, X, 5000)
The output fit is a list with two components: beta is a matrix of simulated draws of beta where each column corresponds to a sample from component of beta, and sigma is a vector of draws from the marginal posterior of sigma.
We can summarize the simulated draws of beta by computing posterior means and standard deviations.
X XHeight XWaist XChest
1.8748122 -0.5579671 2.2031474 -0.2350661
X XHeight XWaist XChest
7.84069390 0.11050071 0.16839919 0.08286758
Here is a graph of density estimates of simulated draws from the four regression parameters.
for (j in 1:4) plot(density(fit$beta[,j]),main=paste("BETA ",j),
lwd=3, col="red", xlab="PAR")
What's so special about Bayesian regression if we are essentially replicating the frequentist regression fit?
We'll talk about the advantages of Bayesian regression in the next blog posting.