Introduction to Bayesian Thinking: Bayesian regression

To introduce Bayesian regression modeling, we consider a dataset from De Veaux, Velleman and Bock which collected physical measurements from a sample of 250 males. One is interested in predicting a person's body fat from his height, waist, and chest measurements.

The file Body_fat.txt contains the data that we read in R.

data=read.table("Body_fat.txt",sep="\t",header=TRUE)
names(data)
[1] "Pct.BF" "Height" "Waist" "Chest"
attach(data)

Suppose we wish to fit the regression model

Pct.BF ~ Height + Waist + Chest

The standard least-squares fit is done using the lm command in R.

fit=lm(Pct.BF~Height+Waist+Chest)

Here is a portion of the summary of the fit.

summary(fit)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.06539 7.80232 0.265 0.79145
Height -0.56083 0.10940 -5.126 5.98e-07 ***
Waist 2.19976 0.16755 13.129 < 2e-16 ***
Chest -0.23376 0.08324 -2.808 0.00538 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.399 on 246 degrees of freedom
Multiple R-Squared: 0.7221, Adjusted R-squared: 0.7187
F-statistic: 213.1 on 3 and 246 DF, p-value: < 2.2e-16

Now let's consider a Bayesian fit of this model. Suppose we place the usual noninformative prior on the regression vector beta and the error variance sigma^2.

g(beta, sigma^2) = 1/sigma^2.

Then there is a simple direct method (outlined in BCUR) of simulating from the posterior distribution of (beta, sigma^2).

1. We first create the design matrix X:

X=cbind(1,Height,Waist,Chest)

The response is contained in the vector Pct.BF.

2. To simulate 5000 draws from the posterior of (beta, sigma), we use the function blinreg in the LearnBayes package.

fit=blinreg(Pct.BF, X, 5000)

The output fit is a list with two components: beta is a matrix of simulated draws of beta where each column corresponds to a sample from component of beta, and sigma is a vector of draws from the marginal posterior of sigma.

We can summarize the simulated draws of beta by computing posterior means and standard deviations.

apply(fit$beta,2,mean)
X XHeight XWaist XChest
1.8748122 -0.5579671 2.2031474 -0.2350661

apply(fit$beta,2,sd)
X XHeight XWaist XChest
7.84069390 0.11050071 0.16839919 0.08286758

Here is a graph of density estimates of simulated draws from the four regression parameters.

par(mfrow=c(2,2))
for (j in 1:4) plot(density(fit$beta[,j]),main=paste("BETA ",j),
lwd=3, col="red", xlab="PAR")

What's so special about Bayesian regression if we are essentially replicating the frequentist regression fit?

We'll talk about the advantages of Bayesian regression in the next blog posting.

Introduction to Bayesian Thinking

Thursday, November 15, 2007

Bayesian regression

1 comment:

Blog Archive

Contributors