Sunday, January 6, 2008

Modeling airline on-time arrival rates

I am beginning to teach a new course on multilevel modeling using a new book by Gelman and Hill.

Here is a simple example of multilevel modeling. The Department of Transportation in May 2007 issued the Air Travel Consumer Report designed to give information to consumers regarding the quality of services of the airlines. For 290 airports across the U.S., this report gives the on-line percentage for arriving flights. Below I've plotted the on-line percentage against the log of the number of flights for these airlines.
What do we notice in this figure? There is a lot of variation in the on-time percentages. Also there variation in the on-line percentages seems to decrease as the number of flights increases.

What explains this variation? There are a couple of causes. First, there are genuine differences in the quality of service at the airports that would cause differences in on-time performance. But also one would expect some natural binomial variability. Even if a particular airport 's planes will be on-time 80% in the long-run, one would expect some variation in the on-time performance of the airport in a short time interval.

In multilevel modeling, we are able to isolate the two types of variation. We are able to model the binomial variability and also model the differences between the true on-time performances of the airports.

To show how multilevel model estimates behavior, I've graphed the estimates in red in the following graph.

I call these multilevel estimates "bayes" in the figure. Note that there are substantial differences between the basic estimates and the multilevel estimates for small airports with a relatively small number of flights.