Friday, March 14, 2008

Happy Pi Day

Well my wife, who is way more connected to the blogosphere than I currently am, informs me that today is Pi Day (3.14 being an approximation for π). Last week I went off at a tangent, so-to-speak, while chatting amiably about Fermat’s Last Theorem, by getting rather worked up about the aesthetic glory of this identity: exp(iπ) = -1. Of course I now realize that I should have held off on that one until today.

Never mind. Instead I will appeal to everyone’s love of statistics in this modest little contribution to Pi Day. Let’s start with the formula for the bell-curve distribution about an average of zero:

f(x) = exp(-x2/2σ2) / √(2π)σ

which we observe contains a π in it. exp(x) means e to the power x.

Now we’ve all heard of the bell-curve which describes how measurements (people’s heights, exam scores, etc) are often distributed about their average. The central limit theorem says that a number of different sources of variation will tend to produce a final distribution that is bell-shaped (otherwise known as normal or gaussian). Given how well-known the bell-curve is, I wonder how many – like myself – don’t really know why exp(-x2) is the correct mathematical shape for such a generic distribution? In it seems to be embodied some very basic things we take for granted about how to deal with measurements with random variation. This includes the fact that we measure variation itself in terms of squared deviations, because random variations must be added as squared quantities. Most of us were given simplistic explanations of this in school in terms of combining positive and negative deviations in such a way that they don’t cancel out. I seem to remember being told that one could estimate the amount of variation by averaging absolute values instead of squares of the deviations from the average if one wanted to, but the x2 in exp(-x2) leads me to doubt that.

When I first learnt about fitting a line to data by minimizing the sum of the squared deviations of the data-points from the line, I was never made aware of how this follows directly from the form of the bell-curve. In fact, the likelihood of acquiring the data can be calculated for a given fit in terms of the product of the individual probabilities of getting the deviations at each data-point. The exp(-x2) factors represent these probabilities so we multiply exp(-x12)exp(-x22)... which equals exp(-(x12+x22...)) So to maximize the likelihood, we simply minimize the sum of squares (x12 + x22...). This quantity happens to be the negative of the log likelihood, which my blogsite is named in honour of.

I have seen a proof of why exp(-x2) describes the distribution resulting from the central limit theorem, but it’s not as straightforward as one might suppose. Although the author of the wikipedia article claims it is, it depends on knowing how to work with characteristic functions of probability distributions. In essence, the proof involves what is called a Taylor series expansion of the characteristic function. This is simply an approximation in terms of constant, linear, squared etc. terms and it turns out that anything above the squared term becomes insignificant as you add a number of independent sources of variation. Hence the squared term in the bell-shaped curve.

So what does all this have to do with Pi Day? Well nothing really, except that the formula for the bell-curve has a π in it. This is because the area under exp(-x2/2σ2) equals √(2π)σ so we divide by this amount in order to make the total probability equal one. Now here is something I do remember being shown at school and I still think it’s cute. Figuring out how to integrate exp(-x2) analytically isn’t so obvious but if you work with exp(-x2)exp(-y2) you get exp(-r2) where r is a radial coordinate. In fact this corresponds to taking a bell-curve and rotating it so as to sweep out a real 3-dimensional bell. The volume inside the bell can be expressed analytically and not surprisingly it contains a π in it. Happy Pi Day!

4 comments:

Laura said...

Thank you for the link back. I believe it may even be called a ping back. Whatever it may be, thanks! And a very happy Pi Day to you too!

sarah said...

hello!!!! happy belated pi day to you too!!!!!!
I enjoyed your post... I'm waiting for more! bye!

hui said...

It is the first time I heard about Pi Day. It's interesting.

rachel said...

Happy very, very belated Pi Day! Anyday is Pi Day! =D