Wednesday, January 9, 2008

How were the polls wrong?

RT raised an interesting point:
"Aren't polls just a measure based on a small selection of the population?"

How wrong were the polls? This is a rather difficult question to answer: it requires much quantitative analysis, etc. I'll examine instead, one of the ways in which the polls were wrong. And while there's lots of mathematics that could be added, and assumptions that should be clarified/removed/annihilated, I'm aiming for a simple discussion here.
First: there were a rather large number of polls taken over the past few days, reasonably small samples, it is true, but many of them. And if my recollection is correct, there were around a dozen, all of which showed Obama in the lead, by smaller or larger margins. But the key points are: first, there were 12 of them, and second, they all showed Obama ahead.

A very simple model for polling is that of an urn, a huge bowl or jug, from which are drawn balls from a large number in the container. Suppose that there are a very large number of balls in the urn, and exactly half of them are labeled O (for Obama) and the other half are labeled C (for Clinton).
Now reach in and randomly select a single ball: do this 101 times (I want an odd number so that we don't get ties). Then the probability that a majority of the balls are O is exactly 1/2. If we repeat this twelve times (returning the balls each time, and remixing the contents), the probability that each time we get a majority being O is (1/2)^(12), or about one in four thousand. Tightening assumptions, using the fact that the ratio of votes last night was near 13:12 rather than 1:1, and the fact that the polls all gave rather larger a margin than a razor thin win for Obama all make this outcome less likely rather than more likely, so I think I can persuade you that a dozen polls showing a consistent outcome reflect rather more certainty than just one poll. Note that it is important that the polls be all polls taken (otherwise we could have a dozen majority O, a dozen majority C, and just select the O's --- that would not be the same thing at all!)

For those interested in rather more sophisticated analyses of what happened, I'll point you to Pollster.com's superb discussion. And, their better explanations than mine too (which suggest that my McCain explanation is incorrect)!

Yours, likely in error,
N.

No comments: