Friday, February 8, 2008

6001, A Race Oddity

The race to the Whitehouse had it's little oddity on Tuesday: there was a tied vote, 6001 - 6001 Clinton-Obama, out of 12346 votes cast (it would have been so much prettier if there had been exactly one fewer vote cast!). National Public Radio today stated (without clarification) that a mathematics professor had shown that the odds of this happening were less than one in a million.

I'd beg to differ, except that I poked around a little, and discovered that what she showed was that the odds of this happening if the vote were a random sample from the same distribution as the entire state was less than one in a million. Hence, she concluded, there is a different distribution for the folks in Syracuse than for the rest of the state, in which Clinton won by a reasonably large margin.

"Well", says I, "hit me with the stupid stick and colour me purple."

Of course there was a different distribution. Look at the bloody vote! It counts everyone who chose to vote. The votes are different.

Now, if she had been talking instead about polling, or random samples, then she might have a point. But the vote in Syracuse was not a random sample of opinion there, it was an actual vote: it is not a sample, it is the true outcome!

And even if it were referring to sampling, it wouldn't be surprising. It would be reflecting a common outcome, namely that populations are not homogeneous in their opinions: she was testing an hypothesis (that the distribution in Syracuse is the same as the average distribution around the state) which one would almost certainly expect to be false in this election. Obama is a hit on college campuses, and Syracuse is a college town: it is to be expected that he might outperform there than he did in some other areas.

To give Professor Kim her due, she was probably asked to perform this calculation by the press: specifically the Post-Standard.

Here, though, is the calculation that I wish that she had added:
Now that we have convinced ourselves that the distribution is not the same as the rest of the state, let us ask instead the better question: suppose that the population were tied in its opinion of Clinton and Obama, and we take a random sample of 12346 voters: what is the probability that we'd end up with and exact tie then? To persuade you that this is not as unlikely as you think, let's simplify it slightly: instead of 12346 voters, some of whom might vote for a third candidate or spoil a ballot, let's toss a fair coin 12002 times, and ask for the probability that it comes up heads 6001 times and tails 6001 times.

This is an easy calculation to do: it turns out that the most likely number of heads you'll get is exactly 6001, and that the probability that this happens is about 0.00728: that is, about 0.7%, or almost one percent of the time. In other words, while ties are unlikely, they are not prohibitively so, when taking samples from tied distributions. Furthermore, if you perform this experiment, say, one hundred times, the odds of seeing a tie at least once are better than 50-50 (about 52% in fact). So, if this sampling model is a good model of the elections, if there were 100 precincts in which opinion was tied, you would not be surprised to see tied votes!
I've drastically simplified the model, mainly for the ease of computation, but more sophisticated models give the same qualitative behaviour: once you accept that different areas in the state have different populations, this sort of outcome becomes much less unlikely than the media, and Professor Kim, have suggested!

Rather than giving the media the message that 6001-6001 is this dramatically unlikely event, I wish she had taken the opportunity to demonstrate yet another circumstance in which coincidences do happen!

Yours, counting votes,
N.

No comments: