• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

Kos charges polling fraud

OK, I think I do understand the statistical problem after starting to read the report

1. Polls taken of different groups of people may reflect broadly similar
opinions but should not show any detailed connections between minor random details. Let's look at a little sample of R2K's recent results for men (M) and women (F).
PHP:
6/3/10       Favorable     Unfavorable    Undecided
Question    Men  Women     Men  Women    Men  Women

Obama        43    59       54    34        3     7
Pelosi       22    52       66    38       12    10
Reid         28    36       60    54       12    10
McConnell    31    17       50    70       19    13
Boehner      26    16       51    67       33    17
Cong. (D)    28    44       64    54        8     2
Cong. (R)    31    13       58    74       11    13
Party (D)    31    45       64    46        5     9
Party (R)    38    20       57    71        5     9

A combination of random sampling error and systematic difference should make the M results differ a bit from the F results, and in almost every case they do differ. In one respect, however, the numbers for M and F do not differ: if one is even, so is the other, and likewise for odd. Given that the M and F results usually differ, knowing that say 43% of M were favorable (Fav) to Obama gives essentially no clue as to whether say 59% or say 60% of F would be. Thus knowing whether M Fav is even or odd tells us essentially nothing about whether F Fav would be even or odd.

Thus the even-odd property should match about half the time, just like the odds of getting both heads or both tails if you tossed a penny and nickel. If you were to toss the penny and the nickel 18 times (like the 18 entries in the first two columns of the table) you would expect them to show about the same number of heads, but would rightly be shocked if they each showed exactly the same random-looking pattern of heads and tails.

Were the results in our little table a fluke? The R2K weekly polls report 778 M-F pairs. For their favorable ratings (Fav), the even-odd property matched 776 times. For unfavorable (Unf) there were 777 matches.

Common sense says that that result is highly unlikely, but it helps to do a more precise calculation. Since the odds of getting a match each time are essentially 50%, the odds of getting 776/778 matches are just like those of getting 776 heads on 778 tosses of a fair coin. Results that extreme happen less than one time in 10228. That’s one followed by 228 zeros. (The number of atoms within our cosmic horizon is something like 1 followed by 80 zeros.) For the Unf, the odds are less than one in 10231. (Having some Undecideds makes Fav and Unf nearly independent, so these are two separate wildly unlikely events.)

If that's true, no way can that be a coincidence.
 
Last edited:
If that's true, no way can that be a coincidence.

It could be depending on how many statistical oddities you looked for before finding one. But then there are other issues.

What I can't work out is any way of producing data that would produce that effect.
 
It could be depending on how many statistical oddities you looked for before finding one. But then there are other issues.

What I can't work out is any way of producing data that would produce that effect.

There's not one, but two "nearly independent" results of a likelihood less probable than 1 in the number of atoms in the universe squared.
 
There's not one, but two "nearly independent" results of a likelihood less probable than 1 in the number of atoms in the universe squared.

I guess the article fails to explain to me in terms I can grasp what the difference is between saying what it says, and saying that the odds of each individual person answering in the exact way they responded (in any poll) is incredibly long odds.

I mean, what are the odds that exactly 43 percent of men would be favorable towards Obama while exactly 22 percent of them would be favorable towards Reid while exactly 17 percent of women would be undecided towards Boehner, etc etc. When you put all those together, the odds of getting those results are astronomical. Why should I be concerned that a 2 incredibly specific patterns appeared in the results any more than any other 2 incredibly specific patterns.

Then again, I'm really not running on all cylinders tonight, so the point is probably something ridiculously basic.
 
Honestly, this is what I think is the most significant aspect of the OP's article:
The explosive charge by the liberal Internet pioneer could invalidate dozens of polls taken in House, Senate and governor’s races that were aggressively touted by his popular website and widely covered by news outlets over the last year and a half.

This is precisely what's wrong with the internet echo chamber when it comes to politics, both on the left and the right. It skews and obfuscates data while amplifying ideological choir-preaching. It's not the fault of the internet, since this happened before the internet as well, but the phenomenon is much more obvious these days when stuff like this happens.
 
I guess the article fails to explain to me in terms I can grasp what the difference is between saying what it says, and saying that the odds of each individual person answering in the exact way they responded (in any poll) is incredibly long odds.


No, you've misunderstood it. Here's what (I think) it's saying. In any poll, some percentage of men are going to answer one way and some percentage of women are also going to answer that way as well. Due to randomness, sampling and other errors, that percentage may drift a bit from poll to poll.

So, in any give complex poll, what are the odds that the percentage of men who choose option A will be an even percentage (... 44%, 46%, 48%, 50%, 52% ...)? Those odds should be about 1/2. What are the odds that the percentage of women who choose option A will be even? The same thing - 1/2.

Now, what are the chances that if the percentage of men who chose option A was even, the percentage of women would also be even? (44% men, 56% women; 32% men, 46% women.) And what are the chances that if the men were an odd percentage, the women would be as well? (61% men, 57% women; 39% men, 33% women)

The number of times odd should match odd and even should match even should be completely random - a 50% chance that odd men will also have odd women, that even men would have even women.

This study found that in a remarkable number of polls, if the men were odd, the women were odd. If the men were even, the women were even. The chance of that happening by accident were so slight, that there's a greater likelihood that the poll numbers were made up/massaged/otherwise interfered with.

I don't know if that's any clearer.
 
Last edited:
So how much money did they get him for?

I have no idea what they paid Research 2000 but I heard that Kos and the unions threw a lot of money behind Bill Halter (against Blanche Lincoln in Arkansas) and polling may have played a role in how much money they spent. Maybe they spent more (or less) than they would have if the polling had been better.
 
could just be a stupid rounding routine? but what sort of weird routine would produce this effect?
 
This study found that in a remarkable number of polls, if the men were odd, the women were odd. If the men were even, the women were even. The chance of that happening by accident were so slight, that there's a greater likelihood that the poll numbers were made up/massaged/otherwise interfered with.

I don't know if that's any clearer.


Ding ding! Thank you. For some reason I wasn't grasping that he was looking at multiple polls for this.
 
Yes, this looks like clear evidence of fraud. There was some discussion at Silver's blog about suspiciously few "no change" results as well in week-week polling, and about very strong correlations in Research 2000's age-group results in multiple Senate elections.

It will be amusing to see what Kos does about his upcoming book, American Taliban, in which he argues that there's no way for Democrats to work with Republicans, because the latter believe so much nutty stuff. As you can probably guess, Kos relied on Research 2000 for polling data that showed how zany conservatives are. Ouch!
 
It will be amusing to see what Kos does about his upcoming book, American Taliban, in which he argues that there's no way for Democrats to work with Republicans, because the latter believe so much nutty stuff. As you can probably guess, Kos relied on Research 2000 for polling data that showed how zany conservatives are. Ouch!

Kos himself said:

"Book was stripped of references to R2K, except in two instances where I couldn't do so without affecting page count (too late for me to do that since the index was done), but those two examples also references other supporting polling, so my premise didn't depend on the R2K results."

For what it's worth.
 
could just be a stupid rounding routine? but what sort of weird routine would produce this effect?

The variables are meant to be independent so it's not generatable through pure rounding.

It's posible that there are some semi legitimate reasons why they are not independent. For example if all your non whites are in the male half you may want to make some ajustments to the female results to try and simulate the results you would have got if your sample had a better ethnic distribution.

Another posibility is that your ajustments made for people lying to pollsters are the same across both generders and that creats an issue.

What I can't see is a way to get these kind of results if you are making up numbers (simplest method pic the number you want then add noise).
 
I don't know, but I bet is is similar to the application of Benford's Law (http://en.wikipedia.org/wiki/Benford's_law). When you fudge the numbers you're just more likely to violate the law.

Not really comparible. The problem with Benford's Law when creating fake data is that real data has more order than you expect.

In this case we have the problem that the data has a degree of order we wouldn't expect to see in real data.
 
Not really comparible. The problem with Benford's Law when creating fake data is that real data has more order than you expect.

In this case we have the problem that the data has a degree of order we wouldn't expect to see in real data.

Ah, so exactly the opposite of Benford's law. Thanks, that helps.
 
Kos himself said:

"Book was stripped of references to R2K, except in two instances where I couldn't do so without affecting page count (too late for me to do that since the index was done), but those two examples also references other supporting polling, so my premise didn't depend on the R2K results."

For what it's worth.
That doesn't seem worth much, really. I mean, wasn't the whole point of commissioning the poll to validate Kos's preconceived notions about conservatives?

Removing references to the book leaves behind the preconceived notions, only presented as bald, unsupported assertions, rather than rational, evidence-based conclusions... And meanwhile everybody now knows that the evidence upon which those assertions are based, and around which the book was written, is bogus.

Unless he's going to entirely re-write the book, referencing reputable polls and reaching conclusions based on real data, I don't really see how it can be anything other than a dog's breakfast.

In fact, deleting the references but keeping the claims seems pretty slimy to me.
 

Back
Top Bottom