The following is Fox News's map

[imgw=640]https://pbs.twimg.com/media/CunLRkeUsAIwfP6.jpg[/imgw]

https://twitter.com/megynkelly/status/786383230375038978

Don't entirely agree with it. Iowa is a toss-up, not leaning GOP, whereas Utah is pretty solidly GOP unless, and this is the rider, the Mormon Church abandon's Trump for Johnson (I don't see them going to Hillary) in which case Johnson may take Utah. I'd also say that NC is leaning Dem currently.
 
Don't entirely agree with it. Iowa is a toss-up, not leaning GOP, whereas Utah is pretty solidly GOP unless, and this is the rider, the Mormon Church abandon's Trump for Johnson (I don't see them going to Hillary) in which case Johnson may take Utah. I'd also say that NC is leaning Dem currently.

Actually Utah is a toss up. And Johnson isn't third there, McEwen is and he's who I expect to carry the State come November.
 
I will say, however, that I think Silver underestimates the probabilities for the states, at least has been recently. In fact, his claim that he correctly called 99/100 states in the last two elections would suggest that. Unless his probabilities are are in the 99% range, he should be getting a lot more wrong than he is. As I pointed out above, if all the states have a 90% probability, the odds of getting them all right are only 0.5%.


You're assuming that the state outcomes are independent. They're not. This means that you can't simply multiply the probability that state A will go to Clinton and the probability that state B will go to Clinton to get the joint probability that states A and B both go to Clinton. For example, say there is an 80% chance that NM goes to Clinton and an 80% chance that CO goes to Clinton. If the states' outcomes were independent, then there would be a 64% chance that both states go to Clinton, a 4% chance that they both go to Trump, and 32% chance that they split.

But say, at the extreme, that these two states are perfectly correlated: whichever way one goes, so does the other. Then, there is an 80% chance that they both go to Clinton, a 20% chance that they both Trump, and no chance that they split. So, in this scenario, Nate either gets both states right or both states wrong; it's one extreme or the other.

The state outcomes are, in fact, correlated (though not perfectly, of course). What this means is that across 50 state predictions, Nate will tend to get either more states right or more states wrong than his individual state probabilities suggest.
 
I have to backtrack on my own words again. For the expected number of EC votes, you can simply multiply the chance of winning in each state with its number of EC seats. It doesn't matter that the races in the various states are not independent - for that single number "expected value".
I don't mean to :deadhorse but I don't think that is right.

Nate's core tool is a simulation. Just one single run. In that one simulation there are no statistics at all. Each candidate is allocated the deterministic (not statistical) EC votes that that simulation produced.

Next, he does that a gazillion times.

NOW, and only now, is when the statistical work is done. He has a distribution of EC votes per candidate and uses some statistic (mean, mode, etc. I don't know) to come up with his final probability. The absolute key here is that there is NO state-by-state averaging done.
 
Last edited:
Consider the following: suppose they had predicted that for every state, Clinton had a 90% chance of winning (ignore maine and nebraska and DC issues). Now, if that is exactly true, then that means that we should expect her to LOSE 5 states. And the average number of electoral votes would be 53.8 for Trump and the rest for Clinton.
Again, that is not a valid approach using Monte-Carlo tools (which is what Nate's simulations are). He does NOT predict statistics by state. All of the results of each simulation gives a deterministic number, NOT a probability. And if you take the totality of his simulations and compute probability statistics for each state, you CANNOT use the process you are talking about here to analyze the data.

The problem is, you don't know which are the 5 states that she would end up losing. Or if it would be 4, or 6. The chance that she wins them all is only 0.5%. It could be that the 5 states she loses would be the 5 largest in electoral votes. In that case, Trump would get a lot more than 50 votes. Alternatively, maybe it is the 5 smallest states, in which case he might get 20.
No, that part is not right.

Now, in this scenario, either of these outcomes is equally likely. However, in the real case, where probabilities vary all over the place, the math is really hard. So in that situation, it's probably a lot easier to run 10 million simulations using the probabilities for each state and looking at the outcomes that way. If I had the probabilities, I could do it easily.
That part is exactly right.

I will say, however, that I think Silver underestimates the probabilities for the states, at least has been recently. In fact, his claim that he correctly called 99/100 states in the last two elections would suggest that. Unless his probabilities are are in the 99% range, he should be getting a lot more wrong than he is. As I pointed out above, if all the states have a 90% probability, the odds of getting them all right are only 0.5%.
No, that part is not right. You cannot take his state probabilities to do statistics because his core prediction is EC votes, NOT state probabilities for each candidate.


I pointed this out after the last election. Silver's model isn't working as good as he asserts, because if he was right, he'd be getting a lot more wrong. If that makes any sense.
Your right in the sense that if he was predicting what you have asserted, then you would be right. But the premise is wrong. The fact that his is not getting a lot more wrong should be a big clue as to why *your* statistical analysis is wrong. He is NOT predicting state probabilities.
 
The following is Fox News's map

[imgw=640]https://pbs.twimg.com/media/CunLRkeUsAIwfP6.jpg[/imgw]

https://twitter.com/megynkelly/status/786383230375038978

Those kind of maps are so bogus. No wonder Fox uses them. They imply that votes are allocated by acreage, not by population. Now, if they would make the state sizes by number of EC votes then such a depiction would have merit. But then the map would be so completely distorted that you might have trouble reading it.
 
You're assuming that the state outcomes are independent. They're not. This means that you can't simply multiply the probability that state A will go to Clinton and the probability that state B will go to Clinton to get the joint probability that states A and B both go to Clinton. For example, say there is an 80% chance that NM goes to Clinton and an 80% chance that CO goes to Clinton. If the states' outcomes were independent, then there would be a 64% chance that both states go to Clinton, a 4% chance that they both go to Trump, and 32% chance that they split.
jt512, please read my previous posts and you will hopefully see that the simulations would include the correlations you are talking about. Thus, you are correct that the individual states are not independent so the state-by-state analysis cannot be done as you state. So you are correct, but for the wrong reason.
 
Those kind of maps are so bogus. No wonder Fox uses them. They imply that votes are allocated by acreage, not by population. Now, if they would make the state sizes by number of EC votes then such a depiction would have merit. But then the map would be so completely distorted that you might have trouble reading it.
I think most people understand that relatively few people live in most of those red states.

Anyway, here is such a map that you described from 538.

[IMGw=640]http://i.imgur.com/HnTZDrh.jpg[/imgw]
 
jt512, please read my previous posts and you will hopefully see that the simulations would include the correlations you are talking about. Thus, you are correct that the individual states are not independent so the state-by-state analysis cannot be done as you state. So you are correct, but for the wrong reason.


I was explaining to pgwenthold why his analysis was wrong, not yours.
 
ddt said:
I have to backtrack on my own words again. For the expected number of EC votes, you can simply multiply the chance of winning in each state with its number of EC seats. It doesn't matter that the races in the various states are not independent - for that single number "expected value".
I don't mean to :deadhorse but I don't think that is right.

Nate's core tool is a simulation. Just one single run. In that one simulation there are no statistics at all. Each candidate is allocated the deterministic (not statistical) EC votes that that simulation produced.

Next, he does that a gazillion times.

NOW, and only now, is when the statistical work is done. He has a distribution of EC votes per candidate and uses some statistic (mean, mode, etc. I don't know) to come up with his final probability. The absolute key here is that there is NO state-by-state averaging done.
Yes, that's right.

I think we largely agree on what Nate's doing. Only I would not say his core tool is the simulation. It's his model underlying that simulation. The process has the following steps.

First, there's the input data, of two kinds:
1) polling data from various polls, which is continuously added to by new polls
2) demographic data, which is static throughout the election process

Second, there's Nate's model which says how to interpret these data. The model says how to weigh each poll: some are better than others, some have an inherent bias to one of the parties. The model also interprets the demographic data into correlations between the outcomes in the various states. To take jt512's hypothetical example: if the demographies of CO and NM are exactly the same, the model translates this to that those states vote identical. This is exaggerated, the model is certainly more subtle than that, and probably also takes polling data into account for establishing correlation between the various states.

Third, the model produces 54 stochasts, to put it in mathematical terms, for each of the separate state elections. That is, you have probability distributions for each state what percentage each party will get. To make a crude layman analogy, you now have 54 dice with a "Clinton" side and a "Trump" side. Those dice are all differently weighted, and the correlations from the model say that some dice are connected. Those CO and NM dice - to carry on that example - have a 100% correlation, so they're effectively glued together. The OH and PA dice are more loosely tied together, so that whenever the OH die turns up "Clinton", the PA one will too.

Fourth, those stochasts are what he runs his simulation with. Each run of the simulation gives one discrete outcome, e.g., Trump wins OH with x% margin and Clinton wins PA with y% margin, and overall, Clinton has n EV and Trump m. He runs the simulation a gazillion times. Yes, that's simply a Monte Carlo run.

Fifth, all the numbers on the 538 page are the averages over those gazillion simulation runs. The probabilities per state that Clinton wins are simply the percentage of simulation runs she won, and the expected value of Clinton's EC vote is also the average number of EC votes over all of those simulations.

And this is a bit where I struggle with why Nate needs simulations at all. You can only run a simulation when you have a probability distribution to begin with. The probability distribution for, say, Ohio, already rolls out of his model and is plugged into the simulation algorithm. Basically, already at step (3) you can say "Clinton has a 65.1% chance of winning Ohio". I surmise it's in the correlations between the various states that his model is too difficult to simply be calculated and that he needs a Monte Carlo run. I admit to a bit of a bias against Monte Carlo runs, mainly because I see all too often people making simulations for trivial questions, like "what is the chance of throwing 7 with 2 dice?" which you can perfectly calculate with pencil and paper. But Nate is a professional statistician, he surely knows what he's doing.

Finally, to come back to my statement you objected to. Yes, that statement is true. Let's do that in proper mathematical terms, and define stochasts:
DOH = number of Democratic electoral votes from Ohio
That's a stochast with outcome either 0 or 18. The chance it is 18 is those 65.1% that comes out of the simulation. Define the respective stochasts for all state races.

Then define the stochast:
DUSA = number of overall Democratic electoral votes
which is a stochast with discrete values between 0 and 538.
Then it obviously holds that:
DUSA = SUM (i in states) Di
And then basic probability theory says about their expected values:
EDUSA = SUM (i in states) EDi
It doesn't matter for the latter formula whether the various state stochasts are independent or not (they're not). That doesn't matter for the single value of "expected value", it does matter for the distribution.
 
Yes, that's right.

I think we largely agree on what Nate's doing. Only I would not say his core tool is the simulation. It's his model underlying that simulation. The process has the following steps.

First, there's the input data, of two kinds:
1) polling data from various polls, which is continuously added to by new polls
2) demographic data, which is static throughout the election process

Second, there's Nate's model which says how to interpret these data. The model says how to weigh each poll: some are better than others, some have an inherent bias to one of the parties. The model also interprets the demographic data into correlations between the outcomes in the various states. To take jt512's hypothetical example: if the demographies of CO and NM are exactly the same, the model translates this to that those states vote identical. This is exaggerated, the model is certainly more subtle than that, and probably also takes polling data into account for establishing correlation between the various states.

Third, the model produces 54 stochasts, to put it in mathematical terms, for each of the separate state elections. That is, you have probability distributions for each state what percentage each party will get. To make a crude layman analogy, you now have 54 dice with a "Clinton" side and a "Trump" side. Those dice are all differently weighted, and the correlations from the model say that some dice are connected. Those CO and NM dice - to carry on that example - have a 100% correlation, so they're effectively glued together. The OH and PA dice are more loosely tied together, so that whenever the OH die turns up "Clinton", the PA one will too.

Fourth, those stochasts are what he runs his simulation with. Each run of the simulation gives one discrete outcome, e.g., Trump wins OH with x% margin and Clinton wins PA with y% margin, and overall, Clinton has n EV and Trump m. He runs the simulation a gazillion times. Yes, that's simply a Monte Carlo run.

Fifth, all the numbers on the 538 page are the averages over those gazillion simulation runs. The probabilities per state that Clinton wins are simply the percentage of simulation runs she won, and the expected value of Clinton's EC vote is also the average number of EC votes over all of those simulations.

And this is a bit where I struggle with why Nate needs simulations at all. You can only run a simulation when you have a probability distribution to begin with. The probability distribution for, say, Ohio, already rolls out of his model and is plugged into the simulation algorithm. Basically, already at step (3) you can say "Clinton has a 65.1% chance of winning Ohio". I surmise it's in the correlations between the various states that his model is too difficult to simply be calculated and that he needs a Monte Carlo run. I admit to a bit of a bias against Monte Carlo runs, mainly because I see all too often people making simulations for trivial questions, like "what is the chance of throwing 7 with 2 dice?" which you can perfectly calculate with pencil and paper. But Nate is a professional statistician, he surely knows what he's doing.
Finally, to come back to my statement you objected to. Yes, that statement is true. Let's do that in proper mathematical terms, and define stochasts:
DOH = number of Democratic electoral votes from Ohio
That's a stochast with outcome either 0 or 18. The chance it is 18 is those 65.1% that comes out of the simulation. Define the respective stochasts for all state races.

Then define the stochast:
DUSA = number of overall Democratic electoral votes
which is a stochast with discrete values between 0 and 538.
Then it obviously holds that:
DUSA = SUM (i in states) Di
And then basic probability theory says about their expected values:
EDUSA = SUM (i in states) EDi
It doesn't matter for the latter formula whether the various state stochasts are independent or not (they're not). That doesn't matter for the single value of "expected value", it does matter for the distribution.

The highlighted part. I not a mathematician, but my understanding of a Monte Carlo analysis is that one could play around with the distributions a bit i.e. instead of using a single probability distribution for say, Nevada, one has a distribution of the distributions, so the weighting could be altered over the runs. I suspect that would be pretty ugly to work out analytically (or even if you do just have the probability distribution for each of the 50 states).
 
The highlighted part. I not a mathematician, but my understanding of a Monte Carlo analysis is that one could play around with the distributions a bit i.e. instead of using a single probability distribution for say, Nevada, one has a distribution of the distributions, so the weighting could be altered over the runs. I suspect that would be pretty ugly to work out analytically (or even if you do just have the probability distribution for each of the 50 states).
Thank you. I am a mathematician by education, but I did only a few probability and statistics classes - I never quite liked it from a philosophical point of view, for me maths is about certainty (the Dutch word wiskunde also means that). There must be at least something like a shifting probability distribution to model the correlation between states' outcomes.
 
Thank you. I am a mathematician by education, but I did only a few probability and statistics classes - I never quite liked it from a philosophical point of view, for me maths is about certainty (the Dutch word wiskunde also means that). There must be at least something like a shifting probability distribution to model the correlation between states' outcomes.

In my physics undergrad class our lecturer said that the physicists assumed that the mathematicians had proved the theories, whilst the mathematicians assumed the physicists had empirically demonstrated the theories to be true.
 

Back
Top Bottom