• Quick note - the problem with Youtube videos not embedding on the forum appears to have been fixed, thanks to ZiprHead. If you do still see problems let me know.

10/10 scale

=MIN(X,Y)

X or Y
=MAX(X,Y)

That's fuzzy. Not only is is legitimate; it's a better model of thought than probability is. After all, people say "a chain is only as strong as it's weakest link." They don't find the product of the strengths of all the links normalized to 0..1.
 
That's fuzzy. Not only is is legitimate; it's a better model of thought than probability is. After all, people say "a chain is only as strong as it's weakest link." They don't find the product of the strengths of all the links normalized to 0..1.

Actually, no. It's neither a good model of thought nor a good model of the real world.

Nor, for that matter, is "finding the products of the strengths of all the links normalized to 0..1," unless you have some assurance of independence among the links. Which is why "independence" gets stressed so hard in the first week of a decent probability class.

The problem with using min() as a linking function is that it doesn't accurately capture the unlikelihood, for example, of winning several horse races in succession. I submit that it's substantially less likely for me to win the Daily Double than it is for me to win either one of the underlying bets. Using min() doesn't capture this.....
 
This still seems okay to me, but I need to think about it more. There are many related questions:

1. What is the probability that a person will win a bet given that they choose flipping a coin (depends on our assumptions. My inner cynic and pessimist says <1/2)?

2. What is the probability that A wins there race? (If we know nothing, I say 1/2)?

3. What is the probability a person flipping a coin chooses A (I'm sure we all agree 1/2)?

4. Given that a person chose A by flipping a coin, what is the probability they will win the bet (you could use either 1 or 2 here)?

5. Given that a person chose A by flipping a coin, what is the probability that A wins the race (intuitively, we suspect these events are independent, but if a person is betting with no information)?

6. Given that A wins the race, what is the probability a person chose A by flipping a coin?

Bayes's Theorem says:
P(A|B)=P(B|A)P(A)/P(B)

So many of these are related. Unfortunately, I've had a long day and can't quite work it all out yet.

1. Depends on no assumptions (except silly ones like "the runners know what you picked and want to screw you over"). The answer is 1/2. Suppose B wins, then you have picked B with probability 1/2. In fact, suppose A will win with probability p. Then you will win with probability 1/2*p + 1/2*(1-p) = 1/2.

2. If we know nothing, we assign this a probability of 1/2, as you said.

3. 1/2, yes.

4. Given that they chose A, this is the probability that A wins. Before any extra information is known but after the coin is flipped, this is still 1/2. Perhaps someone very knowledgeable standing by will say "In my expert opinion A will win with probability p," but who knows.

5. Independent, as you say, unless the above silly assumption or similar applies.

6. P(chose A | A won) = ... If the winner is independent of the coin flip, i.e., the runners and anyone controlling them are not out to get you, then P(chose A | A won) = P(chose A) = 1/2.

It seems that the biggest hurdle is the idea of abstracting away the evil demon who is out to get you. Just assume he is gone and work from there. :)
 
Actually, no. It's neither a good model of thought nor a good model of the real world.

It's better than probability. Back when I did AI, I found fuzzy a lot better.

The problem with using min() as a linking function is that it doesn't accurately capture the unlikelihood, for example, of winning several horse races in succession.

And the problem with screwdrivers is that they aren't very good for hammering in nails. Do you have a point?
 
It's better than probability.

Well, we obviously disagree.


And the problem with screwdrivers is that they aren't very good for hammering in nails. Do you have a point?

Yes. You said that fuzzy [sic] was a better model of thought than straight-up probability, and pointed out that it better captures a proverb describing causal chains. I pointed out that, first, that capturing the proverb is not the same as capturing the idea behind the proverb, and second, that not all reasoning involves causal chains.
 
Sure, no problem.

You're simply assuming that the two are in fact different. I say, "what's the difference? In both cases you have no idea who will win." And you reply, "yeah, but in one case I know what the chances are that each will win." And I say, "Right. But, what does that mean, actually?"

And round and round we go... :D

It means, among other things, that in the case where I don't know the chances it is at least possible that one of the runners is so much better than the other than they are guaranteed to win. In the other case I know this is not true.

If you say that "the chances are 50:50" means something more than just "I don't know who will win this race," I think that any attempt to describe exactly what you do mean by it will have to involve what you think will happen in a real or hypothetical string of similar races between the two runners. When talking about a string of races, the two statements are different. But we're not talking about a string of races, and what fraction of them will be won by each runner; we're just talking about this single race, and which runner will win it. That's all we're interested in: who will win this race. (Because that's all that determines whether we win our bet or lose it.)

It would only require a real or hypothetical string of races if our knowledge of biology, body mechanics, the runners' physiques and so forth was inadequate to gauge their chances in a race. A string of races is the obvious way to go, but nothing about the universe dictates that such a string of races is necessary.
 
It seems that the biggest hurdle is the idea of abstracting away the evil demon who is out to get you. Just assume he is gone and work from there. :)
But he's there I tell you! You're just like all the doctors who wouldn't listen to me!

Seriously though, I see the problem in my thinking. The absolute worst you can do assuming no information is a 50/50 chance. Thanks, Greedy.
 
There is a lot of research on how people respond to scales like this. Likert-types use odd numbers, generally. I discovered that 5 scale points in one direction are not sufficient. People need to be able to go "between" verbally anchored points, apparently. They do. If you give them 5 points, some will still mark in between the 4 and 5, no matter how hard you try to anchor the "strongly" to "somwhat" to "seldom" to the numbers.

Check out my Likert-type survey below in my sig.

Reload the page, and you can see how I made it to try to avoid people automatically marking the center or the extremes.
 
So in other words the people saying that, for Bayesians, no evidence equals evidence of a 50/50 proposition were just wrong?

Kevin makes a good point. He saw that assigning axiomatic flat priors doesn't make sense, because it says that total ignorance on one scale translates into information on another scale.

I guess an answer is that

1) it has been shown to be useful in practice, despite philosophical pitfalls, and

2) for large data, Bayesian and Frequentism converge, because the likelihood dominates the prior
 
Kevin makes a good point. He saw that assigning axiomatic flat priors doesn't make sense, because it says that total ignorance on one scale translates into information on another scale.

To which the only sensible response is : "so don't translate."

Or, more accurately, don't translate ignorantly. You might as well complain that Spanish doesn't make sense because the word "pretender" doesn't mean "pretend," but "try," and because the word "actual" means "current," and the word "arena" means "sand."

If you speak Spanish badly, the solution isn't to complain about Spanish-speakers, but to learn better Spanish.
 
How does 'know nothing' translate to 'uniform distribution', praytell?

If we use the same model but get different results from using different priors, subjectivity won't help us out here. And every we do had better include a sensitivity analysis on the priors.

Bayes, in his paper, wrote that the prior in his example came from data generated from an auxillary experiment (ie. it wasn't just made up), which actually makes sense.
 
How does 'know nothing' translate to 'uniform distribution', praytell?

If we use the same model but get different results from using different priors, subjectivity won't help us out here. And every we do had better include a sensitivity analysis on the priors.

Bayes, in his paper, wrote that the prior in his example came from data generated from an auxillary experiment (ie. it wasn't just made up), which actually makes sense.
Agreed. Priors should come from all available data. If you have an auxillary experiment, great! If you don't, you still need priors. And yes, you should do a sensitivity analysis. If it turns out the result is chaotic with respect to the priors, then maybe you should consider other models. And "know nothing" translates to "uniform distribution" because it's the distribution most closely approximating knowing nothing, just like "experiments show 27 heads, 23 tails out of 50 tosses" translates to "P(H)=0.54, P(T)=0.46" because it's the distribution most closely approximating our knowledge.
 
Last edited:
And "know nothing" translates to "uniform distribution" because it's the distribution most closely approximating knowing nothing,

I think it is sensible for practice, but I don't buy the philosophy.

For example, I don't know anything about when the busses will arrive during the day, therefore I will assume they are equally likely to arrive at any time of the day. I go from literally knowing nothing to putting a specific distribution.
 
I think it is sensible for practice, but I don't buy the philosophy.

For example, I don't know anything about when the busses will arrive during the day, therefore I will assume they are equally likely to arrive at any time of the day. I go from literally knowing nothing to putting a specific distribution.
Have you read the section in Wikipedia on the probabilities of probabilities under Bayesian probability? This seems to be your main objection, that P(H|fair coin) = P(H|coin with completely unknown bias), and there isn't a way to distinguish the two. But just abstract a level and you may feel more comfortable... now you can say that

P(P(H|fair coin)==0.5) = 1 but

P(a <= P(H|coin with completely unknown bias) <= b) = b-a.

This tells you that you know for a fact a fair coin has a 50% chance of landing on heads, but if there's a coin with utterly and completely unknown bias, all you can say is that as far as you know it's just as likely to land on heads with probability 0.3 as with probability 0.66 or probability 0.9374. If you want to assign P(H|coin with completely unknown bias) you'd then calculate something like
[latex]$\int_0^1{p\ dp}=\frac{p^2}{2}|_0^1=\frac{1}{2}$[/latex]
So now P(H|unknown)=0.5, which if we had to assign it a number is the only possible number we'd ever assign it, but we can see the difference between the fair coin and the completely unknown coin.
 
For example, I don't know anything about when the busses will arrive during the day, therefore I will assume they are equally likely to arrive at any time of the day. I go from literally knowing nothing to putting a specific distribution.
Are you talking about the time of arrival of the next bus (i.e. a single time), or are you talking about the entire bus schedule for the day?

If a Bayesian is uncertain about something, he describes the incomplete knowledge he does have about it by putting a probability distribution on the uncertain thing.

If the uncertain thing under discussion is a single time---the time of arrival of the next bus---then put a distribution on times. If the uncertain thing is the bus schedule, then put a distribution on bus schedules.

You seem to think, if a Bayesian describes his ignorance about the next arrival time by putting a distribution on that time, that he is thereby claiming to have completely certain knowledge about the day's schedule. He is not. For example, if he is sure about the schedule, seeing a few bus arrivals won't change his mind about when the next bus will arrive; if he isn't sure about the schedule, it generally will. So the two states of knowledge are not equivalent, and the Bayesian agrees that they are not, even though both of them might, before seeing any busses arrive, result in the same probability distribution for the (single) next arrival time.

If I roll a single standard die, what's the probability that it will come up three? It's 1/6, of course.

If I have two nonstandard dies, one with 2 three's and one with none, and I choose one of them at random and roll it, what's the probability that it will come up three? It's also 1/6.

Saying that the probability is 1/6 that a three will come up is not saying that I know for sure I rolled a die with a single three. There are other possibilities that can result in a probability of 1/6.

Saying that the probability is p that a bus will arrive in the next five minutes is not saying that I know the bus schedule for sure. Sure knowledge of the bus schedule is just one way of coming up with such a probability.
 
How does 'know nothing' translate to 'uniform distribution', praytell?
This is just a matter of definition. Bayesian probability distributions describe a person's knowledge, not a supposedly objective fact about the world.

Bayes, in his paper, wrote that the prior in his example came from data generated from an auxillary experiment (ie. it wasn't just made up), which actually makes sense.
Without an even earlier prior, representing our state of knowledge before the generation of that data, how are we to decide, after seeing the data, how likely they are to be typical and how likely they are to be atypical?

A fair coin won't always come up half heads and half tails. Without deciding, before we toss a coin, how likely we think it is to be fair, we have no way of interpreting the results of any tosses. If we're initially quite sure it's fair, we'll still be pretty sure it's fair even after getting, say, 16 heads and 4 tails in 20 tosses. If initially we aren't at all sure it's fair, then after the same 16 heads and 4 tails, we'll think the coin unlikely to be fair.

There's no getting around it, really. We have to start somewhere. It's better to make our starting point explicit.
 
A fair coin won't always come up half heads and half tails. Without deciding, before we toss a coin, how likely we think it is to be fair, we have no way of interpreting the results of any tosses. If we're initially quite sure it's fair, we'll still be pretty sure it's fair even after getting, say, 16 heads and 4 tails in 20 tosses. If initially we aren't at all sure it's fair, then after the same 16 heads and 4 tails, we'll think the coin unlikely to be fair.
Yes, exactly. We can assign probabilities to probabilities. We don't have to say just P(H)=0.5, because then when we update due to observation we can't distinguish between coins given prior knowledge of the coin's fairness.

If we wanted to represent the belief that the coin was not fair, and even knew that unfair coins all came either P(H)=0.75 (called uH) or P(H)=0.25 (called uT) (for instance), we would say P(uH)=0.5, P(uT)=0.5. Then in 69dodge's example Bayes' Theorem would say:

P(H) = P(H|uH)*P(uH) + P(H|uT)*P(uT) = 0.75*0.5+0.25*0.5 = 0.5

P(16H,4T | uH) = 0.189685
P(16H,4T | uT) = 0.0000003569266
P(16H,4T) = P(16H,4T | uH)*P(uH) + P(16H,4T | uT)*P(uT) = 0.0948429

P(uH | 16H,4T) = P(16H,4T | uH)*P(uH) / P(16H,4T) = 0.189685*0.5/0.0948429 = 0.999998

Notice that even though we can now distinguish between types of coin, P(H) before seeing the coin flipped was still 0.5, and that didn't imply we thought that the coin was fair.
 
If a Bayesian is uncertain about something, he describes the incomplete knowledge he does have about it by putting a probability distribution on the uncertain thing.

I'm interested in distributions on the busses, not distributions about the Bayesian's uncertainty about his own knowledge. And that is because I am studying busses, not Bayesians.

If I roll a single standard die, what's the probability that it will come up three? It's 1/6, of course.

And we know this from repeadelty tossing die, not from any axiomatic prior.
 
I'm interested in distributions on the busses, not distributions about the Bayesian's uncertainty about his own knowledge. And that is because I am studying busses, not Bayesians.
Sorry, I still don't know what "distributions on the busses" means. Are you talking about the single time of arrival of the next bus, or are you talking about the entire bus schedule for the day?

Whatever it is that you're interested in, it will often be the case that you don't know everything about it, but only some things. How do you deal with that uncertainty? How do you describe being somewhat sure that a proposition is true, but not entirely sure? Purely qualitatively? Maybe something like, "I'm kind of sure it's true" vs. "I'm pretty sure it's true" vs. "I'm almost totally sure it's true, but not quite totally"?

Bayesians use a number between 0 and 1, with 0 representing being sure that it's false, 1 representing being sure that it's true, and intermediate numbers representing intermediate degrees of sureness. Is there any particular reason they shouldn't do that?

And we know this from repeadelty tossing die, not from any axiomatic prior.
Was my previous post unclear?

Unless we decide what we knew before tossing the die, we can't decide what we know after tossing it, because the latter depends on the former in accordance with Bayes's theorem.

Let's get specific. Suppose we toss a die ten times, and we get: 5, 1, 6, 6, 6, 4, 5, 3, 6, 1. What do you say we now know about this die?

(I don't have a die, so I simulated tossing a die by flipping a penny a bunch of times. Interesting exercise: how?)
 

Back
Top Bottom