Merged Odds Standard for Preliminary Test

I don't think Cuddles et al are doing what you say here.

One must distinguish between the claim and the protocol to test that claim. The claim is what the applicant, and only the applicant, can state. The protocol is what requires negotiation and agreement.

Perhaps I misunderstood.

My understanding of Pavel's claim is the ability to do claravoyance, albeit imperfectly. The issues about number of successes, number of trials, etc. are about the protocol.
 
See Linda's response above. Based on that, I retract my comment about it making statistical sense.
I think Linda is misunderstanding what I'm proposing. If Pavel were to obtain 29 hits in the initial 40 trials, that would be 72.5% hits, which is better than his claimed average hit rate of 70%. Would it make sense to say that he "failed" the test simply because he did not beat odds of better than 1000 to 1? So, under that circumstance (or one where he obtained 28 hits or even, arguably, 26-27 hits), I am proposing that another 40 trials be conducted in the near future. I am not proposing that Pavel be allowed to stop after only another 10 trials if at that point he has beaten odds of better than 1000 to 1, so early stopping is not an issue.
 
I think Linda is misunderstanding what I'm proposing. If Pavel were to obtain 29 hits in the initial 40 trials, that would be 72.5% hits, which is better than his claimed average hit rate of 70%. Would it make sense to say that he "failed" the test simply because he did not beat odds of better than 1000 to 1? So, under that circumstance (or one where he obtained 28 hits or even, arguably, 26-27 hits), I am proposing that another 40 trials be conducted in the near future. I am not proposing that Pavel be allowed to stop after only another 10 trials if at that point he has beaten odds of better than 1000 to 1, so early stopping is not an issue.

The initial 40 trials is the early stopping. Your intention is to perform 80 trials. However, you intend to stop early if you get a positive result* after 40 trials.

Linda

*beat the 1000 to 1 odds.
 
Last edited:
It doesn't. What you have just described is 'early stopping', a well-known* way of introducing bias and increasing the possibility of obtaining a positive result.

Except that "increasing the possibility of obtaining a positive result" is not necessarily a bad thing, nor does it necessarily increase bias.

You simply have to emend the statistics properly.

If I have a task with a one in 2000 chance of success, then trying it twice in succession will give me approximately one chance in 1000 of getting at least one success. If I have a task with a one in ten thousand chance of success, doing it ten times will give me about 1:1000 of getting at least one success. Allowing the claimant to stop at the halfway point if he hits the 2000:1 mark isn't that unreasonable.

(Of course, this analysis assumes independence in the trials since it makes the math easier)
 
Last edited:
The initial 40 trials is the early stopping. Your intention is to perform 80 trials. However, you intend to stop early if you get a positive result* after 40 trials.

Linda

*beat the 1000 to 1 odds.
No, it's the JREF that seems to want only 40 trials. If I had my way, Pavel, would be given at least 100 trials. I'm simply trying to accommodate the JREF's concern about its expenditure of time.
 
Except that "increasing the possibility of obtaining a positive result" is not necessarily a bad thing, nor does it necessarily increase bias.

I should have been more specific. I meant "increasing the possibility of obtaining a positive result when one should not be obtained" (i.e. the definition of bias) which is what we are talking about when we refer to chance.

You simply have to emend the statistics properly.

If I have a task with a one in 2000 chance of success, then trying it twice in succession will give me approximately one chance in 1000 of getting at least one success. If I have a task with a one in ten thousand chance of success, doing it ten times will give me about 1:1000 of getting at least one success. Allowing the claimant to stop at the halfway point if he hits the 2000:1 mark isn't that unreasonable.

(Of course, this analysis assumes independence in the trials since it makes the math easier)

Rather than attempting to find a reasonable way to patch up the hole, my hope is that one understands that it is simpler to avoid making the hole in the first place.

Linda
 
No, it's the JREF that seems to want only 40 trials. If I had my way, Pavel, would be given at least 100 trials. I'm simply trying to accommodate the JREF's concern about its expenditure of time.

It is not relevant who prescribed the trials or what the total number is. I'm just pointing out that your proposed accomodation introduces an error.

Linda
 
It is not relevant who prescribed the trials or what the total number is. I'm just pointing out that your proposed accomodation introduces an error.
What I'm saying is that, if the JREF does not want to generally commit to more than 40 trials, the least it should do is extend the preliminary test if Pavel performs at his claimed performance level. Considering that he has to beat odds of 1,000 to 1 to pass the preliminary test, it is extremely unlikely that he would be able to do that by luck irrespective of the number of trials. If you disagree, give me an example where someone could beat those odds by early stopping.
 
What I'm saying is that, if the JREF does not want to generally commit to more than 40 trials, the least it should do is extend the preliminary test if Pavel performs at his claimed performance level. Considering that he has to beat odds of 1,000 to 1 to pass the preliminary test, it is extremely unlikely that he would be able to do that by luck irrespective of the number of trials. If you disagree, give me an example where someone could beat those odds by early stopping.

What you have proposed is just a way to increase (nearly double, in fact) the odds of false positive.

The originally suggested protocol: Applicant passes if the odds of the results are about a thousand to one, say, 30 or more out of 40.
Your suggested protocol: Applicant passes if they get 30 or more out of 40. If applicant gets 26 or more, then they are given 40 more attempts. If the odds of the results are then about a thousand to one (that would be, say, 54 or more out of 80), the applicant passes.

The probability of succeeding by chance alone in the originally suggested protocol: about 1 in 900.
The probability of succeeding by chance alone in your suggested protocol: about 1 in 537.

Do you see what you have done there?
 
What you have proposed is just a way to increase (nearly double, in fact) the odds of false positive.

The originally suggested protocol: Applicant passes if the odds of the results are about a thousand to one, say, 30 or more out of 40.
Your suggested protocol: Applicant passes if they get 30 or more out of 40. If applicant gets 26 or more, then they are given 40 more attempts. If the odds of the results are then about a thousand to one (that would be, say, 54 or more out of 80), the applicant passes.

The probability of succeeding by chance alone in the originally suggested protocol: about 1 in 900.
The probability of succeeding by chance alone in your suggested protocol: about 1 in 537.

Do you see what you have done there?

Also consider the other aspect of early-stopping, and likely where it gets its name from:

Applicant gets 32 out of 40. This is considered a 'pass'. However, if applicant had continued, consider that they might have only gotten 10 out of the next 40 correct, for a total of 42 out of 80 -- a 'fail', but the applicant does not have the chance to do this; the early stop guarantees the pass, even though the applicant might have failed.

This scenario (and the above-quoted) is why Linda is pointing out that you actually _intend_ to do 80 tests with your protocol, and also why the number of tests should never be modified after the testing has begun. Either you're going to do 40, or you're going to do 80, and that's the end of it.

Put another way, the following possibilities could occur in an 80-trial scenario. We will consider the word 'pass' to be achieving 30 out of 40, or 75%.

1) Applicant passes the first 40, and passes the second 40. Result: PASS
2) Applicant passes the first 40, and fails the second 40 by enough to drop the percentage below 75%. Result: FAIL
3) Applicant passes the first 40, and fails the second 40, but not by enough to drop the percentage below 75%: Result: PASS
4) Applicant fails the first 40, and fails the second 40. Result: FAIL
5) Applicant fails the first 40, and passes the second 40 by enough to raise the percentage above 75%. Result: PASS
6) Applicant fails the first 40, and passes the second 40, but not by enough to raise the percentage above 75%. Result: FAIL

You will note that there are three 'pass' scenarios, and three 'fail' scenarios in there; an even distribution for chance alone. What your 'early stop' protocol does is change #2 from a "FAIL" to a "PASS" -- a very bad flaw, as now the test is highly skewed towards passing by chance alone.
 
But you could do it this way:

55 out of 80 two-envelope trials has a <.001 chance (.00053).

But 28 out of 40 does not (.0083).

So if his claimed success rate is around 69-70% and his claim is accurate, he will succeed in beating 1 in 1000 odds in an 80-trial protocol but not in a 40-trial protocol.

Require that the claimant succeeds at the p <.05 level after 40 trials (that would be 26 or more, p = .040) in order to continue with the latter 40 trials. There is no "win" in the first round regardless of the score, only passing a sufficient threshold to be eligible to complete the test.

That way, the early stopping only increases the chance of failure in a protocol that otherwise gives him a greater chance of success if he has the claimed ability. Say he scored only 25 hits in the first 40. By agreeing to this procedure (and getting the benefit of passing with a relatively low success rate), he'd be giving up the chance to attempt to "come back in the stretch" by scoring 30 out of 40 in the latter 40 trials.

Suppose his actual success rate is .75 which is independent for each trial. With 40 rounds he still has a .416 chance of failing (29 or fewer hits) in a protocol requiring him to score 30 out of 40. With a straight 80 round protocol, he would have only a .08 chance of failing (54 or fewer). With the compromise two-stage protocol he has a .054 chance of failing (25 or fewer) in the first stage and an additional .052 chance (this one is a rather complex calculation) of failing in the second stage. The two-stage version has a slightly higher chance of a false negative outcome than the straight 80 trials because cases where he scored very poorly in the first 40 but came back very strong in the latter 40 would be ruled out as successes. But that's still a much lower chance of a false negative than the protocol with only 40 trials.

Respectfully,
Myriad
 
What you have proposed is just a way to increase (nearly double, in fact) the odds of false positive.

The originally suggested protocol: Applicant passes if the odds of the results are about a thousand to one, say, 30 or more out of 40.
Your suggested protocol: Applicant passes if they get 30 or more out of 40. If applicant gets 26 or more, then they are given 40 more attempts. If the odds of the results are then about a thousand to one (that would be, say, 54 or more out of 80), the applicant passes.

The probability of succeeding by chance alone in the originally suggested protocol: about 1 in 900.
The probability of succeeding by chance alone in your suggested protocol: about 1 in 537.

Do you see what you have done there?
Yes, but it's a minor difference. The way the preliminary test is tentatively designed for Pavel, the odds are he will fail even if he has a paranormal ability. Is that the design you want?
 
But you could do it this way:
(details snipped)

The two-stage version has a slightly higher chance of a false negative outcome than the straight 80 trials because cases where he scored very poorly in the first 40 but came back very strong in the latter 40 would be ruled out as successes. But that's still a much lower chance of a false negative than the protocol with only 40 trials.

What you've demonstrated is very basic, though -- it's well known that as the number of trials increases, the chances of both a false positive -and- false negative decrease. The problem here is that in a two-stage version, you're still going to potentially do 80 trials, and since you need to prepare for that possibility, you may as well just do 80 trials outright, rather than working up some elaborate formula which (even you have admitted) is less accurate than just doing 80 trials.
 
Yes, but it's a minor difference.

I'm sorry, the difference between 1 in 900 and 1 in 537 is _not_ minor; not even close. It is, in fact, approaching double the chance of success.

Rodney said:
The way the preliminary test is tentatively designed for Pavel, the odds are he will fail even if he has a paranormal ability.

So your contention is that Pavel has a greater than 50% chance of failure in the test as designed if he has his claimed paranormal ability? Remember, that's what "odds are" means -- either he passes or he fails, and for your statement to be true, he would have to have a better than 50% chance of failure of the test even though he has the ability. If this is your contention, please show your math in detail.

Rodney said:
Is that the design you want?

As has been repeated over and over to you, this is not a scientific study to determine whether the paranormal exists; this is a challenge. Either you know what you can do, and you can demonstrate that you can do it, or you fail.

If Pavel knows he can only perform consistently at a rate of 28 out of 40, then he should not accept terms that require him to perform at a rate of 30 out of 40. In the end, though, this is not about how many of 'X' Pavel can do -- each challenge is primarily designed to rule out the possibility of performing claim "X" by random chance alone.
 
I'm sorry, the difference between 1 in 900 and 1 in 537 is _not_ minor; not even close. It is, in fact, approaching double the chance of success.
It depends whether you're talking about relative or absolute odds. In relative terms, I have a vastly greater chance of being eaten by a shark than being struck by an asteroid, but that doesn't stop me from swimming in the ocean. Why? Because the absolute odds of being eaten by a shark are miniscule.

So your contention is that Pavel has a greater than 50% chance of failure in the test as designed if he has his claimed paranormal ability? Remember, that's what "odds are" means -- either he passes or he fails, and for your statement to be true, he would have to have a better than 50% chance of failure of the test even though he has the ability. If this is your contention, please show your math in detail.
If Pavel is correct that he averages 70% hits, the math couldn't be easier: In 40 trials he would be expected to average 28 hits, and his odds of getting 30 or more -- according to the binomial distribution -- would be 31%.

As has been repeated over and over to you, this is not a scientific study to determine whether the paranormal exists; this is a challenge. Either you know what you can do, and you can demonstrate that you can do it, or you fail.

If Pavel knows he can only perform consistently at a rate of 28 out of 40, then he should not accept terms that require him to perform at a rate of 30 out of 40.
And what happens if the JREF refuses to accept a protocol that allows him to average only 70%?

In the end, though, this is not about how many of 'X' Pavel can do -- each challenge is primarily designed to rule out the possibility of performing claim "X" by random chance alone.
I guess it would be quite a blow to the JREF if someone passed the preliminary test -- several folks here would probably have to be put on suicide watch. ;)
 
If someone passed the preliminary test, I would consider it a lucky hit. I have no idea how many claimants the JREF have tested over the years, but it is probably rather far from a thousand. However, the chance that somebody would pass through a lucky hit is not that impossible. Having once rolled two zeros twice with 10-sided dice in a game where it really mattered, I know the feeling of having used up a lifetime of luck, if such was possible. And just like with the preliminary test, nobody believed that everything was quite allright.

If the claimant would move on and take the prize money also, I would be a believer!
 
I guess it would be quite a blow to the JREF if someone passed the preliminary test -- several folks here would probably have to be put on suicide watch.

I see you have trouble distinguishing between the words "probably" and "in my fantasies".

Randi's Personal FAQ states that "... a couple hundred have completed and failed the preliminaries." Assuming an average chance of a false positive in the preliminaries on the order of 1 in a thousand, someone having passed a preliminary test by now would be, statistically, utterly unremarkable.
 
It depends whether you're talking about relative or absolute odds. In relative terms, I have a vastly greater chance of being eaten by a shark than being struck by an asteroid, but that doesn't stop me from swimming in the ocean. Why? Because the absolute odds of being eaten by a shark are miniscule.

This is a false analogy, red herring, and evasion; the odds of being eaten by a shark are far, far lower than 1 in 900. Neither 1 in 900 nor 1 in 500 are particularly miniscule in the first place, and this does not change the fact that your proposed protocol gives the applicant a far greater chance of succeeding by luck alone.

Rodney said:
If Pavel is correct that he averages 70% hits, the math couldn't be easier: In 40 trials he would be expected to average 28 hits, and his odds of getting 30 or more -- according to the binomial distribution -- would be 31%.

Then you can show that math in _detail_, right, not just handwaving some numbers out of nowhere? Perhaps you could pretend that I'm only facile in basic algebra.

Rodney said:
And what happens if the JREF refuses to accept a protocol that allows him to average only 70%?

And what happens if Pavel wrongly accepts a protocol that requires him to perform above his capacity? This is a negotiation. Both parties are expected to act in their own best interest. If an impasse is reached, _then_ this question is appropriate for discussion, and not until then.

Rodney said:
I guess it would be quite a blow to the JREF if someone passed the preliminary test -- several folks here would probably have to be put on suicide watch. ;)

Earth to Rodney -- most of us would be delighted if anyone even passed the _preliminary_ test. It would indicate that we might, indeed, be dealing with something real... and if nothing else, it would indicate that someone has found a new way to trick people that we hadn't thought of.
 
Originally Posted by Rodney
If Pavel is correct that he averages 70% hits, the math couldn't be easier: In 40 trials he would be expected to average 28 hits, and his odds of getting 30 or more -- according to the binomial distribution -- would be 31%.

Then you can show that math in _detail_, right, not just handwaving some numbers out of nowhere? Perhaps you could pretend that I'm only facile in basic algebra.


And what happens if Pavel wrongly accepts a protocol that requires him to perform above his capacity? This is a negotiation. Both parties are expected to act in their own best interest. If an impasse is reached, _then_ this question is appropriate for discussion, and not until then.

On the specific technical issue here, Rodney is right. Honestly, I don't remember the details of the math either. But here is a Matlab program that gives the result.

Code:
pp=betainc(.3,40-(30-1),30,'upper')
pp =
    0.3087

You raise an interesting point about both sides negotiating "in their own best interest." Presumably, JREF's interest is in conducting a fair test. Primarily, they want to be sure the applicant neither cheats nor wins by luck. Past that, I hope JREF would do what it can to help an applicant demonstrate any paranormal skills he has.
 

Back
Top Bottom